Posts by BeemerBiker
log in
1) Message boards : Number crunching : Errors (Message 66636)
Posted 17 Sep 2017 by Profile BeemerBiker
The ones named "fixed" work, but they are scattered among so many un-fixed I am aborting all.

[EDIT] After aborting 170 my system downloaded another 110 of which only 2 were "fixed" They are downloading faster than I can abort them. I should have stuck with my bitcoin mining.
2) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66207)
Posted 19 Feb 2017 by Profile BeemerBiker
just realized that one of my 1070s quit working. that can explain the failures. however it was working fine on gpugrid today and completed a long run then switched to milkyway when the queue ran out. rebooting fixed the second 1070
3) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66206)
Posted 19 Feb 2017 by Profile BeemerBiker
I have the same problem. Total of over 900 tasks with 460 erroring out just today within a 2 seconds. Yesterday 420 ran ok taking 2 minutes each.

Pair of gtx 1070 not in sli. every now and then, in todays batch, there is one that is valid but i have to look hard to find it.

Here is a typical error
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.43 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE3 path
Found 1 platform
Platform 0 information:
Name: NVIDIA CUDA
Version: OpenCL 1.2 CUDA 8.0.0
Vendor: NVIDIA Corporation
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
Profile: FULL_PROFILE
Using device 1 on platform 0
Found 1 CL device
Requested device is out of range of number found devices
Failed to select a device (1): MW_CL_ERROR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
4) Message boards : Number crunching : Computation errors. (Message 60290)
Posted 3 Nov 2013 by Profile BeemerBiker
Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi'


I think that explains why I am failing all opencl tasks on one of my gtx460 machines while another similar 460 runs opencl fine. I once had an ATI video board on it. After reading Richards ATI info post, I uninstalled the AMD catalyst set using "express uninstall manager" and rebooted but it seems I have the same problem:

Before rebooting there were 7 or so MW tasks that had compute error and one that had not started yet as I had managed to suspend the project. After rebooting I resumed MW project and the tasks started up and failed immediately. An additional task was downloaded and it also failed.

It would appear that there is still some opencl ATI driver somewhere causing a problem. This system is a old Tyan S2892 server with onboard ATI video (unused and disabled via jumper) with two gtx460 and win7x64pro. It would appear I will have to use some driver cleaner to get rid of ATI opencl. Alternately, there is some other problem. Maybe a guru can spot another problem here

EDIT - Collatz and Prime run fine on this system but are not using opencl.
5) Message boards : Number crunching : Computation errors. (Message 60267)
Posted 1 Nov 2013 by Profile BeemerBiker
Suddenly, all errors on an AMD system with pair of gtx460
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=285405&offset=0&show_names=0&state=0&appid=

other systems are intel cpu and no errors.
6) Message boards : Number crunching : AMD GPU Computation errors (Message 59981)
Posted 24 Sep 2013 by Profile BeemerBiker
I have the same driver which uses opencl 1268.1 but have BOINC 7.0.28

    d975x2x-2 WIN7x64
    13 2013-09-24 8:11:10 AM ATI GPU 0: Cypress (CAL version 1.4.1848, 1024MB, 991MB available, 4176 GFLOPS peak)
    14 2013-09-24 8:11:10 AM OpenCL: ATI GPU 0: Cypress (driver version 1268.1 (VM), device version OpenCL 1.2 AMD-APP (1268.1), 1024MB, 991MB available)



This should be working but it isnt. I havs since restricted WU's to only do "fit"


NOTE: THESE ARE ALL OPEN CL APPS.



You should really update from that version, it had some serious problems with OpenCL.



I am running catalyst 13.9, the latest non-beta driver. That driver includes OpenCl 1268.1 as shown below

7) Message boards : Number crunching : AMD GPU Computation errors (Message 59977)
Posted 24 Sep 2013 by Profile BeemerBiker
"Modified Fit" seems to be the only tasks working. All others generating computation error on my 5850
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=497329

Yes, I have the same problem on my 6970. :(


I have a 6950 and they ARE working for me:
570963671 430971605 23 Sep 2013, 9:55:40 UTC 23 Sep 2013, 14:35:48 UTC Completed and validated 8,842.33 8,816.48 267.19 Milkyway@Home Separation (Modified Fit) v1.28
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=570963671

It is running Catalyst version 13.6 and Boinc 7.0.64, the 64 bit version.



Yea - but the "fit" are the only ones working. You don't seem to be using opencl either. All of your WU AFAICT are not opencl apps like mine.
8) Message boards : Number crunching : AMD GPU Computation errors (Message 59976)
Posted 24 Sep 2013 by Profile BeemerBiker
I have the same driver which uses opencl 1268.1 but have BOINC 7.0.28

    d975x2x-2 WIN7x64
    13 2013-09-24 8:11:10 AM ATI GPU 0: Cypress (CAL version 1.4.1848, 1024MB, 991MB available, 4176 GFLOPS peak)
    14 2013-09-24 8:11:10 AM OpenCL: ATI GPU 0: Cypress (driver version 1268.1 (VM), device version OpenCL 1.2 AMD-APP (1268.1), 1024MB, 991MB available)



This should be working but it isnt. I havs since restricted WU's to only do "fit"


NOTE: THESE ARE ALL OPEN CL APPS.

9) Message boards : Number crunching : AMD GPU Computation errors (Message 59970)
Posted 24 Sep 2013 by Profile BeemerBiker
"Modified Fit" seems to be the only tasks working. All others generating computation error on my 5850
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=497329
10) Message boards : Number crunching : MW app is misidentifying gpu (Message 56945)
Posted 18 Jan 2013 by Profile BeemerBiker

The HD7750 got replaced by a GTX-650 Ti.


Interesting - I picked up a GTX650 TI last week and am disappointed, at least here at MilkyWay, with its performance. It is taking 1494 seconds to get a WU complete compared to 744 seconds for an old gtx280. However, it does work better in PrimeGrids double precision "Genefr" than my gtx280. It also fits nicely on a motherboard that cannot take even a 2/3 length card.

I have been working on an excel spreadsheet that shows results after running that PrimeGrid challenge --

11) Message boards : Number crunching : MW app is misidentifying gpu (Message 56941)
Posted 18 Jan 2013 by Profile BeemerBiker
SOLVED - I HAD TO UNINSTALL THE ATI DRIVER.

After uninstalling and rebooting that gpu cps program was able to run the open cl nvidia demo.

Question: I wonder if opencl will run on both boards at the same time? It seem to me that both BOINC and that gpu caps utility should be able to discriminate between opencl_nvidia and opencl_ati

12) Message boards : Number crunching : MW app is misidentifying gpu (Message 56940)
Posted 18 Jan 2013 by Profile BeemerBiker
It is set for "default" and for default I use
cpu: no
amd: yes
nvidia: yes

MW does not keep a long enough history for me to check previous WUs to verify that the gtx460 had worked two weeks ago. There is a BOINC client post about how BOINC checks for the opencl library in a subdirectory here so it appears that BOINC decides if the opencl is available. I wonder if that is only for ATI and not NVIDIA or vice-versa.

gpu cps capability shows I have opencl 1.1


I did not bother uninstalling the ATI driver. Maybe that needs to be done to avoid confusion.

[EDIT] I just ran the "opencl demo" and IT FAILED. Said something to the effect that "open cl was not supported on this platform". Obviously this is a problem with my drivers and the removal of the ATI card and not MW or BOINC. Maybe it was trying to run the ATI opencl that was left as I didnt uninstall ATI???
13) Message boards : Number crunching : MW app is misidentifying gpu (Message 56932)
Posted 17 Jan 2013 by Profile BeemerBiker
In a system that had both gtx460 and hd5850 I pulled out th ATI cypress board and put in a 2nd nvidia gtx460 temporarily here for a PrimeGrid challenge that cannot use ATI. That just completed today and I turned MW back on without putting the HD5850 back in. MW is rejecting all nvidia tasks, even though they use to run when a single gtx460 was still in the system alongside the HD5850


On the 9th (the day I put in the 2nd gtx460) it seems I dont have an nvidia gpu even though it is shown here

    1/9/2013 1:50:10 PM NVIDIA GPU 0: GeForce GTX 460 (driver version 310.90, CUDA version 5.0, compute capability 2.1, 1024MB, 8381367MB available, 907 GFLOPS peak)
    1/9/2013 1:50:10 PM NVIDIA GPU 1: GeForce GTX 460 (driver version 310.90, CUDA version 5.0, compute capability 2.1, 1024MB, 940MB available, 907 GFLOPS peak)
    1/9/2013 1:50:10 PM App version needs OpenCL but GPU doesn't support it
    Milkyway@Home 1/9/2013 1:50:10 PM Application uses missing NVIDIA GPU
    1/9/2013 1:50:10 PM Config: use all coprocessors
    1/9/2013 1:50:10 PM Config: GUI RPC allowed from any host



Then starting about 11am local time I re-enabled MW and I get these error messages. There were no old MW tasks and all the errors message were from new tasks that were downloaded after I set "allow new tasks".


    1/17/2013 11:51:00 AM App version needs OpenCL but GPU doesn't support it
    Milkyway@Home 1/17/2013 11:51:00 AM Scheduler request completed: got 16 new tasks
    Milkyway@Home 1/17/2013 11:51:00 AM [error] App version uses non-existent NVIDIA GPU
    Milkyway@Home 1/17/2013 11:51:00 AM [error] Missing coprocessor for task de_separation_11_sSgr_1_1356215205_12166343_1; aborting



MW is simply rejecting all NVIDIA thinking they were for an ATI card.

I assume a reset of project will correct the problem but, IMHO, this shows a problem with recognition of video boards in the MW (or maybe BOINC) code. The HD5850 has a fan problem and it may be sometime before I get it back in.


[EDIT]
Thread named "App version needs OpenCL but GPU doesn't support it" has the same problem as I do. However, I have another system with three gtx460 and same boinc 28 version and it has no problem with gtx460. The problem here is that the ATI used to run MW projects and is now missing and the MW (or BOINC??) has gotten confused.

14) Message boards : Number crunching : getting perameter, memory alloc, and milkway ignoring cc directive (Message 54241)
Posted 29 Apr 2012 by Profile BeemerBiker
I have read where the exclude stuff is being ignored in the cc_config.xml file and you MUST put it in the other file again. Sorry brain fart I JUST forgot the name of the other file! There are two major ways to tweak Boinc the cc_config.xml file and.....? It is the 2nd one you have to use again.


This was working at one time so something is broken. I did recently move the two GTS250 (which are SLI) from slots 2 and 3 to slots 1 and 2 and put the GTX460 into slot 3. My motherboard has 3 PCIe slots. This move was to mitigate overheating. When I rebooted, I checked the message dialog box (events) and verified that device 1 and 2 were still the GTS250.

I assume they are consistently numbered using 0,1,2 for both messages and not 1,2,3 for another.

Since I posted, I upgraded to BM 7.0.25 but have yet to receive any milkyway tasks. Eventually I will get some and will see if the problem has disappeared.
15) Message boards : Number crunching : getting perameter, memory alloc, and milkway ignoring cc directive (Message 54229)
Posted 29 Apr 2012 by Profile BeemerBiker
Got a bunch of errors in a row on my gtx570 as shown here for example.

Different type of error for gtx460 system that has two gts250 in addition to the gtx460.

Although I have coded to ignore the gts250 into the cc file, it seems they are being handed tasks that can only run on my gtx460

    13 2012-04-29 12:19:53 AM NVIDIA GPU 0: GeForce GTX 460 (driver version 296.10, CUDA version 4.20, compute capability 2.1, 1024MB, 954MB available, 907 GFLOPS peak)
    14 2012-04-29 12:19:53 AM NVIDIA GPU 1: GeForce GTS 250 (driver version 296.10, CUDA version 4.20, compute capability 1.1, 1024MB, 965MB available, 705 GFLOPS peak)
    15 2012-04-29 12:19:53 AM NVIDIA GPU 2: GeForce GTS 250 (driver version 296.10, CUDA version 4.20, compute capability 1.1, 1024MB, 970MB available, 705 GFLOPS peak)
    16 2012-04-29 12:19:53 AM OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 296.10, device version OpenCL 1.1 CUDA, 1024MB, 954MB available)
    17 2012-04-29 12:19:53 AM OpenCL: NVIDIA GPU 1: GeForce GTS 250 (driver version 296.10, device version OpenCL 1.0 CUDA, 1024MB, 965MB available)
    18 2012-04-29 12:19:53 AM OpenCL: NVIDIA GPU 2: GeForce GTS 250 (driver version 296.10, device version OpenCL 1.0 CUDA, 1024MB, 970MB available)
    26 2012-04-29 12:19:53 AM Config: use all coprocessors
    27 Milkyway@Home 2012-04-29 12:19:53 AM Config: excluded GPU. Type: all. App: all. Device: 1
    28 Milkyway@Home 2012-04-29 12:19:53 AM Config: excluded GPU. Type: all. App: all. Device: 2



As shown here the task was handed to the gts250 which is a mistake. Although shown in the message script above, it would appear the the following is being ignored.


    <cc_config>
    <log_flags>
    </log_flags>
    <options>
    <use_all_gpus>1</use_all_gpus>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>1</device_num>
    </exclude_gpu>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>2</device_num>
    </exclude_gpu>
    <exclusive_app>PlantsVsZombies.exe</exclusive_app>
    </options>
    </cc_config>




bonc: 7.0.18 and win7x64

16) Message boards : Number crunching : tasks being sent to wrong gpu card (Message 53529)
Posted 3 Mar 2012 by Profile BeemerBiker
http://boinc.berkeley.edu/wiki/Client_configuration :

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. (New in 6.13 )

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>nvidia|ati</type>]
[<app>appname</app>]
</exclude_gpu>


and

From the code of mw sep v1.0x there is a command line param
--device [Device number passed by BOINC to use]

No idea if and how that works.
Anyone with 2 gpus willing to test it?


I put the following together and am running it under 7.0.18


    <cc_config>
    <log_flags>
    </log_flags>
    <options>
    <use_all_gpus>1</use_all_gpus>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>1</device_num>
    </exclude_gpu>
    <exclude_gpu>
    <url>http://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>2</device_num>
    </exclude_gpu>
    </options>
    </cc_config>



Results from reading cc_config follow.


    13 2012-03-03 8:28:21 AM NVIDIA GPU 0: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 933MB available, 907 GFLOPS peak)
    14 2012-03-03 8:28:21 AM NVIDIA GPU 1: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 970MB available, 705 GFLOPS peak)
    15 2012-03-03 8:28:21 AM NVIDIA GPU 2: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 937MB available, 705 GFLOPS peak)

    <snip>
    94 2012-03-03 8:39:29 AM Re-reading cc_config.xml
    95 2012-03-03 8:39:29 AM Config: use all coprocessors
    96 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 1
    97 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 2



I also verified it was working: Milkyway was assigned to device 0 only and the other two GPUs were idle during the same time.

17) Message boards : Number crunching : OpenCL video card assignment issue (Message 53493)
Posted 29 Feb 2012 by Profile BeemerBiker
I delayed switching one of my machines over after reading about this but today took the plunge, upgraded the ATI drivers to 12.1 (Win7-64) and switched MW to use the automatic open_cl app. I'm using BOINC 6.12.43 and it works fine, running MW on the HD5870 and HD4770 cards while crunching PG on the GTX460. So far no hitches at all.


FWIW, I had to go back to 11.6 as my HD4890 started failing on the collatz project. My HD5850 handled 12.1 upgrade just fine.

If you are running collatz I am curious if your 4770 handles it OK.

As far as MW, my system that had the 4890 and 12.1 kept giving this message
    Milkyway@Home 2012-02-29 6:03:28 AM Message from server: Catalyst driver version is not OK for OpenCL application with this GPU


I have not seen this message since I went back to 11.6.

18) Message boards : Number crunching : tasks being sent to wrong gpu card (Message 53263)
Posted 18 Feb 2012 by Profile BeemerBiker
I still have this problem but now it is a gtx570 and a gts250. The 570 is a full length 3x width and the gts250 a 2/3 length 2x width that just barely fits. I do not see any way to route MW tasks to avoid the 250 and will stop MW for this system.
19) Message boards : Number crunching : cuda_opencl tasks not terminating (Message 50318)
Posted 21 Jul 2011 by Profile BeemerBiker
I can verify this problem and it is easily repeataable. The following copy and paste was from the boinc forum as I did not see this thread before I posted.

======from boinc core forum===

I thought I would mention the following problems / successes. I upgraded 6.12.26 on two systems gpu's: hd5850+gtx570 and hd4890, to 6.13.0 and a day later to 6.13.1.

(1) I got my first successful result on the hd4890 for milkyway. Previously, all the work units timeout out after about 80 seconds. The hd4890 was a warrantee replacement for my burned out gtx280 as xfxforce was out of gtx280s.

(2) PrimeGrid tasks never start on the hd4890. I understand this is a problem with ATI 11.6 driver from what I read at PrimeGrid forum. Moo! wrapper runs fine on hd4890. The 5850 does not have any of these problem, it seems only the 4890 series.

For both 13.0 and 13.1 closing BM stopped all CPU tasks and most gpu tasks for all projects but the CUDA version of milkyway failed to stop. Starting BM back up gave me two instances of milkyway cuda, but the one left running when BM exited quickly got into "waiting to run". This was easily repeatable.

When the ATI version of milkyway terminates "ready to report" both monitors flicker noticibly and a second flicker about 1/2 sec later when the next milkyway work unit starts up. Hotfix 11.6b was supposed to fix flickering, but it had no effect on BOINC 6.13 (milkyway project).

Anyway, I was very happy to be able to crunch milkyway using my ATI 4890. Only Moo! Wrapper worked when I first put the board into the system to replace the gtx280.
20) Message boards : News : maximum time limit elapsed bug (Message 50281)
Posted 20 Jul 2011 by Profile BeemerBiker
Switching to BOINC 6.13 got me my first valid unit for my ATI 4890
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=261689

Previously, the timeout interval was 82 seconds (from my understanding of the stderr) and my units would fail between %40 and %60 complete. After switching from 6.12.26 to 6.13 all work units are validing.

I have not yet tested PrimeGrid which also failed with the 4890 board. Driver is 11.6 from ATI and when I brought up the catalyst control center, I got a driver failure and a message from CCC that it was switching to compatibility mode (whatever that means). I continued to get MW failures even after rebooting and then decided to switch to 6.13.

The ATI 4890 was an xfxforce warrantee replacement for my defective nVidia gtx280. Seems they ran out of gtx280's.

[EDIT]
hmm ... spoke too soon. Got failures on two MW units on the 5850. Had not had any errors on that system before upgrading to 6.13

PrimeGrid is still failing on this 4890, but so far, all milkyways are getting to completion


Next 20

Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group