Message boards :
Number crunching :
ATI generate some invalids and nVidia not
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
The drivers I had in September (before the update to opencl) didn't had the opencl feature so they won't work. That was the reason I updated the drivers from ATI. I am using the newer 13.1 drivers on an AMD gpu at Moo and it IS crunching units faster, not alot faster but some. It seems to be using 20 less gpu seconds and 10 fewer cpu seconds too, this on a unit that normally takes closer to 60 minutes is now around 57 minutes, both numbers are average and NOT done with a calculator. |
Send message Joined: 27 Apr 10 Posts: 35 Credit: 90,828,595 RAC: 0 |
I have run a few hundred thousand WUs on my 5850, with very few failures. Most of those were due to system crashes or other issues. They're great cards for projects that actually support them. |
Send message Joined: 1 Aug 08 Posts: 4 Credit: 115,508,956 RAC: 0 |
Yes there seems to an issue with there is an issue with the code on the N-Body Simulation on ATI/AMD for me as well. It does not seem to be with the GPU either as all of my MilkyWay@Home v1.02 (opencl_nvidia) and MilkyWay@Home v1.00 complete without errors. Here are two work unit results: name de_nbody_100K_1_30_13_1358941502_20715 application MilkyWay@Home N-Body Simulation created 31 Jan 2013 | 15:35:16 UTC minimum quorum 2 initial replication 3 max # of error/total/success tasks 3, 9, 6 errors Too many errors (may have bug) Name de_nbody_100K_1_30_13_1358941502_56031_0 Workunit 304634204 Created 2 Feb 2013 | 9:44:23 UTC Sent 2 Feb 2013 | 9:51:28 UTC Received 2 Feb 2013 | 15:38:31 UTC Server state Over Outcome Computation error Client state Compute error Exit status -1073741511 (0xffffffffc0000139) Unknown error number Computer ID 356381 Report deadline 14 Feb 2013 | 9:51:28 UTC Run time 0.00 CPU time 0.00 Validate state Invalid Credit 0.00 Application version MilkyWay@Home N-Body Simulation v1.06 Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073741511 (0xc0000139) </message> ]]> 396318720 306279395 29021 5 Feb 2013 | 17:18:45 UTC 5 Feb 2013 | 23:47:01 UTC Error while computing 151.15 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396318719 306279394 29021 5 Feb 2013 | 17:18:45 UTC 5 Feb 2013 | 23:47:01 UTC Error while computing 154.50 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396318718 306279393 29021 5 Feb 2013 | 17:18:45 UTC 5 Feb 2013 | 23:47:01 UTC Error while computing 153.25 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396318717 306279392 29021 5 Feb 2013 | 17:18:45 UTC 5 Feb 2013 | 23:47:01 UTC Error while computing 149.31 0.00 --- MilkyWay@Home N-Body Simulation v1.06 My video on this machine is a Gigabyte HD6850 factory overclocked. My cpu is an AMD X4 640 and it is overclocked a bit but I am sure that is not the issue. This computer crunches all my other projects fine and I 5 others with AMD cards and rarely ever have any units error out. The other thing that makes me think bug is all the workunits with errors are almost exactly 150 secconds long. Didn't really think much about the errors till I read this post so I decided to give my feedback. I guess I will just uncheck the N-Body app untill they get it sorted out. |
Send message Joined: 1 Aug 08 Posts: 4 Credit: 115,508,956 RAC: 0 |
Typo from the end. Just to let everyone know I think it is an ATI issue: "This computer crunches all my other projects fine and I 5 others with AMD cards" Should read "This computer crunches all my other projects fine and I have 5 others with NVIDIA cards that do not have errors" |
Send message Joined: 1 Aug 08 Posts: 4 Credit: 115,508,956 RAC: 0 |
Just one more post and then I will wait for a reply. Looking over some of the failed workunits I discovered that it does not seem to be an ATI or AMD issues as I found a bunch of others with Nvidia cards and Intel cpu's that are having errors. It definately seems to be just that application. I count 4 others on this workunit that errored out. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=299078972 Workunit 299078972 Bill Brumbaugh | log out name de_nbody_orphan_real_CHISQ_1356215205_558387 application MilkyWay@Home N-Body Simulation created 21 Jan 2013 | 12:35:11 UTC minimum quorum 2 initial replication 3 max # of error/total/success tasks 3, 9, 6 errors Too many errors (may have bug) Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 386004668 459250 21 Jan 2013 | 12:44:45 UTC 21 Jan 2013 | 20:40:33 UTC Completed, can't validate 15.08 9.41 0.00 MilkyWay@Home N-Body Simulation v1.04 386239118 495021 21 Jan 2013 | 20:50:20 UTC 24 Jan 2013 | 12:38:12 UTC Completed, can't validate 7.53 5.52 0.00 MilkyWay@Home N-Body Simulation v1.04 387885762 362466 24 Jan 2013 | 12:38:25 UTC 5 Feb 2013 | 12:38:25 UTC Timed out - no response 0.00 0.00 --- MilkyWay@Home N-Body Simulation v1.04 396167838 29021 5 Feb 2013 | 12:45:39 UTC 5 Feb 2013 | 17:18:45 UTC Error while computing 150.13 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396321853 496301 5 Feb 2013 | 17:23:08 UTC 5 Feb 2013 | 17:25:41 UTC Error while computing 0.00 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396325924 351074 5 Feb 2013 | 17:29:53 UTC 6 Feb 2013 | 2:14:19 UTC Error while computing 0.00 0.00 --- MilkyWay@Home N-Body Simulation v1.06 396611703 479231 6 Feb 2013 | 2:22:03 UTC 6 Feb 2013 | 4:15:12 UTC Error while computing 0.00 0.00 --- MilkyWay@Home N-Body Simulation v1.06 |
Send message Joined: 19 Jul 10 Posts: 578 Credit: 18,845,239 RAC: 856 |
N-Body runs on CPU, not GPU, so it has nothing to do with ATI or nVidia. Issues with those WUs should be posted in the news froum in the corresponding thread. |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
Hi all, I've been reading this thread anxiously hoping to find the reason why my new 7970 has a success rate of less than 15.9% on GPU crunched workunits. total WU for GPU is 395, 63 valid, 197 invalid, 135 pending. Settings are no more than 6 cpu used, 16gig ram. I was using a gt 520 before and everything was fine, i know it's CUDA, but wanted what was considered the best cruncher around but very dissapointed by failure rate. I'm using lastest offcial drivers from AMD. I just started Milky was to see if the invalids were app scpecific because most happen with einstein. Unfortunately i cant seem to be able to see through boinc all my results with milky way so i cannot tell wether the wu are valid or not. I have a total score but i've also crunched wu with cpu thus i cant conclude that my total is solely from GPU WU. Can smeone help me? |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
State: All (131) | In progress (25) | Pending (64) | Valid (37) | Invalid (1) | Error (4) Looks good to me so far All new and returning users for this project need to proove themselves At least thats how it works for me. While the pending all show validation inconclusive that is 100% normal for this project. After MilkyWay validates some more (not sure how many more) and you still keep the real Invalids down to 0 or 1 Milkyway is one of the few projects (that I participate in) that have a minimum quorum of 1 (not sure how they do this)? Have patience and watch your real INVALIDS as far as Einstein If you check your Programs and Features and you have Intels OpenCL installed try as a test to remove it as it sometimes interferes with OpenCL on other Devices. 7.0.52 is supposed to detect this but not sure if it fixes this. |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
Where in Boinc can i access the status of a particular WU. It's easy in einstein and SETI but in milKyway...tHE only tab i have is HOMe paGE. |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
Homeboy Homepage --> Returning participants •Your account - view stats, modify preferences Tasks |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
thanks...unfortunate we cant get those stats directly through tabs in BOINC app here goes does this look like normal behavior? All (254) | In progress (49) | Pending (45) | Valid (152) | Invalid (1) | Error(7) Application: All (254) | MilkyWay@Home (253) | MilkyWay@Home N-Body Simulation (1) | Milkyway@Home Separation (0) |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
No Invalids since that first one The errors come in groups of three possibly something is resetting your 79xx ATI Since I run Nvidia I cannot help with driver versions and which are best or worst for MilkyWay someone else who has been thru catalyst versions should jump in now . Are you running more than one GPU job at a time? Good luck it seems to be running good otherwise |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Your invalid WU is only valid up to 7 or 8 digits and your error WUs are showing lots of "NAN" in the results. What type of 7970 is it exactly? "Clock frequency: 1000 Mhz" Is it factory overclocked or done by yourself? Try setting it back to default clock and see if you still get errors. |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
it's XFX 1000mhz not overclocked by me or by hardware. It's not the ghz editon at 1050mhz. pretty stable no artefatcs in benchmarks. |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
i've downclocked from original manufacturer 1000mhz to 925mhz original. I'll try overnight and see results. thanks for tips. I'll keep you posted on results. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
i've downclocked from original manufacturer 1000mhz to 925mhz original. I'll try overnight and see results. thanks for tips. I'll keep you posted on results. One other thing are you using all of your cpu cores to crunch with or are you leaving one or more free? Also how many gpu units are you running at one time? |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
well i only have 2 invalid and 10 compute errors on over 1200 units since friday. Seems to be within tolerable error rate. I had stated before but i only use 6 cores out of 8. My error rate is way to high with einstein...85% plus error so i think i'll stick with Milky way. Got to say that its really crunching away with this baby. ;-) |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
well i only have 2 invalid and 10 compute errors on over 1200 units since friday. Seems to be within tolerable error rate. I had stated before but i only use 6 cores out of 8. My error rate is way to high with einstein...85% plus error so i think i'll stick with Milky way. Got to say that its really crunching away with this baby. ;-) That is GOOD news!!! But how many MW units do you crunch at the same time with that gpu? I know it is a 3gb unit, do you crunch multiple units at once? I am thinking of getting one and am wondering how you are doing it. And YES I too like the XFX ones, the warranty is just too good to pass up! Although 'knock on wood' I have NEVER had a gpu totally die on me. I do have to spray the Power Lube in mine when the gunk builds up and the fans stop spinning, but they are usually back up and running in a day or so. |
Send message Joined: 22 Feb 13 Posts: 7 Credit: 131,958,276 RAC: 0 |
Just one but it takes 48 seconds/per unit so i dont think it's necessary to over do it. BTW can it crunch more than one at a time. I dont have a crossfire setup so one GPU one unit, 2 GPU 2 units..although it would be costly and pretty hot in that case. The cpu is crunching 6 other with mix of einstein and seti (but not the gpu projects). I've downclocked th4e GPU to original factory specs and it's been fine so far. THE RAC went from 0 to 47k in 3 days..guess it is working. ;-) |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
Just one but it takes 48 seconds/per unit so i dont think it's necessary to over do it. BTW can it crunch more than one at a time. I dont have a crossfire setup so one GPU one unit, 2 GPU 2 units..although it would be costly and pretty hot in that case. The cpu is crunching 6 other with mix of einstein and seti (but not the gpu projects). I've downclocked th4e GPU to original factory specs and it's been fine so far. THE RAC went from 0 to 47k in 3 days..guess it is working. YES it IS possible but no it doesn't always help, I tried in on my 5870's and it did NOT help. But I only have 1gb of ram and each unit is taking a couple of minutes or so to finish, when I ran 2 units at once it jumped to almost 9 minutes per unit. To run more then one unit in Windows use NOTEPAD: <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config> Save the text file as app_config.xml in your Milkyway directory (….\BOINC\projects\milkyway.cs.rpi_milkyway) The line <gpu_usage>0.5</gpu_usage> tells your gpu to run 2 units at once, to run 3 you would use 0.33, to run 4 units you would use 0.25 etc. You also MUST be using a version of Boinc at least as new as 7.0.40 for this type of file to work. You can always get the latest version of Boinc here: http://boinc.berkeley.edu/dl/?C=M;O=D |
©2024 Astroinformatics Group