Message boards :
Number crunching :
Invalid WUs
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Dec 07 Posts: 4 Credit: 19,943,618 RAC: 0 |
I am getting about a 10% invalid rate on one of my HD 4850s, but cannot figure out why. The stderror_out text looks exactly the same as for a valid WU. Here is what it looks like: Name de_s222_3s_best_4p_05r_22_596170_1259083707_0 Workunit 597171 Created 24 Nov 2009 17:28:30 UTC Sent 24 Nov 2009 17:29:11 UTC Received 24 Nov 2009 17:35:17 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 118608 Report deadline 27 Nov 2009 17:29:11 UTC Run time 52.78125 stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <stderr_txt> Running Milkyway@home ATI GPU application version 0.20b (Win32, x87, CAL 1.4) by Gipsel instructed by BOINC client to use device 1 CPU: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz (2 cores/threads) 2.63307 GHz (409ms) CAL Runtime: 1.4.255 Found 2 CAL devices Device 0: ATI Radeon HD4700/4800 (RV740/RV770) 512 MB local RAM (remote 64 MB cached + 256 MB uncached) GPU core clock: 700 MHz, memory clock: 993 MHz 800 shader units organized in 10 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Device 1: ATI Radeon HD4700/4800 (RV740/RV770) 512 MB local RAM (remote 64 MB cached + 256 MB uncached) GPU core clock: 700 MHz, memory clock: 993 MHz 800 shader units organized in 10 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Starting WU on GPU 1 main integral, 320 iterations predicted runtime per iteration is 150 ms (33.3333 ms are allowed), dividing each iteration in 5 parts borders of the domains at 0 320 640 960 1280 1600 Calculated about 8.22242e+012 floatingpoint ops on GPU, 1.23583e+008 on FPU. Approximate GPU time 52.7813 seconds. probability calculation (stars) Calculated about 3.34818e+009 floatingpoint ops on FPU. WU completed. CPU time: 6.625 seconds, GPU time: 52.7813 seconds, wall clock time: 56.508 seconds, CPU frequency: 2.63308 GHz </stderr_txt> ]]> Validate state Invalid Claimed credit 0.242683513700349 Granted credit 0 application version 0.20 Does anyone out there see anything I am missing? I can see no reason for it to be invalid. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Gonna see what's up. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Going through your results it looks like the ones that have an error are returning NaN or some very weird result (as in an impossible result). There might be some kind of hardware issue. Is your GPU overclocked? |
Send message Joined: 19 Dec 07 Posts: 4 Credit: 19,943,618 RAC: 0 |
Yes Travis. They were both mildly overclocked. I have set them both back to factory default. Let's see what happens now. Thanks. |
Send message Joined: 19 Dec 07 Posts: 4 Credit: 19,943,618 RAC: 0 |
Well that fixed it I think. No invalids for over 2 hours now. Strange that they were both overclocked equally but only one was having the problem. Oh well, full steam ahead again. |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
OK, Cuda working fine, but ATI showing Computation error for all. Windows coming up with: astronomy_0.20b_ATI_amd.exe has stopped working I click on the close program button then it moves to the next one and errors that as well. as below 25/11/2009 1:54:31 p.m. Milkyway@home Computation for task de_s222_3s_best_4p_05r_25_689985_1259092066_0 finished 25/11/2009 1:54:31 p.m. Milkyway@home Output file de_s222_3s_best_4p_05r_25_689985_1259092066_0_0 for task de_s222_3s_best_4p_05r_25_689985_1259092066_0 absent 25/11/2009 1:54:33 p.m. Milkyway@home Computation for task de_s222_3s_best_4p_05r_25_689986_1259092066_0 finished 25/11/2009 1:54:33 p.m. Milkyway@home Output file de_s222_3s_best_4p_05r_25_689986_1259092066_0_0 for task de_s222_3s_best_4p_05r_25_689986_1259092066_0 absent 25/11/2009 1:54:34 p.m. Milkyway@home Computation for task de_s222_3s_best_4p_05r_25_689993_1259092066_0 finished 25/11/2009 1:54:34 p.m. Milkyway@home Output file de_s222_3s_best_4p_05r_25_689993_1259092066_0_0 for task de_s222_3s_best_4p_05r_25_689993_1259092066_0 absent 25/11/2009 1:54:34 p.m. Milkyway@home Computation for task de_s222_3s_best_4p_05r_25_689987_1259092066_0 finished 25/11/2009 1:54:34 p.m. Milkyway@home Output file de_s222_3s_best_4p_05r_25_689987_1259092066_0_0 for task de_s222_3s_best_4p_05r_25_689987_1259092066_0 absent My start up config: 24/11/2009 10:44:58 p.m. Starting BOINC client version 6.10.17 for windows_x86_64 24/11/2009 10:44:58 p.m. Config: use all coprocessors 24/11/2009 10:44:58 p.m. log flags: file_xfer, sched_ops, task 24/11/2009 10:44:58 p.m. Libraries: libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3 24/11/2009 10:44:58 p.m. Data directory: C:\Users\Public\boinc data 24/11/2009 10:44:58 p.m. Running under account David 24/11/2009 10:44:58 p.m. Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 965 Processor [AMD64 Family 16 Model 4 Stepping 2] 24/11/2009 10:44:58 p.m. Processor: 512.00 KB cache 24/11/2009 10:44:58 p.m. Processor features: fpu tsc pae nx sse sse2 pni 24/11/2009 10:44:58 p.m. OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 24/11/2009 10:44:58 p.m. Memory: 8.00 GB physical, 16.05 GB virtual 24/11/2009 10:44:58 p.m. Disk: 498.05 GB total, 386.85 GB free 24/11/2009 10:44:58 p.m. Local time is UTC +13 hours 24/11/2009 10:44:58 p.m. ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.427, 1024MB, 1000 GFLOPS peak) 24/11/2009 10:44:58 p.m. ATI GPU 1: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.427, 1024MB, 1000 GFLOPS peak) 24/11/2009 10:44:58 p.m. Milkyway@home Found app_info.xml; using anonymous platform It was running Collatz while the project was down, and still is without problems. I will try installing 6.10.18 and see if that makes a difference. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Well that fixed it I think. No invalids for over 2 hours now. Strange that they were both overclocked equally but only one was having the problem. Oh well, full steam ahead again. Well processor manufacturing isn't an exact science, so some processors overclock better than others :) |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
OK, Cuda working fine, but ATI showing Computation error for all. Not sure, but it looks like you have the incorrect MilkyWay application version variant installed. I think you require the 64-bit one for Catalyst drivers 9.2 and above. This application variant contains "astronomy_0.20b_ATI_x64_ati.exe". You will need to change all 3 files to the correct version, not just the application itself. Details and explanations on which application variant is the correct one for your operating system and Catalyst driver version are contained in the readme file included with the download. If you are swapping graphics cards between different computers with different Windows operating systems (32-bit and 64-bit) and different Catalyst driver versions installed you have to ensure that you are using the correct MilkyWay application variant to match the operating system and Catalyst driver version that is installed on that box. |
Send message Joined: 28 Jan 09 Posts: 31 Credit: 85,934,108 RAC: 0 |
Has something changed with the workunits? A machine that has been running well is now erroring out:- de_s222_3s_best_4p_05r_21_433001_1259232804_0 Workunit 1245689 Created 26 Nov 2009 10:53:26 UTC Sent 26 Nov 2009 10:54:10 UTC Received 26 Nov 2009 10:55:03 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741515 (0xffffffffc0000135) Computer ID 87699 Report deadline 29 Nov 2009 10:54:10 UTC Run time 0 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> - exit code -1073741515 (0xc0000135) </message> ]]> Validate state Invalid Claimed credit 0 Granted credit 0 application version 0.20 every unit I receive is like this. What is exit status -1073741515 ? |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
Thanks. All running again, seems that when I changed Cal versions it took out the AMD*.dll files as well. FYI: the 5970 is showing a 25% speed improvement over the 4850. Not really good value, still we need to try. Cheers. |
Send message Joined: 28 Jan 09 Posts: 31 Credit: 85,934,108 RAC: 0 |
Don't tell me that I have two in the post. In fact that has prompted me to go write down the questions about what would be the best 5970 app_info setup in the Optimized App thread. I was wondering what coproc value I need to utilize all those shaders. i.e. if it's 25% speed increase per unit and I can run double the units concurrently then thats aloooot more than 25% performance boost. Heres hoping :) Otherwise I will sell the 5970's and keep the 4850x2's running if 0.5 is still the best answer :) |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Thanks. Your 5970s seem to be throttling on the second core. GPU 0 and 2 are taking about 30 seconds which is about right for 1600 shaders at 725 MHz, but GPU 1 and 3 are taking about 50 seconds which is slower than it should be. I would be experimenting with trying to reduce the load by only running one concurrent task, and/or increasing the w value from your current setting of 1.1. Perhaps you could also experiment with the f parameter to try and stop the throttling. Even if it slows down your fast GPU 0 and 2 cores a bit your total throughput should be greater and there will be less strain on your cards if you can reduce the load and stop the throttling on GPU core 1 and 3. Have you reduced the memory speed for Milkyway? Currently with the hot weather I am running my 4890 on MilkyWay with w1.2 to keep the temperature under 90°C. I always run MilkyWay with the memory set at 500 MHz. If I am running Collatz with memory at 1,000 MHz and switch back to MilkyWay but forget to reduce the memory to 500 Mhz then the temperature quickly rises to over 100°C which is too hot for my liking. |
Send message Joined: 3 Jul 09 Posts: 1 Credit: 22,557,885 RAC: 0 |
New WU, in size about 4x what they were before. 5 WU executed and all "Completed, marked as invalid". System: WinXP, GTX285, driver 190.62, Boinc 6.6.36, app version 0.21 (cuda23) No problem until these new WU's. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Not sure what my actual error rate is, but, I have over 100 on my system with GTX 295 cards ... there are almost as many that have validated. I know it is possible it is my hardware, though I rate that low for the simple reason that Collatz and GPU Grid also run on these cards and I don't see this kind of error rate. I also did not see this kind of error rate prior to the extension of run times. I think you should check some of these that are coming back as invalid to see if there might be an application / validator issue ... |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
Not sure what my actual error rate is, but, I have over 100 on my system with GTX 295 cards ... there are almost as many that have validated. I know it is possible it is my hardware, though I rate that low for the simple reason that Collatz and GPU Grid also run on these cards and I don't see this kind of error rate. I also did not see this kind of error rate prior to the extension of run times. My GTX 295's and GTX 260's are all ok with the new WU's, as is the 4850 and the 5970's. One happy cruncher here. Oh yeah, stats. BOINC 6.10.18, App 20b, cuda 190.62, all on Vista. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Well, my ATI cards have no problems with the tasks it seems. So the question is why do many if not most of them fail on CUDA cards? I have a hard time believing that 4 cards suddenly went bad... |
©2024 Astroinformatics Group