Hanging Work Units

Author	Message
KWSN imcrazynow Send message Joined: 22 Nov 08 Posts: 136 Credit: 319,414,799 RAC: 0	Message 25850 - Posted: 17 Jun 2009, 22:53:43 UTC I've had another seriously hanging work unit. Note the GPU time and the wall clock times. This is for task: ps_sgr_218F5_3s_wtest_564836_1245261734_0 Work Unit ID:81999922 <core_client_version>6.4.7</core_client_version> <![CDATA[ <stderr_txt> Running Milkyway@home ATI GPU application version 0.19f by Gipsel CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 3.54596 GHz (289ms) CAL Runtime: 1.3.145 Found 1 CAL device Device 0: ATI Radeon HD 4800 (RV770) 1024 MB local RAM (remote 28 MB cached + 1024 MB uncached) GPU core clock: 750 MHz, memory clock: 900 MHz 800 shader units organized in 10 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision 3 WUs already running on GPU 0 No free GPU! Waiting ... 119.516 seconds. Starting WU on GPU 0 main integral, 320 iterations predicted runtime per iteration is 191 ms (33.3333 ms are allowed), dividing each iteration in 6 parts borders of the domains at 0 272 536 800 1072 1336 1600 Calculated about 9.89542e+012 floatingpoint ops on GPU, 1.23583e+008 on FPU. Approximate GPU time 16130 seconds. probability calculation (stars) Calculated about 3.05993e+009 floatingpoint ops on FPU. WU completed. CPU time: 8.85938 seconds, GPU time: 16130 seconds, wall clock time: 16389.6 seconds, CPU frequency: 3.546 GHz I'm running the Prime Grid Challenge on 3 cores with 1 core reserved for MW. Does anybody have any idea what may be going on here. This is the second hanging w/u i've caught this week. It will cause MW to stop processing on the gpu until it completes. Aparently successfully as i'm given credit for it. Not much for over 16K seconds but it does validate and get credit. 4870 GPU 4870 GPU ID: 25850 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 25858 - Posted: 18 Jun 2009, 0:40:06 UTC - in response to Message 25850. Perhaps the CPU core you allocated to MilkyWay switched to PrimeGrid before the MilkyWay task completed. The MilkyWay status would show as "Waiting to Run" until BOINC calculated the accumulated debt was repaid. Then after about 4.5 hours it switched back and the task completed. It seems to me if you run with a <avg_ncpus>value of less than 1, then the estimated "To completion" time for Milky Way tasks may be incorrect and thus a scheduling debt could build up. This is just a possibility, others with more experience and knowledge may be able to advise you better. I now run only one MilkyWay task at a time with <avg_ncpus> value of 1 and have had few problems. I had some trouble with this configuration once after stopping and restarting BOINC so now have a cc_config.xml file with <zero_debts>1</zero_debts> (works with BOINC 6.6.11 and above). Also I download 1.5 days of Einstein work at a time and then set Einstein to No new tasks so that MilkyWay will continue to download new work. This is working for me with BOINC 6.6.31, HD 3850, Xeon W3520 and Einstein as my other project. Your HD 48xx, Q6600 and PrimeGrid may require a different configuration. ID: 25858 · Rating: 0 · rate: / Reply Quote

Conan Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,524,618 RAC: 953	Message 26007 - Posted: 19 Jun 2009, 12:21:32 UTC Have also caught a WU that hung on my GPU WU completed. CPU time: 7.875 seconds, GPU time: 25267.7 seconds, wall clock time: 25561 seconds, Mine appears to have been caused by BOINC allocating too many resources to other projects that I run and did not leave enough CPU power to run the GPU, so it did not do anything for about 7 hours. It had started and said it was running but all that was happening was the time to completion keep ticking over. As soon as I suspended a couple of WUs Milkyway started and completed all work it had in the queue, including the 7 hour WU (got a mighty 10 cr/h for that one). Now I have the unfortunate circumstance of the LTD saying Milkyway has to owe other projects and now I can't get any work. I might have to micro manage BOINC for a while, running Einstein, Docking, AQUA, Ralph at the moment. Conan. ID: 26007 · Rating: 0 · rate: / Reply Quote