Message boards :
News :
maximum time limit exceeded bug
Message board moderation
Author | Message |
---|---|
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
It seems like people are still (sigh) having this problem. Let me know if you're seeing it (and give me a host id) so I can try and debug it. --Travis ![]() |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,247,949 RAC: 15,852 ![]() ![]() |
It happens on all of mine unless I use an app_info.xml (all ATI). That tells me that it's an invalid flops estimate that's getting passed. The one in the app_info.xml allows enough time that the WU doesn't time out: <flops>1.0e11</flops> |
![]() ![]() Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 ![]() ![]() |
On my boxes it works just fine, it seams to depend on the configuration. I have the folowing... BOX 1 OS = Vista64 Driver = Catalyst 11.3 GPU = 2 X 5870 CPU = Q6600 BOINC = 6.10.60 Let BOINC use 75% of the CPU BOX 2 OS = Vista64 Driver = Catalyst 11.6 GPU = 6970 (unlocked 6950) CPU = i7 920 BOINC = 6.10.60 Let BOINC use 75% of the CPU |
FruehwF Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0 ![]() ![]() |
|
Bornerdogge Send message Joined: 19 Sep 08 Posts: 4 Credit: 1,955,671 RAC: 0 ![]() ![]() |
I have the problem, unless I use the modified app_info.xml, as Beyond said. Win XP SP3 32 bits AMD Athlon 64 X2 ATI HD4830, Catalyst 11.3 BOINC 6.12.33 |
S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0 ![]() ![]() |
I also had the problem, unless the app_info was used. So I tested it yesterday on boinc 6.12.26 and you know what happend: When the WU started the estimated speed was larger then estimated WU size but after the WU finished OK and the next one started the DCF which was just below 1 changed to 99 or 100 which caused the next one to have estimates which where OK. When I was on boinc 6.12.33 this did not happen. |
![]() Send message Joined: 24 May 10 Posts: 5 Credit: 349,964,812 RAC: 1,075 ![]() ![]() |
Just thought I'd post this because I've finally gotten my ATI 4870 to work without "elapsed time error" and thought it might be useful to others and possibly for actually fixing the problem. The recent aps posted at Arkayn addresses this issue. Check out their aps for your OS and DL the one that fits your chip instructions (SSE1, SSE2, SSE3...etc........) The WU's I have been sent, work for the 0.82 files. http://www.arkayn.us/forum/index.php...wnloads;cat=11 app_info worked. the 4870 is now producing valid WU's. Below is a post of mine from another forum describing the problem - Just bought a used 4870 for my 2nd rig for the express purpose of crunching MW. It works fine on Collatz, but fails all MW wu immediately. With updated drivers it would make it to 80% complete, then get a computational error. With the original disk drivers it fails at 1 second. BTW, it is not OCed. Boinc "Properties" reporting way too high gflps. xp pro 32 Q6600 quad Catalyst 10.10 Boinc 6.12.33 Here's one of the error messages: <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE2 path Failed to get CAL device attributes: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER) Error getting device information: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER) Failed to get CAL info: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER) Failed to setup CAL 10:10:53 (3296): called boinc_finish </stderr_txt> Another error message... Stderr output <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE2 path Found 1 CAL devices Chose device 0 Device target: CAL_TARGET_770 Revision: 2 CAL Version: 1.4.838 Engine clock: 750 Mhz Memory clock: 900 Mhz GPU RAM: 1024 Wavefront size: 64 Double precision: CAL_TRUE Compute shader: CAL_TRUE Number SIMD: 10 Number shader engines: 1 Pitch alignment: 256 Surface alignment: 256 Max size 2D: { 8192, 8192 } Estimated iteration time 330.481667 ms Target frequency 30.000000 Hz, polling mode 1 Dividing into 9 chunks, initially sleeping for 0 ms Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 } Using 9 chunk(s) with sizes: 176 176 176 176 176 176 176 176 192 Integration time = 200.021940 s, average per iteration = 312.534281 ms Integral 0 time = 202.951483 s Likelihood time = 3.230086 s <background_integral> 0.000928726019059 </background_integral> <stream_integral> 353.723081457166980 130.799939675965280 1464.654321262977600 </stream_integral> <background_likelihood> -3.602249639427543 </background_likelihood> <stream_only_likelihood> -17.295920317294978 -4.388825045143967 -4.522211712778050 </stream_only_likelihood> <search_likelihood> -3.096088347341647 </search_likelihood> <search_application> milkywayathome_client separation 0.82 Windows x86 double CAL++ </search_application> 00:20:30 (3416): called boinc_finish </stderr_txt> |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
I've seen some wu's get stuck on my 5970. Normally they complete in 1.5 to 2.5 minutes, but I've seen one at 30 minutes and running, so I suspended and unsuspended it and it appeared to finish 'normally' with the time jumping back to about 60 sec on restart whic it then immediately completed and reported OK. Another I found with 12 minutes on the clock. Suspend and unsuspend also got it to complete, though I missed the resulting wu completion and reporting. So the question is why do they get 'stuck'? |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit. Other wu's are completing in the 'normal' amount of time. |
FruehwF Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0 ![]() ![]() |
Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit. I had this Problem too today and yesterday on my dual GPU machine. But this has nothing to do with time limit exceeded bug |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
I just wish the maximum time limit would kick in and stop the wu. When I got home today I had one at 7.5hrs and counting...this is really hurting my RAC! |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,247,949 RAC: 15,852 ![]() ![]() |
Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit. Are they always the same type of WU or do they vary? |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
Haven't had any more get stuck. The only thing that has changed is that my 5970 is now running MW on both cores, whereas before it was only running on 1 core while Collatz was running on the other. |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
Spoke too soon. Caught a 17_3s_fix_2 wu stuck at 1hr20min when I woke up this morning... |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
And another stuck for 1hr 10min. This time a 13_3s_free_2 wu. |
Acid303 Send message Joined: 5 Mar 11 Posts: 3 Credit: 31,381,479 RAC: 0 ![]() ![]() |
host ID: 268557 pls fix |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
I decided to run 2 wu's at a time per core to see if this would at least overcome loosing RAC and it worked. Just found another stuck wu this time a 82_2s_mix1_1 wu at 3hrs19min. Suspended/Unsuspended/Completed OK/Granted Credit. Core stayed at 100% and kept on crunching other wu's. |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,247,949 RAC: 15,852 ![]() ![]() |
I decided to run 2 wu's at a time per core to see if this would at least overcome loosing RAC and it worked. Just found another stuck wu this time a 82_2s_mix1_1 wu at 3hrs19min. Suspended/Unsuspended/Completed OK/Granted Credit. Core stayed at 100% and kept on crunching other wu's. Glad you found a workaround. I've been watching for this since you've been posting and so far haven't seen it. Most of my machines are dual GPU but no 5970s. |
IrateAdmin![]() Send message Joined: 6 Apr 11 Posts: 7 Credit: 59,288,856 RAC: 0 ![]() ![]() |
I am having this problem 7/31/2011 5:15:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207178_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) 7/31/2011 5:15:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207177_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) 7/31/2011 5:15:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207173_1 using milkyway version 82 7/31/2011 5:15:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207172_1 using milkyway version 82 7/31/2011 5:16:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207178_1 finished 7/31/2011 5:16:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207177_1 finished 7/31/2011 5:17:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207173_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) 7/31/2011 5:17:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207172_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) 7/31/2011 5:17:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207171_1 using milkyway version 82 7/31/2011 5:17:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207170_1 using milkyway version 82 7/31/2011 5:18:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207173_1 finished 7/31/2011 5:18:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207172_1 finished 7/31/2011 5:19:47 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207171_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) 7/31/2011 5:19:47 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207170_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G) http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=297896 |
Rick Send message Joined: 30 Jul 11 Posts: 1 Credit: 35,309 RAC: 0 ![]() ![]() |
I don't get any error messages, however I stopped receiving work a couple of days ago. I have only been on the project for a few days. |
©2023 Astroinformatics Group