Welcome to MilkyWay@home

maximum time limit exceeded bug

Message boards : News : maximum time limit exceeded bug
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 50348 - Posted: 25 Jul 2011, 18:04:26 UTC

It seems like people are still (sigh) having this problem. Let me know if you're seeing it (and give me a host id) so I can try and debug it.

--Travis
ID: 50348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50349 - Posted: 25 Jul 2011, 18:18:34 UTC

It happens on all of mine unless I use an app_info.xml (all ATI). That tells me that it's an invalid flops estimate that's getting passed. The one in the app_info.xml allows enough time that the WU doesn't time out:

<flops>1.0e11</flops>
ID: 50349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Simplex0
Avatar

Send message
Joined: 11 Nov 07
Posts: 232
Credit: 178,229,009
RAC: 0
Message 50350 - Posted: 25 Jul 2011, 18:33:34 UTC

On my boxes it works just fine, it seams to depend on the configuration.
I have the folowing...

BOX 1

OS = Vista64
Driver = Catalyst 11.3
GPU = 2 X 5870
CPU = Q6600
BOINC = 6.10.60
Let BOINC use 75% of the CPU


BOX 2

OS = Vista64
Driver = Catalyst 11.6
GPU = 6970 (unlocked 6950)
CPU = i7 920
BOINC = 6.10.60
Let BOINC use 75% of the CPU
ID: 50350 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50352 - Posted: 25 Jul 2011, 18:46:13 UTC

ID: 50352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bornerdogge

Send message
Joined: 19 Sep 08
Posts: 4
Credit: 1,955,671
RAC: 0
Message 50353 - Posted: 25 Jul 2011, 18:47:28 UTC

I have the problem, unless I use the modified app_info.xml, as Beyond said.

Win XP SP3 32 bits
AMD Athlon 64 X2
ATI HD4830, Catalyst 11.3
BOINC 6.12.33
ID: 50353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50376 - Posted: 26 Jul 2011, 6:09:02 UTC
Last modified: 26 Jul 2011, 6:15:49 UTC

I also had the problem, unless the app_info was used.
So I tested it yesterday on boinc 6.12.26 and you know what happend:
When the WU started the estimated speed was larger then estimated WU size but after the WU finished OK and the next one started the DCF which was just below 1 changed to 99 or 100 which caused the next one to have estimates which where OK.
When I was on boinc 6.12.33 this did not happen.
ID: 50376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sphynx

Send message
Joined: 24 May 10
Posts: 5
Credit: 351,431,383
RAC: 1,121
Message 50377 - Posted: 26 Jul 2011, 11:27:52 UTC

Just thought I'd post this because I've finally gotten my ATI 4870 to work without "elapsed time error" and thought it might be useful to others and possibly for actually fixing the problem.

The recent aps posted at Arkayn addresses this issue. Check out their aps for your OS and DL the one that fits your chip instructions (SSE1, SSE2, SSE3...etc........) The WU's I have been sent, work for the 0.82 files.

http://www.arkayn.us/forum/index.php...wnloads;cat=11

app_info worked. the 4870 is now producing valid WU's.


Below is a post of mine from another forum describing the problem -



Just bought a used 4870 for my 2nd rig for the express purpose of crunching MW. It works fine on Collatz, but fails all MW wu immediately. With updated drivers it would make it to 80% complete, then get a computational error. With the original disk drivers it fails at 1 second. BTW, it is not OCed. Boinc "Properties" reporting way too high gflps.

xp pro 32
Q6600 quad
Catalyst 10.10
Boinc 6.12.33



Here's one of the error messages:

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE2 path
Failed to get CAL device attributes: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER)
Error getting device information: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER)
Failed to get CAL info: Parameter passed in is invalid (CAL_RESULT_INVALID_PARAMETER)
Failed to setup CAL
10:10:53 (3296): called boinc_finish

</stderr_txt>


Another error message...

Stderr output

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE2 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_770
Revision: 2
CAL Version: 1.4.838
Engine clock: 750 Mhz
Memory clock: 900 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 10
Number shader engines: 1
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 8192, 8192 }

Estimated iteration time 330.481667 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 9 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 9 chunk(s) with sizes: 176 176 176 176 176 176 176 176 192
Integration time = 200.021940 s, average per iteration = 312.534281 ms
Integral 0 time = 202.951483 s
Likelihood time = 3.230086 s
<background_integral> 0.000928726019059 </background_integral>
<stream_integral> 353.723081457166980 130.799939675965280 1464.654321262977600 </stream_integral>
<background_likelihood> -3.602249639427543 </background_likelihood>
<stream_only_likelihood> -17.295920317294978 -4.388825045143967 -4.522211712778050 </stream_only_likelihood>
<search_likelihood> -3.096088347341647 </search_likelihood>
<search_application> milkywayathome_client separation 0.82 Windows x86 double CAL++ </search_application>
00:20:30 (3416): called boinc_finish

</stderr_txt>
ID: 50377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50404 - Posted: 27 Jul 2011, 3:09:32 UTC

I've seen some wu's get stuck on my 5970. Normally they complete in 1.5 to 2.5 minutes, but I've seen one at 30 minutes and running, so I suspended and unsuspended it and it appeared to finish 'normally' with the time jumping back to about 60 sec on restart whic it then immediately completed and reported OK. Another I found with 12 minutes on the clock. Suspend and unsuspend also got it to complete, though I missed the resulting wu completion and reporting.

So the question is why do they get 'stuck'?
ID: 50404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50420 - Posted: 27 Jul 2011, 20:10:32 UTC

Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit.

Other wu's are completing in the 'normal' amount of time.

ID: 50420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50422 - Posted: 27 Jul 2011, 21:01:25 UTC - in response to Message 50420.  

Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit.

Other wu's are completing in the 'normal' amount of time.



I had this Problem too today and yesterday on my dual GPU machine.

But this has nothing to do with time limit exceeded bug
ID: 50422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50432 - Posted: 28 Jul 2011, 9:18:44 UTC

I just wish the maximum time limit would kick in and stop the wu. When I got home today I had one at 7.5hrs and counting...this is really hurting my RAC!
ID: 50432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50433 - Posted: 28 Jul 2011, 11:30:56 UTC - in response to Message 50422.  

Until a couple of days ago I had not seen this 'bug' but now I've had quite a few wu's get stuck. They haven't timed out with maximum time limit exceeded. Since my previous post I've had 2 more wu's get stuck. I found one after 3hrs40min and another after 3hrs20min. There was no GPU activity, but 100% CPU activity. I suspended/unsuspended both and they both jumped back to just over 1 min elapsed time and completed OK on the CPU only about 5 seconds later. The both reported OK and received credit.
Other wu's are completing in the 'normal' amount of time.

I had this Problem too today and yesterday on my dual GPU machine.
But this has nothing to do with time limit exceeded bug

Are they always the same type of WU or do they vary?
ID: 50433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50449 - Posted: 29 Jul 2011, 10:22:52 UTC

Haven't had any more get stuck. The only thing that has changed is that my 5970 is now running MW on both cores, whereas before it was only running on 1 core while Collatz was running on the other.
ID: 50449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50452 - Posted: 29 Jul 2011, 21:40:58 UTC

Spoke too soon. Caught a 17_3s_fix_2 wu stuck at 1hr20min when I woke up this morning...
ID: 50452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50453 - Posted: 30 Jul 2011, 2:13:16 UTC

And another stuck for 1hr 10min. This time a 13_3s_free_2 wu.
ID: 50453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Acid303

Send message
Joined: 5 Mar 11
Posts: 3
Credit: 31,381,479
RAC: 0
Message 50456 - Posted: 30 Jul 2011, 7:03:04 UTC

host ID: 268557

pls fix
ID: 50456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50458 - Posted: 30 Jul 2011, 9:17:28 UTC

I decided to run 2 wu's at a time per core to see if this would at least overcome loosing RAC and it worked. Just found another stuck wu this time a 82_2s_mix1_1 wu at 3hrs19min. Suspended/Unsuspended/Completed OK/Granted Credit. Core stayed at 100% and kept on crunching other wu's.
ID: 50458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50459 - Posted: 30 Jul 2011, 12:43:34 UTC - in response to Message 50458.  

I decided to run 2 wu's at a time per core to see if this would at least overcome loosing RAC and it worked. Just found another stuck wu this time a 82_2s_mix1_1 wu at 3hrs19min. Suspended/Unsuspended/Completed OK/Granted Credit. Core stayed at 100% and kept on crunching other wu's.

Glad you found a workaround. I've been watching for this since you've been posting and so far haven't seen it. Most of my machines are dual GPU but no 5970s.
ID: 50459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
IrateAdmin
Avatar

Send message
Joined: 6 Apr 11
Posts: 7
Credit: 59,288,856
RAC: 0
Message 50471 - Posted: 31 Jul 2011, 22:31:15 UTC - in response to Message 50348.  
Last modified: 31 Jul 2011, 22:32:38 UTC

I am having this problem

7/31/2011 5:15:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207178_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)
7/31/2011 5:15:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207177_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)
7/31/2011 5:15:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207173_1 using milkyway version 82
7/31/2011 5:15:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207172_1 using milkyway version 82
7/31/2011 5:16:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207178_1 finished
7/31/2011 5:16:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207177_1 finished
7/31/2011 5:17:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207173_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)
7/31/2011 5:17:46 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207172_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)
7/31/2011 5:17:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207171_1 using milkyway version 82
7/31/2011 5:17:46 PM | Milkyway@home | Starting task ps_separation_13_3s_fix20_2_1207170_1 using milkyway version 82
7/31/2011 5:18:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207173_1 finished
7/31/2011 5:18:46 PM | Milkyway@home | Computation for task ps_separation_13_3s_fix20_2_1207172_1 finished
7/31/2011 5:19:47 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207171_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)
7/31/2011 5:19:47 PM | Milkyway@home | Aborting task ps_separation_13_3s_fix20_2_1207170_1: exceeded elapsed time limit 118.68 (3958564.39G/33354.06G)


http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=297896
ID: 50471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rick

Send message
Joined: 30 Jul 11
Posts: 1
Credit: 35,309
RAC: 0
Message 50486 - Posted: 2 Aug 2011, 0:45:46 UTC

I don't get any error messages, however I stopped receiving work a couple of days ago. I have only been on the project for a few days.
ID: 50486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : maximum time limit exceeded bug

©2024 Astroinformatics Group