Welcome to MilkyWay@home

maximum time limit elapsed bug

Message boards : News : maximum time limit elapsed bug
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
LouisH

Send message
Joined: 13 Mar 09
Posts: 5
Credit: 1,366,490
RAC: 0
Message 50258 - Posted: 19 Jul 2011, 22:11:44 UTC

It's working with the file "app_info.xml". Thank you :)

But one WU on CPU has been deleted I think because of this :( 5 hours lost :(
Look at this :

19/07/2011 22:36:01 | | ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.1417, 512MB, 1000 GFLOPS peak)
19/07/2011 22:36:01 | Milkyway@home | Found app_info.xml; using anonymous platform
19/07/2011 22:36:01 | Milkyway@home | [error] State file error: missing application milkyway_nbody
19/07/2011 22:36:01 | Milkyway@home | [error] Can't handle workunit in state file
19/07/2011 22:36:01 | Milkyway@home | [error] State file error: missing task ps_nbody_test3_499724
19/07/2011 22:36:01 | Milkyway@home | [error] Can't link task ps_nbody_test3_499724_0 in state file
19/07/2011 22:36:01 | Milkyway@home | [error] State file error: result ps_nbody_test3_499724_0 not found for task
ID: 50258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 50281 - Posted: 20 Jul 2011, 13:22:59 UTC
Last modified: 20 Jul 2011, 14:02:08 UTC

Switching to BOINC 6.13 got me my first valid unit for my ATI 4890
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=261689

Previously, the timeout interval was 82 seconds (from my understanding of the stderr) and my units would fail between %40 and %60 complete. After switching from 6.12.26 to 6.13 all work units are validing.

I have not yet tested PrimeGrid which also failed with the 4890 board. Driver is 11.6 from ATI and when I brought up the catalyst control center, I got a driver failure and a message from CCC that it was switching to compatibility mode (whatever that means). I continued to get MW failures even after rebooting and then decided to switch to 6.13.

The ATI 4890 was an xfxforce warrantee replacement for my defective nVidia gtx280. Seems they ran out of gtx280's.

[EDIT]
hmm ... spoke too soon. Got failures on two MW units on the 5850. Had not had any errors on that system before upgrading to 6.13

PrimeGrid is still failing on this 4890, but so far, all milkyways are getting to completion
ID: 50281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 50286 - Posted: 20 Jul 2011, 20:28:23 UTC

hmm ... spoke too soon. Got failures on two MW units on the 5850. Had not had any errors on that system before upgrading to 6.13


Stay away from 6.13.0 ..... it was withdrawn very quickly with major bugs present. They seem to have fixed most of the ones that crept into 6.13.0 with the release of 6.13.1. However, they are all Alpha releases, so doubt 6.13.1 is really trustworthy just yet.

Stick to 6.12.33 which is the latest stable release. I doubt it will solve all (any?) of your current ills, but its a racing certainty that 6.13.0 will not in any way help, and is highly likely to be detrimental to say the least.

Regards
Zy
ID: 50286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50290 - Posted: 20 Jul 2011, 21:31:37 UTC - in response to Message 50286.  

6.13.1 is really really bad too :(
ID: 50290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 50291 - Posted: 20 Jul 2011, 21:42:19 UTC

Tried to upgrade the client to the Milkyway client -

milkyway_separation_0.82_windows_intelx86_ati14, with the App_info file.

It seemed to be OK but would not download any work, and when update requests were made FreeHAl kept resetting the work being crunched.

In the end I gave up and returned to DNETC.

I hope the stock GPU client is reworked to overcome this time lapse problem
Go away, I was asleep


ID: 50291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 50292 - Posted: 20 Jul 2011, 22:01:00 UTC
Last modified: 20 Jul 2011, 22:08:33 UTC

Nothing dramatically new, but posting an observation as a lot of work has gone into the beasts lately.

Had three go bang in fairly short order for some reason. Temperatures are fine, PC seems stable, came out of the blue really.

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Found 4 CAL devices
Chose device 2

I did notice the card was decidedly "dragy" after them, I had to switch tabs a few times to encourage the counter to get going. In the end I stopped the BOINC Client, and restarted the BOINC Client, that seemed to clear it fine, and away she went again. No other problems. Whether or not the bad WU(s) were stuck in the GPU after crashing, and caused delays loading fresh ones, no idea, but it gave that impression as all was well when the Client restarted.

Edit: Woa .... Welcome home Murphy .... Just had a Blue Screen, first one for weeks. No CPU WUs running, so its "clean run" for MW. Too fast to notice the Blue Screen notes, but was definitely "ati....something" as the nominal errent file, so something is still lurking inside the GPU application wise. Driver appears sold at present - no driver resets happening.

Regards
Zy
ID: 50292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50293 - Posted: 20 Jul 2011, 22:03:35 UTC

John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions.
ID: 50293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LouisH

Send message
Joined: 13 Mar 09
Posts: 5
Credit: 1,366,490
RAC: 0
Message 50294 - Posted: 20 Jul 2011, 22:03:38 UTC

My computer's graphics become slow when my GPU is working. (the windows move slowly for example).
Can we adjust the % of the GPU like the CPU? With a limit of 80% for example?
ID: 50294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50295 - Posted: 20 Jul 2011, 22:07:17 UTC - in response to Message 50292.  

Had three go bang in fairly short order for some reason. Temperatures are fine, PC seems stable, came out of the blue really.

Interesting because I had WUs on 3 machines do the same thing a few hours ago after those boxes had been error free for quite a while. Maybe a rash of bad WUs?
ID: 50295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 50296 - Posted: 20 Jul 2011, 22:21:52 UTC
Last modified: 20 Jul 2011, 22:36:10 UTC

...... and another BSOD, the nominal file was "atikpmag.sys", and seeing it again, it was that one last time. I'll turn down the card a bit see what happens, but it ran fine at 760/300 for well over an hour. All a bit strange, see how she goes.

EDIT: Been running for 15 mins after that last BSOD @750/300 - Murphy allowing, seems ok again. Maybe 760/300 was just that little over the top, but seems strange after running with no problems for well over an hour, and 760/300 is hardly fast and furious for 2x5970s...... still, Crunch on as they say :)

Regards
Zy
ID: 50296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 50311 - Posted: 21 Jul 2011, 8:35:59 UTC - in response to Message 50293.  

John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions.


I don't think so, I run 6.10.58 and that works fine for MW via ATI and Einstein, Rosetta, MW via CPU without any issues.

I never update to the latest version as thay usually have more or newer bugs.
Greetings from,
TJ
ID: 50311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vepide

Send message
Joined: 22 Apr 11
Posts: 5
Credit: 5,578,008
RAC: 0
Message 50312 - Posted: 21 Jul 2011, 8:57:31 UTC

I enountered errors with PrimeGrid after installing Catalyst 11.6 also and rolled back to 11.5 on my primary Win7 installation. I have not tested BOINC on my secondary Win7 installation which is running Catalyst 11.6

I think its an ATI driver problem as CCC 11.6 seems to be getting a few a,b,c, and d patches made available for it. But I don't know if they'll patch the problem with BOINC and PrimeGrid yet. Dual GPU's aren't both clocking to the maximum, one insists on remaining at default clocks...and the crashed PrimeGrid WU's seem to occurr when the video card switches to 3D mode or begins computation during the 3D mode switch too abruptly, and so BOINC or PrimeGrid may have to adjust a delayed respone time in ms. Again unclear wether this is a BOINC problem or an ATI problem. 4870 X2 here, somtimes in quadfire. I think its faster than the 6990 in computation due to having more cores in the 4870 X2, and clocks are secondary to speed.

BOINC 6.12.33 as far as I can tell is stable with a new behavior, it seems to turn work units in soon after completion rather than allowing completed work units across projects to stack up for a manual update.
ID: 50312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vepide

Send message
Joined: 22 Apr 11
Posts: 5
Credit: 5,578,008
RAC: 0
Message 50313 - Posted: 21 Jul 2011, 9:06:52 UTC

My 4870 X2 Cores run at max 800 Mhz...its just the memory won't OC to the max anymore like it did for a few older CCC versions. Otherwise X2 cores are stable even at temps of 175-190 Fahrenheit, its also my gaming rig.
ID: 50313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 50314 - Posted: 21 Jul 2011, 9:45:52 UTC

Currently my both MW-crunching systems run fine, but ...
- my mainsys: since I've upgradet to 11.6 my second screen flickers sometimes. The problem is described in the hotfix-list for the update to 11.6b; however, the flicker problem is not solved.
- my Integrator-pc , which usually has one ATI and one nVidia- card, was unusable after upgrading to 11.6. It failed with a blue-screen twice an hour, with or without boinc running, always with an ati-file shown in the dump(sorry, I cannot tell you which one).

I removed ALL drivers, nVidia and ATI, uninstalled all vendor software and reinstalled CCC 11.6b.

It's running clean now since three days.
ID: 50314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50315 - Posted: 21 Jul 2011, 11:58:25 UTC

Can I ask a stupid guestion ?
Why are people discussing other problems ?
This is a thread for "maximum time limit elapsed bug".
ID: 50315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 50316 - Posted: 21 Jul 2011, 13:01:30 UTC

The problem is, that no one can say for shure, what the cause of the problem is.

In this case it makes sense to eliminate all possible sources. The one, which can not be eliminated is most likely the one which causes the pain.
ID: 50316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50319 - Posted: 21 Jul 2011, 13:43:34 UTC - in response to Message 50316.  

I agree, but not if people are talking BSOD\Incorrect function. (0x1) - exit code 1 (0x1)\etc. instead of the maximum time limit error.

On that note:
I wonder what will happen if instead of <flops>1.0e11</flops> (100 GFLOPS) in my app_info file I put in a number greater then the estimated task size received from the server.
So if the server estimates 50000 GFLOPS for the WU and I enter a value greater then that. Then the program should be finished in less then a second which is not true of course it takes longer.
Perhaps a test for later.
ID: 50319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50321 - Posted: 21 Jul 2011, 13:58:47 UTC

I just did a small test and I got:
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
]]>

I changed the <flops>1.0e11</flops> to <flops>6002.0e11</flops> in the app_info file.
So if the program runs without app_info and the estimated application speed is greater then the estimated task size you will receive this error.
ID: 50321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 50322 - Posted: 21 Jul 2011, 14:13:55 UTC - in response to Message 50321.  

Not surprising, if you increase the flops of the app, BOINC thinks it is faster and expect it to be ready with the task faster than with lower value (= slower app). You either have to decrease the flops of the app or increase rsc_fpops_bound for BOINC should allow the app to run longer.
ID: 50322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50323 - Posted: 21 Jul 2011, 14:18:03 UTC - in response to Message 50311.  
Last modified: 21 Jul 2011, 14:25:06 UTC

John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions.

I don't think so, I run 6.10.58 and that works fine for MW via ATI and Einstein, Rosetta, MW via CPU without any issues.
I never update to the latest version as thay usually have more or newer bugs.

If you looked you might have seen that he's running 6.12.22 which was quite buggy. Your version is more stable but 6.12.33 is better yet. It's hard to properly diagnose the type of problem he's having when he has a BOINC version with many known problems. I really don't make suggestions to hear myself talk. If you guys want me to stop helping people I will. Don't really want to waste time on senseless arguments.
ID: 50323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : News : maximum time limit elapsed bug

©2024 Astroinformatics Group