maximum time limit elapsed bug

Author	Message
LouisH Send message Joined: 13 Mar 09 Posts: 5 Credit: 1,366,490 RAC: 0	Message 50258 - Posted: 19 Jul 2011, 22:11:44 UTC It's working with the file "app_info.xml". Thank you :) But one WU on CPU has been deleted I think because of this :( 5 hours lost :( Look at this : 19/07/2011 22:36:01 \| \| ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.1417, 512MB, 1000 GFLOPS peak) 19/07/2011 22:36:01 \| Milkyway@home \| Found app_info.xml; using anonymous platform 19/07/2011 22:36:01 \| Milkyway@home \| [error] State file error: missing application milkyway_nbody 19/07/2011 22:36:01 \| Milkyway@home \| [error] Can't handle workunit in state file 19/07/2011 22:36:01 \| Milkyway@home \| [error] State file error: missing task ps_nbody_test3_499724 19/07/2011 22:36:01 \| Milkyway@home \| [error] Can't link task ps_nbody_test3_499724_0 in state file 19/07/2011 22:36:01 \| Milkyway@home \| [error] State file error: result ps_nbody_test3_499724_0 not found for task ID: 50258 · Rating: 0 · rate: / Reply Quote

Joseph Stateson Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,464,323,667 RAC: 6,166	Message 50281 - Posted: 20 Jul 2011, 13:22:59 UTC Last modified: 20 Jul 2011, 14:02:08 UTC Switching to BOINC 6.13 got me my first valid unit for my ATI 4890 http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=261689 Previously, the timeout interval was 82 seconds (from my understanding of the stderr) and my units would fail between %40 and %60 complete. After switching from 6.12.26 to 6.13 all work units are validing. I have not yet tested PrimeGrid which also failed with the 4890 board. Driver is 11.6 from ATI and when I brought up the catalyst control center, I got a driver failure and a message from CCC that it was switching to compatibility mode (whatever that means). I continued to get MW failures even after rebooting and then decided to switch to 6.13. The ATI 4890 was an xfxforce warrantee replacement for my defective nVidia gtx280. Seems they ran out of gtx280's. [EDIT] hmm ... spoke too soon. Got failures on two MW units on the 5850. Had not had any errors on that system before upgrading to 6.13 PrimeGrid is still failing on this 4890, but so far, all milkyways are getting to completion ID: 50281 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 50286 - Posted: 20 Jul 2011, 20:28:23 UTC hmm ... spoke too soon. Got failures on two MW units on the 5850. Had not had any errors on that system before upgrading to 6.13 Stay away from 6.13.0 ..... it was withdrawn very quickly with major bugs present. They seem to have fixed most of the ones that crept into 6.13.0 with the release of 6.13.1. However, they are all Alpha releases, so doubt 6.13.1 is really trustworthy just yet. Stick to 6.12.33 which is the latest stable release. I doubt it will solve all (any?) of your current ills, but its a racing certainty that 6.13.0 will not in any way help, and is highly likely to be detrimental to say the least. Regards Zy ID: 50286 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,203,143 RAC: 47,828	Message 50290 - Posted: 20 Jul 2011, 21:31:37 UTC - in response to Message 50286. 6.13.1 is really really bad too :( ID: 50290 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 50291 - Posted: 20 Jul 2011, 21:42:19 UTC Tried to upgrade the client to the Milkyway client - milkyway_separation_0.82_windows_intelx86_ati14, with the App_info file. It seemed to be OK but would not download any work, and when update requests were made FreeHAl kept resetting the work being crunched. In the end I gave up and returned to DNETC. I hope the stock GPU client is reworked to overcome this time lapse problem Go away, I was asleep ID: 50291 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 50292 - Posted: 20 Jul 2011, 22:01:00 UTC Last modified: 20 Jul 2011, 22:08:33 UTC Nothing dramatically new, but posting an observation as a lot of work has gone into the beasts lately. Had three go bang in fairly short order for some reason. Temperatures are fine, PC seems stable, came out of the blue really. <core_client_version>6.12.33</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Found 4 CAL devices Chose device 2 I did notice the card was decidedly "dragy" after them, I had to switch tabs a few times to encourage the counter to get going. In the end I stopped the BOINC Client, and restarted the BOINC Client, that seemed to clear it fine, and away she went again. No other problems. Whether or not the bad WU(s) were stuck in the GPU after crashing, and caused delays loading fresh ones, no idea, but it gave that impression as all was well when the Client restarted. Edit: Woa .... Welcome home Murphy .... Just had a Blue Screen, first one for weeks. No CPU WUs running, so its "clean run" for MW. Too fast to notice the Blue Screen notes, but was definitely "ati....something" as the nominal errent file, so something is still lurking inside the GPU application wise. Driver appears sold at present - no driver resets happening. Regards Zy ID: 50292 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,203,143 RAC: 47,828	Message 50293 - Posted: 20 Jul 2011, 22:03:35 UTC John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions. ID: 50293 · Rating: 0 · rate: / Reply Quote

LouisH Send message Joined: 13 Mar 09 Posts: 5 Credit: 1,366,490 RAC: 0	Message 50294 - Posted: 20 Jul 2011, 22:03:38 UTC My computer's graphics become slow when my GPU is working. (the windows move slowly for example). Can we adjust the % of the GPU like the CPU? With a limit of 80% for example? ID: 50294 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,203,143 RAC: 47,828	Message 50295 - Posted: 20 Jul 2011, 22:07:17 UTC - in response to Message 50292. Had three go bang in fairly short order for some reason. Temperatures are fine, PC seems stable, came out of the blue really. Interesting because I had WUs on 3 machines do the same thing a few hours ago after those boxes had been error free for quite a while. Maybe a rash of bad WUs? ID: 50295 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 50296 - Posted: 20 Jul 2011, 22:21:52 UTC Last modified: 20 Jul 2011, 22:36:10 UTC ...... and another BSOD, the nominal file was "atikpmag.sys", and seeing it again, it was that one last time. I'll turn down the card a bit see what happens, but it ran fine at 760/300 for well over an hour. All a bit strange, see how she goes. EDIT: Been running for 15 mins after that last BSOD @750/300 - Murphy allowing, seems ok again. Maybe 760/300 was just that little over the top, but seems strange after running with no problems for well over an hour, and 760/300 is hardly fast and furious for 2x5970s...... still, Crunch on as they say :) Regards Zy ID: 50296 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 50311 - Posted: 21 Jul 2011, 8:35:59 UTC - in response to Message 50293. John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions. I don't think so, I run 6.10.58 and that works fine for MW via ATI and Einstein, Rosetta, MW via CPU without any issues. I never update to the latest version as thay usually have more or newer bugs. Greetings from, TJ ID: 50311 · Rating: 0 · rate: / Reply Quote

Vepide Send message Joined: 22 Apr 11 Posts: 5 Credit: 5,578,008 RAC: 0	Message 50312 - Posted: 21 Jul 2011, 8:57:31 UTC I enountered errors with PrimeGrid after installing Catalyst 11.6 also and rolled back to 11.5 on my primary Win7 installation. I have not tested BOINC on my secondary Win7 installation which is running Catalyst 11.6 I think its an ATI driver problem as CCC 11.6 seems to be getting a few a,b,c, and d patches made available for it. But I don't know if they'll patch the problem with BOINC and PrimeGrid yet. Dual GPU's aren't both clocking to the maximum, one insists on remaining at default clocks...and the crashed PrimeGrid WU's seem to occurr when the video card switches to 3D mode or begins computation during the 3D mode switch too abruptly, and so BOINC or PrimeGrid may have to adjust a delayed respone time in ms. Again unclear wether this is a BOINC problem or an ATI problem. 4870 X2 here, somtimes in quadfire. I think its faster than the 6990 in computation due to having more cores in the 4870 X2, and clocks are secondary to speed. BOINC 6.12.33 as far as I can tell is stable with a new behavior, it seems to turn work units in soon after completion rather than allowing completed work units across projects to stack up for a manual update. ID: 50312 · Rating: 0 · rate: / Reply Quote

Vepide Send message Joined: 22 Apr 11 Posts: 5 Credit: 5,578,008 RAC: 0	Message 50313 - Posted: 21 Jul 2011, 9:06:52 UTC My 4870 X2 Cores run at max 800 Mhz...its just the memory won't OC to the max anymore like it did for a few older CCC versions. Otherwise X2 cores are stable even at temps of 175-190 Fahrenheit, its also my gaming rig. ID: 50313 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 50314 - Posted: 21 Jul 2011, 9:45:52 UTC Currently my both MW-crunching systems run fine, but ... - my mainsys: since I've upgradet to 11.6 my second screen flickers sometimes. The problem is described in the hotfix-list for the update to 11.6b; however, the flicker problem is not solved. - my Integrator-pc , which usually has one ATI and one nVidia- card, was unusable after upgrading to 11.6. It failed with a blue-screen twice an hour, with or without boinc running, always with an ati-file shown in the dump(sorry, I cannot tell you which one). I removed ALL drivers, nVidia and ATI, uninstalled all vendor software and reinstalled CCC 11.6b. It's running clean now since three days. ID: 50314 · Rating: 0 · rate: / Reply Quote

S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0	Message 50315 - Posted: 21 Jul 2011, 11:58:25 UTC Can I ask a stupid guestion ? Why are people discussing other problems ? This is a thread for "maximum time limit elapsed bug". ID: 50315 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 50316 - Posted: 21 Jul 2011, 13:01:30 UTC The problem is, that no one can say for shure, what the cause of the problem is. In this case it makes sense to eliminate all possible sources. The one, which can not be eliminated is most likely the one which causes the pain. ID: 50316 · Rating: 0 · rate: / Reply Quote

S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0	Message 50319 - Posted: 21 Jul 2011, 13:43:34 UTC - in response to Message 50316. I agree, but not if people are talking BSOD\Incorrect function. (0x1) - exit code 1 (0x1)\etc. instead of the maximum time limit error. On that note: I wonder what will happen if instead of <flops>1.0e11</flops> (100 GFLOPS) in my app_info file I put in a number greater then the estimated task size received from the server. So if the server estimates 50000 GFLOPS for the WU and I enter a value greater then that. Then the program should be finished in less then a second which is not true of course it takes longer. Perhaps a test for later. ID: 50319 · Rating: 0 · rate: / Reply Quote

S@NL - EStorm Send message Joined: 15 Jul 11 Posts: 14 Credit: 5,978,191 RAC: 0	Message 50321 - Posted: 21 Jul 2011, 13:58:47 UTC I just did a small test and I got: <core_client_version>6.12.33</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> ]]> I changed the <flops>1.0e11</flops> to <flops>6002.0e11</flops> in the app_info file. So if the program runs without app_info and the estimated application speed is greater then the estimated task size you will receive this error. ID: 50321 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 19 Jul 10 Posts: 828 Credit: 21,653,414 RAC: 6,669	Message 50322 - Posted: 21 Jul 2011, 14:13:55 UTC - in response to Message 50321. Not surprising, if you increase the flops of the app, BOINC thinks it is faster and expect it to be ready with the task faster than with lower value (= slower app). You either have to decrease the flops of the app or increase rsc_fpops_bound for BOINC should allow the app to run longer. ID: 50322 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,203,143 RAC: 47,828	Message 50323 - Posted: 21 Jul 2011, 14:18:03 UTC - in response to Message 50311. Last modified: 21 Jul 2011, 14:25:06 UTC John, one problem is that you need to update your BOINC version to current. Then we can perhaps offer suggestions. I don't think so, I run 6.10.58 and that works fine for MW via ATI and Einstein, Rosetta, MW via CPU without any issues. I never update to the latest version as thay usually have more or newer bugs. If you looked you might have seen that he's running 6.12.22 which was quite buggy. Your version is more stable but 6.12.33 is better yet. It's hard to properly diagnose the type of problem he's having when he has a BOINC version with many known problems. I really don't make suggestions to hear myself talk. If you guys want me to stop helping people I will. Don't really want to waste time on senseless arguments. ID: 50323 · Rating: 0 · rate: / Reply Quote