No work being D/Led & no warning messages

Author	Message
John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 9498 - Posted: 1 Feb 2009, 2:25:16 UTC Last modified: 1 Feb 2009, 2:33:47 UTC I am knocking off the MW work nicely, but the requests for new work are not succeeding in getting further work. I have no problems with work not validating. At the same time the BOINC messages tab is giving no reasons beyond this message script - 01/02/2009 02:14:07\|\|Time passed...reporting result now. 01/02/2009 02:14:07\|Milkyway@home\|Sending scheduler request: To report completed tasks 01/02/2009 02:14:07\|Milkyway@home\|Reporting 4 tasks 01/02/2009 02:14:12\|Milkyway@home\|Scheduler RPC succeeded [server version 603] 01/02/2009 02:14:12\|Milkyway@home\|Deferring communication for 7 sec 01/02/2009 02:14:12\|Milkyway@home\|Reason: requested by project 01/02/2009 02:16:23\|Milkyway@home\|Computation for task nm_s86_l40_477939_1233452956_0 finished 01/02/2009 02:16:23\|Milkyway@home\|Starting nm_s86_l40_477941_1233452956_0 01/02/2009 02:16:23\|Milkyway@home\|Starting task nm_s86_l40_477941_1233452956_0 using milkyway version 14 01/02/2009 02:16:26\|Milkyway@home\|[file_xfer] Started upload of file nm_s86_l40_477939_1233452956_0_0 01/02/2009 02:16:29\|Milkyway@home\|[file_xfer] Finished upload of file nm_s86_l40_477939_1233452956_0_0 01/02/2009 02:16:29\|Milkyway@home\|[file_xfer] Throughput 1002 bytes/sec 01/02/2009 02:16:38\|Milkyway@home\|Computation for task nm_s86_l41_477951_1233452957_0 finished 01/02/2009 02:16:38\|Milkyway@home\|Starting nm_s86_l41_477959_1233452957_0 01/02/2009 02:16:38\|Milkyway@home\|Starting task nm_s86_l41_477959_1233452957_0 using milkyway version 14 01/02/2009 02:16:40\|Milkyway@home\|[file_xfer] Started upload of file nm_s86_l41_477951_1233452957_0_0 01/02/2009 02:16:42\|Milkyway@home\|[file_xfer] Finished upload of file nm_s86_l41_477951_1233452957_0_0 01/02/2009 02:16:42\|Milkyway@home\|[file_xfer] Throughput 1409 bytes/sec 01/02/2009 02:17:33\|\|Time passed...reporting result now. 01/02/2009 02:17:33\|Milkyway@home\|Sending scheduler request: To report completed tasks 01/02/2009 02:17:33\|Milkyway@home\|Reporting 2 tasks 01/02/2009 02:17:38\|Milkyway@home\|Scheduler RPC succeeded [server version 603] 01/02/2009 02:17:38\|Milkyway@home\|Deferring communication for 7 sec 01/02/2009 02:17:38\|Milkyway@home\|Reason: requested by project 01/02/2009 02:19:36\|Milkyway@home\|Computation for task nm_s86_l40_477941_1233452956_0 finished Anyone else experiencing this? I am running 2 projects to compensate. The lack of work seems to be affecting my 2 quads and not myy old dual Xeon --- very strange? ID: 9498 · Rating: 0 · rate: / Reply Quote

Labbie Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0	Message 9499 - Posted: 1 Feb 2009, 3:08:08 UTC I'm getting something similar on one of my quads, but not the other one. I thought I had hit the daily limit, but that is not the case here. It does request more work when it gets down to the last few MW WUs tho'. I'm keeping an eye on it. Calm Chaos Forum...Join Calm Chaos Now ID: 9499 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 9502 - Posted: 1 Feb 2009, 5:23:14 UTC Last modified: 1 Feb 2009, 5:29:01 UTC I hear of many people having this same issue. From what I can tell it is a problem with the DCF ... If it is the same issue as what I and several others have experienced . The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched. On 8 core systems this is an issue, because 7 cores sit idle until the last WU is completed. 4 cores, same problem. Duo cores as well, the more cores you have the more idle time. I edited my client_state.xml file, and noticed the DCF was at .01xxx I increased this to 0.2xxx and walla, work was downloaded. But soon after, no more work and the same settings on the DCF. It is also NOT affecting my slower boxes like my xeon dual quads like you mentioned only the faster boxes. Resetting the project clears the problem as well, but you trash any WU's in cache. And the problem will eventually come back. John, You might try upgrading to a newer standard version of BOINC and see if that helps. . ID: 9502 · Rating: 0 · rate: / Reply Quote

Labbie Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0	Message 9503 - Posted: 1 Feb 2009, 5:28:44 UTC My machine that was doing this has somehow cleared it self up and is requesting work normally now. I don't know what happened to fix it as I did nothing. But I've got Cosmo running as a backup project, so no wasted cycles here. Calm Chaos Forum...Join Calm Chaos Now ID: 9503 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 9508 - Posted: 1 Feb 2009, 8:24:56 UTC Last modified: 1 Feb 2009, 8:36:40 UTC I left things overnight, and the same is still happening. I notice that I get 1 MW WU download, and when this is crunched it will D/L another. Just looked at the DCF value, in the client_state.xml file and it shows 1.204559 I presume this is OK. However, the long term and short term debt are high - Short term debt = -938.579505 Long term debt = -6,5594.049107 Both Quads crunch for Einstein as well I will let things run a while so I can look at other things ID: 9508 · Rating: 0 · rate: / Reply Quote

msattler Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0	Message 9509 - Posted: 1 Feb 2009, 9:15:14 UTC I am seeing this wonky behavior on at least one of my rigs too........ Runs the cache down until the very last WU has been crunched, and then finally requests more work...... I am the Kittyman. Please visit and give a Click for Seti City. ID: 9509 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 9512 - Posted: 1 Feb 2009, 10:47:51 UTC - in response to Message 9502. The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched. I get this on all my Core 2/hyperthreading. I.e., it will not load any more WU's until the last one has finished. So there will allways be one crunching on it's own at the end of the 16 WU downloaded batch. But then it's only 15 because there is one always still there when the upload kicks in, even if its finished and uploading or waiting to report. 16 being the max you can ever have on 2 cores at any one time. ID: 9512 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 9520 - Posted: 1 Feb 2009, 14:38:27 UTC It's curious watching this happen. Crunching MW nicely for hours, then cache dries up to one, then the back up project crunches, then it goes completely dry bar 1 WU. Then the cache fills up and the cycle restarts. ID: 9520 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 9522 - Posted: 1 Feb 2009, 15:55:18 UTC This is the line you would be looking for <project> <master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url> .......... <duration_correction_factor>0.0164825</duration_correction_factor> My understanding from Logan is that anything less than .02 BOINC will not request work. A reset of the project works - but only for a short time, then the problem starts happening again. . ID: 9522 · Rating: 0 · rate: / Reply Quote

Temujin Send message Joined: 12 Oct 07 Posts: 77 Credit: 404,471,187 RAC: 0	Message 9523 - Posted: 1 Feb 2009, 16:43:11 UTC - in response to Message 9522. My understanding from Logan is that anything less than .02 BOINC will not request work. I've got 3 machines showing DCFs of 0.011, 0.014 & 0.018 - all quite happily downloading WUs ID: 9523 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 9533 - Posted: 1 Feb 2009, 20:33:06 UTC I have posted a topic in the BOINC help message boards as well.. http://boinc.berkeley.edu/dev/forum_thread.php?id=3594 Maybe some other input might be helpful to determining the problem.. I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now. But the slower boxes don't seem to be affected. . ID: 9533 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 9535 - Posted: 1 Feb 2009, 22:43:47 UTC - in response to Message 9533. I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now. But the slower boxes don't seem to be affected. That seems to be my experience as well on a small farm ID: 9535 · Rating: 0 · rate: / Reply Quote

Gavin Shaw Send message Joined: 16 Jan 08 Posts: 98 Credit: 1,371,299 RAC: 0	Message 9536 - Posted: 1 Feb 2009, 23:00:09 UTC Two of my machines are having this problem. They are both C2D systems and run Boinc 5.10.45. This problem only started a few days ago. Interestingly, my third C2D system, which is using Boinc 5.4.11 is not having this problem and readily asks for more work nearly every minute as it tries to fill its cache. Never surrender and never give up. In the darkest hour there is always hope. ID: 9536 · Rating: 0 · rate: / Reply Quote

Cori Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0	Message 9537 - Posted: 1 Feb 2009, 23:38:44 UTC - in response to Message 9522. This is the line you would be looking for <project> <master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url> .......... <duration_correction_factor>0.0164825</duration_correction_factor> My understanding from Logan is that anything less than .02 BOINC will not request work. A reset of the project works - but only for a short time, then the problem starts happening again. Yeah, since I have the same prob on all my 4 boxes as well I've tried editing the DCF in the client_state.xml - and I immediately got new work! :-))) My DCF was 0.01 and I changed it to 0.02 - but I guess it will go back to 0.01 in a short while. :-( Lovely greetings, Cori ID: 9537 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9562 - Posted: 2 Feb 2009, 20:11:37 UTC - in response to Message 9537. This is the line you would be looking for http://milkyway.cs.rpi.edu/milkyway/ .......... 0.0164825 My understanding from Logan is that anything less than .02 BOINC will not request work. A reset of the project works - but only for a short time, then the problem starts happening again. Yeah, since I have the same prob on all my 4 boxes as well I've tried editing the DCF in the client_state.xml - and I immediately got new work! :-))) My DCF was 0.01 and I changed it to 0.02 - but I guess it will go back to 0.01 in a short while. :-( I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. ID: 9562 · Rating: 0 · rate: / Reply Quote

Cori Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0	Message 9568 - Posted: 2 Feb 2009, 20:31:01 UTC - in response to Message 9562. I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. Thanks, Travis! :-))) Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/ There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr! Lovely greetings, Cori ID: 9568 · Rating: 0 · rate: / Reply Quote

Logan Send message Joined: 15 Aug 08 Posts: 163 Credit: 3,876,869 RAC: 0	Message 9571 - Posted: 2 Feb 2009, 20:43:22 UTC - in response to Message 9568. Last modified: 2 Feb 2009, 21:25:08 UTC I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. Thanks, Travis! :-))) Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/ There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr! The question is about of the theorical crunch time calculated by the project and the correction factor what your BOINC manager applies... If the calculation of the task duration in the project's end is higher than reality and your real duration is too low, the DCF can falls under 0.02, and so no more work seconds could be requested until your cache is empty... As example... If a task are defined by the project as 6h (as DCF=1) task and stock app completes it in less of an hour (DCF= 1/6=0.16), and opti app could finish it in a few minutes, calculate..... Best regards. Logan. BOINC FAQ Service (Ahora, tambiÃ©n disponible en EspaÃ±ol/Now available in Spanish) ID: 9571 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 9572 - Posted: 2 Feb 2009, 20:46:21 UTC - in response to Message 9568. Last modified: 2 Feb 2009, 20:47:48 UTC I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. Thanks, Travis! :-))) Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/ There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr! It may not fix the DCF problem, but will allow you to crunch longer with a larger cache before waiting to reload the cache. Instead of 8 WU's per core then wait for the last WU, now you will get -- DRUM ROLL PLEASE >>> 12!!!!! WU's per core then wait.. . ID: 9572 · Rating: 0 · rate: / Reply Quote

Cori Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0	Message 9573 - Posted: 2 Feb 2009, 20:49:08 UTC - in response to Message 9572. Last modified: 2 Feb 2009, 20:50:48 UTC I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. Thanks, Travis! :-))) Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/ There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr! It may not fix the DCF problem, but will allow you to crunch longer with a larger cache before waiting to reload the cache. Instead of 8 WU's per core then wait for the last WU, now you will get xx?? WU's per core then wait.. I agree, having a cache of 12 x number of core WUs now is better than before. :-))) But I don't know how to correct the DCF permanently because after editing the xml's it's always back to the too low value. :-( Lovely greetings, Cori ID: 9573 · Rating: 0 · rate: / Reply Quote

GlennG Send message Joined: 17 Nov 08 Posts: 18 Credit: 130,650,263 RAC: 0	Message 9576 - Posted: 2 Feb 2009, 22:10:04 UTC Travis, Logan is right. The real problems for those of us with fast machines are twofold. 1. The estimated duration for a WU, when it is published by the project is way too long (ex. the older WUs show up at around 22 hours after a reset on my machine) After several dozen wus have completed, the DCF has been reduced to the correct amount, but it is considered too low for BOINC to allow more work. 2. The new set of WUs started last week (I38 & I39) had estimated durations out of line with the other WUs. They were about 25% longer to crunch, but the published duration was about 2x. People weren't seeing problem #1 because we were just above the theshold of the problem until the new WUs drove our DCFs down even lower. My suggestion is that the published duration for each WU be cut by a factor of 10. That way the fast machines will still have reasonable DCF, and older, slower machines can have DCFs greater than 1.0. Also, try to make sure that each new set of WUs have duration estimates in line with other outstanding sets. Glenn ID: 9576 · Rating: 0 · rate: / Reply Quote