Message boards :
Number crunching :
No work being D/Led & no warning messages
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 ![]() ![]() |
I am knocking off the MW work nicely, but the requests for new work are not succeeding in getting further work. I have no problems with work not validating. At the same time the BOINC messages tab is giving no reasons beyond this message script - 01/02/2009 02:14:07||Time passed...reporting result now. 01/02/2009 02:14:07|Milkyway@home|Sending scheduler request: To report completed tasks 01/02/2009 02:14:07|Milkyway@home|Reporting 4 tasks 01/02/2009 02:14:12|Milkyway@home|Scheduler RPC succeeded [server version 603] 01/02/2009 02:14:12|Milkyway@home|Deferring communication for 7 sec 01/02/2009 02:14:12|Milkyway@home|Reason: requested by project 01/02/2009 02:16:23|Milkyway@home|Computation for task nm_s86_l40_477939_1233452956_0 finished 01/02/2009 02:16:23|Milkyway@home|Starting nm_s86_l40_477941_1233452956_0 01/02/2009 02:16:23|Milkyway@home|Starting task nm_s86_l40_477941_1233452956_0 using milkyway version 14 01/02/2009 02:16:26|Milkyway@home|[file_xfer] Started upload of file nm_s86_l40_477939_1233452956_0_0 01/02/2009 02:16:29|Milkyway@home|[file_xfer] Finished upload of file nm_s86_l40_477939_1233452956_0_0 01/02/2009 02:16:29|Milkyway@home|[file_xfer] Throughput 1002 bytes/sec 01/02/2009 02:16:38|Milkyway@home|Computation for task nm_s86_l41_477951_1233452957_0 finished 01/02/2009 02:16:38|Milkyway@home|Starting nm_s86_l41_477959_1233452957_0 01/02/2009 02:16:38|Milkyway@home|Starting task nm_s86_l41_477959_1233452957_0 using milkyway version 14 01/02/2009 02:16:40|Milkyway@home|[file_xfer] Started upload of file nm_s86_l41_477951_1233452957_0_0 01/02/2009 02:16:42|Milkyway@home|[file_xfer] Finished upload of file nm_s86_l41_477951_1233452957_0_0 01/02/2009 02:16:42|Milkyway@home|[file_xfer] Throughput 1409 bytes/sec 01/02/2009 02:17:33||Time passed...reporting result now. 01/02/2009 02:17:33|Milkyway@home|Sending scheduler request: To report completed tasks 01/02/2009 02:17:33|Milkyway@home|Reporting 2 tasks 01/02/2009 02:17:38|Milkyway@home|Scheduler RPC succeeded [server version 603] 01/02/2009 02:17:38|Milkyway@home|Deferring communication for 7 sec 01/02/2009 02:17:38|Milkyway@home|Reason: requested by project 01/02/2009 02:19:36|Milkyway@home|Computation for task nm_s86_l40_477941_1233452956_0 finished Anyone else experiencing this? I am running 2 projects to compensate. The lack of work seems to be affecting my 2 quads and not myy old dual Xeon --- very strange? |
![]() ![]() Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0 ![]() ![]() |
I'm getting something similar on one of my quads, but not the other one. I thought I had hit the daily limit, but that is not the case here. It does request more work when it gets down to the last few MW WUs tho'. I'm keeping an eye on it. ![]() Calm Chaos Forum...Join Calm Chaos Now |
![]() ![]() Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 ![]() ![]() |
I hear of many people having this same issue. From what I can tell it is a problem with the DCF ... If it is the same issue as what I and several others have experienced . The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched. On 8 core systems this is an issue, because 7 cores sit idle until the last WU is completed. 4 cores, same problem. Duo cores as well, the more cores you have the more idle time. I edited my client_state.xml file, and noticed the DCF was at .01xxx I increased this to 0.2xxx and walla, work was downloaded. But soon after, no more work and the same settings on the DCF. It is also NOT affecting my slower boxes like my xeon dual quads like you mentioned only the faster boxes. Resetting the project clears the problem as well, but you trash any WU's in cache. And the problem will eventually come back. John, You might try upgrading to a newer standard version of BOINC and see if that helps. . ![]() |
![]() ![]() Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0 ![]() ![]() |
My machine that was doing this has somehow cleared it self up and is requesting work normally now. I don't know what happened to fix it as I did nothing. But I've got Cosmo running as a backup project, so no wasted cycles here. ![]() Calm Chaos Forum...Join Calm Chaos Now |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 ![]() ![]() |
I left things overnight, and the same is still happening. I notice that I get 1 MW WU download, and when this is crunched it will D/L another. Just looked at the DCF value, in the client_state.xml file and it shows 1.204559 I presume this is OK. However, the long term and short term debt are high - Short term debt = -938.579505 Long term debt = -6,5594.049107 Both Quads crunch for Einstein as well I will let things run a while so I can look at other things |
Send message Joined: 15 Jul 08 Posts: 288 Credit: 5,474,012 RAC: 0 ![]() ![]() |
I am seeing this wonky behavior on at least one of my rigs too........ Runs the cache down until the very last WU has been crunched, and then finally requests more work...... I am the Kittyman. Please visit and give a Click for Seti City. ![]() |
![]() ![]() Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 ![]() ![]() |
The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched. I get this on all my Core 2/hyperthreading. I.e., it will not load any more WU's until the last one has finished. So there will allways be one crunching on it's own at the end of the 16 WU downloaded batch. But then it's only 15 because there is one always still there when the upload kicks in, even if its finished and uploading or waiting to report. 16 being the max you can ever have on 2 cores at any one time. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 ![]() ![]() |
It's curious watching this happen. Crunching MW nicely for hours, then cache dries up to one, then the back up project crunches, then it goes completely dry bar 1 WU. Then the cache fills up and the cycle restarts. |
![]() ![]() Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 ![]() ![]() |
This is the line you would be looking for <project> <master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url> .......... <duration_correction_factor>0.0164825</duration_correction_factor> My understanding from Logan is that anything less than .02 BOINC will not request work. A reset of the project works - but only for a short time, then the problem starts happening again. . ![]() |
Send message Joined: 12 Oct 07 Posts: 77 Credit: 404,471,187 RAC: 0 ![]() ![]() |
My understanding from Logan is that anything less than .02 BOINC will not request work. I've got 3 machines showing DCFs of 0.011, 0.014 & 0.018 - all quite happily downloading WUs |
![]() ![]() Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 ![]() ![]() |
I have posted a topic in the BOINC help message boards as well.. http://boinc.berkeley.edu/dev/forum_thread.php?id=3594 Maybe some other input might be helpful to determining the problem.. I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now. But the slower boxes don't seem to be affected. . ![]() |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 ![]() ![]() |
I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now. That seems to be my experience as well on a small farm |
![]() ![]() Send message Joined: 16 Jan 08 Posts: 98 Credit: 1,371,299 RAC: 0 ![]() ![]() |
Two of my machines are having this problem. They are both C2D systems and run Boinc 5.10.45. This problem only started a few days ago. Interestingly, my third C2D system, which is using Boinc 5.4.11 is not having this problem and readily asks for more work nearly every minute as it tries to fill its cache. Never surrender and never give up. In the darkest hour there is always hope. |
![]() ![]() Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0 ![]() ![]() |
Yeah, since I have the same prob on all my 4 boxes as well I've tried editing the DCF in the client_state.xml - and I immediately got new work! :-))) My DCF was 0.01 and I changed it to 0.02 - but I guess it will go back to 0.01 in a short while. :-( Lovely greetings, Cori ![]() ![]() |
![]() Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. ![]() |
![]() ![]() Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0 ![]() ![]() |
I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. Thanks, Travis! :-))) Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/ There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr! Lovely greetings, Cori ![]() ![]() |
![]() ![]() Send message Joined: 15 Aug 08 Posts: 163 Credit: 3,876,869 RAC: 0 ![]() ![]() |
I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. The question is about of the theorical crunch time calculated by the project and the correction factor what your BOINC manager applies... If the calculation of the task duration in the project's end is higher than reality and your real duration is too low, the DCF can falls under 0.02, and so no more work seconds could be requested until your cache is empty... As example... If a task are defined by the project as 6h (as DCF=1) task and stock app completes it in less of an hour (DCF= 1/6=0.16), and opti app could finish it in a few minutes, calculate..... Best regards. Logan. ![]() BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish) |
![]() ![]() Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 ![]() ![]() |
I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. It may not fix the DCF problem, but will allow you to crunch longer with a larger cache before waiting to reload the cache. Instead of 8 WU's per core then wait for the last WU, now you will get -- DRUM ROLL PLEASE >>> 12!!!!! WU's per core then wait.. . ![]() |
![]() ![]() Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0 ![]() ![]() |
I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was. I agree, having a cache of 12 x number of core WUs now is better than before. :-))) But I don't know how to correct the DCF permanently because after editing the xml's it's always back to the too low value. :-( Lovely greetings, Cori ![]() ![]() |
Send message Joined: 17 Nov 08 Posts: 18 Credit: 130,650,263 RAC: 0 ![]() ![]() |
Travis, Logan is right. The real problems for those of us with fast machines are twofold. 1. The estimated duration for a WU, when it is published by the project is way too long (ex. the older WUs show up at around 22 hours after a reset on my machine) After several dozen wus have completed, the DCF has been reduced to the correct amount, but it is considered too low for BOINC to allow more work. 2. The new set of WUs started last week (I38 & I39) had estimated durations out of line with the other WUs. They were about 25% longer to crunch, but the published duration was about 2x. People weren't seeing problem #1 because we were just above the theshold of the problem until the new WUs drove our DCFs down even lower. My suggestion is that the published duration for each WU be cut by a factor of 10. That way the fast machines will still have reasonable DCF, and older, slower machines can have DCFs greater than 1.0. Also, try to make sure that each new set of WUs have duration estimates in line with other outstanding sets. Glenn ![]() |
©2025 Astroinformatics Group