Welcome to MilkyWay@home

No work being D/Led & no warning messages

Message boards : Number crunching : No work being D/Led & no warning messages
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 9498 - Posted: 1 Feb 2009, 2:25:16 UTC
Last modified: 1 Feb 2009, 2:33:47 UTC

I am knocking off the MW work nicely, but the requests for new work are not succeeding in getting further work. I have no problems with work not validating. At the same time the BOINC messages tab is giving no reasons beyond this message script -

01/02/2009 02:14:07||Time passed...reporting result now.
01/02/2009 02:14:07|Milkyway@home|Sending scheduler request: To report completed tasks
01/02/2009 02:14:07|Milkyway@home|Reporting 4 tasks
01/02/2009 02:14:12|Milkyway@home|Scheduler RPC succeeded [server version 603]
01/02/2009 02:14:12|Milkyway@home|Deferring communication for 7 sec
01/02/2009 02:14:12|Milkyway@home|Reason: requested by project
01/02/2009 02:16:23|Milkyway@home|Computation for task nm_s86_l40_477939_1233452956_0 finished
01/02/2009 02:16:23|Milkyway@home|Starting nm_s86_l40_477941_1233452956_0
01/02/2009 02:16:23|Milkyway@home|Starting task nm_s86_l40_477941_1233452956_0 using milkyway version 14
01/02/2009 02:16:26|Milkyway@home|[file_xfer] Started upload of file nm_s86_l40_477939_1233452956_0_0
01/02/2009 02:16:29|Milkyway@home|[file_xfer] Finished upload of file nm_s86_l40_477939_1233452956_0_0
01/02/2009 02:16:29|Milkyway@home|[file_xfer] Throughput 1002 bytes/sec
01/02/2009 02:16:38|Milkyway@home|Computation for task nm_s86_l41_477951_1233452957_0 finished
01/02/2009 02:16:38|Milkyway@home|Starting nm_s86_l41_477959_1233452957_0
01/02/2009 02:16:38|Milkyway@home|Starting task nm_s86_l41_477959_1233452957_0 using milkyway version 14
01/02/2009 02:16:40|Milkyway@home|[file_xfer] Started upload of file nm_s86_l41_477951_1233452957_0_0
01/02/2009 02:16:42|Milkyway@home|[file_xfer] Finished upload of file nm_s86_l41_477951_1233452957_0_0
01/02/2009 02:16:42|Milkyway@home|[file_xfer] Throughput 1409 bytes/sec
01/02/2009 02:17:33||Time passed...reporting result now.
01/02/2009 02:17:33|Milkyway@home|Sending scheduler request: To report completed tasks
01/02/2009 02:17:33|Milkyway@home|Reporting 2 tasks
01/02/2009 02:17:38|Milkyway@home|Scheduler RPC succeeded [server version 603]
01/02/2009 02:17:38|Milkyway@home|Deferring communication for 7 sec
01/02/2009 02:17:38|Milkyway@home|Reason: requested by project
01/02/2009 02:19:36|Milkyway@home|Computation for task nm_s86_l40_477941_1233452956_0 finished


Anyone else experiencing this?

I am running 2 projects to compensate.

The lack of work seems to be affecting my 2 quads and not myy old dual Xeon --- very strange?
ID: 9498 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 9499 - Posted: 1 Feb 2009, 3:08:08 UTC

I'm getting something similar on one of my quads, but not the other one.

I thought I had hit the daily limit, but that is not the case here. It does request more work when it gets down to the last few MW WUs tho'.

I'm keeping an eye on it.


Calm Chaos Forum...Join Calm Chaos Now
ID: 9499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 9502 - Posted: 1 Feb 2009, 5:23:14 UTC
Last modified: 1 Feb 2009, 5:29:01 UTC

I hear of many people having this same issue.

From what I can tell it is a problem with the DCF ...

If it is the same issue as what I and several others have experienced .

The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched.

On 8 core systems this is an issue, because 7 cores sit idle until the last WU is completed. 4 cores, same problem. Duo cores as well, the more cores you have the more idle time.

I edited my client_state.xml file, and noticed the DCF was at .01xxx I increased this to 0.2xxx and walla, work was downloaded. But soon after, no more work and the same settings on the DCF.

It is also NOT affecting my slower boxes like my xeon dual quads like you mentioned only the faster boxes.

Resetting the project clears the problem as well, but you trash any WU's in cache. And the problem will eventually come back.

John, You might try upgrading to a newer standard version of BOINC and see if that helps.
.
ID: 9502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 9503 - Posted: 1 Feb 2009, 5:28:44 UTC

My machine that was doing this has somehow cleared it self up and is requesting work normally now. I don't know what happened to fix it as I did nothing.

But I've got Cosmo running as a backup project, so no wasted cycles here.


Calm Chaos Forum...Join Calm Chaos Now
ID: 9503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 9508 - Posted: 1 Feb 2009, 8:24:56 UTC
Last modified: 1 Feb 2009, 8:36:40 UTC

I left things overnight, and the same is still happening.

I notice that I get 1 MW WU download, and when this is crunched it will D/L another.

Just looked at the DCF value, in the client_state.xml file and it shows 1.204559

I presume this is OK.

However, the long term and short term debt are high -

Short term debt = -938.579505

Long term debt = -6,5594.049107

Both Quads crunch for Einstein as well

I will let things run a while so I can look at other things
ID: 9508 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 9509 - Posted: 1 Feb 2009, 9:15:14 UTC

I am seeing this wonky behavior on at least one of my rigs too........

Runs the cache down until the very last WU has been crunched, and then finally requests more work......
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 9509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 9512 - Posted: 1 Feb 2009, 10:47:51 UTC - in response to Message 9502.  

The cache runs dry, BOINC will not ask for more work until the cache is empty and the very last WU has been crunched.


I get this on all my Core 2/hyperthreading. I.e., it will not load any more WU's until the last one has finished. So there will allways be one crunching on it's own at the end of the 16 WU downloaded batch. But then it's only 15 because there is one always still there when the upload kicks in, even if its finished and uploading or waiting to report. 16 being the max you can ever have on 2 cores at any one time.



ID: 9512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 9520 - Posted: 1 Feb 2009, 14:38:27 UTC

It's curious watching this happen.

Crunching MW nicely for hours, then cache dries up to one, then the back up project crunches, then it goes completely dry bar 1 WU. Then the cache fills up and the cycle restarts.
ID: 9520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 9522 - Posted: 1 Feb 2009, 15:55:18 UTC



This is the line you would be looking for
<project>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>
..........
<duration_correction_factor>0.0164825</duration_correction_factor>

My understanding from Logan is that anything less than .02 BOINC will not request work.

A reset of the project works - but only for a short time, then the problem starts happening again.


.
ID: 9522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Oct 07
Posts: 77
Credit: 404,471,187
RAC: 0
Message 9523 - Posted: 1 Feb 2009, 16:43:11 UTC - in response to Message 9522.  

My understanding from Logan is that anything less than .02 BOINC will not request work.

I've got 3 machines showing DCFs of 0.011, 0.014 & 0.018 - all quite happily downloading WUs
ID: 9523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 9533 - Posted: 1 Feb 2009, 20:33:06 UTC



I have posted a topic in the BOINC help message boards as well..

http://boinc.berkeley.edu/dev/forum_thread.php?id=3594

Maybe some other input might be helpful to determining the problem..



I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now.

But the slower boxes don't seem to be affected.
.
ID: 9533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 9535 - Posted: 1 Feb 2009, 22:43:47 UTC - in response to Message 9533.  

I got to looking at some of my machines today, and almost all of faster quads are experiencing this problem now.

But the slower boxes don't seem to be affected.


That seems to be my experience as well on a small farm
ID: 9535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gavin Shaw
Avatar

Send message
Joined: 16 Jan 08
Posts: 98
Credit: 1,371,299
RAC: 0
Message 9536 - Posted: 1 Feb 2009, 23:00:09 UTC

Two of my machines are having this problem. They are both C2D systems and run Boinc 5.10.45. This problem only started a few days ago.

Interestingly, my third C2D system, which is using Boinc 5.4.11 is not having this problem and readily asks for more work nearly every minute as it tries to fill its cache.

Never surrender and never give up. In the darkest hour there is always hope.

ID: 9536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 9537 - Posted: 1 Feb 2009, 23:38:44 UTC - in response to Message 9522.  



This is the line you would be looking for
<project>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>
..........
<duration_correction_factor>0.0164825</duration_correction_factor>

My understanding from Logan is that anything less than .02 BOINC will not request work.

A reset of the project works - but only for a short time, then the problem starts happening again.


Yeah, since I have the same prob on all my 4 boxes as well I've tried editing the DCF in the client_state.xml - and I immediately got new work! :-)))
My DCF was 0.01 and I changed it to 0.02 - but I guess it will go back to 0.01 in a short while. :-(
Lovely greetings, Cori
ID: 9537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9562 - Posted: 2 Feb 2009, 20:11:37 UTC - in response to Message 9537.  



This is the line you would be looking for

http://milkyway.cs.rpi.edu/milkyway/
..........
0.0164825

My understanding from Logan is that anything less than .02 BOINC will not request work.

A reset of the project works - but only for a short time, then the problem starts happening again.


Yeah, since I have the same prob on all my 4 boxes as well I've tried editing the DCF in the client_state.xml - and I immediately got new work! :-)))
My DCF was 0.01 and I changed it to 0.02 - but I guess it will go back to 0.01 in a short while. :-(


I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was.
ID: 9562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 9568 - Posted: 2 Feb 2009, 20:31:01 UTC - in response to Message 9562.  

I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was.

Thanks, Travis! :-)))

Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/
There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr!

Lovely greetings, Cori
ID: 9568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan
Avatar

Send message
Joined: 15 Aug 08
Posts: 163
Credit: 3,876,869
RAC: 0
Message 9571 - Posted: 2 Feb 2009, 20:43:22 UTC - in response to Message 9568.  
Last modified: 2 Feb 2009, 21:25:08 UTC

I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was.

Thanks, Travis! :-)))

Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/
There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr!


The question is about of the theorical crunch time calculated by the project and the correction factor what your BOINC manager applies...

If the calculation of the task duration in the project's end is higher than reality and your real duration is too low, the DCF can falls under 0.02, and so no more work seconds could be requested until your cache is empty...

As example... If a task are defined by the project as 6h (as DCF=1) task and stock app completes it in less of an hour (DCF= 1/6=0.16), and opti app could finish it in a few minutes, calculate.....

Best regards.
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 9571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 9572 - Posted: 2 Feb 2009, 20:46:21 UTC - in response to Message 9568.  
Last modified: 2 Feb 2009, 20:47:48 UTC

I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was.

Thanks, Travis! :-)))

Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/
There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr!



It may not fix the DCF problem, but will allow you to crunch longer with a larger cache before waiting to reload the cache.

Instead of 8 WU's per core then wait for the last WU, now you will get -- DRUM ROLL PLEASE >>> 12!!!!! WU's per core then wait..
.
ID: 9572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 9573 - Posted: 2 Feb 2009, 20:49:08 UTC - in response to Message 9572.  
Last modified: 2 Feb 2009, 20:50:48 UTC

I'm going to increase the WU queue (I bet that'll make a lot of people happy). The newton methods we're running right now aren't as sensitive to the work queue as the genetic search was.

Thanks, Travis! :-)))

Btw, I don't think this will change the strange work fetch policy of recent BOINC manager versions though. :-/
There is always one last WU crunching while the other core(s) are idling before the darn thing will request more work, grrrr!



It may not fix the DCF problem, but will allow you to crunch longer with a larger cache before waiting to reload the cache.

Instead of 8 WU's per core then wait for the last WU, now you will get xx?? WU's per core then wait..



I agree, having a cache of 12 x number of core WUs now is better than before. :-)))
But I don't know how to correct the DCF permanently because after editing the xml's it's always back to the too low value. :-(
Lovely greetings, Cori
ID: 9573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GlennG

Send message
Joined: 17 Nov 08
Posts: 18
Credit: 130,650,263
RAC: 0
Message 9576 - Posted: 2 Feb 2009, 22:10:04 UTC

Travis,

Logan is right. The real problems for those of us with fast machines are twofold.

1. The estimated duration for a WU, when it is published by the project is way too long (ex. the older WUs show up at around 22 hours after a reset on my machine) After several dozen wus have completed, the DCF has been reduced to the correct amount, but it is considered too low for BOINC to allow more work.

2. The new set of WUs started last week (I38 & I39) had estimated durations out of line with the other WUs. They were about 25% longer to crunch, but the published duration was about 2x.

People weren't seeing problem #1 because we were just above the theshold of the problem until the new WUs drove our DCFs down even lower.

My suggestion is that the published duration for each WU be cut by a factor of 10. That way the fast machines will still have reasonable DCF, and older, slower machines can have DCFs greater than 1.0. Also, try to make sure that each new set of WUs have duration estimates in line with other outstanding sets.

Glenn
ID: 9576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : No work being D/Led & no warning messages

©2025 Astroinformatics Group