Welcome to MilkyWay@home

Uh Oh - Ran Out Of Work Again !

Message boards : Number crunching : Uh Oh - Ran Out Of Work Again !
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jon Boy UK - Wales

Send message
Joined: 17 Nov 07
Posts: 17
Credit: 663,827
RAC: 0
Message 1697 - Posted: 13 Feb 2008, 19:23:51 UTC

Help...

I'm a lonely xeon cpu that badly need's some more workunit's to crunch !

Can you help me !

Kind Regards,

Happy Crunchin John :0)
ID: 1697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1698 - Posted: 13 Feb 2008, 19:42:49 UTC - in response to Message 1697.  

Help...

I'm a lonely xeon cpu that badly need's some more workunit's to crunch !

Can you help me !

Kind Regards,

Happy Crunchin John :0)



i just noticed :)started a new search - i'll also be starting a few more when nate sends me some new data to crunch on. a little more on that later :D
ID: 1698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 17 Nov 07
Posts: 17
Credit: 663,827
RAC: 0
Message 1700 - Posted: 13 Feb 2008, 19:46:21 UTC

Hello Travis,

You were on the ball with the new batch kind sir !

No sooner had i mentioned the server had ran out of WU's to forward to my rather hungry machines they began coughing and spluttering back to life again !

It's raining Milky Way WU's - I love it !

Happy Crunchi'n John :0)
ID: 1700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1704 - Posted: 13 Feb 2008, 23:42:48 UTC

Travis-I have a couple hosts running Milkyway solo ...only project running. Is this wise? Do you forsee running out of work at any given point that would have my machines empty of work? The server has not gone down or ran out of work for a couple of weeks now :)Thanks-Jeff
ID: 1704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1709 - Posted: 14 Feb 2008, 5:08:04 UTC - in response to Message 1704.  

I know [i]I'm[/u] not Travis, but I'll try. o-o

Do you forsee running out of work at any given point that would have my machines empty of work?

The 'problem' is foreseeing running out of work. This- or any project- can run out unexpectedly due to power outage, hardware (server, UPS, RAM) failure, building fire, etc...

I personally would recommend having two or three projects ready.
[If the odds of one project at a particular moment is .5 (50%), then two projects at the same moment is .5*.5=.25, and three at the same time is .5*.5*.5=.125=12.5%=1 in 8.)
ID: 1709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1712 - Posted: 14 Feb 2008, 12:27:01 UTC - in response to Message 1709.  
Last modified: 14 Feb 2008, 12:29:50 UTC

I know [i]I'm[/u] not Travis, but I'll try. o-o

Do you forsee running out of work at any given point that would have my machines empty of work?

The 'problem' is foreseeing running out of work. This- or any project- can run out unexpectedly due to power outage, hardware (server, UPS, RAM) failure, building fire, etc...

I personally would recommend having two or three projects ready.
[If the odds of one project at a particular moment is .5 (50%), then two projects at the same moment is .5*.5=.25, and three at the same time is .5*.5*.5=.125=12.5%=1 in 8.)


Thanks but was really asking about forseen...know about the unforseen....but check the machines alot. Two weeks ago when the server kept crashing it wasn't even a consideration.
ID: 1712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1722 - Posted: 15 Feb 2008, 15:43:45 UTC - in response to Message 1712.  

I know [i]I'm[/u] not Travis, but I'll try. o-o

Do you forsee running out of work at any given point that would have my machines empty of work?

The 'problem' is foreseeing running out of work. This- or any project- can run out unexpectedly due to power outage, hardware (server, UPS, RAM) failure, building fire, etc...

I personally would recommend having two or three projects ready.
[If the odds of one project at a particular moment is .5 (50%), then two projects at the same moment is .5*.5=.25, and three at the same time is .5*.5*.5=.125=12.5%=1 in 8.)


Thanks but was really asking about forseen...know about the unforseen....but check the machines alot. Two weeks ago when the server kept crashing it wasn't even a consideration.


well normally it's not an issue, i usually start up new searches when the old ones are close to being finished. right now i'm trying to see what effect the number of searches being run concurrently has on the convergence rates of our searches - so i have to wait for the previous batch of searches to stop before starting up a new batch. once these sets of results are finished things should go back to constantly available work. i've been trying to catch when the searches finish as fast as possible so hopefully there wont be much downtime.
ID: 1722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Swordfish

Send message
Joined: 19 Nov 07
Posts: 4
Credit: 82,330,797
RAC: 0
Message 1725 - Posted: 18 Feb 2008, 0:14:37 UTC

I'm hoping that there is plenty of work....5 dual quads just put on the project.
ID: 1725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1726 - Posted: 18 Feb 2008, 15:17:06 UTC - in response to Message 1725.  
Last modified: 18 Feb 2008, 15:56:47 UTC

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.
ID: 1726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1729 - Posted: 18 Feb 2008, 18:01:54 UTC - in response to Message 1726.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P
ID: 1729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Swordfish

Send message
Joined: 19 Nov 07
Posts: 4
Credit: 82,330,797
RAC: 0
Message 1730 - Posted: 19 Feb 2008, 2:25:01 UTC - in response to Message 1729.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P


Dang did I break it,lol
ID: 1730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1733 - Posted: 19 Feb 2008, 15:56:12 UTC - in response to Message 1730.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P


Dang did I break it,lol


and actually, i've been working on changing the server code to allow us to determine what work units get set to what computers -- i might be able to set it up so that theres a limit of work units per search, which could probably let us bump the WU limit to 30 or 40... i'll have to take a look into it.
ID: 1733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1734 - Posted: 19 Feb 2008, 17:12:16 UTC - in response to Message 1733.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P


Dang did I break it,lol


and actually, i've been working on changing the server code to allow us to determine what work units get set to what computers -- i might be able to set it up so that theres a limit of work units per search, which could probably let us bump the WU limit to 30 or 40... i'll have to take a look into it.


Travis-Another way to accomplish this is to change the rpc calls from 20min down to 10 or 15 min.....but I don't know if you want to increase the server load by 50-100% this way.
ID: 1734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1758 - Posted: 25 Feb 2008, 19:49:59 UTC - in response to Message 1734.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P


Dang did I break it,lol


and actually, i've been working on changing the server code to allow us to determine what work units get set to what computers -- i might be able to set it up so that theres a limit of work units per search, which could probably let us bump the WU limit to 30 or 40... i'll have to take a look into it.


Travis-Another way to accomplish this is to change the rpc calls from 20min down to 10 or 15 min.....but I don't know if you want to increase the server load by 50-100% this way.



actually i'll do that -- the server seems to be fine with the current load and could handle a bit more.
ID: 1758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1772 - Posted: 27 Feb 2008, 22:42:24 UTC - in response to Message 1734.  

I'm hoping that there is plenty of work....5 dual quads just put on the project.



Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't.


all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P


Dang did I break it,lol


and actually, i've been working on changing the server code to allow us to determine what work units get set to what computers -- i might be able to set it up so that theres a limit of work units per search, which could probably let us bump the WU limit to 30 or 40... i'll have to take a look into it.


Travis-Another way to accomplish this is to change the rpc calls from 20min down to 10 or 15 min.....but I don't know if you want to increase the server load by 50-100% this way.



I actually emailed the BOINC projects list about this -- Dave Anderson said that the client should automatically request new work when the # of workunits is low... there shouldn't be a 20 min RPC call. He said that if you guys had any transcripts of this to give them to him. So if you see this problem let me know and post a transcript and i'll forward it on.
ID: 1772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1773 - Posted: 27 Feb 2008, 23:47:51 UTC - in response to Message 1772.  
Last modified: 28 Feb 2008, 0:08:19 UTC


I actually emailed the BOINC projects list about this -- Dave Anderson said that the client should automatically request new work when the # of workunits is low... there shouldn't be a 20 min RPC call. He said that if you guys had any transcripts of this to give them to him. So if you see this problem let me know and post a transcript and i'll forward it on.



2/24/2008 4:46:25 PM|Milkyway@home|Starting gs_260_1203802157_143213_0
2/24/2008 4:46:25 PM|Milkyway@home|Starting task gs_260_1203802157_143213_0 using astronomy version 113
2/24/2008 4:52:12 PM|Milkyway@home|Sending scheduler request: To fetch work
2/24/2008 4:52:12 PM|Milkyway@home|Requesting 54653 seconds of new work
2/24/2008 4:52:22 PM|Milkyway@home|Scheduler RPC succeeded [server version 511]
2/24/2008 4:52:22 PM|Milkyway@home|Message from server: No work sent
2/24/2008 4:52:22 PM|Milkyway@home|Message from server: (reached per-host limit of 20 tasks)
2/24/2008 4:52:22 PM|Milkyway@home|Deferring communication for 20 min 0 sec
2/24/2008 4:52:22 PM|Milkyway@home|Reason: requested by project

This is the message we are receiving....... now

So if you run out of work in less than 20 minutes tough bananas.

Its the deferring communications for 20min that needs to change.

Once you reach max work there has to be a pause for the next rpc call otherwise we would be contacting the sever every 7 seconds!Hence the delay of time.

What does Dr.Anderson not understand about this?
ID: 1773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 17 Nov 07
Posts: 17
Credit: 663,827
RAC: 0
Message 1796 - Posted: 1 Mar 2008, 19:00:19 UTC
Last modified: 1 Mar 2008, 19:01:27 UTC

Lol..... You Gotta laugh ! - Haven't You :)

Any one know when new WU's will be availiable ?
ID: 1796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1801 - Posted: 2 Mar 2008, 3:16:02 UTC - in response to Message 1796.  


i forwarded the message on to dr. anderson so hopefully he'll have a response for me soon :)
ID: 1801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1808 - Posted: 3 Mar 2008, 1:34:55 UTC - in response to Message 1801.  


i forwarded the message on to dr. anderson so hopefully he'll have a response for me soon :)


FYI, heres the response from Dr. Anderson:

This is a long-standing design flaw:

- A host is sent max_wus_in_progress jobs
- It finishes one of them and starts uploading it
- It decides it needs more work and contacts the scheduler
(before the upload has finished)
- The scheduler sees that it already has max_wus_in_progress jobs,
refuses to give it more, and tells it to back off for 20 min
- a few seconds later the upload finishes

The right thing is to not count the finished/uploading jobs
(or to not count a limited number of them).
I'll look at this. If anyone has other ideas let me know.


i'll try and take a look into the scheduler code to see if i can figure out a fix for this.

ID: 1808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 21 Dec 07
Posts: 69
Credit: 7,048,412
RAC: 0
Message 1809 - Posted: 3 Mar 2008, 6:53:07 UTC - in response to Message 1808.  
Last modified: 3 Mar 2008, 6:54:44 UTC

I'm no expert on the server-side options of BOINC, but a search of the BOINC site shows the following seemingly relevant config options (at http://boinc.berkeley.edu/trac/wiki/ProjectOptions):

<max_wus_in_progress> N </max_wus_in_progress>
<min_sendwork_interval> N </min_sendwork_interval>

Here's what it says about that last option:

min_sendwork_interval
Minimum number of seconds to wait after sending results to a given host, before new results are sent to the same host. Helps prevent hosts with download or application problems from trashing lots of results by returning lots of error results. But don't set it to be so long that a host goes idle after completing its work, before getting new work.


What we are seeing on fast hosts, particularly with short (2 credit) work units is exactly as described - they run out of work before they are allowed to connect again... This may become even more prevalent if the new applications are faster.

Perhaps if that were set to 5 or 10 minutes, things would run more smoothly
Join the #1 Aussie Alliance on MilkyWay!
ID: 1809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Uh Oh - Ran Out Of Work Again !

©2024 Astroinformatics Group