Message boards :
Number crunching :
Uh Oh - Ran Out Of Work Again !
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Nov 07 Posts: 17 Credit: 663,827 RAC: 0 |
Help... I'm a lonely xeon cpu that badly need's some more workunit's to crunch ! Can you help me ! Kind Regards, Happy Crunchin John :0) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Help... i just noticed :)started a new search - i'll also be starting a few more when nate sends me some new data to crunch on. a little more on that later :D |
Send message Joined: 17 Nov 07 Posts: 17 Credit: 663,827 RAC: 0 |
Hello Travis, You were on the ball with the new batch kind sir ! No sooner had i mentioned the server had ran out of WU's to forward to my rather hungry machines they began coughing and spluttering back to life again ! It's raining Milky Way WU's - I love it ! Happy Crunchi'n John :0) |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
Travis-I have a couple hosts running Milkyway solo ...only project running. Is this wise? Do you forsee running out of work at any given point that would have my machines empty of work? The server has not gone down or ran out of work for a couple of weeks now :)Thanks-Jeff |
Send message Joined: 17 Nov 07 Posts: 77 Credit: 117,183 RAC: 0 |
I know [i]I'm[/u] not Travis, but I'll try. o-o Do you forsee running out of work at any given point that would have my machines empty of work? The 'problem' is foreseeing running out of work. This- or any project- can run out unexpectedly due to power outage, hardware (server, UPS, RAM) failure, building fire, etc... I personally would recommend having two or three projects ready. [If the odds of one project at a particular moment is .5 (50%), then two projects at the same moment is .5*.5=.25, and three at the same time is .5*.5*.5=.125=12.5%=1 in 8.) |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
I know [i]I'm[/u] not Travis, but I'll try. o-o Thanks but was really asking about forseen...know about the unforseen....but check the machines alot. Two weeks ago when the server kept crashing it wasn't even a consideration. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I know [i]I'm[/u] not Travis, but I'll try. o-o well normally it's not an issue, i usually start up new searches when the old ones are close to being finished. right now i'm trying to see what effect the number of searches being run concurrently has on the convergence rates of our searches - so i have to wait for the previous batch of searches to stop before starting up a new batch. once these sets of results are finished things should go back to constantly available work. i've been trying to catch when the searches finish as fast as possible so hopefully there wont be much downtime. |
Send message Joined: 19 Nov 07 Posts: 4 Credit: 82,330,797 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. Because of the 20 limit at a time here and 20 min rpc calls,I hope you have a backup project running because some of those cores might be idle at times if you don't. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. all the work units now should be convolution, so i'm hoping the 20 limit should be able to cover it. quad core quad processors might be an issue though :P |
Send message Joined: 19 Nov 07 Posts: 4 Credit: 82,330,797 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. Dang did I break it,lol |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. and actually, i've been working on changing the server code to allow us to determine what work units get set to what computers -- i might be able to set it up so that theres a limit of work units per search, which could probably let us bump the WU limit to 30 or 40... i'll have to take a look into it. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. Travis-Another way to accomplish this is to change the rpc calls from 20min down to 10 or 15 min.....but I don't know if you want to increase the server load by 50-100% this way. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. actually i'll do that -- the server seems to be fine with the current load and could handle a bit more. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm hoping that there is plenty of work....5 dual quads just put on the project. I actually emailed the BOINC projects list about this -- Dave Anderson said that the client should automatically request new work when the # of workunits is low... there shouldn't be a 20 min RPC call. He said that if you guys had any transcripts of this to give them to him. So if you see this problem let me know and post a transcript and i'll forward it on. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
2/24/2008 4:46:25 PM|Milkyway@home|Starting gs_260_1203802157_143213_0 2/24/2008 4:46:25 PM|Milkyway@home|Starting task gs_260_1203802157_143213_0 using astronomy version 113 2/24/2008 4:52:12 PM|Milkyway@home|Sending scheduler request: To fetch work 2/24/2008 4:52:12 PM|Milkyway@home|Requesting 54653 seconds of new work 2/24/2008 4:52:22 PM|Milkyway@home|Scheduler RPC succeeded [server version 511] 2/24/2008 4:52:22 PM|Milkyway@home|Message from server: No work sent 2/24/2008 4:52:22 PM|Milkyway@home|Message from server: (reached per-host limit of 20 tasks) 2/24/2008 4:52:22 PM|Milkyway@home|Deferring communication for 20 min 0 sec 2/24/2008 4:52:22 PM|Milkyway@home|Reason: requested by project This is the message we are receiving....... now So if you run out of work in less than 20 minutes tough bananas. Its the deferring communications for 20min that needs to change. Once you reach max work there has to be a pause for the next rpc call otherwise we would be contacting the sever every 7 seconds!Hence the delay of time. What does Dr.Anderson not understand about this? |
Send message Joined: 17 Nov 07 Posts: 17 Credit: 663,827 RAC: 0 |
Lol..... You Gotta laugh ! - Haven't You :) Any one know when new WU's will be availiable ? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
i forwarded the message on to dr. anderson so hopefully he'll have a response for me soon :) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
FYI, heres the response from Dr. Anderson: This is a long-standing design flaw: i'll try and take a look into the scheduler code to see if i can figure out a fix for this. |
Send message Joined: 21 Dec 07 Posts: 69 Credit: 7,048,412 RAC: 0 |
I'm no expert on the server-side options of BOINC, but a search of the BOINC site shows the following seemingly relevant config options (at http://boinc.berkeley.edu/trac/wiki/ProjectOptions): <max_wus_in_progress> N </max_wus_in_progress> <min_sendwork_interval> N </min_sendwork_interval> Here's what it says about that last option: min_sendwork_interval What we are seeing on fast hosts, particularly with short (2 credit) work units is exactly as described - they run out of work before they are allowed to connect again... This may become even more prevalent if the new applications are faster. Perhaps if that were set to 5 or 10 minutes, things would run more smoothly Join the #1 Aussie Alliance on MilkyWay! |
©2024 Astroinformatics Group