Message boards :
Number crunching :
new workunit queue size (6)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Send message Joined: 16 Jan 08 Posts: 18 Credit: 4,111,257 RAC: 0 |
Still no improvement here, wu unit increase made no difference at all, I've still got computers Idle and comms backed of for hours. m4rtyn ******************************* ******************************* |
Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0 |
It's really hard to catch WUs because there's several "Scheduler request completed: got 0 new tasks" messages before one gets work eventually. But it's still an improvement to have the bigger WU cache of 20/core because if you catch WUs it now takes longer until the boxes cry for more. *LOL* Lovely greetings, Cori |
Send message Joined: 27 Aug 07 Posts: 6 Credit: 305,610,813 RAC: 0 |
Yep, not seeing a change here either. Took 30 mins of updating every min to get like 3 mins of work. Then back to nothing. Would be nice to see 20 per core sitting there. Rick |
Send message Joined: 11 Mar 08 Posts: 10 Credit: 10,647,326 RAC: 0 |
It's taking several tries to get any work, and not even getting the full 20/core either... 27/02/2009 21:46:49|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046482 seconds of work, reporting 4 completed tasks 27/02/2009 21:46:55|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:00|SHA-1 Collision Search Graz|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:05|SHA-1 Collision Search Graz|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:15|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046900 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:20|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:30|Milkyway@home|Fetching scheduler list 27/02/2009 21:47:35|Milkyway@home|Master file download succeeded 27/02/2009 21:47:40|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047232 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:45|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:56|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047577 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:01|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:48:26|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047971 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:31|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:48:46|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3048311 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:51|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:49:52|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3049346 seconds of work, reporting 0 completed tasks 27/02/2009 21:49:57|Milkyway@home|Scheduler request completed: got 8 new tasks 27/02/2009 21:49:59|Milkyway@home|Started download of ps_s82_9_search_parameters_186427_1235771352 27/02/2009 21:49:59|Milkyway@home|Started download of ps_s86_9_search_parameters_186428_1235771352 27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s82_9_search_parameters_186427_1235771352 27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s86_9_search_parameters_186428_1235771352 27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186429_1235771352 27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186430_1235771352 27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186429_1235771352 27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186430_1235771352 27/02/2009 21:50:01|Milkyway@home|Started download of ps_s86_9_search_parameters_186431_1235771352 27/02/2009 21:50:01|Milkyway@home|Started download of ps_s82_9_search_parameters_186382_1235771351 27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s86_9_search_parameters_186431_1235771352 27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s82_9_search_parameters_186382_1235771351 27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186384_1235771351 27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186385_1235771351 27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186384_1235771351 27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186385_1235771351 27/02/2009 21:50:08|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3021818 seconds of work, reporting 0 completed tasks 27/02/2009 21:50:13|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:51:13|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3022868 seconds of work, reporting 0 completed tasks 27/02/2009 21:51:18|Milkyway@home|Scheduler request completed: got 0 new tasks 27/02/2009 21:51:18|Milkyway@home|Message from server: No work sent 27/02/2009 21:51:18|Milkyway@home|Message from server: (reached per-CPU limit of 20 tasks) |
Send message Joined: 17 Jan 09 Posts: 98 Credit: 72,182,367 RAC: 0 |
In the 8 hours since I last posted I have had 4-5 small batches of work and thats it. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s). Hmmm... Sorry, about the delay replying. I've been working on getting a newly acquired firewall appliance straightened out and configured, so I had to block the whole rpi.edu domain to guarantee I didn't miss data points for my hosts, so I couldn't even look at the website again until now. IIRC, the big problem in increasing the scheduler queue the way it's designed is you have to allocate more physical memory to the shared segment. If you don't have any to spare, then I guess you would be pretty well boned from a quick fix POV. ;-) Alinator |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Looks like lots more WUs ready to distribute, but getting any work seems to be almost impossible. Even with manual forced requests ... zilch ... and the BOINC Projects tab shows MW up to 3 hours and more, which means more forced requests until it's back down to 1 minute. First time this has really hit me. It's letting Einstein in for crunching. |
Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0 |
I guess as it's the weekend and Seti always goes down at the weekend as well, a few other projects are going to get some extra work done again :) |
Send message Joined: 26 Sep 08 Posts: 12 Credit: 1,228,382 RAC: 0 |
As of 1:44:14 UTC server status shows 0 results ready to send and the validator seems to have stopped. Rats, I may have to temporarily go back to SETI on these two machines. Noooooooo |
Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0 |
Is it just me or has it been harder to get work since the cache limit was bumped up? Calm Chaos Forum...Join Calm Chaos Now |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P |
Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0 |
Just got 17 new tasks after many update tries Glenn |
Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0 |
The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P I'm talking about all day, not just the last hour or so. Calm Chaos Forum...Join Calm Chaos Now |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P I actually think you might be right here. If the queue on the scheduler is not large enough, this means it takes less requests to clean out it's queue. I'm going to lower the number to 6 and see if that helps. |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
2/27/2009 19:45:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks) See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message. *edit* before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well. |
Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0 |
Hi Travis, Why a limit of 6/cpu we had 12 up till last nite things were running not bad with a limit of 12 maybe you should put it back that... Glenn |
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
Yea, I saw that after I posted - How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation.... I also remember you saying something about a budget and not having another machine. . |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Work is generated more than fast enough. I've never seen less than 300 WUs available server-side today. What I'm thinking the problem is, AFAIK, is that the scheduler uses a shared memory queue to store WUs which it can send out to clients, the feeder puts WUs into this queue for the scheduler. What's happening is that before the feeder can re-fill up the queue for the scheduler, the queue gets emptied by WU requests so people are getting no work sent. I'm pretty sure we need to increase the scheduler's queue size, but we need to do some work with labstaff to get that done, so it probably won't happen until early next week (like monday). So until then, I think the 6 WU queue should help keep enough work available for the scheduler. |
Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0 |
I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests. Given that once a client is unable to receive work a couple of times, the time between requests increases, there are going to be a lot more people out of work for longer periods because of such a small amount of work (on all my systems, 6 wu per core will typically keep me going between approx 60 and 90 minutes, and you are not proving you can feed us with work that fast). |
©2024 Astroinformatics Group