new workunit queue size (6)

Author	Message
m4rtyn Send message Joined: 16 Jan 08 Posts: 18 Credit: 4,111,257 RAC: 0	Message 13094 - Posted: 27 Feb 2009, 19:13:46 UTC Still no improvement here, wu unit increase made no difference at all, I've still got computers Idle and comms backed of for hours. m4rtyn ***************************** ***************************** ID: 13094 · Rating: 0 · rate: / Reply Quote

Cori Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0	Message 13096 - Posted: 27 Feb 2009, 19:20:05 UTC It's really hard to catch WUs because there's several "Scheduler request completed: got 0 new tasks" messages before one gets work eventually. But it's still an improvement to have the bigger WU cache of 20/core because if you catch WUs it now takes longer until the boxes cry for more. LOL Lovely greetings, Cori ID: 13096 · Rating: 0 · rate: / Reply Quote

Buga1 Send message Joined: 27 Aug 07 Posts: 6 Credit: 305,610,813 RAC: 0	Message 13106 - Posted: 27 Feb 2009, 21:53:33 UTC Yep, not seeing a change here either. Took 30 mins of updating every min to get like 3 mins of work. Then back to nothing. Would be nice to see 20 per core sitting there. Rick ID: 13106 · Rating: 0 · rate: / Reply Quote

darkstarz1 Send message Joined: 11 Mar 08 Posts: 10 Credit: 10,647,326 RAC: 0	Message 13107 - Posted: 27 Feb 2009, 22:00:57 UTC - in response to Message 13096. It's taking several tries to get any work, and not even getting the full 20/core either... 27/02/2009 21:46:49\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3046482 seconds of work, reporting 4 completed tasks 27/02/2009 21:46:55\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:00\|SHA-1 Collision Search Graz\|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:05\|SHA-1 Collision Search Graz\|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:15\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3046900 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:20\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:30\|Milkyway@home\|Fetching scheduler list 27/02/2009 21:47:35\|Milkyway@home\|Master file download succeeded 27/02/2009 21:47:40\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3047232 seconds of work, reporting 0 completed tasks 27/02/2009 21:47:45\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:47:56\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3047577 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:01\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:48:26\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3047971 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:31\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:48:46\|Milkyway@home\|Sending scheduler request: Requested by user. Requesting 3048311 seconds of work, reporting 0 completed tasks 27/02/2009 21:48:51\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:49:52\|Milkyway@home\|Sending scheduler request: To fetch work. Requesting 3049346 seconds of work, reporting 0 completed tasks 27/02/2009 21:49:57\|Milkyway@home\|Scheduler request completed: got 8 new tasks 27/02/2009 21:49:59\|Milkyway@home\|Started download of ps_s82_9_search_parameters_186427_1235771352 27/02/2009 21:49:59\|Milkyway@home\|Started download of ps_s86_9_search_parameters_186428_1235771352 27/02/2009 21:50:00\|Milkyway@home\|Finished download of ps_s82_9_search_parameters_186427_1235771352 27/02/2009 21:50:00\|Milkyway@home\|Finished download of ps_s86_9_search_parameters_186428_1235771352 27/02/2009 21:50:00\|Milkyway@home\|Started download of ps_s86_9_search_parameters_186429_1235771352 27/02/2009 21:50:00\|Milkyway@home\|Started download of ps_s86_9_search_parameters_186430_1235771352 27/02/2009 21:50:01\|Milkyway@home\|Finished download of ps_s86_9_search_parameters_186429_1235771352 27/02/2009 21:50:01\|Milkyway@home\|Finished download of ps_s86_9_search_parameters_186430_1235771352 27/02/2009 21:50:01\|Milkyway@home\|Started download of ps_s86_9_search_parameters_186431_1235771352 27/02/2009 21:50:01\|Milkyway@home\|Started download of ps_s82_9_search_parameters_186382_1235771351 27/02/2009 21:50:02\|Milkyway@home\|Finished download of ps_s86_9_search_parameters_186431_1235771352 27/02/2009 21:50:02\|Milkyway@home\|Finished download of ps_s82_9_search_parameters_186382_1235771351 27/02/2009 21:50:02\|Milkyway@home\|Started download of ps_s82_9_search_parameters_186384_1235771351 27/02/2009 21:50:02\|Milkyway@home\|Started download of ps_s82_9_search_parameters_186385_1235771351 27/02/2009 21:50:03\|Milkyway@home\|Finished download of ps_s82_9_search_parameters_186384_1235771351 27/02/2009 21:50:03\|Milkyway@home\|Finished download of ps_s82_9_search_parameters_186385_1235771351 27/02/2009 21:50:08\|Milkyway@home\|Sending scheduler request: To fetch work. Requesting 3021818 seconds of work, reporting 0 completed tasks 27/02/2009 21:50:13\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:51:13\|Milkyway@home\|Sending scheduler request: To fetch work. Requesting 3022868 seconds of work, reporting 0 completed tasks 27/02/2009 21:51:18\|Milkyway@home\|Scheduler request completed: got 0 new tasks 27/02/2009 21:51:18\|Milkyway@home\|Message from server: No work sent 27/02/2009 21:51:18\|Milkyway@home\|Message from server: (reached per-CPU limit of 20 tasks) ID: 13107 · Rating: 0 · rate: / Reply Quote

Neal Chantrill Send message Joined: 17 Jan 09 Posts: 98 Credit: 72,182,367 RAC: 0	Message 13183 - Posted: 28 Feb 2009, 0:43:53 UTC In the 8 hours since I last posted I have had 4-5 small batches of work and thats it. ID: 13183 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 13186 - Posted: 28 Feb 2009, 1:21:31 UTC - in response to Message 13075. Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s). Alinator Yeah, I'm thinking this might be the problem. Now I just have to figure out how to increase the scheduler's queue size. Hmmm... Sorry, about the delay replying. I've been working on getting a newly acquired firewall appliance straightened out and configured, so I had to block the whole rpi.edu domain to guarantee I didn't miss data points for my hosts, so I couldn't even look at the website again until now. IIRC, the big problem in increasing the scheduler queue the way it's designed is you have to allocate more physical memory to the shared segment. If you don't have any to spare, then I guess you would be pretty well boned from a quick fix POV. ;-) Alinator ID: 13186 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 13187 - Posted: 28 Feb 2009, 1:26:14 UTC Looks like lots more WUs ready to distribute, but getting any work seems to be almost impossible. Even with manual forced requests ... zilch ... and the BOINC Projects tab shows MW up to 3 hours and more, which means more forced requests until it's back down to 1 minute. First time this has really hit me. It's letting Einstein in for crunching. ID: 13187 · Rating: 0 · rate: / Reply Quote

Debs Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0	Message 13188 - Posted: 28 Feb 2009, 1:41:15 UTC I guess as it's the weekend and Seti always goes down at the weekend as well, a few other projects are going to get some extra work done again :) ID: 13188 · Rating: 0 · rate: / Reply Quote

gomeyer Send message Joined: 26 Sep 08 Posts: 12 Credit: 1,228,382 RAC: 0	Message 13192 - Posted: 28 Feb 2009, 1:48:29 UTC As of 1:44:14 UTC server status shows 0 results ready to send and the validator seems to have stopped. Rats, I may have to temporarily go back to SETI on these two machines. Noooooooo ID: 13192 · Rating: 0 · rate: / Reply Quote

Labbie Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0	Message 13195 - Posted: 28 Feb 2009, 2:00:46 UTC Is it just me or has it been harder to get work since the cache limit was bumped up? Calm Chaos Forum...Join Calm Chaos Now ID: 13195 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 13197 - Posted: 28 Feb 2009, 2:12:07 UTC - in response to Message 13195. The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P ID: 13197 · Rating: 0 · rate: / Reply Quote

Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 13198 - Posted: 28 Feb 2009, 2:15:27 UTC Just got 17 new tasks after many update tries Glenn ID: 13198 · Rating: 0 · rate: / Reply Quote

Labbie Send message Joined: 29 Aug 07 Posts: 327 Credit: 116,463,193 RAC: 0	Message 13201 - Posted: 28 Feb 2009, 2:31:57 UTC - in response to Message 13197. The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P I'm talking about all day, not just the last hour or so. Calm Chaos Forum...Join Calm Chaos Now ID: 13201 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 13202 - Posted: 28 Feb 2009, 2:39:54 UTC - in response to Message 13201. The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P I'm talking about all day, not just the last hour or so. I actually think you might be right here. If the queue on the scheduler is not large enough, this means it takes less requests to clean out it's queue. I'm going to lower the number to 6 and see if that helps. ID: 13202 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 13203 - Posted: 28 Feb 2009, 2:46:13 UTC Last modified: 28 Feb 2009, 2:47:22 UTC Ahh, never mind - just saw the message about limiting it to 6... Ok. I hope you have a nice big network pipe - lots of requests coming your way :) . ID: 13203 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 13205 - Posted: 28 Feb 2009, 2:48:01 UTC - in response to Message 13203. Last modified: 28 Feb 2009, 2:49:06 UTC 2/27/2009 19:45:33\|Milkyway@home\|Message from server: (reached per-CPU limit of 6 tasks) ??? See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message. edit before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well. ID: 13205 · Rating: 0 · rate: / Reply Quote

Glenn Rogers Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0	Message 13206 - Posted: 28 Feb 2009, 2:48:56 UTC - in response to Message 13202. Hi Travis, Why a limit of 6/cpu we had 12 up till last nite things were running not bad with a limit of 12 maybe you should put it back that... Glenn ID: 13206 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 13208 - Posted: 28 Feb 2009, 2:59:27 UTC - in response to Message 13205. Last modified: 28 Feb 2009, 3:00:07 UTC See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message. edit before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well. Yea, I saw that after I posted - How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation.... I also remember you saying something about a budget and not having another machine. . ID: 13208 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 13209 - Posted: 28 Feb 2009, 3:02:48 UTC - in response to Message 13208. See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message. edit before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well. Yea, I saw that after I posted - How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation.... Work is generated more than fast enough. I've never seen less than 300 WUs available server-side today. What I'm thinking the problem is, AFAIK, is that the scheduler uses a shared memory queue to store WUs which it can send out to clients, the feeder puts WUs into this queue for the scheduler. What's happening is that before the feeder can re-fill up the queue for the scheduler, the queue gets emptied by WU requests so people are getting no work sent. I'm pretty sure we need to increase the scheduler's queue size, but we need to do some work with labstaff to get that done, so it probably won't happen until early next week (like monday). So until then, I think the 6 WU queue should help keep enough work available for the scheduler. ID: 13209 · Rating: 0 · rate: / Reply Quote

Debs Send message Joined: 15 Jan 09 Posts: 169 Credit: 6,734,481 RAC: 0	Message 13210 - Posted: 28 Feb 2009, 3:06:38 UTC I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests. Given that once a client is unable to receive work a couple of times, the time between requests increases, there are going to be a lot more people out of work for longer periods because of such a small amount of work (on all my systems, 6 wu per core will typically keep me going between approx 60 and 90 minutes, and you are not proving you can feed us with work that fast). ID: 13210 · Rating: 0 · rate: / Reply Quote