Welcome to MilkyWay@home

new workunit queue size (6)

Message boards : Number crunching : new workunit queue size (6)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Profile m4rtyn
Avatar

Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 13094 - Posted: 27 Feb 2009, 19:13:46 UTC

Still no improvement here, wu unit increase made no difference at all, I've still got computers Idle and comms backed of for hours.
m4rtyn
******************************* *******************************

ID: 13094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 13096 - Posted: 27 Feb 2009, 19:20:05 UTC

It's really hard to catch WUs because there's several "Scheduler request completed: got 0 new tasks" messages before one gets work eventually.

But it's still an improvement to have the bigger WU cache of 20/core because if you catch WUs it now takes longer until the boxes cry for more. *LOL*
Lovely greetings, Cori
ID: 13096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Buga1

Send message
Joined: 27 Aug 07
Posts: 6
Credit: 305,610,813
RAC: 0
Message 13106 - Posted: 27 Feb 2009, 21:53:33 UTC

Yep, not seeing a change here either. Took 30 mins of updating every min to get like 3 mins of work. Then back to nothing.

Would be nice to see 20 per core sitting there.

Rick
ID: 13106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile darkstarz1

Send message
Joined: 11 Mar 08
Posts: 10
Credit: 10,647,326
RAC: 0
Message 13107 - Posted: 27 Feb 2009, 22:00:57 UTC - in response to Message 13096.  

It's taking several tries to get any work, and not even getting the full 20/core either...

27/02/2009 21:46:49|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046482 seconds of work, reporting 4 completed tasks
27/02/2009 21:46:55|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:00|SHA-1 Collision Search Graz|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:05|SHA-1 Collision Search Graz|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:15|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046900 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:20|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:30|Milkyway@home|Fetching scheduler list
27/02/2009 21:47:35|Milkyway@home|Master file download succeeded
27/02/2009 21:47:40|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047232 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:45|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:56|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047577 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:01|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:48:26|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047971 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:31|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:48:46|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3048311 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:51|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:49:52|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3049346 seconds of work, reporting 0 completed tasks
27/02/2009 21:49:57|Milkyway@home|Scheduler request completed: got 8 new tasks
27/02/2009 21:49:59|Milkyway@home|Started download of ps_s82_9_search_parameters_186427_1235771352
27/02/2009 21:49:59|Milkyway@home|Started download of ps_s86_9_search_parameters_186428_1235771352
27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s82_9_search_parameters_186427_1235771352
27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s86_9_search_parameters_186428_1235771352
27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186429_1235771352
27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186430_1235771352
27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186429_1235771352
27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186430_1235771352
27/02/2009 21:50:01|Milkyway@home|Started download of ps_s86_9_search_parameters_186431_1235771352
27/02/2009 21:50:01|Milkyway@home|Started download of ps_s82_9_search_parameters_186382_1235771351
27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s86_9_search_parameters_186431_1235771352
27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s82_9_search_parameters_186382_1235771351
27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186384_1235771351
27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186385_1235771351
27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186384_1235771351
27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186385_1235771351
27/02/2009 21:50:08|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3021818 seconds of work, reporting 0 completed tasks
27/02/2009 21:50:13|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:51:13|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3022868 seconds of work, reporting 0 completed tasks
27/02/2009 21:51:18|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:51:18|Milkyway@home|Message from server: No work sent
27/02/2009 21:51:18|Milkyway@home|Message from server: (reached per-CPU limit of 20 tasks)
ID: 13107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Neal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
Message 13183 - Posted: 28 Feb 2009, 0:43:53 UTC

In the 8 hours since I last posted I have had 4-5 small batches of work and thats it.
ID: 13183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 13186 - Posted: 28 Feb 2009, 1:21:31 UTC - in response to Message 13075.  

Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s).

Alinator


Yeah, I'm thinking this might be the problem. Now I just have to figure out how to increase the scheduler's queue size.


Hmmm...

Sorry, about the delay replying. I've been working on getting a newly acquired firewall appliance straightened out and configured, so I had to block the whole rpi.edu domain to guarantee I didn't miss data points for my hosts, so I couldn't even look at the website again until now.

IIRC, the big problem in increasing the scheduler queue the way it's designed is you have to allocate more physical memory to the shared segment. If you don't have any to spare, then I guess you would be pretty well boned from a quick fix POV. ;-)

Alinator
ID: 13186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 13187 - Posted: 28 Feb 2009, 1:26:14 UTC

Looks like lots more WUs ready to distribute, but getting any work seems to be almost impossible.

Even with manual forced requests ... zilch ... and the BOINC Projects tab shows MW up to 3 hours and more, which means more forced requests until it's back down to 1 minute.

First time this has really hit me. It's letting Einstein in for crunching.
ID: 13187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Debs

Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13188 - Posted: 28 Feb 2009, 1:41:15 UTC

I guess as it's the weekend and Seti always goes down at the weekend as well, a few other projects are going to get some extra work done again :)
ID: 13188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gomeyer
Avatar

Send message
Joined: 26 Sep 08
Posts: 12
Credit: 1,228,382
RAC: 0
Message 13192 - Posted: 28 Feb 2009, 1:48:29 UTC

As of 1:44:14 UTC server status shows 0 results ready to send and the validator seems to have stopped. Rats, I may have to temporarily go back to SETI on these two machines.

Noooooooo
ID: 13192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 13195 - Posted: 28 Feb 2009, 2:00:46 UTC

Is it just me or has it been harder to get work since the cache limit was bumped up?


Calm Chaos Forum...Join Calm Chaos Now
ID: 13195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13197 - Posted: 28 Feb 2009, 2:12:07 UTC - in response to Message 13195.  

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P
ID: 13197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 13198 - Posted: 28 Feb 2009, 2:15:27 UTC

Just got 17 new tasks after many update tries
Glenn
ID: 13198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 13201 - Posted: 28 Feb 2009, 2:31:57 UTC - in response to Message 13197.  

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P


I'm talking about all day, not just the last hour or so.


Calm Chaos Forum...Join Calm Chaos Now
ID: 13201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13202 - Posted: 28 Feb 2009, 2:39:54 UTC - in response to Message 13201.  

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P


I'm talking about all day, not just the last hour or so.


I actually think you might be right here. If the queue on the scheduler is not large enough, this means it takes less requests to clean out it's queue.

I'm going to lower the number to 6 and see if that helps.
ID: 13202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13203 - Posted: 28 Feb 2009, 2:46:13 UTC
Last modified: 28 Feb 2009, 2:47:22 UTC

Ahh, never mind - just saw the message about limiting it to 6... Ok.

I hope you have a nice big network pipe - lots of requests coming your way :)
.
ID: 13203 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13205 - Posted: 28 Feb 2009, 2:48:01 UTC - in response to Message 13203.  
Last modified: 28 Feb 2009, 2:49:06 UTC

2/27/2009 19:45:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks)


???


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.
ID: 13205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 13206 - Posted: 28 Feb 2009, 2:48:56 UTC - in response to Message 13202.  

Hi Travis,
Why a limit of 6/cpu we had 12 up till last nite things were running not bad with a limit of 12 maybe you should put it back that...
Glenn
ID: 13206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13208 - Posted: 28 Feb 2009, 2:59:27 UTC - in response to Message 13205.  
Last modified: 28 Feb 2009, 3:00:07 UTC


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.



Yea, I saw that after I posted -

How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation.... I also remember you saying something about a budget and not having another machine.
.
ID: 13208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13209 - Posted: 28 Feb 2009, 3:02:48 UTC - in response to Message 13208.  


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.



Yea, I saw that after I posted -

How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation....



Work is generated more than fast enough. I've never seen less than 300 WUs available server-side today.

What I'm thinking the problem is, AFAIK, is that the scheduler uses a shared memory queue to store WUs which it can send out to clients, the feeder puts WUs into this queue for the scheduler.

What's happening is that before the feeder can re-fill up the queue for the scheduler, the queue gets emptied by WU requests so people are getting no work sent.

I'm pretty sure we need to increase the scheduler's queue size, but we need to do some work with labstaff to get that done, so it probably won't happen until early next week (like monday).

So until then, I think the 6 WU queue should help keep enough work available for the scheduler.
ID: 13209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Debs

Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13210 - Posted: 28 Feb 2009, 3:06:38 UTC

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.

Given that once a client is unable to receive work a couple of times, the time between requests increases, there are going to be a lot more people out of work for longer periods because of such a small amount of work (on all my systems, 6 wu per core will typically keep me going between approx 60 and 90 minutes, and you are not proving you can feed us with work that fast).
ID: 13210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Number crunching : new workunit queue size (6)

©2024 Astroinformatics Group