new workunit queue size (6)
log in

Advanced search

Message boards : Number crunching : new workunit queue size (6)

Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13057 - Posted: 27 Feb 2009 | 16:28:43 UTC

Let me know if the increase queue size helps at all with the work availability.
____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13058 - Posted: 27 Feb 2009 | 16:33:16 UTC - in response to Message 13057.
Last modified: 27 Feb 2009 | 16:35:23 UTC

Let me know if the increase queue size helps at all with the work availability.

Looking forward to it Travis, but right now I have this;

27/02/2009 16:31:35|Milkyway@home|Message from server: Project is temporarily shut down for maintenance

No work availabilty just now

[edit] I just got a task. Whooohoooo! :)
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13059 - Posted: 27 Feb 2009 | 16:34:12 UTC - in response to Message 13058.

Let me know if the increase queue size helps at all with the work availability.

Looking forward to it Travis, but right now I have this;

27/02/2009 16:31:35|Milkyway@home|Message from server: Project is temporarily shut down for maintenance

No work availabilty just now


Yeah, I just restarted the server :P Work should start flowing now.
____________

Profile [XTBA>XTC] ZeuZ
Send message
Joined: 27 Dec 07
Posts: 14
Credit: 5,089,974
RAC: 0
Message 13060 - Posted: 27 Feb 2009 | 16:36:08 UTC - in response to Message 13057.

Let me know if the increase queue size helps at all with the work availability.


Not really

27/02/2009 17:34:31 Milkyway@home Sending scheduler request: Requested by user.
27/02/2009 17:34:31 Milkyway@home Reporting 1 completed tasks, requesting new tasks
27/02/2009 17:34:36 Milkyway@home Scheduler request completed: got 0 new tasks
27/02/2009 17:35:07 Milkyway@home Sending scheduler request: Requested by user.
27/02/2009 17:35:07 Milkyway@home Requesting new tasks
27/02/2009 17:35:12 Milkyway@home Scheduler request completed: got 0 new tasks


:(

Profile Neal Chantrill
Avatar
Send message
Joined: 17 Jan 09
Posts: 96
Credit: 69,437,271
RAC: 9,249
Message 13061 - Posted: 27 Feb 2009 | 16:37:24 UTC

Still nothing here.

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13062 - Posted: 27 Feb 2009 | 16:37:37 UTC - in response to Message 13059.
Last modified: 27 Feb 2009 | 16:45:35 UTC

Let me know if the increase queue size helps at all with the work availability.

Looking forward to it Travis, but right now I have this;

27/02/2009 16:31:35|Milkyway@home|Message from server: Project is temporarily shut down for maintenance

No work availabilty just now


Yeah, I just restarted the server :P Work should start flowing now.

Yes, I just got a task, but no more;

27/02/2009 16:36:26|Milkyway@home|Sending scheduler request: To fetch work. Requesting 1745280 seconds of work, reporting 0 completed tasks
27/02/2009 16:36:32|Milkyway@home|Scheduler request succeeded: got 0 new tasks

____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13064 - Posted: 27 Feb 2009 | 16:42:27 UTC
Last modified: 27 Feb 2009 | 16:46:51 UTC

Hmmm...

Insta-Purge seems to back to. :-(

UGGGHHH! PITA time again for data logging.

<edit> It seems to me that if you cannot even accommodate a 1 hour delay in purging the completed tasks, then you have a serious overbooking and/or backend capacity problem that allowing more work out in the field is not going to help one iota.

Alinator

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13065 - Posted: 27 Feb 2009 | 16:44:02 UTC

Nothing at all since that one task;


27/02/2009 16:43:14|Milkyway@home|Scheduler request succeeded: got 0 new tasks


____________

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13067 - Posted: 27 Feb 2009 | 16:46:22 UTC

> got 49 WU's on one machine, 50 WU's on another but these are the exception... mostly '0 new tasks' after repeated manual primes...

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13068 - Posted: 27 Feb 2009 | 16:49:50 UTC - in response to Message 13067.

It's gonna take the server awhile to catch up I think. Give it a little time :)
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13069 - Posted: 27 Feb 2009 | 16:50:30 UTC

Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s).

Alinator

David @ TPS
Send message
Joined: 20 Nov 08
Posts: 24
Credit: 2,561,361
RAC: 0
Message 13071 - Posted: 27 Feb 2009 | 16:53:58 UTC

Just got 80 for the Quad.
____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13072 - Posted: 27 Feb 2009 | 16:56:32 UTC


OK, I just got a bunch through. It was quite a shock to see so many :P

____________

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13073 - Posted: 27 Feb 2009 | 17:00:01 UTC - in response to Message 13072.


OK, I just got a bunch through. It was quite a shock to see so many :P


Change can be good :)

gomeyer
Avatar
Send message
Joined: 26 Sep 08
Posts: 12
Credit: 191,244
RAC: 0
Message 13074 - Posted: 27 Feb 2009 | 17:02:25 UTC

Yup, the new limit came down right away on two machines with one manual update. Looks good so far.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13075 - Posted: 27 Feb 2009 | 17:03:29 UTC - in response to Message 13069.

Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s).

Alinator


Yeah, I'm thinking this might be the problem. Now I just have to figure out how to increase the scheduler's queue size.
____________

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13079 - Posted: 27 Feb 2009 | 17:45:32 UTC

I'm not seeing a reduction in the amount of '0 new tasks' and am not hitting the WU download limit.

Profile Paul D. Buck
Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 13081 - Posted: 27 Feb 2009 | 18:10:52 UTC

Sitting idle here ... sigh ... when I checked the server page it said 500 available ... now 437 ... refetch master file, still nothing ...

ARRRRGGGGGHHHHH!

idle ATI GPU ... man this is depressing ...

Riil
Send message
Joined: 10 Feb 09
Posts: 13
Credit: 713,485
RAC: 0
Message 13088 - Posted: 27 Feb 2009 | 18:22:06 UTC - in response to Message 13079.

Nearly 80 new WU here aswell.

BarryAZ
Send message
Joined: 1 Sep 08
Posts: 512
Credit: 223,261,844
RAC: 166,583
Message 13089 - Posted: 27 Feb 2009 | 18:41:46 UTC - in response to Message 13081.

I'm wondering if one could code a script which would request an update at say 15 second intervals automatically -- that would eventually get work, of course hammering the server might not have the right effect if such a script got out in the wild....
____________

Profile m4rtyn
Avatar
Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 13094 - Posted: 27 Feb 2009 | 19:13:46 UTC

Still no improvement here, wu unit increase made no difference at all, I've still got computers Idle and comms backed of for hours.
____________
m4rtyn
******************************* *******************************

Profile Cori
Avatar
Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 13096 - Posted: 27 Feb 2009 | 19:20:05 UTC

It's really hard to catch WUs because there's several "Scheduler request completed: got 0 new tasks" messages before one gets work eventually.

But it's still an improvement to have the bigger WU cache of 20/core because if you catch WUs it now takes longer until the boxes cry for more. *LOL*
____________
Lovely greetings, Cori

Profile Buga1
Send message
Joined: 27 Aug 07
Posts: 6
Credit: 305,046,710
RAC: 6,138
Message 13106 - Posted: 27 Feb 2009 | 21:53:33 UTC

Yep, not seeing a change here either. Took 30 mins of updating every min to get like 3 mins of work. Then back to nothing.

Would be nice to see 20 per core sitting there.

Rick

Profile darkstarz1
Send message
Joined: 11 Mar 08
Posts: 10
Credit: 10,647,326
RAC: 0
Message 13107 - Posted: 27 Feb 2009 | 22:00:57 UTC - in response to Message 13096.

It's taking several tries to get any work, and not even getting the full 20/core either...

27/02/2009 21:46:49|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046482 seconds of work, reporting 4 completed tasks
27/02/2009 21:46:55|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:00|SHA-1 Collision Search Graz|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:05|SHA-1 Collision Search Graz|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:15|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3046900 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:20|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:30|Milkyway@home|Fetching scheduler list
27/02/2009 21:47:35|Milkyway@home|Master file download succeeded
27/02/2009 21:47:40|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047232 seconds of work, reporting 0 completed tasks
27/02/2009 21:47:45|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:47:56|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047577 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:01|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:48:26|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3047971 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:31|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:48:46|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3048311 seconds of work, reporting 0 completed tasks
27/02/2009 21:48:51|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:49:52|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3049346 seconds of work, reporting 0 completed tasks
27/02/2009 21:49:57|Milkyway@home|Scheduler request completed: got 8 new tasks
27/02/2009 21:49:59|Milkyway@home|Started download of ps_s82_9_search_parameters_186427_1235771352
27/02/2009 21:49:59|Milkyway@home|Started download of ps_s86_9_search_parameters_186428_1235771352
27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s82_9_search_parameters_186427_1235771352
27/02/2009 21:50:00|Milkyway@home|Finished download of ps_s86_9_search_parameters_186428_1235771352
27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186429_1235771352
27/02/2009 21:50:00|Milkyway@home|Started download of ps_s86_9_search_parameters_186430_1235771352
27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186429_1235771352
27/02/2009 21:50:01|Milkyway@home|Finished download of ps_s86_9_search_parameters_186430_1235771352
27/02/2009 21:50:01|Milkyway@home|Started download of ps_s86_9_search_parameters_186431_1235771352
27/02/2009 21:50:01|Milkyway@home|Started download of ps_s82_9_search_parameters_186382_1235771351
27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s86_9_search_parameters_186431_1235771352
27/02/2009 21:50:02|Milkyway@home|Finished download of ps_s82_9_search_parameters_186382_1235771351
27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186384_1235771351
27/02/2009 21:50:02|Milkyway@home|Started download of ps_s82_9_search_parameters_186385_1235771351
27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186384_1235771351
27/02/2009 21:50:03|Milkyway@home|Finished download of ps_s82_9_search_parameters_186385_1235771351
27/02/2009 21:50:08|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3021818 seconds of work, reporting 0 completed tasks
27/02/2009 21:50:13|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:51:13|Milkyway@home|Sending scheduler request: To fetch work. Requesting 3022868 seconds of work, reporting 0 completed tasks
27/02/2009 21:51:18|Milkyway@home|Scheduler request completed: got 0 new tasks
27/02/2009 21:51:18|Milkyway@home|Message from server: No work sent
27/02/2009 21:51:18|Milkyway@home|Message from server: (reached per-CPU limit of 20 tasks)

Profile Neal Chantrill
Avatar
Send message
Joined: 17 Jan 09
Posts: 96
Credit: 69,437,271
RAC: 9,249
Message 13183 - Posted: 28 Feb 2009 | 0:43:53 UTC

In the 8 hours since I last posted I have had 4-5 small batches of work and thats it.

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13186 - Posted: 28 Feb 2009 | 1:21:31 UTC - in response to Message 13075.

Well I haven't been following their work feeder problem that closely. However, in general BOINC terms, if you cannot transition enough work fast enough into the schedulers limited size queue, then this is what you end up seeing even if there is plenty of tasks coming out of the work generator(s).

Alinator


Yeah, I'm thinking this might be the problem. Now I just have to figure out how to increase the scheduler's queue size.


Hmmm...

Sorry, about the delay replying. I've been working on getting a newly acquired firewall appliance straightened out and configured, so I had to block the whole rpi.edu domain to guarantee I didn't miss data points for my hosts, so I couldn't even look at the website again until now.

IIRC, the big problem in increasing the scheduler queue the way it's designed is you have to allocate more physical memory to the shared segment. If you don't have any to spare, then I guess you would be pretty well boned from a quick fix POV. ;-)

Alinator

John Clark
Send message
Joined: 4 Oct 08
Posts: 1613
Credit: 62,030,625
RAC: 27,550
Message 13187 - Posted: 28 Feb 2009 | 1:26:14 UTC

Looks like lots more WUs ready to distribute, but getting any work seems to be almost impossible.

Even with manual forced requests ... zilch ... and the BOINC Projects tab shows MW up to 3 hours and more, which means more forced requests until it's back down to 1 minute.

First time this has really hit me. It's letting Einstein in for crunching.

Debs
Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13188 - Posted: 28 Feb 2009 | 1:41:15 UTC

I guess as it's the weekend and Seti always goes down at the weekend as well, a few other projects are going to get some extra work done again :)
____________

gomeyer
Avatar
Send message
Joined: 26 Sep 08
Posts: 12
Credit: 191,244
RAC: 0
Message 13192 - Posted: 28 Feb 2009 | 1:48:29 UTC

As of 1:44:14 UTC server status shows 0 results ready to send and the validator seems to have stopped. Rats, I may have to temporarily go back to SETI on these two machines.

Noooooooo

Profile Labbie
Avatar
Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 13195 - Posted: 28 Feb 2009 | 2:00:46 UTC

Is it just me or has it been harder to get work since the cache limit was bumped up?

____________

Calm Chaos Forum...Join Calm Chaos Now

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13197 - Posted: 28 Feb 2009 | 2:12:07 UTC - in response to Message 13195.

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P
____________

Profile Glenn Rogers
Avatar
Send message
Joined: 4 Jul 08
Posts: 165
Credit: 363,844
RAC: 0
Message 13198 - Posted: 28 Feb 2009 | 2:15:27 UTC

Just got 17 new tasks after many update tries
Glenn
____________

Profile Labbie
Avatar
Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 13201 - Posted: 28 Feb 2009 | 2:31:57 UTC - in response to Message 13197.

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P


I'm talking about all day, not just the last hour or so.

____________

Calm Chaos Forum...Join Calm Chaos Now

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13202 - Posted: 28 Feb 2009 | 2:39:54 UTC - in response to Message 13201.

The assimilator/validator was crashed for the last hour or so, which would make it hard to get work :P


I'm talking about all day, not just the last hour or so.


I actually think you might be right here. If the queue on the scheduler is not large enough, this means it takes less requests to clean out it's queue.

I'm going to lower the number to 6 and see if that helps.
____________

Profile Kevint
Avatar
Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13203 - Posted: 28 Feb 2009 | 2:46:13 UTC
Last modified: 28 Feb 2009 | 2:47:22 UTC

Ahh, never mind - just saw the message about limiting it to 6... Ok.

I hope you have a nice big network pipe - lots of requests coming your way :)
____________
.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13205 - Posted: 28 Feb 2009 | 2:48:01 UTC - in response to Message 13203.
Last modified: 28 Feb 2009 | 2:49:06 UTC

2/27/2009 19:45:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks)


???


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.
____________

Profile Glenn Rogers
Avatar
Send message
Joined: 4 Jul 08
Posts: 165
Credit: 363,844
RAC: 0
Message 13206 - Posted: 28 Feb 2009 | 2:48:56 UTC - in response to Message 13202.

Hi Travis,
Why a limit of 6/cpu we had 12 up till last nite things were running not bad with a limit of 12 maybe you should put it back that...
Glenn
____________

Profile Kevint
Avatar
Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13208 - Posted: 28 Feb 2009 | 2:59:27 UTC - in response to Message 13205.
Last modified: 28 Feb 2009 | 3:00:07 UTC


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.



Yea, I saw that after I posted -

How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation.... I also remember you saying something about a budget and not having another machine.
____________
.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13209 - Posted: 28 Feb 2009 | 3:02:48 UTC - in response to Message 13208.


See the above post. I dropped the queue down to 6 per core to see if this will help more people get work. It was looking like with the limit at 20, the scheduler was running out of work to send very quickly and more people were getting the out of work message.

*edit*

before you flip out too much -- this is just temporary to see if this helps with work availability. if it doesn't help i'll increase the queue again. also, once we get a larger queue for the scheduler we'll be able to increase the work unit queue as well.



Yea, I saw that after I posted -

How fast do you generate work? With the number of GPU clients, and more to come, you may have to do review how the work is generated. I think there are several projects that have a separate server just for work generation....



Work is generated more than fast enough. I've never seen less than 300 WUs available server-side today.

What I'm thinking the problem is, AFAIK, is that the scheduler uses a shared memory queue to store WUs which it can send out to clients, the feeder puts WUs into this queue for the scheduler.

What's happening is that before the feeder can re-fill up the queue for the scheduler, the queue gets emptied by WU requests so people are getting no work sent.

I'm pretty sure we need to increase the scheduler's queue size, but we need to do some work with labstaff to get that done, so it probably won't happen until early next week (like monday).

So until then, I think the 6 WU queue should help keep enough work available for the scheduler.
____________

Debs
Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13210 - Posted: 28 Feb 2009 | 3:06:38 UTC

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.

Given that once a client is unable to receive work a couple of times, the time between requests increases, there are going to be a lot more people out of work for longer periods because of such a small amount of work (on all my systems, 6 wu per core will typically keep me going between approx 60 and 90 minutes, and you are not proving you can feed us with work that fast).
____________

Profile Kevint
Avatar
Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13211 - Posted: 28 Feb 2009 | 3:09:08 UTC
Last modified: 28 Feb 2009 | 3:13:57 UTC

Travis,

As long as you think your server can handle the requests..

This is just 1 machine with a GPU - you will notice it requesting every few seconds, and getting 1 - it is going to destroy your bandwidth - and may cause some issues on the RPC calls on your side.. I am not sure about this, but something to consider.


2/27/2009 19:59:30|Milkyway@home|Sending scheduler request: Requested by user. Requesting 2488366 seconds of work, reporting 1 completed tasks
2/27/2009 19:59:35|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 19:59:46|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488191 seconds of work, reporting 1 completed tasks
2/27/2009 19:59:51|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 20:00:02|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488079 seconds of work, reporting 1 completed tasks
2/27/2009 20:00:07|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 20:00:17|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2487999 seconds of work, reporting 1 completed tasks
2/27/2009 20:00:22|Milkyway@home|Scheduler request completed: got 1 new tasks




Guess it is time to open up another project to run as well.
____________
.

gomeyer
Avatar
Send message
Joined: 26 Sep 08
Posts: 12
Credit: 191,244
RAC: 0
Message 13212 - Posted: 28 Feb 2009 | 3:10:53 UTC

Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13215 - Posted: 28 Feb 2009 | 3:13:41 UTC - in response to Message 13210.

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.


I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests...
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13216 - Posted: 28 Feb 2009 | 3:14:34 UTC - in response to Message 13212.

Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time.


I was thinking about doing something like this. Going to let the 6 WU queue go for awhile and see if it helps anything first.
____________

Bob in FL
Send message
Joined: 19 Jul 08
Posts: 5
Credit: 2,547,855
RAC: 0
Message 13218 - Posted: 28 Feb 2009 | 3:35:50 UTC

Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others:

"Scheduler request completed: got 0 new tasks"

Profile Arion
Avatar
Send message
Joined: 10 Aug 08
Posts: 198
Credit: 14,874,834
RAC: 33,364
Message 13219 - Posted: 28 Feb 2009 | 3:41:17 UTC - in response to Message 13216.


I was thinking about doing something like this. Going to let the 6 WU queue go for awhile and see if it helps anything first.


I haven't been able to keep my caches full (set for .1 days) for a while now. Cutting back to 6 per core is just going to make it beg even more. Seems to me this would cause more problems for the server instead of it making fewer requests at a longer interval. so I'll help out. Setting all systems to pull from einstein when the server here won't honor requests for work.

Not a complaint as I know you got problems, but would rather have something to do than the computers sitting at idle for long stretches. Maybe this will take some of the load off. Probably wouldn't hurt if others did the same thing over the weekend until you got this worked out. (but then again if more people were doing that you would probably think 6 is enough) <smile>


____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13221 - Posted: 28 Feb 2009 | 3:44:28 UTC - in response to Message 13218.

Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others:

"Scheduler request completed: got 0 new tasks"


I've been debugging the new code for the assimilator/validator which lets us do some validation of workunits to keep our searches from getting screwed up by invalid results. This has caused the server to crash quite a few times this evening, so that might be causing a lot of the lack of work.

The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out.
____________

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 13223 - Posted: 28 Feb 2009 | 4:48:30 UTC - in response to Message 13221.

The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out.

My quad seems to be keeping filled so far. It's running the GPU app, 0.19.
____________

Profile Glenn Rogers
Avatar
Send message
Joined: 4 Jul 08
Posts: 165
Credit: 363,844
RAC: 0
Message 13224 - Posted: 28 Feb 2009 | 4:59:13 UTC

Gday all.. Just saw this result in my task list wondering why it is so???

Task ID 12690510
Name ps_s82_10_394_1235776518_0
Workunit 12383039
Created 27 Feb 2009 23:15:21 UTC
Sent 27 Feb 2009 23:15:54 UTC
Received 28 Feb 2009 4:51:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 21898
Report deadline 2 Mar 2009 23:15:54 UTC
CPU time 1027.891
stderr out <core_client_version>6.4.6</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Genuine Intel(R) CPU T2300 @ 1.66GHz (2 cores/threads) 1.66677 GHz (917ms)

WU completed. It took 1027.89 seconds CPU time and 1038.24 seconds wall clock time @ 1.66678 GHz.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.83954728750274
Granted credit 0
application version 0.19

Every other task has been validated correctly dont know why this is the odd one out.......
____________

BarryAZ
Send message
Joined: 1 Sep 08
Posts: 512
Credit: 223,261,844
RAC: 166,583
Message 13225 - Posted: 28 Feb 2009 | 6:09:49 UTC - in response to Message 13202.

What lowering to 6 does is temporarily (say for maybe a couple of hours) reduce the false (success / 0 new work) messages by replacing them with 'met your CPU limit of 6'. Then, when the completed work drops the queue back from 20 to 6, the same problem pops up again -- but *more* frequently. Now one needs to hit the server for more work almost continuously since it may take about 15 minutes of server pounding for more work to get 45 minutes of work by which time two more work units have completed.

I'm still trying to figure out how one can script a 'pound the server continuously script' <rueful smile>.



I'm going to lower the number to 6 and see if that helps.


____________

BarryAZ
Send message
Joined: 1 Sep 08
Posts: 512
Credit: 223,261,844
RAC: 166,583
Message 13226 - Posted: 28 Feb 2009 | 6:10:54 UTC - in response to Message 13219.

Yup -- I think you are spot on here.


I haven't been able to keep my caches full (set for .1 days) for a while now. Cutting back to 6 per core is just going to make it beg even more. Seems to me this would cause more problems for the server instead of it making fewer requests at a longer interval. so I'll help out. Setting all systems to pull from einstein when the server here won't honor requests for work.

Not a complaint as I know you got problems, but would rather have something to do than the computers sitting at idle for long stretches. Maybe this will take some of the load off. Probably wouldn't hurt if others did the same thing over the weekend until you got this worked out. (but then again if more people were doing that you would probably think 6 is enough) <smile>



____________

Profile mscharmack
Avatar
Send message
Joined: 4 Dec 07
Posts: 45
Credit: 1,253,522
RAC: 0
Message 13227 - Posted: 28 Feb 2009 | 6:24:52 UTC
Last modified: 28 Feb 2009 | 6:42:29 UTC

Feed the beast. Limiting on the workunit queue size to 6 will not stop the beast from starvation. Feed the beast.


Holy Mackerel! Call headquarters. Get the lieutenant.
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13228 - Posted: 28 Feb 2009 | 6:41:46 UTC - in response to Message 13224.

Gday all.. Just saw this result in my task list wondering why it is so???

Task ID 12690510
Name ps_s82_10_394_1235776518_0
Workunit 12383039
Created 27 Feb 2009 23:15:21 UTC
Sent 27 Feb 2009 23:15:54 UTC
Received 28 Feb 2009 4:51:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 21898
Report deadline 2 Mar 2009 23:15:54 UTC
CPU time 1027.891
stderr out <core_client_version>6.4.6</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Genuine Intel(R) CPU T2300 @ 1.66GHz (2 cores/threads) 1.66677 GHz (917ms)

WU completed. It took 1027.89 seconds CPU time and 1038.24 seconds wall clock time @ 1.66678 GHz.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.83954728750274
Granted credit 0
application version 0.19

Every other task has been validated correctly dont know why this is the odd one out.......


Hmmm...

Hard to say, but assuming it wasn't just a coincidence the most likely reason is that the backend 'lost' the output file (due to the troubleshooting at MW) for some reason.

If there aren't any further repeats of it, then I wouldn't worry about it.

Alinator

Profile m4rtyn
Avatar
Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 13229 - Posted: 28 Feb 2009 | 7:18:57 UTC

Sorry, but reducing the wu limit to 6 has only made things worse.
____________
m4rtyn
******************************* *******************************

Riil
Send message
Joined: 10 Feb 09
Posts: 13
Credit: 713,485
RAC: 0
Message 13234 - Posted: 28 Feb 2009 | 8:26:32 UTC
Last modified: 28 Feb 2009 | 8:42:00 UTC

2009-02-28 09:22:33|Milkyway@home|Message from server: No work sent
2009-02-28 09:22:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks)

I see this info more often than this about getin' 0 new tasks. So it's gettin' better now imo. Anyway i see that more WUs is with granted credit 0.
As far as i can see now 0 credit is granted in about 10 % of my new WUs.

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13236 - Posted: 28 Feb 2009 | 8:50:39 UTC - in response to Message 13234.

i see that more WUs is with granted credit 0.

I'm seeing WUs claiming zero credit and being awarded the credit they jolly well deserve.

(OK, it's some GPU which show zero crunching time but I've timed some of them on a stopwatch and they do actually take a few seconds. I mean, I wouldn't even see them if it was zero seconds, would I)


____________

etrecords
Send message
Joined: 15 May 08
Posts: 7
Credit: 86,619,410
RAC: 171
Message 13238 - Posted: 28 Feb 2009 | 9:18:07 UTC

At this moment I see also workunits that are getting q credits. I did not see this before this night. Now it looks like to happen with about 10% of the wu.

etrecords
Send message
Joined: 15 May 08
Posts: 7
Credit: 86,619,410
RAC: 171
Message 13239 - Posted: 28 Feb 2009 | 9:21:17 UTC

I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable.

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13241 - Posted: 28 Feb 2009 | 9:39:24 UTC - in response to Message 13239.

I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable.

Hallelujah, someone's happy ;)

____________

John Clark
Send message
Joined: 4 Oct 08
Posts: 1613
Credit: 62,030,625
RAC: 27,550
Message 13243 - Posted: 28 Feb 2009 | 9:44:51 UTC
Last modified: 28 Feb 2009 | 9:45:19 UTC

Compared to yesterday my rigs (CPU only) seem to be running well, and fed. Yesterday, with the cache at 20 per CPU, I was in the same position as everyone else - zilch.

Profile Cori
Avatar
Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 13244 - Posted: 28 Feb 2009 | 9:53:38 UTC

Still have problems with comps running dry (overnight).
If they don't get work within the first requests of the BOINC manager then the requests get delayed more and more - and the boxes are sitting idle. *sniff sniff*

PS. The lame P4 boxes have more chances to have WUs running of course, the faster comps need to be fed more often. ;-)
____________
Lovely greetings, Cori

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13245 - Posted: 28 Feb 2009 | 9:53:50 UTC - in response to Message 13243.

Compared to yesterday my rigs (CPU only) seem to be running well, and fed. Yesterday, with the cache at 20 per CPU, I was in the same position as everyone else - zilch.

Everyone? Zilch? I got a shed load of WUs yesterday, I can tell you. Did anyone else get any? Did everyone get zilch WUs yesterday? Was I the only one who got any WUs yesterday?

If you're going to bitch and complain, at least don't drag me into your 'everyone'.

____________

Pwrguru
Send message
Joined: 30 Aug 08
Posts: 24
Credit: 245,446,780
RAC: 276,852
Message 13247 - Posted: 28 Feb 2009 | 9:56:08 UTC - in response to Message 13238.

At this moment I see also workunits that are getting q credits. I did not see this before this night. Now it looks like to happen with about 10% of the wu.
I am seeing a rate of about 15% and it only just started with the newer work units....

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,865,573
RAC: 0
Message 13249 - Posted: 28 Feb 2009 | 10:05:24 UTC - in response to Message 13245.

Compared to yesterday my rigs (CPU only) seem to be running well, and fed. Yesterday, with the cache at 20 per CPU, I was in the same position as everyone else - zilch.

Everyone? Zilch? I got a shed load of WUs yesterday, I can tell you. Did anyone else get any? Did everyone get zilch WUs yesterday? Was I the only one who got any WUs yesterday?

If you're going to bitch and complain, at least don't drag me into your 'everyone'.

You'd have to be someone first...

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 92
Credit: 1,562,391,719
RAC: 912,805
Message 13265 - Posted: 28 Feb 2009 | 11:23:09 UTC

With the 20 limit, I got more 0 responses than before, but on the other hand I got more WUs when I occasionally got some.

With the limit at 6, I anticipated a lot of 'limit reached' responses, so I lowered my cache from one to half a day. So far, I've seen *none* of the 0 responses, but only a few WUs with each request.

Of course, both my ATI-equipped crunchers are asking for work almost continuously now, but that doesn't seem to matter much.
____________

Profile [BAT] Annabel
Avatar
Send message
Joined: 31 Aug 08
Posts: 1
Credit: 807,772
RAC: 1,116
Message 13269 - Posted: 28 Feb 2009 | 12:09:58 UTC
Last modified: 28 Feb 2009 | 12:15:32 UTC

If there are so many problems with getting as many WU's out as is necessary to keep all crunching cores happily working, why do not you increase the crunching length of a WU? If you increase it by a factor of two, then the amount of SQL-requests to the dataserver should be lowered and server responsiveness should increase accordingly...
____________

Copycat-Digital for WCG*
Avatar
Send message
Joined: 18 Nov 07
Posts: 32
Credit: 35,792,028
RAC: 0
Message 13272 - Posted: 28 Feb 2009 | 12:26:05 UTC - in response to Message 13269.

OK shoot me for this!
The baby ATI cruncher grew into a monster with an unsatisfying appetite
A quick calculation shows that a 4850 in a quad can gobble up +/- 1400 work units in an hour!
These monsters multiplied rapidly when a lot of crunchers rushed to the stores to get theirs
Now the server and network can’t keep up feeding these hungry monsters.

Result:
Everyone is starving!

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13273 - Posted: 28 Feb 2009 | 12:26:58 UTC
Last modified: 28 Feb 2009 | 12:27:58 UTC

I still run out of work, two CPU apps dried up over the last 3 hours left to their own devices. The small WU queue did improve the situation a lot though.
This is with a 4870- not so good:

02/28/09 06:10:38|Milkyway@home|Sending scheduler request: To fetch work. Requesting 129629 seconds of work, reporting 8 completed tasks
02/28/09 06:10:43|Milkyway@home|Scheduler request completed: got 8 new tasks
02/28/09 06:10:53|Milkyway@home|Sending scheduler request: To fetch work. Requesting 115461 seconds of work, reporting 0 completed tasks
02/28/09 06:10:58|Milkyway@home|Scheduler request completed: got 0 new tasks
02/28/09 06:12:00|Milkyway@home|Sending scheduler request: To fetch work. Requesting 132473 seconds of work, reporting 8 completed tasks
02/28/09 06:12:05|Milkyway@home|Scheduler request completed: got 0 new tasks
02/28/09 06:13:05|Milkyway@home|Sending scheduler request: To fetch work. Requesting 149010 seconds of work, reporting 0 completed tasks
02/28/09 06:13:10|Milkyway@home|Scheduler request completed: got 0 new tasks
02/28/09 06:14:11|Milkyway@home|Sending scheduler request: To fetch work. Requesting 165398 seconds of work, reporting 9 completed tasks
02/28/09 06:14:16|Milkyway@home|Scheduler request completed: got 0 new tasks
02/28/09 06:15:16|Milkyway@home|Sending scheduler request: To fetch work. Requesting 172803 seconds of work, reporting 7 completed tasks
02/28/09 06:15:21|Milkyway@home|Scheduler request completed: got 0 new tasks

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13274 - Posted: 28 Feb 2009 | 12:35:14 UTC - in response to Message 13272.
Last modified: 28 Feb 2009 | 12:36:11 UTC

OK shoot me for this!
The baby ATI cruncher grew into a monster with an unsatisfying appetite
A quick calculation shows that a 4850 in a quad can gobble up +/- 1400 work units in an hour!
These monsters multiplied rapidly when a lot of crunchers rushed to the stores to get theirs
Now the server and network can’t keep up feeding these hungry monsters.

Result:
Everyone is starving!

Bang! Travis has recently repeated that there are plenty of WUs on the server side. The problem is releasing them to the clients (us), which I understand requires a server memory fix which will be done on Monday, hopefully, the 20/6 WUs being a temporary fix. It has nothing to do do with WU shortage so blaming ATI will do nothing for you.

Go here; http://milkyway.cs.rpi.edu/milkyway/server_status.php

There are 700 WUs for you not being gobbled by ATIs

[edit] it just went up to 800 +
____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13282 - Posted: 28 Feb 2009 | 12:54:06 UTC - in response to Message 13276.

Although the server side says there is plenty of WUs to be released so, clearly, the issuing to work to requests are the problem still. Again I am, uncharacteristicly, out of work.

Let us hope the problem, and server cache work waiting, issues can be solved early next week.

::Waving to Anton:: who is still sore!

Sore? Try a few more idiotic statements to entertain us.

____________

Profile Al*
Avatar
Send message
Joined: 8 Nov 07
Posts: 323
Credit: 1,362,120
RAC: 0
Message 13284 - Posted: 28 Feb 2009 | 13:07:07 UTC

1 WU today 0 yesterday.

John Clark
Send message
Joined: 4 Oct 08
Posts: 1613
Credit: 62,030,625
RAC: 27,550
Message 13287 - Posted: 28 Feb 2009 | 13:12:24 UTC
Last modified: 28 Feb 2009 | 13:13:30 UTC

:lolol: :lolol: :lolol: :lolol:

Profile Cori
Avatar
Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 13294 - Posted: 28 Feb 2009 | 13:26:33 UTC

Funny kindergarten here.
____________
Lovely greetings, Cori

STE\/E
Send message
Joined: 29 Aug 07
Posts: 486
Credit: 572,432,344
RAC: 10
Message 13297 - Posted: 28 Feb 2009 | 13:31:54 UTC - in response to Message 13284.

1 WU today 0 yesterday.


The work can be had, I've been getting some across my entire Pharm, so you have some other reason your not getting any WU's ...

Profile Al*
Avatar
Send message
Joined: 8 Nov 07
Posts: 323
Credit: 1,362,120
RAC: 0
Message 13299 - Posted: 28 Feb 2009 | 13:34:37 UTC - in response to Message 13297.
Last modified: 28 Feb 2009 | 13:37:10 UTC

Beats me, ill check out the second PC to see if it has any work.

_________________________
Other one has a ton of work, weird.

Copycat-Digital for WCG*
Avatar
Send message
Joined: 18 Nov 07
Posts: 32
Credit: 35,792,028
RAC: 0
Message 13301 - Posted: 28 Feb 2009 | 13:37:31 UTC - in response to Message 13274.
Last modified: 28 Feb 2009 | 13:41:33 UTC

[/quote]
Bang! Travis has recently repeated that there are plenty of WUs on the server side. The problem is releasing them to the clients (us), which I understand requires a server memory fix which will be done on Monday, hopefully, the 20/6 WUs being a temporary fix. It has nothing to do do with WU shortage so blaming ATI will do nothing for you.

Go here; http://milkyway.cs.rpi.edu/milkyway/server_status.php

There are 700 WUs for you not being gobbled by ATIs

[edit] it just went up to 800 +[/quote]

Sorry I think you missed

Server status shows 753 ready
I immediately hit update on my dry quad and received 0
I’m not blaming ATI (its ability is amazing!) but this started to escalate when more and more crunchers uses their ATI’s.
I looked randomly at 10 of the top 100 computer’s RAC – they are all ATI equipped.

Other fast PC’s may also contribute to this.
I’m also guilty because I OC’d my Q8200 to 2.9 GHz to pump up my RAC
It’s doing a S82 in 250 seconds
Yesterday I turned down the cache to minimum 0.01 but its scheduler still wants 9000 seconds of work and repeatedly ask for more every 60 seconds even with 76 wu’s ready to start
Doesn’t this put a big strain on the network and server?

Edit
I can also install ATI but with my luck my RAC will go down
____________
A BLAST FROM YOUR PAST

STE\/E
Send message
Joined: 29 Aug 07
Posts: 486
Credit: 572,432,344
RAC: 10
Message 13303 - Posted: 28 Feb 2009 | 13:43:07 UTC - in response to Message 13299.

Beats me, ill check out the second PC to see if it has any work.

_________________________
Other one has a ton of work, weird.


Just Reset the Project, don't Detach but just reset it if it doesn't have any work & see if you get some. You may not right first but in awhile you might again ...

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13307 - Posted: 28 Feb 2009 | 13:50:59 UTC - in response to Message 13303.
Last modified: 28 Feb 2009 | 13:51:47 UTC

Beats me, ill check out the second PC to see if it has any work.

_________________________
Other one has a ton of work, weird.


Just Reset the Project, don't Detach but just reset it if it doesn't have any work & see if you get some. You may not right first but in awhile you might again ...


Wait a second...

Resetting the project as a workaround to get more work might be counter productive. The reason is that even if you already have the third party apps to install afterwards, your host is going to have to re-download all the input files it needs that got dumped from the reset. If running stock, it will have to download the app again too.

This is probably not going to do much to help the overall situation. ;-)

Alinator

Profile Al*
Avatar
Send message
Joined: 8 Nov 07
Posts: 323
Credit: 1,362,120
RAC: 0
Message 13309 - Posted: 28 Feb 2009 | 13:54:58 UTC

Hmm ya, one out of two working is good enough.

STE\/E
Send message
Joined: 29 Aug 07
Posts: 486
Credit: 572,432,344
RAC: 10
Message 13310 - Posted: 28 Feb 2009 | 13:58:08 UTC - in response to Message 13307.

Beats me, ill check out the second PC to see if it has any work.

_________________________
Other one has a ton of work, weird.


Just Reset the Project, don't Detach but just reset it if it doesn't have any work & see if you get some. You may not right first but in awhile you might again ...


Wait a second...

Resetting the project as a workaround to get more work might be counter productive. The reason is that even if you already have the third party apps to install afterwards, your host is going to have to re-download all the input files it needs that got dumped from the reset. If running stock, it will have to download the app again too.

This is probably not going to do much to help the overall situation. ;-)

Alinator


No, Resetting the Project won't get rid of any Optimized App's, Third Party App's I don't know about but the Optimized ones don't get taken out when Resetting. Yes you'll have to re-download the input files & if you have Dial-Up it could take some time or pose a problem.

I do it all the time but I have Cable so it's no big deal for me to re-download the input files, it only takes a few seconds for me to do that. It works for me but if somebody has a problem with it then don't do it, more work for me then ... :)

Debs
Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13315 - Posted: 28 Feb 2009 | 14:09:18 UTC - in response to Message 13215.

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.


I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests...


Larger work units would be a real fix, and would give you more time to sort out the problem with how slowly work is being released at times.
____________

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 13317 - Posted: 28 Feb 2009 | 14:11:32 UTC - in response to Message 13315.

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.


I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests...


Larger work units would be a real fix, and would give you more time to sort out the problem with how slowly work is being released at times.


I brought this up a couple weeks ago and Travis said they contain enough info and the 'group' doesn't want to increase the size any more.
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Debs
Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13319 - Posted: 28 Feb 2009 | 14:15:41 UTC - in response to Message 13317.

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.


I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests...


Larger work units would be a real fix, and would give you more time to sort out the problem with how slowly work is being released at times.


I brought this up a couple weeks ago and Travis said they contain enough info and the 'group' doesn't want to increase the size any more.


LOL, I'd love to know what majority group thinks everything is fine as it is :)
____________

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 13320 - Posted: 28 Feb 2009 | 14:22:22 UTC - in response to Message 13319.



I brought this up a couple weeks ago and Travis said they contain enough info and the 'group' doesn't want to increase the size any more.


LOL, I'd love to know what majority group thinks everything is fine as it is :)


Whoever he is supposed to see about the project. Probably the same people who don't think a second server is needed. :P
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13321 - Posted: 28 Feb 2009 | 14:26:18 UTC - in response to Message 13301.


I looked randomly at 10 of the top 100 computer’s RAC – they are all ATI equipped.

I expect the top crunchers amassed most of their credits before the ATIs were available for optimized crunching. ATI GPU crunching is relatively quite recent and I expect people want a stable period of crunching with the ATIs before saying how they are doing with them.

____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13328 - Posted: 28 Feb 2009 | 15:31:39 UTC - in response to Message 13310.
Last modified: 28 Feb 2009 | 15:32:42 UTC



Wait a second...

Resetting the project as a workaround to get more work might be counter productive. The reason is that even if you already have the third party apps to install afterwards, your host is going to have to re-download all the input files it needs that got dumped from the reset. If running stock, it will have to download the app again too.

This is probably not going to do much to help the overall situation. ;-)

Alinator


No, Resetting the Project won't get rid of any Optimized App's, Third Party App's I don't know about but the Optimized ones don't get taken out when Resetting. Yes you'll have to re-download the input files & if you have Dial-Up it could take some time or pose a problem.

I do it all the time but I have Cable so it's no big deal for me to re-download the input files, it only takes a few seconds for me to do that. It works for me but if somebody has a problem with it then don't do it, more work for me then ... :)


Agreed, not all CC's will dump the whole project directory on a reset. However, all will dump the stock app on one.

However, that wasn't the real point. The new GPU apps are bandwidth and work sponges from the Projects POV.

Therefore, 'casual' resetting forces those hosts to pull all the input files and a whole new set of work from empty, with the attendant DB and and bandwidth load that goes along with that. Given that the project is having trouble keeping up as it is, anything that increase that even more only makes things worse for the big picture. Also, there is no guarantee that the scheduler queue hasn't been sucked dry just as the host gets reset, so the whole process would have been done for no net gain in that case.

Alinator

Profile Neal Chantrill
Avatar
Send message
Joined: 17 Jan 09
Posts: 96
Credit: 69,437,271
RAC: 9,249
Message 13331 - Posted: 28 Feb 2009 | 15:40:50 UTC

I know have work on all computers and including the GPU where I was getting practically nothing.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13334 - Posted: 28 Feb 2009 | 15:47:00 UTC - in response to Message 13331.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13336 - Posted: 28 Feb 2009 | 15:53:48 UTC - in response to Message 13334.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.


LOL...

Well, we had our three steps backwards over the last 36 hours or so.

So I guess that would be about 2 1/2 forward now? ;-)

Alinator

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13341 - Posted: 28 Feb 2009 | 16:28:40 UTC - in response to Message 13334.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.



The only problem- and it's a big one- is when you have 'got 0 new tasks' multiple times in a row, which does still happen, and run dry and the reconnect time goes to 2 or 3 hours and you have nothing to crunch until then... any way to change/fix(?) that?

Alinator
Send message
Joined: 7 Jun 08
Posts: 393
Credit: 20,843,949
RAC: 65,532
Message 13343 - Posted: 28 Feb 2009 | 16:36:01 UTC - in response to Message 13341.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.



The only problem- and it's a big one- is when you have 'got 0 new tasks' multiple times in a row, which does still happen, and run dry and the reconnect time goes to 2 or 3 hours and you have nothing to crunch until then... any way to change/fix(?) that?


It depends on what mean by 'fix'.

If you are talking about MW being able to feed work to the scheduler faster, then no, not until Monday at the earliest.

If you're talking about not running dry, then you have two choices:

1.) Sit at the console all the time and pound on the update button when you are out.

2.) Run a backup project and just ride out these difficulties until a satisfactory backend fix can be implemented.

Alinator


Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 13345 - Posted: 28 Feb 2009 | 16:56:36 UTC - in response to Message 13343.



The only problem- and it's a big one- is when you have 'got 0 new tasks' multiple times in a row, which does still happen, and run dry and the reconnect time goes to 2 or 3 hours and you have nothing to crunch until then... any way to change/fix(?) that?


It depends on what mean by 'fix'.

If you are talking about MW being able to feed work to the scheduler faster, then no, not until Monday at the earliest.

If you're talking about not running dry, then you have two choices:

1.) Sit at the console all the time and pound on the update button when you are out.

2.) Run a backup project and just ride out these difficulties until a satisfactory backend fix can be implemented.

Alinator


I think he was talking about the 3 hour delay.

____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 92
Credit: 1,562,391,719
RAC: 912,805
Message 13355 - Posted: 28 Feb 2009 | 17:51:58 UTC - in response to Message 13265.

So far, I've seen *none* of the 0 responses

I spoke too soon. Quite a few 0 responses hidden in the logs, and running dry on one occasion. But still a significant improvement over both the previous per-core quotas tried, as far as my crunchers are concerned.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13356 - Posted: 28 Feb 2009 | 17:58:28 UTC - in response to Message 13355.

So far, I've seen *none* of the 0 responses

I spoke too soon. Quite a few 0 responses hidden in the logs, and running dry on one occasion. But still a significant improvement over both the previous per-core quotas tried, as far as my crunchers are concerned.


Should be even better now that the workunits should take around twice as long to crunch.
____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13360 - Posted: 28 Feb 2009 | 18:13:33 UTC - in response to Message 13356.


Should be even better now that the workunits should take around twice as long to crunch.

Huh. Making us work twice as slow eh? :p

____________

JAMC
Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 13366 - Posted: 28 Feb 2009 | 18:37:22 UTC - in response to Message 13345.



The only problem- and it's a big one- is when you have 'got 0 new tasks' multiple times in a row, which does still happen, and run dry and the reconnect time goes to 2 or 3 hours and you have nothing to crunch until then... any way to change/fix(?) that?


It depends on what mean by 'fix'.

If you are talking about MW being able to feed work to the scheduler faster, then no, not until Monday at the earliest.

If you're talking about not running dry, then you have two choices:

1.) Sit at the console all the time and pound on the update button when you are out.

2.) Run a backup project and just ride out these difficulties until a satisfactory backend fix can be implemented.

Alinator


I think he was talking about the 3 hour delay.


'If you are talking about MW being able to feed work to the scheduler faster...'
I'm not.

'I think he was talking about the 3 hour delay.'
I was.

Debs
Send message
Joined: 15 Jan 09
Posts: 169
Credit: 6,734,481
RAC: 0
Message 13367 - Posted: 28 Feb 2009 | 18:39:44 UTC

Good to see that the idea of increasing the size of a work unit was eventually taken on board. Making them smaller always seemed crazy when they are mostly completed so fast :)
____________

Profile Misfit
Avatar
Send message
Joined: 27 Aug 07
Posts: 915
Credit: 1,503,115
RAC: 0
Message 13380 - Posted: 28 Feb 2009 | 19:17:39 UTC - in response to Message 13360.

Should be even better now that the workunits should take around twice as long to crunch.

Huh. Making us work twice as slow eh? :p

Ooh 2 seconds then.
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 13382 - Posted: 28 Feb 2009 | 19:28:46 UTC - in response to Message 13380.

Should be even better now that the workunits should take around twice as long to crunch.

Huh. Making us work twice as slow eh? :p

Ooh 2 seconds then.


Hey now, some of the GPU apps are taking a whole 3 seconds.
____________

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 13391 - Posted: 28 Feb 2009 | 21:04:37 UTC - in response to Message 13382.

Should be even better now that the workunits should take around twice as long to crunch.

Huh. Making us work twice as slow eh? :p

Ooh 2 seconds then.


Hey now, some of the GPU apps are taking a whole 3 seconds.


Need a faster app now.
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Profile magyarficko
Send message
Joined: 22 Jan 09
Posts: 35
Credit: 46,731,190
RAC: 0
Message 13396 - Posted: 28 Feb 2009 | 21:24:45 UTC - in response to Message 13356.

Should be even better now that the workunits should take around twice as long to crunch.


Wonderful, we've solved the problem of WU availability, but now you've effectively cut WU credits almost in half (for me) yet again!

[B^S] Beremat
Send message
Joined: 19 Feb 09
Posts: 33
Credit: 1,111,866
RAC: 0
Message 13397 - Posted: 28 Feb 2009 | 21:35:54 UTC - in response to Message 13396.

Should be even better now that the workunits should take around twice as long to crunch.


Wonderful, we've solved the problem of WU availability, but now you've effectively cut WU credits almost in half (for me) yet again!


Just because they take longer to crunch doesn't mean that they don't give more credits. :D
____________

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 13398 - Posted: 28 Feb 2009 | 21:37:00 UTC - in response to Message 13397.

Unless it's the same credits as the previous amount of work.
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 92
Credit: 1,562,391,719
RAC: 912,805
Message 13399 - Posted: 28 Feb 2009 | 21:37:05 UTC - in response to Message 13396.

Should be even better now that the workunits should take around twice as long to crunch.


Wonderful, we've solved the problem of WU availability, but now you've effectively cut WU credits almost in half (for me) yet again!


No. The longer WUs are granted more credit each.
____________

Profile magyarficko
Send message
Joined: 22 Jan 09
Posts: 35
Credit: 46,731,190
RAC: 0
Message 13400 - Posted: 28 Feb 2009 | 21:38:18 UTC - in response to Message 13396.

but now you've effectively cut WU credits almost in half (for me) yet again!


Apologies! Problem seems to have been rectified now, but earlier my WU's were running twice as long and still getting same amount of credits.

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 92
Credit: 1,562,391,719
RAC: 912,805
Message 13401 - Posted: 28 Feb 2009 | 21:42:10 UTC - in response to Message 13341.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.



The only problem- and it's a big one- is when you have 'got 0 new tasks' multiple times in a row, which does still happen, and run dry and the reconnect time goes to 2 or 3 hours and you have nothing to crunch until then... any way to change/fix(?) that?


I think the increasing back-off times are built into the BOINC core client, so there's nothing anyone at MW can do about it. Berkeley's idea behind this design was to ease the burden for project servers after being offline for a while (as seems to be common over at SAH), making clients contact them over a longer period of time instead of all at once.
____________

Profile caferace
Avatar
Send message
Joined: 4 Aug 08
Posts: 46
Credit: 8,255,900
RAC: 0
Message 13402 - Posted: 28 Feb 2009 | 21:52:09 UTC

Travis, thanks for keeping my beasties fed.

-jim

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,865,573
RAC: 0
Message 13417 - Posted: 28 Feb 2009 | 23:07:33 UTC

My quady appears to be much happier now, as does my 2 * work C2D's and my old P4. I wouldn't have thought that decreasing the cached wu's per cpu would work, but this and extending the crunching time of the wu's appears to have done the job! Well done Travis.

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 92
Credit: 1,562,391,719
RAC: 912,805
Message 13420 - Posted: 28 Feb 2009 | 23:11:42 UTC

I second the two previous posts. Thanks, Travis (also for the work you've done *before* visible success).
____________

Profile Zanth
Avatar
Send message
Joined: 18 Feb 09
Posts: 144
Credit: 44,887,561
RAC: 30,274
Message 13427 - Posted: 28 Feb 2009 | 23:34:25 UTC

My i7 is just fine, but my Core2Quad running on GPU is almost never running at its full potential. I usually only get 10 WUs when a request goes in, it crunches 8 in about 5-10 minutes, starts up two, gets a few more, maybe 8 more, sometimes 4, sometimes none. But at any rate, quite often today I've seen my system crunching 6 WUs and having none complete, so it made a request and just got nothing, or perhaps only 6. But I'm pretty positive its not had 24 WUs at any time today.

Temujin
Send message
Joined: 12 Oct 07
Posts: 77
Credit: 404,471,187
RAC: 0
Message 13429 - Posted: 28 Feb 2009 | 23:49:09 UTC - in response to Message 13427.

All running well here, thanks Travis

BarryAZ
Send message
Joined: 1 Sep 08
Posts: 512
Credit: 223,261,844
RAC: 166,583
Message 13437 - Posted: 1 Mar 2009 | 0:49:08 UTC - in response to Message 13334.

OK -- what I see is that this works ok *if* the only application running is Milkyway -- I'm doing that as a test on a batch of computers and it does work. However, if one is doing multiple projects (like I normally do), then the small cache tends to result in other projects *with lower resource shares* actually getting a larger proportion of CPU cycles because they have larger caches with similar due dates (examples in particular would be Spinhenge and Poem for me, but it also seems to apply to a lesser degree with SETI, Einstein, Rosetta). The only project which stays reasonable is Climate -- but that's because the due dates are so far out.

I know have work on all computers and including the GPU where I was getting practically nothing.


Server-side, it looks like the change helped work availability quite a bit.


____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13440 - Posted: 1 Mar 2009 | 1:05:25 UTC - in response to Message 13382.
Last modified: 1 Mar 2009 | 1:16:37 UTC

Hey now, some of the GPU apps are taking a whole 3 seconds.

Not really. But the latest test app does not use a full CPU core to poll the GPU all the time. That lowers the CPU load quite a bit and therefore the reported time, which is actually the CPU time. Ice has posted already one result that took him mere 0.96 CPU seconds to crunch (but a bit more on the GPU of course).

It is a bit hard to get some meaningful timing for a GPU app. If only one WU at a time would run, one could report the wall clock time. but my app tries to overlap several WUs to increase the GPU load. Therefore the wall clock time is also not reliable (you can have a look in the stderr.txt output). Actually I can let the app report any time you want. So if you have a wish... If you want we can skew the cpcs values (they are skewed anyway with GPU apps) on the stats sites with some creative timing ;)

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13474 - Posted: 1 Mar 2009 | 14:22:59 UTC - in response to Message 13440.
Last modified: 1 Mar 2009 | 14:23:38 UTC

Hey now, some of the GPU apps are taking a whole 3 seconds.

Not really.
...<snip>...
Ice has posted already one result that took him mere 0.96 CPU seconds to crunch (but a bit more on the GPU of course).
...<snip>...
If you want we can skew the cpcs values (they are skewed anyway with GPU apps) on the stats sites with some creative timing ;)

Yes of course. I have a stopwatch on my mobile, not too accurate for timing but, prior to the increased length WU since yesterday, they were taking around 8 seconds GMT time.

As for the sub-second you quoted above - I caught one even quicker ;)

CPU: Intel(R) Pentium(R) D CPU 2.80GHz (2 cores/threads) 2.79297 GHz (347ms)

CAL Runtime: 1.3.145
Found 1 CAL device

Device 0: ATI Radeon HD 4800 (RV770) 512 MB local RAM (remote 28 MB cached + 512 MB uncached)
GPU core clock: 680 MHz, memory clock: 750 MHz
800 shader units organized in 10 SIMDs with 16 VLIW units (5-issue)
supporting double precision

0 WUs already running on GPU 0
Starting WU on GPU 0
Calculated about 1.85078e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU.
Calculated about 8.03964e+008 floatingpoint ops on FPU (stars).
WU completed. It took 0.953125 seconds CPU time and 25.528 seconds wall clock time @ 2.79307 GHz.

____________

John Clark
Send message
Joined: 4 Oct 08
Posts: 1613
Credit: 62,030,625
RAC: 27,550
Message 13635 - Posted: 2 Mar 2009 | 15:38:36 UTC
Last modified: 2 Mar 2009 | 15:43:48 UTC

Thank you Travis for the other requested changes.

It looks like the changes you and Dave have made are keeping things going smoothly - at least from this end they seem to be. Crunchers are fully fed and have some morsels waiting to gnaw on.

I wonder if any of the GPU crunchers (nice RACs) are equally resplendent with the WU availability?

Post to thread

Message boards : Number crunching : new workunit queue size (6)


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group