Welcome to MilkyWay@home

new workunit limit

Message boards : Number crunching : new workunit limit
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6951 - Posted: 29 Nov 2008, 21:06:40 UTC

It looks like the transitioner really can't keep up with what's going on with milkyway right now, so in order to speed things up i would like to reduce the workunit limit (5 would be ideal, 10 passable), to reduce the size of the database, which would speed things up.

Now that the server is assigning WUs at a per-core rate as opposed to a per-computer rate, i think this is should work out fine; it will also give us better results for the searches we're running.

I'm going to lower the WU limit to 5 and if this is really unworkable i'll raise it. Hopefully this should speed up the transitioner and make more work available.
ID: 6951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 6953 - Posted: 29 Nov 2008, 21:26:46 UTC

5 doesn't do it unless they are made longer, as in an hour not 10 min.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 6953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 6958 - Posted: 29 Nov 2008, 21:37:15 UTC
Last modified: 29 Nov 2008, 21:50:19 UTC

Travis

I think 5 per core is too low for the new Penryn Quads. These crunch WUs in about 3 - 5 minutes.

If the server contact script was modified to allow a maximum server recontact time of, say 10 minutes then this may work. But, I would also recommend a slightly higher WU-per-core number (say 10 max). ATM the server recontact script boots through a few minutes then quickly escalates to 45 minutes, then 55 minutes then to 1hour, 2 hours and 3 hours.

My server recontact time is the same as JAMC's 'communication deferred' time.
ID: 6958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 6959 - Posted: 29 Nov 2008, 21:38:14 UTC
Last modified: 29 Nov 2008, 21:40:57 UTC

The killer for my quads is the 'communication deferred' time in the BOINC projects tab- is that set by the project or BOINC? Even with network activity set to always on, when the communication deferred time goes to 2 or 3+ hours I lose all hope of keeping WU's cached and always run out and that's with 20 WU/core limit- we have to get longer WU's to make the change to 5WU's/core work and I guess that means we all have to run test apps as well...
ID: 6959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thierry Godefroy

Send message
Joined: 29 Jul 08
Posts: 9
Credit: 2,200,784
RAC: 0
Message 6972 - Posted: 29 Nov 2008, 22:44:13 UTC

This is pure non-sense... Even with 20 WUs per core the queue was stalling unless I manually requested more work.

The problem being that when BOINC gets replied "Reached CPU limit" several times in a raw, it starts delaying the work request, and in the end it gets delayed by over 3 hours... And as 20 WUs are crunched in under 110 minutes, you get a queue stall (not to mention it's a Hell of a nightmare to get just a few more WUs at the next request).

The solution is simple: make it so that the optimized apps will need 60 minutes or so to crunch each WU (multiply the work to do per WU by 12).

As it is, I will rather crunch for another project than let the queue stall and the computer staying powered on for nothing at all...
ID: 6972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Misfit
Avatar

Send message
Joined: 27 Aug 07
Posts: 915
Credit: 1,503,319
RAC: 0
Message 6974 - Posted: 29 Nov 2008, 22:46:04 UTC - in response to Message 6951.  
Last modified: 29 Nov 2008, 22:47:13 UTC

11/29/2008 2:44:13 PM|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3317 seconds of work, reporting 3 completed tasks
11/29/2008 2:44:28 PM|Milkyway@home|Scheduler request succeeded: got 0 new tasks
11/29/2008 2:44:28 PM|Milkyway@home|Message from server: No work sent
11/29/2008 2:44:28 PM|Milkyway@home|Message from server: (reached per-CPU limit of 5 tasks)


Well that brought me to the message board. :/

I'm going to lower the WU limit to 5 and if this is really unworkable i'll raise it.

It's really unworkable. You should raise it.
me@rescam.org
ID: 6974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bigred
Avatar

Send message
Joined: 23 Nov 07
Posts: 33
Credit: 300,042,542
RAC: 0
Message 6975 - Posted: 29 Nov 2008, 22:50:24 UTC

This stragety seems to be working for me. My Quads are staying at 20 tasks. As soon as any are done they are reported and replaced.


ID: 6975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 22 Mar 08
Posts: 90
Credit: 501,728
RAC: 0
Message 6977 - Posted: 29 Nov 2008, 23:00:51 UTC - in response to Message 6975.  

This stragety seems to be working for me. My Quads are staying at 20 tasks. As soon as any are done they are reported and replaced.




Same here but also my pendings are still growing.
A clear conscience is usually the sign of a bad memory



ID: 6977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6979 - Posted: 29 Nov 2008, 23:05:32 UTC - in response to Message 6977.  

This stragety seems to be working for me. My Quads are staying at 20 tasks. As soon as any are done they are reported and replaced.




Same here but also my pendings are still growing.


I'm hoping with the lower limit the transitioner will be able to keep up with the work requests. I'll bump things up to 8 and see how that works out -- I don't want people getting communication deferreds if they're crunching too fast.

The assimilator/validator for the new app are a lot faster than the old one, so when we make the switch to running only the new app, this should help a bit as well.
ID: 6979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 6980 - Posted: 29 Nov 2008, 23:10:02 UTC - in response to Message 6951.  

No problem, you did also say that the work units would be 12 to 20 times longer, right? If you REALLY want to lower the stress on the transitioner, then you must increase the length of the work unit. A 25 minute per core cache is only going to increase the stress on the transitioner as it will compel everyone running MW to be consinuously hitting the server.

Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, fix that and things would be fine, keep it the way it is, and you end up wasting your time chasing server problems along with users wasting their time chasing WU's.


It looks like the transitioner really can't keep up with what's going on with milkyway right now, so in order to speed things up i would like to reduce the workunit limit (5 would be ideal, 10 passable), to reduce the size of the database, which would speed things up.

Now that the server is assigning WUs at a per-core rate as opposed to a per-computer rate, i think this is should work out fine; it will also give us better results for the searches we're running.

I'm going to lower the WU limit to 5 and if this is really unworkable i'll raise it. Hopefully this should speed up the transitioner and make more work available.


ID: 6980 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 6982 - Posted: 29 Nov 2008, 23:11:50 UTC

I see you just bumped up the cache to 8 WU's from 5. But until you have reasonably timed WU's, keeping things running smoothly is just a dream.

ID: 6982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 6984 - Posted: 29 Nov 2008, 23:15:44 UTC - in response to Message 6980.  

Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU,


How many times has this been said? I know I have. It was why the old,old,old wu's were made into hours in the first place, until the optimised apps came along.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 6984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6988 - Posted: 29 Nov 2008, 23:32:42 UTC - in response to Message 6984.  

Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU,


How many times has this been said? I know I have. It was why the old,old,old wu's were made into hours in the first place, until the optimised apps came along.


I know we need longer WUs. In fact, I think this next meeting will be all about how we can get them much longer :P
ID: 6988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 6991 - Posted: 29 Nov 2008, 23:43:05 UTC - in response to Message 6988.  
Last modified: 29 Nov 2008, 23:44:13 UTC

Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU,


How many times has this been said? I know I have. It was why the old,old,old wu's were made into hours in the first place, until the optimised apps came along.


I know we need longer WUs. In fact, I think this next meeting will be all about how we can get them much longer :P


I presume you can now think about more science to make the WUs longer, more useful to you science.

On an aside -

So far my PCs are being kept fed, and the work ready to send, on the servers, has been high (compared the past) which means that the work is there to satisfy demand.

The only problem I see, when the computers here are unsupervised, is the build up of the time due to deferring communications for xxxx As long as this does not rise above, say, 20 minutes I think the current WUs-per-core limit might work OK.
ID: 6991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6994 - Posted: 29 Nov 2008, 23:46:36 UTC - in response to Message 6991.  

Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU,


How many times has this been said? I know I have. It was why the old,old,old wu's were made into hours in the first place, until the optimised apps came along.


I know we need longer WUs. In fact, I think this next meeting will be all about how we can get them much longer :P


I presume you can now think about more science to make the WUs longer, more useful to you science.


Yes, but unfortunately that takes time :( I'm going to see what we can do more short-term, until we can take the analysis up to the next level (and hopefully make the WUs really long).
ID: 6994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 6995 - Posted: 29 Nov 2008, 23:47:11 UTC - in response to Message 6991.  

Not quite the same but when I get 'No work' it will deferr: 1 min, 1 min, 1 min, 3 hours (or some varation).
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 6995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 6996 - Posted: 29 Nov 2008, 23:49:44 UTC - in response to Message 6994.  

Well, it might take longer for you as since you are both the cook and bottle washer, the more time you spend nursing the server, the less time you have for all the good stuff (smile>)


Yes, but unfortunately that takes time :( I'm going to see what we can do more short-term, until we can take the analysis up to the next level (and hopefully make the WUs really long).


ID: 6996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 7006 - Posted: 30 Nov 2008, 0:25:22 UTC

Seems like only a temp fix:

[As of 30 Nov 2008 0:21:21 UTC]
Results ready to send 1,505
Results in progress 47,165
Workunits waiting for validation 3,524
Workunits waiting for assimilation 461
Workunits waiting for deletion 67
Results waiting for deletion 92
Transitioner backlog (hours) 2

~30 min ago
ready to send: 15k
progess: 35k
valid: >100
(others about same)
deletion: 2
backlog: 2 hours
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 7006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile m4rtyn
Avatar

Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 7008 - Posted: 30 Nov 2008, 0:55:10 UTC

It's not gonna work! already I'm getting repeated "No Work Sent" messages and my pc's are backing of to between 1 & 3 hours. Without constant attendance they'll spend most of the time with an empty cache.
m4rtyn
******************************* *******************************

ID: 7008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 7009 - Posted: 30 Nov 2008, 0:59:02 UTC - in response to Message 7008.  

It's not gonna work! already I'm getting repeated "No Work Sent" messages and my pc's are backing of to between 1 & 3 hours. Without constant attendance they'll spend most of the time with an empty cache.


...same here :(
ID: 7009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : new workunit limit

©2024 Astroinformatics Group