Message boards :
Number crunching :
new workunit limit
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
It looks like the transitioner really can't keep up with what's going on with milkyway right now, so in order to speed things up i would like to reduce the workunit limit (5 would be ideal, 10 passable), to reduce the size of the database, which would speed things up. Now that the server is assigning WUs at a per-core rate as opposed to a per-computer rate, i think this is should work out fine; it will also give us better results for the searches we're running. I'm going to lower the WU limit to 5 and if this is really unworkable i'll raise it. Hopefully this should speed up the transitioner and make more work available. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
5 doesn't do it unless they are made longer, as in an hour not 10 min. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Travis I think 5 per core is too low for the new Penryn Quads. These crunch WUs in about 3 - 5 minutes. If the server contact script was modified to allow a maximum server recontact time of, say 10 minutes then this may work. But, I would also recommend a slightly higher WU-per-core number (say 10 max). ATM the server recontact script boots through a few minutes then quickly escalates to 45 minutes, then 55 minutes then to 1hour, 2 hours and 3 hours. My server recontact time is the same as JAMC's 'communication deferred' time. |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
The killer for my quads is the 'communication deferred' time in the BOINC projects tab- is that set by the project or BOINC? Even with network activity set to always on, when the communication deferred time goes to 2 or 3+ hours I lose all hope of keeping WU's cached and always run out and that's with 20 WU/core limit- we have to get longer WU's to make the change to 5WU's/core work and I guess that means we all have to run test apps as well... |
Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0 |
This is pure non-sense... Even with 20 WUs per core the queue was stalling unless I manually requested more work. The problem being that when BOINC gets replied "Reached CPU limit" several times in a raw, it starts delaying the work request, and in the end it gets delayed by over 3 hours... And as 20 WUs are crunched in under 110 minutes, you get a queue stall (not to mention it's a Hell of a nightmare to get just a few more WUs at the next request). The solution is simple: make it so that the optimized apps will need 60 minutes or so to crunch each WU (multiply the work to do per WU by 12). As it is, I will rather crunch for another project than let the queue stall and the computer staying powered on for nothing at all... |
Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0 |
11/29/2008 2:44:13 PM|Milkyway@home|Sending scheduler request: Requested by user. Requesting 3317 seconds of work, reporting 3 completed tasks 11/29/2008 2:44:28 PM|Milkyway@home|Scheduler request succeeded: got 0 new tasks 11/29/2008 2:44:28 PM|Milkyway@home|Message from server: No work sent 11/29/2008 2:44:28 PM|Milkyway@home|Message from server: (reached per-CPU limit of 5 tasks) Well that brought me to the message board. :/ I'm going to lower the WU limit to 5 and if this is really unworkable i'll raise it. It's really unworkable. You should raise it. me@rescam.org |
Send message Joined: 23 Nov 07 Posts: 33 Credit: 300,042,542 RAC: 0 |
This stragety seems to be working for me. My Quads are staying at 20 tasks. As soon as any are done they are reported and replaced. |
Send message Joined: 22 Mar 08 Posts: 90 Credit: 501,728 RAC: 0 |
|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
This stragety seems to be working for me. My Quads are staying at 20 tasks. As soon as any are done they are reported and replaced. I'm hoping with the lower limit the transitioner will be able to keep up with the work requests. I'll bump things up to 8 and see how that works out -- I don't want people getting communication deferreds if they're crunching too fast. The assimilator/validator for the new app are a lot faster than the old one, so when we make the switch to running only the new app, this should help a bit as well. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 1 |
No problem, you did also say that the work units would be 12 to 20 times longer, right? If you REALLY want to lower the stress on the transitioner, then you must increase the length of the work unit. A 25 minute per core cache is only going to increase the stress on the transitioner as it will compel everyone running MW to be consinuously hitting the server. Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, fix that and things would be fine, keep it the way it is, and you end up wasting your time chasing server problems along with users wasting their time chasing WU's. It looks like the transitioner really can't keep up with what's going on with milkyway right now, so in order to speed things up i would like to reduce the workunit limit (5 would be ideal, 10 passable), to reduce the size of the database, which would speed things up. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 1 |
|
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, How many times has this been said? I know I have. It was why the old,old,old wu's were made into hours in the first place, until the optimised apps came along. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, I know we need longer WUs. In fact, I think this next meeting will be all about how we can get them much longer :P |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, I presume you can now think about more science to make the WUs longer, more useful to you science. On an aside - So far my PCs are being kept fed, and the work ready to send, on the servers, has been high (compared the past) which means that the work is there to satisfy demand. The only problem I see, when the computers here are unsupervised, is the build up of the time due to deferring communications for xxxx As long as this does not rise above, say, 20 minutes I think the current WUs-per-core limit might work OK. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Seriously, the problem isn't a 5, 10 or 20 WU cache limit, it is the 5 minute WU, Yes, but unfortunately that takes time :( I'm going to see what we can do more short-term, until we can take the analysis up to the next level (and hopefully make the WUs really long). |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Not quite the same but when I get 'No work' it will deferr: 1 min, 1 min, 1 min, 3 hours (or some varation). Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 1 |
Well, it might take longer for you as since you are both the cook and bottle washer, the more time you spend nursing the server, the less time you have for all the good stuff (smile>)
|
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Seems like only a temp fix: [As of 30 Nov 2008 0:21:21 UTC] Results ready to send 1,505 Results in progress 47,165 Workunits waiting for validation 3,524 Workunits waiting for assimilation 461 Workunits waiting for deletion 67 Results waiting for deletion 92 Transitioner backlog (hours) 2 ~30 min ago ready to send: 15k progess: 35k valid: >100 (others about same) deletion: 2 backlog: 2 hours Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 16 Jan 08 Posts: 18 Credit: 4,111,257 RAC: 0 |
It's not gonna work! already I'm getting repeated "No Work Sent" messages and my pc's are backing of to between 1 & 3 hours. Without constant attendance they'll spend most of the time with an empty cache. m4rtyn ******************************* ******************************* |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
It's not gonna work! already I'm getting repeated "No Work Sent" messages and my pc's are backing of to between 1 & 3 hours. Without constant attendance they'll spend most of the time with an empty cache. ...same here :( |
©2024 Astroinformatics Group