Message boards :
News :
increased WU limits
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I've increased the workunit limits some more, seeing as things stabilized an the server still seems to be running nice and snappy. RIght now the total limit is up to 48, keeping the max of 3 per CPU, and with a new max of 12 per GPU. I think the server should be able to handle this, given the use of the new applications not hammering the hard disks nearly as much. If things still stay nice and smooth for the next few days I'll try increasing them again. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Sweet! Now for double credit? |
Send message Joined: 1 Feb 11 Posts: 17 Credit: 16,245,184 RAC: 0 |
No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse. |
Send message Joined: 2 Nov 10 Posts: 731 Credit: 131,536,342 RAC: 0 |
No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse. That's not the reason for double credit. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
Say Travis, Good point. I have pages of them. Greetings from, TJ |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Say Travis, You mean the server has sent out more of those older workunits that were purged from the database? That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones). |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. Those are just the pre .59 WUs (read .57). Nothing to worry about. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Sweet! Now for double credit? I think we need to debug the ATI application a bit more before that. A few users have having a bit of problems with it. That and I need to get the nbody simulation assimilator up and running -- and to do that I need to implement BOINCs new credit scheme and get it to work with our stuff. Probably won't be until wednesday as I have a class to teach tomorrow and a bunch of slides to make for it tonight. :P |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Say Travis, No, I just have some detritus from the test runs that won't go away without further intervention. <edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB. |
Send message Joined: 6 Nov 09 Posts: 5 Credit: 17,541,931 RAC: 0 |
Got 12 new WU's....and they all crashed out. John |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
Say Travis, Like this Travis 251766 144408 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.11 --- Not in DB 251760 144402 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.08 --- Not in DB 251759 144401 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.09 --- Not in DB 250177 142819 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.04 0.09 --- Not in DB 250176 142818 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.06 --- Not in DB 250175 142817 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB 250167 142809 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB 245825 137727 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.04 0.08 --- Not in DB 245811 137915 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.05 0.06 --- Not in DB 241242 135480 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.05 0.08 --- Not in DB 241241 135479 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB 241239 135473 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB I have 3 pages of them... Greetings from, TJ |
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
It's just the v0.57 application that's not in the database any more. Look at the heading of the column where it says "Not in DB". It says "Application". The work units are still there in the database and will gradually clear as they are completed successfully by wingmen or exceed the maximum number of errors. Some will time out after 8 days and be reissued before they clear. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2343&nowrap=true#47534 You will possibly find that many of those work units are waiting on wingmen using a CPU application to complete a task. CPU processing of separation tasks is relatively slow and CPU resources are often shared by more than one project, so tasks issued to CPUs may take many days to be completed and reported. |
Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0 |
How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage. |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :) |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Increased wu limits sounds like a good idea! |
Send message Joined: 3 Oct 10 Posts: 42 Credit: 320,242 RAC: 0 |
I'm waiting another week or so before I jump into this again. 32bit Windows XP Home AMD Opteron 180 ASUS A8N-SLI Motherboard Nvidia 450GTS GPU 4GB DDR Memory |
Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0 |
There are more and more votes for higher WU limit. Of course this should be a goal to have a cache which can bring a client over a temporal server downtime. But I think this must go on very carefully. I don’t know anything about the insights of a BOINC server, but I have experiences of DB Management. I want only outline a few things which the admins here have to look at. First you have to find out whether the indices are good and if they are used by the system. i.e. the validation: when the increase from 6 WU’s to 12 WU’s lead to an 100 %increase of the validation time , then the index for the WU Id’s doesn’t work or is not in use. The increase may only be about 10% maximum (in case of 128K outstanding WU’s). Second if the index works you have to look how big is the Index, and how big is the permitted space in memory. If you have a 2,4 or 8 time bigger Index there comes a point where the System has to page the Index to disk, with a significant Performance decrease. Third: Tablespace of the DB. When you exceed the permitted tablespace the DB doesn’t work anymore. And there are a lot of other things which the admins have to take care. So of this limits were exceeded during the last upgrade, when there were suddenly no WU-limits. So please be patient and stay in the project, so that the load ist constant. In German we say (Gut Ding braucht Weile) greetings Franz |
Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0 |
How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support? |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. Fixed that for you. Full support is in the 6.12.xx series. It will download 1 task per idle resource, crunch it and request another if needed. |
©2024 Astroinformatics Group