increased WU limits

Author	Message
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 47548 - Posted: 11 Apr 2011, 3:40:20 UTC I've increased the workunit limits some more, seeing as things stabilized an the server still seems to be running nice and snappy. RIght now the total limit is up to 48, keeping the max of 3 per CPU, and with a new max of 12 per GPU. I think the server should be able to handle this, given the use of the new applications not hammering the hard disks nearly as much. If things still stay nice and smooth for the next few days I'll try increasing them again. ID: 47548 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 47552 - Posted: 11 Apr 2011, 4:39:22 UTC Sweet! Now for double credit? ID: 47552 · Rating: 0 · rate: / Reply Quote

Ed.T Send message Joined: 1 Feb 11 Posts: 17 Credit: 16,245,184 RAC: 0	Message 47555 - Posted: 11 Apr 2011, 4:48:03 UTC - in response to Message 47548. No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse. ID: 47555 · Rating: 0 · rate: / Reply Quote

BladeD Send message Joined: 2 Nov 10 Posts: 731 Credit: 131,536,342 RAC: 0	Message 47556 - Posted: 11 Apr 2011, 4:56:15 UTC - in response to Message 47555. No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse. That's not the reason for double credit. ID: 47556 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 47645 - Posted: 11 Apr 2011, 20:58:56 UTC - in response to Message 47548. Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) ID: 47645 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 47647 - Posted: 11 Apr 2011, 21:12:08 UTC - in response to Message 47645. Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) Good point. I have pages of them. Greetings from, TJ ID: 47647 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 47660 - Posted: 11 Apr 2011, 23:16:38 UTC - in response to Message 47645. Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) You mean the server has sent out more of those older workunits that were purged from the database? That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones). ID: 47660 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 47661 - Posted: 11 Apr 2011, 23:17:23 UTC - in response to Message 47645. I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. Those are just the pre .59 WUs (read .57). Nothing to worry about. ID: 47661 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 47662 - Posted: 11 Apr 2011, 23:17:35 UTC - in response to Message 47552. Sweet! Now for double credit? I think we need to debug the ATI application a bit more before that. A few users have having a bit of problems with it. That and I need to get the nbody simulation assimilator up and running -- and to do that I need to implement BOINCs new credit scheme and get it to work with our stuff. Probably won't be until wednesday as I have a class to teach tomorrow and a bunch of slides to make for it tonight. :P ID: 47662 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 47663 - Posted: 11 Apr 2011, 23:18:32 UTC - in response to Message 47660. Last modified: 11 Apr 2011, 23:31:05 UTC Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) You mean the server has sent out more of those older workunits that were purged from the database? That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones). No, I just have some detritus from the test runs that won't go away without further intervention. <edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB. ID: 47663 · Rating: 0 · rate: / Reply Quote

heffalumpen Send message Joined: 6 Nov 09 Posts: 5 Credit: 17,541,931 RAC: 0	Message 47664 - Posted: 11 Apr 2011, 23:18:34 UTC Got 12 new WU's....and they all crashed out. John ID: 47664 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 47695 - Posted: 12 Apr 2011, 7:40:56 UTC - in response to Message 47663. Say Travis, Not unsurprisingly, given the issues you guys have been battling the last few weeks..... I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore. You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-) You mean the server has sent out more of those older workunits that were purged from the database? That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones). No, I just have some detritus from the test runs that won't go away without further intervention. <edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB. Like this Travis 251766 144408 198526 10 Apr 2011 \| 9:20:55 UTC 10 Apr 2011 \| 9:27:28 UTC Error while computing 2.04 0.11 --- Not in DB 251760 144402 198526 10 Apr 2011 \| 9:20:55 UTC 10 Apr 2011 \| 9:27:28 UTC Error while computing 2.04 0.08 --- Not in DB 251759 144401 198526 10 Apr 2011 \| 9:20:55 UTC 10 Apr 2011 \| 9:27:28 UTC Error while computing 2.04 0.09 --- Not in DB 250177 142819 198526 10 Apr 2011 \| 9:18:24 UTC 10 Apr 2011 \| 9:20:55 UTC Error while computing 2.04 0.09 --- Not in DB 250176 142818 198526 10 Apr 2011 \| 9:18:24 UTC 10 Apr 2011 \| 9:20:55 UTC Error while computing 2.05 0.06 --- Not in DB 250175 142817 198526 10 Apr 2011 \| 9:18:24 UTC 10 Apr 2011 \| 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB 250167 142809 198526 10 Apr 2011 \| 9:18:24 UTC 10 Apr 2011 \| 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB 245825 137727 198526 10 Apr 2011 \| 9:11:52 UTC 10 Apr 2011 \| 9:18:24 UTC Error while computing 2.04 0.08 --- Not in DB 245811 137915 198526 10 Apr 2011 \| 9:11:52 UTC 10 Apr 2011 \| 9:18:24 UTC Error while computing 2.05 0.06 --- Not in DB 241242 135480 198526 10 Apr 2011 \| 9:04:48 UTC 10 Apr 2011 \| 9:11:52 UTC Error while computing 2.05 0.08 --- Not in DB 241241 135479 198526 10 Apr 2011 \| 9:04:48 UTC 10 Apr 2011 \| 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB 241239 135473 198526 10 Apr 2011 \| 9:04:48 UTC 10 Apr 2011 \| 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB I have 3 pages of them... Greetings from, TJ ID: 47695 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 47696 - Posted: 12 Apr 2011, 8:15:09 UTC - in response to Message 47695. Last modified: 12 Apr 2011, 8:16:49 UTC It's just the v0.57 application that's not in the database any more. Look at the heading of the column where it says "Not in DB". It says "Application". The work units are still there in the database and will gradually clear as they are completed successfully by wingmen or exceed the maximum number of errors. Some will time out after 8 days and be reissued before they clear. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2343&nowrap=true#47534 You will possibly find that many of those work units are waiting on wingmen using a CPU application to complete a task. CPU processing of separation tasks is relatively slow and CPU resources are often shared by more than one project, so tasks issued to CPUs may take many days to be completed and reported. ID: 47696 · Rating: 0 · rate: / Reply Quote

DanNeely Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0	Message 47871 - Posted: 15 Apr 2011, 4:01:40 UTC How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage. ID: 47871 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 47875 - Posted: 15 Apr 2011, 7:14:26 UTC - in response to Message 47871. How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage. It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :) ID: 47875 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 47876 - Posted: 15 Apr 2011, 7:19:05 UTC Last modified: 15 Apr 2011, 7:19:21 UTC Increased wu limits sounds like a good idea! ID: 47876 · Rating: 0 · rate: / Reply Quote

Chris Send message Joined: 3 Oct 10 Posts: 42 Credit: 320,242 RAC: 0	Message 47877 - Posted: 15 Apr 2011, 8:02:37 UTC I'm waiting another week or so before I jump into this again. 32bit Windows XP Home AMD Opteron 180 ASUS A8N-SLI Motherboard Nvidia 450GTS GPU 4GB DDR Memory ID: 47877 · Rating: 0 · rate: / Reply Quote

FruehwF Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0	Message 47879 - Posted: 15 Apr 2011, 9:37:10 UTC There are more and more votes for higher WU limit. Of course this should be a goal to have a cache which can bring a client over a temporal server downtime. But I think this must go on very carefully. I donâ€™t know anything about the insights of a BOINC server, but I have experiences of DB Management. I want only outline a few things which the admins here have to look at. First you have to find out whether the indices are good and if they are used by the system. i.e. the validation: when the increase from 6 WUâ€™s to 12 WUâ€™s lead to an 100 %increase of the validation time , then the index for the WU Idâ€™s doesnâ€™t work or is not in use. The increase may only be about 10% maximum (in case of 128K outstanding WUâ€™s). Second if the index works you have to look how big is the Index, and how big is the permitted space in memory. If you have a 2,4 or 8 time bigger Index there comes a point where the System has to page the Index to disk, with a significant Performance decrease. Third: Tablespace of the DB. When you exceed the permitted tablespace the DB doesnâ€™t work anymore. And there are a lot of other things which the admins have to take care. So of this limits were exceeded during the last upgrade, when there were suddenly no WU-limits. So please be patient and stay in the project, so that the load ist constant. In German we say (Gut Ding braucht Weile) greetings Franz ID: 47879 · Rating: 0 · rate: / Reply Quote

DanNeely Send message Joined: 6 Oct 09 Posts: 39 Credit: 78,881,405 RAC: 0	Message 47886 - Posted: 15 Apr 2011, 10:52:22 UTC - in response to Message 47875. How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage. It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :) They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support? ID: 47886 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 47891 - Posted: 15 Apr 2011, 13:39:36 UTC - in response to Message 47886. How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW. It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage. It'll do the same thing because backup projects aren't fully supported in 6.10.xx. Hate to sound like a broken record... :) They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support? Fixed that for you. Full support is in the 6.12.xx series. It will download 1 task per idle resource, crunch it and request another if needed. ID: 47891 · Rating: 0 · rate: / Reply Quote