Welcome to MilkyWay@home

increased WU limits

Message boards : News : increased WU limits
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 47548 - Posted: 11 Apr 2011, 3:40:20 UTC

I've increased the workunit limits some more, seeing as things stabilized an the server still seems to be running nice and snappy. RIght now the total limit is up to 48, keeping the max of 3 per CPU, and with a new max of 12 per GPU. I think the server should be able to handle this, given the use of the new applications not hammering the hard disks nearly as much. If things still stay nice and smooth for the next few days I'll try increasing them again.
ID: 47548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 47552 - Posted: 11 Apr 2011, 4:39:22 UTC

Sweet! Now for double credit?
ID: 47552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ed.T

Send message
Joined: 1 Feb 11
Posts: 17
Credit: 16,245,184
RAC: 0
Message 47555 - Posted: 11 Apr 2011, 4:48:03 UTC - in response to Message 47548.  

No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse.
ID: 47555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BladeD
Avatar

Send message
Joined: 2 Nov 10
Posts: 731
Credit: 131,536,342
RAC: 0
Message 47556 - Posted: 11 Apr 2011, 4:56:15 UTC - in response to Message 47555.  

No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse.

That's not the reason for double credit.
ID: 47556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 47645 - Posted: 11 Apr 2011, 20:58:56 UTC - in response to Message 47548.  

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)
ID: 47645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 47647 - Posted: 11 Apr 2011, 21:12:08 UTC - in response to Message 47645.  

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)


Good point. I have pages of them.
Greetings from,
TJ
ID: 47647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 47660 - Posted: 11 Apr 2011, 23:16:38 UTC - in response to Message 47645.  

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).
ID: 47660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47661 - Posted: 11 Apr 2011, 23:17:23 UTC - in response to Message 47645.  

I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

Those are just the pre .59 WUs (read .57). Nothing to worry about.
ID: 47661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 47662 - Posted: 11 Apr 2011, 23:17:35 UTC - in response to Message 47552.  

Sweet! Now for double credit?


I think we need to debug the ATI application a bit more before that. A few users have having a bit of problems with it.

That and I need to get the nbody simulation assimilator up and running -- and to do that I need to implement BOINCs new credit scheme and get it to work with our stuff. Probably won't be until wednesday as I have a class to teach tomorrow and a bunch of slides to make for it tonight. :P
ID: 47662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 47663 - Posted: 11 Apr 2011, 23:18:32 UTC - in response to Message 47660.  
Last modified: 11 Apr 2011, 23:31:05 UTC

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).


No, I just have some detritus from the test runs that won't go away without further intervention.

<edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them

IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB.
ID: 47663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile heffalumpen
Avatar

Send message
Joined: 6 Nov 09
Posts: 5
Credit: 17,541,931
RAC: 0
Message 47664 - Posted: 11 Apr 2011, 23:18:34 UTC

Got 12 new WU's....and they all crashed out.

John
ID: 47664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 47695 - Posted: 12 Apr 2011, 7:40:56 UTC - in response to Message 47663.  

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).


No, I just have some detritus from the test runs that won't go away without further intervention.

<edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them

IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB.



Like this Travis

251766 144408 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.11 --- Not in DB
251760 144402 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.08 --- Not in DB
251759 144401 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.09 --- Not in DB
250177 142819 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.04 0.09 --- Not in DB
250176 142818 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.06 --- Not in DB
250175 142817 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB
250167 142809 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB
245825 137727 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.04 0.08 --- Not in DB
245811 137915 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.05 0.06 --- Not in DB
241242 135480 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.05 0.08 --- Not in DB
241241 135479 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB
241239 135473 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB

I have 3 pages of them...
Greetings from,
TJ
ID: 47695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 47696 - Posted: 12 Apr 2011, 8:15:09 UTC - in response to Message 47695.  
Last modified: 12 Apr 2011, 8:16:49 UTC

It's just the v0.57 application that's not in the database any more. Look at the heading of the column where it says "Not in DB". It says "Application". The work units are still there in the database and will gradually clear as they are completed successfully by wingmen or exceed the maximum number of errors. Some will time out after 8 days and be reissued before they clear.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2343&nowrap=true#47534

You will possibly find that many of those work units are waiting on wingmen using a CPU application to complete a task. CPU processing of separation tasks is relatively slow and CPU resources are often shared by more than one project, so tasks issued to CPUs may take many days to be completed and reported.
ID: 47696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DanNeely

Send message
Joined: 6 Oct 09
Posts: 39
Credit: 78,881,405
RAC: 0
Message 47871 - Posted: 15 Apr 2011, 4:01:40 UTC

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.
ID: 47871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47875 - Posted: 15 Apr 2011, 7:14:26 UTC - in response to Message 47871.  

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :)
ID: 47875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 47876 - Posted: 15 Apr 2011, 7:19:05 UTC
Last modified: 15 Apr 2011, 7:19:21 UTC

Increased wu limits sounds like a good idea!
ID: 47876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris
Avatar

Send message
Joined: 3 Oct 10
Posts: 42
Credit: 320,242
RAC: 0
Message 47877 - Posted: 15 Apr 2011, 8:02:37 UTC

I'm waiting another week or so before I jump into this again.
32bit Windows XP Home
AMD Opteron 180
ASUS A8N-SLI Motherboard
Nvidia 450GTS GPU
4GB DDR Memory
ID: 47877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 47879 - Posted: 15 Apr 2011, 9:37:10 UTC

There are more and more votes for higher WU limit. Of course this should be a goal to have a cache which can bring a client over a temporal server downtime.

But I think this must go on very carefully. I don’t know anything about the insights of a BOINC server, but I have experiences of DB Management.
I want only outline a few things which the admins here have to look at.

First you have to find out whether the indices are good and if they are used by the system.
i.e. the validation: when the increase from 6 WU’s to 12 WU’s lead to an 100 %increase of the validation time , then the index for the WU Id’s doesn’t work or is not in use. The increase may only be about 10% maximum (in case of 128K outstanding WU’s).
Second if the index works you have to look how big is the Index, and how big is the permitted space in memory. If you have a 2,4 or 8 time bigger Index there comes a point where the System has to page the Index to disk, with a significant Performance decrease.

Third: Tablespace of the DB. When you exceed the permitted tablespace the DB doesn’t work anymore.

And there are a lot of other things which the admins have to take care.

So of this limits were exceeded during the last upgrade, when there were suddenly no WU-limits.

So please be patient and stay in the project, so that the load ist constant. In German we say (Gut Ding braucht Weile)

greetings Franz


ID: 47879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DanNeely

Send message
Joined: 6 Oct 09
Posts: 39
Credit: 78,881,405
RAC: 0
Message 47886 - Posted: 15 Apr 2011, 10:52:22 UTC - in response to Message 47875.  

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :)


They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support?
ID: 47886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 47891 - Posted: 15 Apr 2011, 13:39:36 UTC - in response to Message 47886.  

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't fully supported in 6.10.xx. Hate to sound like a broken record... :)


They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support?


Fixed that for you.

Full support is in the 6.12.xx series. It will download 1 task per idle resource, crunch it and request another if needed.

ID: 47891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : increased WU limits

©2024 Astroinformatics Group