rpi_logo
increased WU limits
increased WU limits
log in

Advanced search

Message boards : News : increased WU limits

1 · 2 · Next
Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 47548 - Posted: 11 Apr 2011, 3:40:20 UTC

I've increased the workunit limits some more, seeing as things stabilized an the server still seems to be running nice and snappy. RIght now the total limit is up to 48, keeping the max of 3 per CPU, and with a new max of 12 per GPU. I think the server should be able to handle this, given the use of the new applications not hammering the hard disks nearly as much. If things still stay nice and smooth for the next few days I'll try increasing them again.
____________

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0

Message 47552 - Posted: 11 Apr 2011, 4:39:22 UTC

Sweet! Now for double credit?

Ed.T
Send message
Joined: 1 Feb 11
Posts: 17
Credit: 14,873,794
RAC: 0

Message 47555 - Posted: 11 Apr 2011, 4:48:03 UTC - in response to Message 47548.

No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse.

Profile BladeD
Avatar
Send message
Joined: 2 Nov 10
Posts: 731
Credit: 131,536,342
RAC: 0

Message 47556 - Posted: 11 Apr 2011, 4:56:15 UTC - in response to Message 47555.

No need to rush this. If the servers are running smoothly then we don't need it. If they aren't then it will only make matters worse.

That's not the reason for double credit.

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 47645 - Posted: 11 Apr 2011, 20:58:56 UTC - in response to Message 47548.

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)

TJ
Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,523,253
RAC: 7,341

Message 47647 - Posted: 11 Apr 2011, 21:12:08 UTC - in response to Message 47645.

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)


Good point. I have pages of them.
____________
Greetings from,
TJ

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 47660 - Posted: 11 Apr 2011, 23:16:38 UTC - in response to Message 47645.

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).
____________

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0

Message 47661 - Posted: 11 Apr 2011, 23:17:23 UTC - in response to Message 47645.

I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

Those are just the pre .59 WUs (read .57). Nothing to worry about.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 47662 - Posted: 11 Apr 2011, 23:17:35 UTC - in response to Message 47552.

Sweet! Now for double credit?


I think we need to debug the ATI application a bit more before that. A few users have having a bit of problems with it.

That and I need to get the nbody simulation assimilator up and running -- and to do that I need to implement BOINCs new credit scheme and get it to work with our stuff. Probably won't be until wednesday as I have a class to teach tomorrow and a bunch of slides to make for it tonight. :P
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 47663 - Posted: 11 Apr 2011, 23:18:32 UTC - in response to Message 47660.
Last modified: 11 Apr 2011, 23:31:05 UTC

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).


No, I just have some detritus from the test runs that won't go away without further intervention.

<edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them

IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB.

Profile heffalumpen
Avatar
Send message
Joined: 6 Nov 09
Posts: 5
Credit: 17,541,931
RAC: 2

Message 47664 - Posted: 11 Apr 2011, 23:18:34 UTC

Got 12 new WU's....and they all crashed out.

John

TJ
Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,523,253
RAC: 7,341

Message 47695 - Posted: 12 Apr 2011, 7:40:56 UTC - in response to Message 47663.

Say Travis,

Not unsurprisingly, given the issues you guys have been battling the last few weeks.....


I've pickied up a few orphaned WU's which show up in my host list of tasks, but aren't in the DB anymore.

You may want to run a search and destroy script to get rid of those while the DB is still really good shape! ;-)



You mean the server has sent out more of those older workunits that were purged from the database?

That shouldn't be possible as I dropped and rebuilt the workunit and result tables. Any work being sent out now should be from the new searches (which happen to have the same names as the old ones).


No, I just have some detritus from the test runs that won't go away without further intervention.

<edit> I'm assuming if can see them in my list as not in the DB, then the stock deleter doesn't see them, and the stock deleter won't clear or get rid of them

IOW's there's a link somewhere that has to wait for seven days to go away in the BOINC DB.



Like this Travis

251766 144408 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.11 --- Not in DB
251760 144402 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.08 --- Not in DB
251759 144401 198526 10 Apr 2011 | 9:20:55 UTC 10 Apr 2011 | 9:27:28 UTC Error while computing 2.04 0.09 --- Not in DB
250177 142819 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.04 0.09 --- Not in DB
250176 142818 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.06 --- Not in DB
250175 142817 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB
250167 142809 198526 10 Apr 2011 | 9:18:24 UTC 10 Apr 2011 | 9:20:55 UTC Error while computing 2.05 0.08 --- Not in DB
245825 137727 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.04 0.08 --- Not in DB
245811 137915 198526 10 Apr 2011 | 9:11:52 UTC 10 Apr 2011 | 9:18:24 UTC Error while computing 2.05 0.06 --- Not in DB
241242 135480 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.05 0.08 --- Not in DB
241241 135479 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB
241239 135473 198526 10 Apr 2011 | 9:04:48 UTC 10 Apr 2011 | 9:11:52 UTC Error while computing 2.04 0.08 --- Not in DB

I have 3 pages of them...
____________
Greetings from,
TJ

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 311
Credit: 148,905,504
RAC: 0

Message 47696 - Posted: 12 Apr 2011, 8:15:09 UTC - in response to Message 47695.
Last modified: 12 Apr 2011, 8:16:49 UTC

It's just the v0.57 application that's not in the database any more. Look at the heading of the column where it says "Not in DB". It says "Application". The work units are still there in the database and will gradually clear as they are completed successfully by wingmen or exceed the maximum number of errors. Some will time out after 8 days and be reissued before they clear.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2343&nowrap=true#47534

You will possibly find that many of those work units are waiting on wingmen using a CPU application to complete a task. CPU processing of separation tasks is relatively slow and CPU resources are often shared by more than one project, so tasks issued to CPUs may take many days to be completed and reported.

DanNeely
Send message
Joined: 6 Oct 09
Posts: 39
Credit: 77,490,970
RAC: 0

Message 47871 - Posted: 15 Apr 2011, 4:01:40 UTC

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0

Message 47875 - Posted: 15 Apr 2011, 7:14:26 UTC - in response to Message 47871.

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :)

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0

Message 47876 - Posted: 15 Apr 2011, 7:19:05 UTC
Last modified: 15 Apr 2011, 7:19:21 UTC

Increased wu limits sounds like a good idea!

Chris
Avatar
Send message
Joined: 3 Oct 10
Posts: 42
Credit: 320,242
RAC: 0

Message 47877 - Posted: 15 Apr 2011, 8:02:37 UTC

I'm waiting another week or so before I jump into this again.
____________
32bit Windows XP Home
AMD Opteron 180
ASUS A8N-SLI Motherboard
Nvidia 450GTS GPU
4GB DDR Memory

FruehwF
Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0

Message 47879 - Posted: 15 Apr 2011, 9:37:10 UTC

There are more and more votes for higher WU limit. Of course this should be a goal to have a cache which can bring a client over a temporal server downtime.

But I think this must go on very carefully. I don’t know anything about the insights of a BOINC server, but I have experiences of DB Management.
I want only outline a few things which the admins here have to look at.

First you have to find out whether the indices are good and if they are used by the system.
i.e. the validation: when the increase from 6 WU’s to 12 WU’s lead to an 100 %increase of the validation time , then the index for the WU Id’s doesn’t work or is not in use. The increase may only be about 10% maximum (in case of 128K outstanding WU’s).
Second if the index works you have to look how big is the Index, and how big is the permitted space in memory. If you have a 2,4 or 8 time bigger Index there comes a point where the System has to page the Index to disk, with a significant Performance decrease.

Third: Tablespace of the DB. When you exceed the permitted tablespace the DB doesn’t work anymore.

And there are a lot of other things which the admins have to take care.

So of this limits were exceeded during the last upgrade, when there were suddenly no WU-limits.

So please be patient and stay in the project, so that the load ist constant. In German we say (Gut Ding braucht Weile)

greetings Franz


DanNeely
Send message
Joined: 6 Oct 09
Posts: 39
Credit: 77,490,970
RAC: 0

Message 47886 - Posted: 15 Apr 2011, 10:52:22 UTC - in response to Message 47875.

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't supported in 6.10.xx. Hate to sound like a broken record... :)


They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support?

Profile arkayn
Avatar
Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0

Message 47891 - Posted: 15 Apr 2011, 13:39:36 UTC - in response to Message 47886.

How's the server doing under load so far? I'd really like to be able to get at least an hours work for my 5870 so it stays on MW after a maintenance window triggers a 1 hour backoff. When that happens and my MW queue runs dry boinc turns to collatz (backup project) and gets the better part of a day worth of work and decides to crunch all of it before returning to MW.

It's possible the latter was due to my debt levels getting messed up. They looked off when I checked earlier today, so I zeroed them out in client_state.xml and am waiting to see what happens after the next outage.

It'll do the same thing because backup projects aren't fully supported in 6.10.xx. Hate to sound like a broken record... :)


They aren't? The DL only when other projects are out of work portion is working. What version do I need to get full support?


Fixed that for you.

Full support is in the 6.12.xx series. It will download 1 task per idle resource, crunch it and request another if needed.

____________

1 · 2 · Next
Post to thread

Message boards : News : increased WU limits


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group