Welcome to MilkyWay@home

Daily graphs of server_status

Message boards : Number crunching : Daily graphs of server_status
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74523 - Posted: 21 Oct 2022, 14:50:19 UTC

There seemed to be some recovery... but it has all been wiped out:


ID: 74523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,882,953
RAC: 253
Message 74525 - Posted: 21 Oct 2022, 15:04:44 UTC - in response to Message 74523.  

Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September.
ID: 74525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74526 - Posted: 21 Oct 2022, 15:27:26 UTC - in response to Message 74525.  

Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September.


I presume some explaination is here https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4924&postid=74351#74351
ID: 74526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,882,953
RAC: 253
Message 74527 - Posted: 21 Oct 2022, 16:49:26 UTC - in response to Message 74526.  

Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September.


I presume some explaination is here https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4924&postid=74351#74351


Thanks for that.
ID: 74527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74540 - Posted: 22 Oct 2022, 16:40:04 UTC

There seems to be an improvement to the validation queue:


And

ID: 74540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74543 - Posted: 23 Oct 2022, 14:18:36 UTC





Seems the numbers improve.. then unimprove
ID: 74543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Captiosus

Send message
Joined: 9 Apr 14
Posts: 35
Credit: 9,708,616
RAC: 0
Message 74545 - Posted: 23 Oct 2022, 23:16:00 UTC

Seems there's some improvement every time there's a server reset, but then things start overloading again and the pending validation tasks start climbing again.

So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not.
ID: 74545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,951,988
RAC: 21,328
Message 74546 - Posted: 24 Oct 2022, 10:52:31 UTC - in response to Message 74545.  

Seems there's some improvement every time there's a server reset, but then things start overloading again and the pending validation tasks start climbing again.

So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not.


Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting
ID: 74546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 74547 - Posted: 24 Oct 2022, 11:40:50 UTC - in response to Message 74545.  

So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not.

If memory is the issue, the only temporary workaround would be to slow down WU generators (or the feeder) to the level, where the validators can follow, so completed tasks can be validated and removed from the database.
ID: 74547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,882,953
RAC: 253
Message 74548 - Posted: 24 Oct 2022, 12:13:55 UTC - in response to Message 74546.  

Maybe it’s time to stop producing new WU’s and get the Q down to a manageable level, it clearly is having adverse effects at present.
ID: 74548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 74549 - Posted: 24 Oct 2022, 14:10:08 UTC - in response to Message 74548.  
Last modified: 24 Oct 2022, 14:11:27 UTC

Maybe it’s time to stop producing new WU’s and get the Q down to a manageable level, it clearly is having adverse effects at present.

I think so too, it feels like to be back on SETI. Not nice for us, who want to crunch, but there's nothing else that can be done. There are now well over 7 millions tasks in the database (plus whatever is still waiting for db_purge, we don't see that number on the server status page, might be around 1-2 millions or even more).
ID: 74549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74550 - Posted: 24 Oct 2022, 16:01:29 UTC





Seems like any improvements made yesterday have all been wiped out
ID: 74550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74551 - Posted: 24 Oct 2022, 16:41:03 UTC - in response to Message 74546.  

...
Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting

Hmmm, waiting for what?
ID: 74551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,882,953
RAC: 253
Message 74552 - Posted: 24 Oct 2022, 16:58:52 UTC - in response to Message 74551.  

...
Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting

Hmmm, waiting for what?


Presumably the long running Nbody WU’s have exacerbated the problem ?
ID: 74552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 208
Credit: 105,467,104
RAC: 36,042
Message 74553 - Posted: 24 Oct 2022, 20:34:31 UTC - in response to Message 74552.  

...
Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting

Hmmm, waiting for what?


Presumably the long running Nbody WU’s have exacerbated the problem ?
The two projects have separate validators, so the only effect one might have on the other is memory demands. However, it's not N-Body that's having severe backlog problems, if the state of my current tasks in progress or waiting for validation is anything to go by...

As at about 20:00 UTC on 24th October a typical _0 N-Body task seems to take about 20 hours to pass through the validator (which doesn't allow it to validate without a wing-man for some reason...) and have a new task sent out for a second opinion. Whilst that isn't brilliant, it's not anywhere near as bad as the situation for Separation tasks!...

As at about the same time as the above, a typical _0 Separation task seems to take over 6 days to pass through the validator (whether it ends up self-validating or not) -- if it doesn't validate without a wingman, there will be two further opinions sought, but when _1 comes back the transitioner will note that 3 results are required so it will spin off another task without troubling the validator[1], and that's pretty quick!

It looks as if it adds about a day to the processing time for each million extra tasks awaiting validation, and bearing in mind that (in theory) more than 10% of initial tasks should end up needing a second (and third!) opinion, clearing out the backlog should produce a reasonable amount of available work, albeit more slowly... (And, of course, tasks that return with an error state should produce new tasks without needing to engage the work unit generator, the same as second retries...) It has been suggested that it might be a good idea to turn off the generation of new work for Separation for a while -- I'd second that suggestion!

Cheers - Al.

[1] Unless the MilkyWay team has done something really strange to the core BOINC stuff, I don't think time-out or error-driven retries and "third opinion" tasks should go anywhere near the work-unit generator so they should not be held up if the validator is not involved (unlike "second opinion" tasks). If Tom knows otherwise, I'd be interested to know what they did!

P.S. "Waiting for what?" -- as mentioned elsewhere, they seem to be waiting for the IT people to sort out the migration...
ID: 74553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
.clair.

Send message
Joined: 3 Mar 13
Posts: 84
Credit: 779,527,603
RAC: 22,637
Message 74554 - Posted: 24 Oct 2022, 20:43:37 UTC - in response to Message 74546.  
Last modified: 24 Oct 2022, 20:46:16 UTC

Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting

If a server needs more memory . . me thinks , quick fix , FIT MORE MEMORY , bin there dun that , unless it already has a full set . 128 or 256GB whatever , ok I know it costs money .
ID: 74554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,951,988
RAC: 21,328
Message 74557 - Posted: 25 Oct 2022, 2:45:11 UTC - in response to Message 74554.  

Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting


If a server needs more memory . . me thinks , quick fix , FIT MORE MEMORY , bin there dun that , unless it already has a full set . 128 or 256GB whatever , ok I know it costs money .


The question is not money as that was raised already so I'm guessing the Server already has all it can handle, Tom said the new Server has more memory and should fix the problem but as said before nothing happens until the IT people do their thing, the problem is NOT on Tom's people he said.
ID: 74557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74558 - Posted: 25 Oct 2022, 6:35:39 UTC - in response to Message 74553.  


P.S. "Waiting for what?" -- as mentioned elsewhere, they seem to be waiting for the IT people to sort out the migration...

Well, I read the post in such a way, that I understood that the IT-team is ready (finished) and the iT-team is waiting ... (for what to happen?).
I guess I misunderstood the posting?
ID: 74558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,951,988
RAC: 21,328
Message 74559 - Posted: 25 Oct 2022, 10:36:01 UTC - in response to Message 74558.  


P.S. "Waiting for what?" -- as mentioned elsewhere, they seem to be waiting for the IT people to sort out the migration...


Well, I read the post in such a way, that I understood that the IT-team is ready (finished) and the iT-team is waiting ... (for what to happen?).
I guess I misunderstood the posting?


I would guess it will take time to shut down everything needed to remove and the install the new Server and that it could affect other parts of the University as well, they do share the internet connection at least, and then make sure they have the right people on hand in case something goes wrong either in hardware mismatches and in the software area as the Server does it's first boot up and gets ready to do it's thing for us. Tom did not go into details about whether it was plug and play from his end or what was involved so there's alot of guessing going on over this. There have also been suggestions to keep the old Server and repurpose it for something Boinc related, ie run the NBody tasks while the new Server handles the Separation tasks but Tom didn't say if that was even a possibility, remember Seti literally had a repurposed closet they had to fit their stuff into before they finally shut down.
ID: 74559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,919,645
RAC: 12,378
Message 74563 - Posted: 25 Oct 2022, 16:31:36 UTC





Seems like its stagnating?
ID: 74563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Daily graphs of server_status

©2024 Astroinformatics Group