Message boards :
Number crunching :
Daily graphs of server_status
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
There seemed to be some recovery... but it has all been wiped out: |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September. I presume some explaination is here https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4924&postid=74351#74351 |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Brilliant….maybe someone will explain at least what changes were made to Nbody on or around 9th September. Thanks for that. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
There seems to be an improvement to the validation queue: And |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
Seems the numbers improve.. then unimprove |
Send message Joined: 9 Apr 14 Posts: 35 Credit: 9,708,616 RAC: 0 |
Seems there's some improvement every time there's a server reset, but then things start overloading again and the pending validation tasks start climbing again. So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 5 |
Seems there's some improvement every time there's a server reset, but then things start overloading again and the pending validation tasks start climbing again. Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting |
Send message Joined: 19 Jul 10 Posts: 618 Credit: 19,254,980 RAC: 10 |
So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not. If memory is the issue, the only temporary workaround would be to slow down WU generators (or the feeder) to the level, where the validators can follow, so completed tasks can be validated and removed from the database. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Maybe it’s time to stop producing new WU’s and get the Q down to a manageable level, it clearly is having adverse effects at present. |
Send message Joined: 19 Jul 10 Posts: 618 Credit: 19,254,980 RAC: 10 |
Maybe it’s time to stop producing new WU’s and get the Q down to a manageable level, it clearly is having adverse effects at present. I think so too, it feels like to be back on SETI. Not nice for us, who want to crunch, but there's nothing else that can be done. There are now well over 7 millions tasks in the database (plus whatever is still waiting for db_purge, we don't see that number on the server status page, might be around 1-2 millions or even more). |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
Seems like any improvements made yesterday have all been wiped out |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
... Hmmm, waiting for what? |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
... Presumably the long running Nbody WU’s have exacerbated the problem ? |
Send message Joined: 16 Mar 10 Posts: 211 Credit: 108,062,624 RAC: 3,720 |
The two projects have separate validators, so the only effect one might have on the other is memory demands. However, it's not N-Body that's having severe backlog problems, if the state of my current tasks in progress or waiting for validation is anything to go by...... As at about 20:00 UTC on 24th October a typical _0 N-Body task seems to take about 20 hours to pass through the validator (which doesn't allow it to validate without a wing-man for some reason...) and have a new task sent out for a second opinion. Whilst that isn't brilliant, it's not anywhere near as bad as the situation for Separation tasks!... As at about the same time as the above, a typical _0 Separation task seems to take over 6 days to pass through the validator (whether it ends up self-validating or not) -- if it doesn't validate without a wingman, there will be two further opinions sought, but when _1 comes back the transitioner will note that 3 results are required so it will spin off another task without troubling the validator[1], and that's pretty quick! It looks as if it adds about a day to the processing time for each million extra tasks awaiting validation, and bearing in mind that (in theory) more than 10% of initial tasks should end up needing a second (and third!) opinion, clearing out the backlog should produce a reasonable amount of available work, albeit more slowly... (And, of course, tasks that return with an error state should produce new tasks without needing to engage the work unit generator, the same as second retries...) It has been suggested that it might be a good idea to turn off the generation of new work for Separation for a while -- I'd second that suggestion! Cheers - Al. [1] Unless the MilkyWay team has done something really strange to the core BOINC stuff, I don't think time-out or error-driven retries and "third opinion" tasks should go anywhere near the work-unit generator so they should not be held up if the validator is not involved (unlike "second opinion" tasks). If Tom knows otherwise, I'd be interested to know what they did! P.S. "Waiting for what?" -- as mentioned elsewhere, they seem to be waiting for the IT people to sort out the migration... |
Send message Joined: 3 Mar 13 Posts: 84 Credit: 779,527,712 RAC: 0 |
Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting If a server needs more memory . . me thinks , quick fix , FIT MORE MEMORY , bin there dun that , unless it already has a full set . 128 or 256GB whatever , ok I know it costs money . |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 5 |
Tom already said the problem is not enough memory in the current Server but the IT people are in charge of moving the stuff over to the new Server they already have ready and waiting The question is not money as that was raised already so I'm guessing the Server already has all it can handle, Tom said the new Server has more memory and should fix the problem but as said before nothing happens until the IT people do their thing, the problem is NOT on Tom's people he said. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Well, I read the post in such a way, that I understood that the IT-team is ready (finished) and the iT-team is waiting ... (for what to happen?). I guess I misunderstood the posting? |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 5 |
I would guess it will take time to shut down everything needed to remove and the install the new Server and that it could affect other parts of the University as well, they do share the internet connection at least, and then make sure they have the right people on hand in case something goes wrong either in hardware mismatches and in the software area as the Server does it's first boot up and gets ready to do it's thing for us. Tom did not go into details about whether it was plug and play from his end or what was involved so there's alot of guessing going on over this. There have also been suggestions to keep the old Server and repurpose it for something Boinc related, ie run the NBody tasks while the new Server handles the Separation tasks but Tom didn't say if that was even a possibility, remember Seti literally had a repurposed closet they had to fit their stuff into before they finally shut down. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 198 |
Seems like its stagnating? |
©2024 Astroinformatics Group