Welcome to MilkyWay@home

Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74244 - Posted: 22 Sep 2022, 16:30:56 UTC

Hey Everyone,

We've been noticing some occasional long loading times and brief connection disruptions with the server, so I'm going to take things down at noon ET tomorrow (4PM UTC) and reboot the server. It should be back up pretty quickly. Hopefully this fixes the issues we're having.

Best,
Tom
ID: 74244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile philhoey

Send message
Joined: 30 Dec 09
Posts: 2
Credit: 26,265,301
RAC: 0
Message 74245 - Posted: 22 Sep 2022, 18:24:31 UTC - in response to Message 74244.  

Having run servers before retirement you might run a drive clean up and defrag as well.
Just a thought.

Phil
ID: 74245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74255 - Posted: 23 Sep 2022, 15:42:08 UTC

Well, I guess MilkyWay really wanted to be restarted, because it went down in the middle of the night last night. According to the server room admins, the machine was doing some funky stuff like not being able to turn off swap or unmount any partitions, but it force rebooted and came back up again.

I'll take the server down for maintenance at the scheduled time, but it may be a few hours now because I want to make a manual backup of the DB in case anything else happens.

I guess it's good that we're planning on migrating to new hardware soon!
ID: 74255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74256 - Posted: 23 Sep 2022, 15:47:44 UTC

After looking at what's going on in the DB, I think I might keep things up for a little while. There are a few long (~4 hr) processes running and I want to see if they complete or if they need to be terminated. I'll keep you all posted on what I plan on doing - it could be that milkyway stays up for a day and then I restart things tomorrow instead.
ID: 74256 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mayo father

Send message
Joined: 22 Aug 22
Posts: 4
Credit: 1,136,950
RAC: 0
Message 74257 - Posted: 23 Sep 2022, 19:18:04 UTC

thanks for the update. i think i may have figured some things out too.
ID: 74257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74259 - Posted: 23 Sep 2022, 20:22:49 UTC

I think the server has caught up to all the work that you all did while it was down. The transitioner backlog is down to 0 hours, and there are a few hundred thousand returned tasks in the queue to go back out. I'm keeping an eye on things again in order to make sure that this number doesn't explode like it did when we had the drive fail. If the number of workunits waiting to go out starts rising quickly, I'll turn off the WU generators until that backlog is crunched.
ID: 74259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74260 - Posted: 23 Sep 2022, 21:48:31 UTC

Looks like the counts are dropping. Will continue monitoring but I think we're okay for now.
ID: 74260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 74261 - Posted: 23 Sep 2022, 22:59:13 UTC - in response to Message 74260.  

Looks like the counts are dropping. Will continue monitoring but I think we're okay for now.


WOO HOO!!
Thanks Tom!!
ID: 74261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Spatzthecat

Send message
Joined: 1 Dec 10
Posts: 82
Credit: 15,452,009,012
RAC: 0
Message 74268 - Posted: 24 Sep 2022, 11:37:50 UTC

:)

Thanks Tom!
ID: 74268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mayo father

Send message
Joined: 22 Aug 22
Posts: 4
Credit: 1,136,950
RAC: 0
Message 74275 - Posted: 24 Sep 2022, 18:37:43 UTC - in response to Message 74260.  

cool yeah i had a little down time on a server farm so could have contributed to the issue. my apologies if so.
ID: 74275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 712
Credit: 553,982,065
RAC: 59,137
Message 74278 - Posted: 24 Sep 2022, 21:11:30 UTC

Web site is slow and laggy today. More than usual.
ID: 74278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74284 - Posted: 26 Sep 2022, 9:10:11 UTC - in response to Message 74244.  

Hey Everyone,

We've been noticing some occasional long loading times and brief connection disruptions with the server, so I'm going to take things down at noon ET tomorrow (4PM UTC) and reboot the server. It should be back up pretty quickly. Hopefully this fixes the issues we're having.

Best,
Tom

Hmmm, loading times are getting too long again.

And under "server staus" I notice:
Workunits waiting for validation 1519530
is going up.

Also noticed
Milkyway@home N-Body Simulation 128813
is also going up.

Should we ignore this?
ID: 74284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 74285 - Posted: 26 Sep 2022, 16:11:07 UTC - in response to Message 74284.  

And, if we shouldn't ignore it, should we dog-pile onto n body to clear out the backlog? (that's been done before.....)
ID: 74285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74286 - Posted: 26 Sep 2022, 17:25:13 UTC - in response to Message 74285.  

... (that's been done before.....)

I think that was a different problem, than we are having now?
ID: 74286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 74287 - Posted: 26 Sep 2022, 18:49:32 UTC - in response to Message 74286.  

... (that's been done before.....)

I think that was a different problem, than we are having now?
Hopefully! That last one was a nightmare! But I'm willing to move over if need be.
ID: 74287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 211
Credit: 108,210,146
RAC: 5,134
Message 74288 - Posted: 27 Sep 2022, 8:23:28 UTC
Last modified: 27 Sep 2022, 8:29:47 UTC

Regarding the high counts for N-body...

Over the last few days there has been an enormous backlog of stuff waiting for validation! Bearing in mind that at present the N-body tasks are sent out with initial quorum 1 (like Separation tasks) but don't ever seem to validate without a wingman (unlike Separation!) a very large proportion of the backlog would be N-body task initial result returns, which would promptly generate a retry when eventually validated :-)

So a day or so after the validation backlog started to grow the number of tasks waiting to be sent started to shoot up as well, and it seems to have settled out around the 110,000 to 120,000 mark...

Until they work out why there's such a huge validation backlog, it's likely to stay as it is... I'm wondering if it's something to do with the extra code in the validator that backs off interesting results (using something called Toolkit for Asynchronous Optimization, I believe)

Cheers - Al.

P.S. The comments about sluggishness (in various guises) are still applicable -- this post is being made the first time I've managed to get at the server since some time yesterday, and my client has only just managed to report old results (again, stalled since yesterday...) And as for trying to download work -- not a chance!
ID: 74288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 211
Credit: 108,210,146
RAC: 5,134
Message 74290 - Posted: 27 Sep 2022, 9:40:22 UTC - in response to Message 74288.  
Last modified: 27 Sep 2022, 9:49:07 UTC

[As it's too late to edit the previous post...]

Work downloading seems to have just started again :-). However, the number of work units waiting for validation is still climbing :-(.

I wonder how long it will be before it all bungs up again (or maybe even crashes [again?])

Cheers - Al.
ID: 74290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 96
Credit: 152,502,177
RAC: 11
Message 74294 - Posted: 27 Sep 2022, 16:38:14 UTC - in response to Message 74290.  

[As it's too late to edit the previous post...]

Work downloading seems to have just started again :-). However, the number of work units waiting for validation is still climbing :-(.

I wonder how long it will be before it all bungs up again (or maybe even crashes [again?])

Cheers - Al.


Things seem to improve then degrade very hard... As evidenced in the below image
ID: 74294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,171,817
RAC: 41
Message 74299 - Posted: 27 Sep 2022, 22:31:13 UTC - in response to Message 74285.  

And, if we shouldn't ignore it, should we dog-pile onto n body to clear out the backlog? (that's been done before.....)

As this will help the server I am more than happy to lend a hand. Has it helped in the past?
ID: 74299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 74301 - Posted: 28 Sep 2022, 10:04:48 UTC - in response to Message 74299.  

And, if we shouldn't ignore it, should we dog-pile onto n body to clear out the backlog? (that's been done before.....)


As this will help the server I am more than happy to lend a hand. Has it helped in the past?


Yes it has but it really helps if you have the file name listed and then crunch the ones with the highest number at the end first,
ie Name de_modfit_70_bundle5_3s_south_pt2_2_1663946992_2335911_1
and Name de_modfit_70_bundle5_3s_south_pt2_3_1663946992_2258518_2

So you should crunch the 2nd one first as it's been in the system longer meaning you would be the wingman clearing out the backlog as opposed to crunching a task with a _1 or even and _0 as the numbers.

YES those are Separation tasks but they are just an example, in the Boinc Manager you can set it to show the file Name under Options, Select columns and tick the box for Name
ID: 74301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022

©2024 Astroinformatics Group