Welcome to MilkyWay@home

Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)

Message boards : News : Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 15 · Next

AuthorMessage
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72259 - Posted: 27 Mar 2022, 19:35:06 UTC

Hey Everyone,

I will be turning off the mysql DB for the server at midnight UTC (8PM EDT) for 12 hours. Last time I kept some processes running, but turned off the WU generators in order to give things time to catch up from the drive failure (they are currently still off). I don't think that the server finished rebuilding then, so I will be turning all MW processes completely off for 12 hours, and then hopefully by the time I turn things back on again, the drive will have rebuilt.

If you only tune in to the BOINC notices, apologies for the massive drops in RAC lately. These are related to a hard drive failure that we are still in the process of recovering from.

Thanks for your patience. I know that this has been very frustrating for all of you, and it has been frustrating for me as well.

Best,
Tom
ID: 72259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FurryGuy

Send message
Joined: 1 Aug 11
Posts: 10
Credit: 51,374,490
RAC: 0
Message 72260 - Posted: 27 Mar 2022, 20:54:44 UTC

Thank you, Tom, for keeping us crunchers in the loop.
ID: 72260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Atratus

Send message
Joined: 21 Mar 09
Posts: 2
Credit: 36,643,097
RAC: 0
Message 72262 - Posted: 27 Mar 2022, 23:41:37 UTC
Last modified: 27 Mar 2022, 23:44:21 UTC

Hi Tom,

Thank you for this Info.

As my RAID6 rebuilt the last Time is took 4 Weeks.
My Setup was 8x8TB Drives.

Next Time I took all my Data off the Storage, exchanged the Drive and Intialized from Scratch and took my Data back on the Storage.
This took only 2 Days and was a little quicker ;-)

I hope things will come back running again...
<crossedfingers>

--
Kind regards
Atratus
ID: 72262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72264 - Posted: 28 Mar 2022, 14:39:50 UTC

Good to know Atratus. I was under the impression that it wouldn't take that long, but that does reassure me that the problems we are experiencing are due to the RAID rebuild and not due to other server problems (knock on wood).
ID: 72264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72265 - Posted: 28 Mar 2022, 14:42:25 UTC

I've just been notified that the server rebuild has completed, so hopefully we can get things running as per normal soon! I'll be paying attention to things throughout the day in order to fix any weirdness that comes along.
ID: 72265 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FurryGuy

Send message
Joined: 1 Aug 11
Posts: 10
Credit: 51,374,490
RAC: 0
Message 72266 - Posted: 28 Mar 2022, 14:49:39 UTC

3/28/2022 7:39:42 AM | Milkyway@Home | Sending scheduler request: Requested by user.
3/28/2022 7:39:42 AM | Milkyway@Home | Reporting 2 completed tasks
3/28/2022 7:39:42 AM | Milkyway@Home | Requesting new tasks for NVIDIA GPU
3/28/2022 7:39:53 AM | Milkyway@Home | Scheduler request completed: got 0 new tasks
3/28/2022 7:39:53 AM | Milkyway@Home | Server error: feeder not running

The tasks ready to report still are waiting.

I'm not surprised things are not running at full speed, that is going to take some time. Seti@Home when it was active had weekly scheduled downtime for project/server maintenance, the recovery time was lengthy.
ID: 72266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72267 - Posted: 28 Mar 2022, 15:19:07 UTC

@ Tom:

SERVER FEED is not running - is that planned/ok?

No uploads or downloads possible.

Project status page 27. March 2022 18:54 UTC
ID: 72267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72268 - Posted: 28 Mar 2022, 15:26:08 UTC
Last modified: 28 Mar 2022, 15:26:19 UTC

Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php).

Let me try restarting all the processes and see if that helps at all. Otherwise, I might just restart the machine.
ID: 72268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alk44
Avatar

Send message
Joined: 2 Mar 20
Posts: 131
Credit: 320,377,701
RAC: 13,859
Message 72270 - Posted: 28 Mar 2022, 16:32:13 UTC - in response to Message 72268.  

Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php).

Let me try restarting all the processes and see if that helps at all. Otherwise, I might just restart the machine.


As of this time, Feeder is not running per Notices tab.
ID: 72270 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72271 - Posted: 28 Mar 2022, 17:04:51 UTC - in response to Message 72270.  

That might have been because I was rebooting the server and bringing things back up. When I look at the server status page, it is up now.
ID: 72271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72272 - Posted: 28 Mar 2022, 17:05:45 UTC

Oh, you're looking at your BOINC client. Is it still down?
ID: 72272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
winny33

Send message
Joined: 10 Mar 10
Posts: 1
Credit: 2,007,394
RAC: 813
Message 72273 - Posted: 28 Mar 2022, 17:07:44 UTC

Good evening to you:
I have it in the logs.
28/03/2022 19:00:46 | Milkyway@Home | Server error: feeder not running
ID: 72273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 213
Credit: 108,362,921
RAC: 4,460
Message 72274 - Posted: 28 Mar 2022, 17:15:59 UTC - in response to Message 72268.  

Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php).

Let me try restarting all the processes and see if that helps at all. Otherwise, I might just restart the machine.

Tom,

It is still saying the feeder is not running, and it requests a delay of 400 seconds rather than the usual 91 seconds. (I don't now whether that is specific to that feeder error state or not...)

The status page I see is nearly 24 hours out of date - the numbers seem unchanged and the timestamp line at the bottom says
Task data as of 27 Mar 2022, 18:54:47 UTC
If you're seeing something different at that URL there's something amiss!

Any clues in the various logs??? I don't envy you having to try to sort this out...

Good luck - Al
ID: 72274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72275 - Posted: 28 Mar 2022, 17:27:09 UTC

@ Tom:

the up- and download servers were not stopped - they were green, while the other were red (as far as I saw).

I'm still getting server errorr: feeder not running.

the project status page ist dated from the 27.

ready to report task still not beeing called home

unable to get tasks

why weren't the up- and download servers restarted?

There might e a differential behaviour between a "re-start" and "cold-Start" - as in many situations this is a problem.
Maybe a complete power down inclusive pulling the wall-socket for 10 minutes would do it?

cheers
ID: 72275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72276 - Posted: 28 Mar 2022, 17:35:11 UTC - in response to Message 72275.  
Last modified: 28 Mar 2022, 17:35:25 UTC

@ Tom:

the up- and download servers were not stopped - they were green, while the other were red (as far as I saw).

I'm still getting server errorr: feeder not running.

the project status page ist dated from the 27.

ready to report task still not beeing called home

unable to get tasks

why weren't the up- and download servers restarted?

There might e a differential behaviour between a "re-start" and "cold-Start" - as in many situations this is a problem.
Maybe a complete power down inclusive pulling the wall-socket for 10 minutes would do it?

cheers


The download and upload servers don't usually get restarted when I restart all of the milkyway processes. However, I did restart the server, so they would have been restarted. The up and download servers come up automatically when the server turns on (which is why they were green), but the other processes have to be started manually (which is why they were red). I don't have physical access to the server, so I can't power cycle the machine.

So, just to confirm, you are not able to contact the server from your client? Does it say that the feeder is down, or what error does it report?
ID: 72276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skillz

Send message
Joined: 28 May 17
Posts: 76
Credit: 4,398,910,125
RAC: 419
Message 72277 - Posted: 28 Mar 2022, 17:55:01 UTC

3/28/2022 11:52:48 AM | Milkyway@Home | Server error: feeder not running

This is what one of my team members just posted on our team forum.
ID: 72277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72278 - Posted: 28 Mar 2022, 17:58:54 UTC - in response to Message 72276.  
Last modified: 28 Mar 2022, 18:04:55 UTC

@ Tom:

3/28/2022 7:22:08 PM | Milkyway@Home | project resumed by user
3/28/2022 7:22:09 PM | Milkyway@Home | Sending scheduler request: To report completed tasks.
3/28/2022 7:22:09 PM | Milkyway@Home | Reporting 488 completed tasks
3/28/2022 7:22:09 PM | Milkyway@Home | Not requesting tasks: "no new tasks" requested via Manager
3/28/2022 7:22:09 PM | Milkyway@Home | work fetch resumed by user
3/28/2022 7:22:30 PM | Milkyway@Home | Scheduler request completed
3/28/2022 7:22:30 PM | Milkyway@Home | Server error: feeder not running
3/28/2022 7:22:30 PM | Milkyway@Home | Project requested delay of 400 seconds

3/28/2022 7:22:30 PM | Milkyway@Home | Project requested delay of 400 seconds
3/28/2022 7:57:24 PM | Milkyway@Home | update requested by user
3/28/2022 7:57:28 PM | Milkyway@Home | Sending scheduler request: Requested by user.
3/28/2022 7:57:28 PM | Milkyway@Home | Reporting 488 completed tasks
3/28/2022 7:57:28 PM | Milkyway@Home | Requesting new tasks for NVIDIA GPU
3/28/2022 7:57:48 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
3/28/2022 7:57:48 PM | Milkyway@Home | Server error: feeder not running
3/28/2022 7:57:48 PM | Milkyway@Home | Project requested delay of 400 seconds

that is all it is saying (on all of my rigs)
ID: 72278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72279 - Posted: 28 Mar 2022, 18:12:39 UTC

Okay, I am going to try restarting all the processes again. I'll let you know when I am done, and we can check if it is working then. If not, I can reboot the whole system again.

If that doesn't work, then there's a problem somewhere that's more serious.
ID: 72279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72280 - Posted: 28 Mar 2022, 18:13:38 UTC - in response to Message 72279.  

Okay, I've restarted things. Is the feeder still down for you?
ID: 72280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Baja Sándor
Avatar

Send message
Joined: 4 Mar 20
Posts: 10
Credit: 11,135,743
RAC: 5,290
Message 72281 - Posted: 28 Mar 2022, 18:20:27 UTC - in response to Message 72259.  

Thanks for avverting us.
ID: 72281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 15 · Next

Message boards : News : Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)

©2024 Astroinformatics Group