Message boards :
News :
Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hey Everyone, I will be turning off the mysql DB for the server at midnight UTC (8PM EDT) for 12 hours. Last time I kept some processes running, but turned off the WU generators in order to give things time to catch up from the drive failure (they are currently still off). I don't think that the server finished rebuilding then, so I will be turning all MW processes completely off for 12 hours, and then hopefully by the time I turn things back on again, the drive will have rebuilt. If you only tune in to the BOINC notices, apologies for the massive drops in RAC lately. These are related to a hard drive failure that we are still in the process of recovering from. Thanks for your patience. I know that this has been very frustrating for all of you, and it has been frustrating for me as well. Best, Tom |
Send message Joined: 1 Aug 11 Posts: 10 Credit: 51,374,490 RAC: 0 |
Thank you, Tom, for keeping us crunchers in the loop. |
Send message Joined: 21 Mar 09 Posts: 2 Credit: 36,643,097 RAC: 0 |
Hi Tom, Thank you for this Info. As my RAID6 rebuilt the last Time is took 4 Weeks. My Setup was 8x8TB Drives. Next Time I took all my Data off the Storage, exchanged the Drive and Intialized from Scratch and took my Data back on the Storage. This took only 2 Days and was a little quicker ;-) I hope things will come back running again... <crossedfingers> -- Kind regards Atratus |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Good to know Atratus. I was under the impression that it wouldn't take that long, but that does reassure me that the problems we are experiencing are due to the RAID rebuild and not due to other server problems (knock on wood). |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I've just been notified that the server rebuild has completed, so hopefully we can get things running as per normal soon! I'll be paying attention to things throughout the day in order to fix any weirdness that comes along. |
Send message Joined: 1 Aug 11 Posts: 10 Credit: 51,374,490 RAC: 0 |
3/28/2022 7:39:42 AM | Milkyway@Home | Sending scheduler request: Requested by user. The tasks ready to report still are waiting. I'm not surprised things are not running at full speed, that is going to take some time. Seti@Home when it was active had weekly scheduled downtime for project/server maintenance, the recovery time was lengthy. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
@ Tom: SERVER FEED is not running - is that planned/ok? No uploads or downloads possible. Project status page 27. March 2022 18:54 UTC |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php). Let me try restarting all the processes and see if that helps at all. Otherwise, I might just restart the machine. |
Send message Joined: 2 Mar 20 Posts: 131 Credit: 319,715,221 RAC: 14,890 |
Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php). As of this time, Feeder is not running per Notices tab. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
That might have been because I was rebooting the server and bringing things back up. When I look at the server status page, it is up now. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Oh, you're looking at your BOINC client. Is it still down? |
Send message Joined: 10 Mar 10 Posts: 1 Credit: 1,972,743 RAC: 15 |
Good evening to you: I have it in the logs. 28/03/2022 19:00:46 | Milkyway@Home | Server error: feeder not running |
Send message Joined: 16 Mar 10 Posts: 211 Credit: 108,180,775 RAC: 5,054 |
Huh, it looks like the feeder is running on our end, at least if you check the status page (https://milkyway.cs.rpi.edu/milkyway/server_status.php). Tom, It is still saying the feeder is not running, and it requests a delay of 400 seconds rather than the usual 91 seconds. (I don't now whether that is specific to that feeder error state or not...) The status page I see is nearly 24 hours out of date - the numbers seem unchanged and the timestamp line at the bottom says Task data as of 27 Mar 2022, 18:54:47 UTCIf you're seeing something different at that URL there's something amiss! Any clues in the various logs??? I don't envy you having to try to sort this out... Good luck - Al |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
@ Tom: the up- and download servers were not stopped - they were green, while the other were red (as far as I saw). I'm still getting server errorr: feeder not running. the project status page ist dated from the 27. ready to report task still not beeing called home unable to get tasks why weren't the up- and download servers restarted? There might e a differential behaviour between a "re-start" and "cold-Start" - as in many situations this is a problem. Maybe a complete power down inclusive pulling the wall-socket for 10 minutes would do it? cheers |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
@ Tom: The download and upload servers don't usually get restarted when I restart all of the milkyway processes. However, I did restart the server, so they would have been restarted. The up and download servers come up automatically when the server turns on (which is why they were green), but the other processes have to be started manually (which is why they were red). I don't have physical access to the server, so I can't power cycle the machine. So, just to confirm, you are not able to contact the server from your client? Does it say that the feeder is down, or what error does it report? |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,880,029 RAC: 23,387 |
3/28/2022 11:52:48 AM | Milkyway@Home | Server error: feeder not running This is what one of my team members just posted on our team forum. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
@ Tom: 3/28/2022 7:22:08 PM | Milkyway@Home | project resumed by user 3/28/2022 7:22:09 PM | Milkyway@Home | Sending scheduler request: To report completed tasks. 3/28/2022 7:22:09 PM | Milkyway@Home | Reporting 488 completed tasks 3/28/2022 7:22:09 PM | Milkyway@Home | Not requesting tasks: "no new tasks" requested via Manager 3/28/2022 7:22:09 PM | Milkyway@Home | work fetch resumed by user 3/28/2022 7:22:30 PM | Milkyway@Home | Scheduler request completed 3/28/2022 7:22:30 PM | Milkyway@Home | Server error: feeder not running 3/28/2022 7:22:30 PM | Milkyway@Home | Project requested delay of 400 seconds 3/28/2022 7:22:30 PM | Milkyway@Home | Project requested delay of 400 seconds 3/28/2022 7:57:24 PM | Milkyway@Home | update requested by user 3/28/2022 7:57:28 PM | Milkyway@Home | Sending scheduler request: Requested by user. 3/28/2022 7:57:28 PM | Milkyway@Home | Reporting 488 completed tasks 3/28/2022 7:57:28 PM | Milkyway@Home | Requesting new tasks for NVIDIA GPU 3/28/2022 7:57:48 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks 3/28/2022 7:57:48 PM | Milkyway@Home | Server error: feeder not running 3/28/2022 7:57:48 PM | Milkyway@Home | Project requested delay of 400 seconds that is all it is saying (on all of my rigs) |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Okay, I am going to try restarting all the processes again. I'll let you know when I am done, and we can check if it is working then. If not, I can reboot the whole system again. If that doesn't work, then there's a problem somewhere that's more serious. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Okay, I've restarted things. Is the feeder still down for you? |
Send message Joined: 4 Mar 20 Posts: 10 Credit: 10,834,914 RAC: 3,138 |
Thanks for avverting us. |
©2024 Astroinformatics Group