Welcome to MilkyWay@home

Server Trouble


Advanced search

Message boards : News : Server Trouble
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 22 · Next

AuthorMessage
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71790 - Posted: 21 Feb 2022, 19:44:42 UTC

Hey Everyone,

The server appears to be having some connectivity issues. Additionally, one of the drives on the server appears to have failed - luckily things are mirrored so we haven't lost any data. However, I need to make a backup of things, which could take several hours. Once everything is backed up I'll try to clear out this transitioner/validator backlog and get things up to speed again.

Best,
Tom
ID: 71790 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71791 - Posted: 22 Feb 2022, 19:11:15 UTC

The problem with the bad drive has been handled, but things are struggling to come back up. I'm working on it currently.
ID: 71791 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilealk44
Avatar

Send message
Joined: 2 Mar 20
Posts: 35
Credit: 103,244,521
RAC: 239,438
100 million credit badge2 year member badge
Message 71792 - Posted: 22 Feb 2022, 19:49:23 UTC

Thanks a lot Tom. Really appreciate you letting us know when things are not running properly.
ID: 71792 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePecosRiverM

Send message
Joined: 25 Aug 17
Posts: 12
Credit: 724,124,078
RAC: 2,304,133
500 million credit badge4 year member badge
Message 71793 - Posted: 22 Feb 2022, 20:59:55 UTC - in response to Message 71792.  

Thanks a lot Tom. Really appreciate you letting us know when things are not running properly.


Double for me. I've a few that need to come back home..
ID: 71793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileHRFMguy

Send message
Joined: 12 Nov 21
Posts: 167
Credit: 66,742,173
RAC: 625,050
50 million credit badge
Message 71794 - Posted: 22 Feb 2022, 21:07:32 UTC - in response to Message 71793.  

Things are looking up. I was able to send in 47 tasks and get one back for my slowest cpu.
ID: 71794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71795 - Posted: 22 Feb 2022, 21:39:45 UTC

Looks like things are stable for the moment, I'm going to keep an eye on the validator and transitioner backlogs. Hopefully they will start decreasing as things begin to flow again.

I also think the server is running slower than usual because it is re-configuring itself after losing the broken hard drive.

We will have to take the server down again in a few days to replace the bad drive, but I'll let you know when that happens. I'll also keep in touch in case any more maintenance is needed before then.
ID: 71795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCliff

Send message
Joined: 2 Oct 09
Posts: 3
Credit: 5,015,485
RAC: 172
5 million credit badge12 year member badge
Message 71796 - Posted: 22 Feb 2022, 21:58:40 UTC - in response to Message 71790.  

Tom,

No worries. Have you tried hitting it with a very large hammer? :)

Seriously though, all of us "data crunchers" truly appreciate all of the infrastructure work and support that you provide.

Kind Regards,
Cliff
Philadelphia, PA
ID: 71796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cameron

Send message
Joined: 16 Dec 07
Posts: 33
Credit: 18,784,493
RAC: 70,134
10 million credit badge14 year member badge
Message 71797 - Posted: 23 Feb 2022, 1:30:10 UTC

Thanks for the hard work Tom.
One of the benefits of the 14 day deadlines is the results can have a few days till they're returned if something unexpected happens.
ID: 71797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 5 Jul 11
Posts: 705
Credit: 273,639,751
RAC: 230,198
200 million credit badge10 year member badge
Message 71801 - Posted: 23 Feb 2022, 6:50:21 UTC - in response to Message 71790.  
Last modified: 23 Feb 2022, 6:53:46 UTC

Hey Everyone,

The server appears to be having some connectivity issues. Additionally, one of the drives on the server appears to have failed - luckily things are mirrored so we haven't lost any data. However, I need to make a backup of things, which could take several hours. Once everything is backed up I'll try to clear out this transitioner/validator backlog and get things up to speed again.

Best,
Tom
When I ran servers we had RAID 6. Yes mirroring isn't exactly the same, but surely you can just pull out the bad drive while it's running and shove in another and it rebuilds it in the background? I even used this function to upgrade to larger disks without interrupting the users, I just did one at a time.

Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000?

Anyway it was nice to walk into my garage this morning to feel the temperature of a butterfly house. I knew what that meant! My Amazon parrots thank you.
ID: 71801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luciferius Infernalis Vel Tohu

Send message
Joined: 3 May 18
Posts: 6
Credit: 38,888
RAC: 1
10 thousand credit badge4 year member badge
Message 71803 - Posted: 23 Feb 2022, 11:57:06 UTC - in response to Message 71790.  

Hallo Tom,

Can I get my points for the last work,because in the same time the server have had troubles.

Best: Luciferius Infernalis Vel Tohu
[color=red][size=12][color=red]
ID: 71803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bobzilla

Send message
Joined: 27 Aug 20
Posts: 6
Credit: 39,683,314
RAC: 284
30 million credit badge1 year member badge
Message 71804 - Posted: 23 Feb 2022, 12:21:20 UTC - in response to Message 71790.  

Time to break out the old Tonia Harding and giver her a good whack.......
ID: 71804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 5 Jul 11
Posts: 705
Credit: 273,639,751
RAC: 230,198
200 million credit badge10 year member badge
Message 71805 - Posted: 23 Feb 2022, 15:46:33 UTC - in response to Message 71804.  

Time to break out the old Tonia Harding and giver her a good whack.......
Is that Cockney Rhyming Slang?
I had to look up who she is, my god how can a cute thing become so repulsive?
ID: 71805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 5 Jul 11
Posts: 705
Credit: 273,639,751
RAC: 230,198
200 million credit badge10 year member badge
Message 71806 - Posted: 23 Feb 2022, 15:48:05 UTC - in response to Message 71803.  

Luciferius Infernalis Vel Tohu
My Latin isn't so good. Infernal devil?
ID: 71806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Baja Sandor
Avatar

Send message
Joined: 4 Mar 20
Posts: 10
Credit: 7,033,241
RAC: 38,685
5 million credit badge2 year member badge
Message 71807 - Posted: 23 Feb 2022, 16:35:14 UTC

Thanks
ID: 71807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71810 - Posted: 23 Feb 2022, 21:31:15 UTC - in response to Message 71801.  

Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000?


This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time.
ID: 71810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 5 Jul 11
Posts: 705
Credit: 273,639,751
RAC: 230,198
200 million credit badge10 year member badge
Message 71811 - Posted: 23 Feb 2022, 21:36:49 UTC - in response to Message 71810.  

Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000?
This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time.
I thought you'd set a limit, as it's usually precisely 10000 seperation and 1000 Nbody.
ID: 71811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71816 - Posted: 24 Feb 2022, 3:54:18 UTC - in response to Message 71811.  

I believe the server tries to keep at least that many tasks queued up to be ready to send out, but sometimes it increases if the server is struggling.
ID: 71816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman

Send message
Joined: 14 Mar 09
Posts: 1
Credit: 9,398,786
RAC: 2,501
5 million credit badge13 year member badge
Message 71847 - Posted: 2 Mar 2022, 16:30:10 UTC - in response to Message 71791.  

Thanks for all you do
ID: 71847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3man001

Send message
Joined: 1 Jan 22
Posts: 1
Credit: 5,353,141
RAC: 21,113
5 million credit badge
Message 71848 - Posted: 2 Mar 2022, 17:34:51 UTC

Server not fixed yet?
ID: 71848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 347
Credit: 77,345,079
RAC: 147,637
50 million credit badge3 year member badge
Message 71849 - Posted: 3 Mar 2022, 4:16:14 UTC

There is a large number of workunits that are waiting to be validated. I think that this is because the server is running more slowly while it is degraded. When we get the drive back we can restore full functionality of the server. Hopefully that isn't too long, but in the meantime this big drop in RAC might just be how things are at the moment.
ID: 71849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 22 · Next

Message boards : News : Server Trouble

©2022 Astroinformatics Group