Message boards :
News :
Server Trouble
Message board moderation
Author | Message |
---|---|
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
Hey Everyone, The server appears to be having some connectivity issues. Additionally, one of the drives on the server appears to have failed - luckily things are mirrored so we haven't lost any data. However, I need to make a backup of things, which could take several hours. Once everything is backed up I'll try to clear out this transitioner/validator backlog and get things up to speed again. Best, Tom |
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
The problem with the bad drive has been handled, but things are struggling to come back up. I'm working on it currently. |
![]() ![]() Send message Joined: 2 Mar 20 Posts: 35 Credit: 103,987,489 RAC: 231,431 ![]() ![]() |
Thanks a lot Tom. Really appreciate you letting us know when things are not running properly. |
![]() Send message Joined: 25 Aug 17 Posts: 12 Credit: 758,271,591 RAC: 4,395,227 ![]() ![]() |
Thanks a lot Tom. Really appreciate you letting us know when things are not running properly. Double for me. I've a few that need to come back home.. |
![]() Send message Joined: 12 Nov 21 Posts: 167 Credit: 69,568,662 RAC: 681,191 ![]() |
Things are looking up. I was able to send in 47 tasks and get one back for my slowest cpu. |
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
Looks like things are stable for the moment, I'm going to keep an eye on the validator and transitioner backlogs. Hopefully they will start decreasing as things begin to flow again. I also think the server is running slower than usual because it is re-configuring itself after losing the broken hard drive. We will have to take the server down again in a few days to replace the bad drive, but I'll let you know when that happens. I'll also keep in touch in case any more maintenance is needed before then. |
![]() Send message Joined: 2 Oct 09 Posts: 3 Credit: 5,016,621 RAC: 218 ![]() ![]() |
Tom, No worries. Have you tried hitting it with a very large hammer? :) Seriously though, all of us "data crunchers" truly appreciate all of the infrastructure work and support that you provide. Kind Regards, Cliff Philadelphia, PA |
Cameron Send message Joined: 16 Dec 07 Posts: 33 Credit: 18,943,516 RAC: 63,670 ![]() ![]() |
Thanks for the hard work Tom. One of the benefits of the 14 day deadlines is the results can have a few days till they're returned if something unexpected happens. |
Peter Hucker of the Scottish Boinc Team![]() Send message Joined: 5 Jul 11 Posts: 708 Credit: 278,419,186 RAC: 586,679 ![]() ![]() |
Hey Everyone,When I ran servers we had RAID 6. Yes mirroring isn't exactly the same, but surely you can just pull out the bad drive while it's running and shove in another and it rebuilds it in the background? I even used this function to upgrade to larger disks without interrupting the users, I just did one at a time. Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000? Anyway it was nice to walk into my garage this morning to feel the temperature of a butterfly house. I knew what that meant! My Amazon parrots thank you. |
Luciferius Infernalis Vel Tohu Send message Joined: 3 May 18 Posts: 6 Credit: 39,118 RAC: 22 ![]() ![]() |
Hallo Tom, Can I get my points for the last work,because in the same time the server have had troubles. Best: Luciferius Infernalis Vel Tohu[color=red][size=12][color=red] |
Bobzilla Send message Joined: 27 Aug 20 Posts: 6 Credit: 39,683,314 RAC: 191 ![]() ![]() |
Time to break out the old Tonia Harding and giver her a good whack....... |
Peter Hucker of the Scottish Boinc Team![]() Send message Joined: 5 Jul 11 Posts: 708 Credit: 278,419,186 RAC: 586,679 ![]() ![]() |
Time to break out the old Tonia Harding and giver her a good whack.......Is that Cockney Rhyming Slang? I had to look up who she is, my god how can a cute thing become so repulsive? |
Peter Hucker of the Scottish Boinc Team![]() Send message Joined: 5 Jul 11 Posts: 708 Credit: 278,419,186 RAC: 586,679 ![]() ![]() |
Luciferius Infernalis Vel TohuMy Latin isn't so good. Infernal devil? |
Baja Sandor![]() Send message Joined: 4 Mar 20 Posts: 10 Credit: 7,233,260 RAC: 44,283 ![]() ![]() |
Thanks |
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000? This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time. |
Peter Hucker of the Scottish Boinc Team![]() Send message Joined: 5 Jul 11 Posts: 708 Credit: 278,419,186 RAC: 586,679 ![]() ![]() |
I thought you'd set a limit, as it's usually precisely 10000 seperation and 1000 Nbody.Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000?This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time. |
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
I believe the server tries to keep at least that many tasks queued up to be ready to send out, but sometimes it increases if the server is struggling. |
Dataman Send message Joined: 14 Mar 09 Posts: 1 Credit: 9,398,786 RAC: 1,858 ![]() ![]() |
Thanks for all you do |
3man001 Send message Joined: 1 Jan 22 Posts: 1 Credit: 5,432,623 RAC: 21,471 ![]() |
Server not fixed yet? |
![]() Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 347 Credit: 77,924,738 RAC: 153,626 ![]() ![]() |
There is a large number of workunits that are waiting to be validated. I think that this is because the server is running more slowly while it is degraded. When we get the drive back we can restore full functionality of the server. Hopefully that isn't too long, but in the meantime this big drop in RAC might just be how things are at the moment. |
©2022 Astroinformatics Group