Message boards :
News :
Server Trouble
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hey Everyone, The server appears to be having some connectivity issues. Additionally, one of the drives on the server appears to have failed - luckily things are mirrored so we haven't lost any data. However, I need to make a backup of things, which could take several hours. Once everything is backed up I'll try to clear out this transitioner/validator backlog and get things up to speed again. Best, Tom |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
The problem with the bad drive has been handled, but things are struggling to come back up. I'm working on it currently. |
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,387,951 RAC: 12,870 |
Thanks a lot Tom. Really appreciate you letting us know when things are not running properly. |
Send message Joined: 25 Aug 17 Posts: 12 Credit: 1,253,769,456 RAC: 260,710 |
Thanks a lot Tom. Really appreciate you letting us know when things are not running properly. Double for me. I've a few that need to come back home.. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
Things are looking up. I was able to send in 47 tasks and get one back for my slowest cpu. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Looks like things are stable for the moment, I'm going to keep an eye on the validator and transitioner backlogs. Hopefully they will start decreasing as things begin to flow again. I also think the server is running slower than usual because it is re-configuring itself after losing the broken hard drive. We will have to take the server down again in a few days to replace the bad drive, but I'll let you know when that happens. I'll also keep in touch in case any more maintenance is needed before then. |
Send message Joined: 2 Oct 09 Posts: 3 Credit: 6,511,328 RAC: 1,198 |
Tom, No worries. Have you tried hitting it with a very large hammer? :) Seriously though, all of us "data crunchers" truly appreciate all of the infrastructure work and support that you provide. Kind Regards, Cliff Philadelphia, PA |
Send message Joined: 16 Dec 07 Posts: 37 Credit: 25,795,406 RAC: 5,827 |
Thanks for the hard work Tom. One of the benefits of the 14 day deadlines is the results can have a few days till they're returned if something unexpected happens. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Hey Everyone,When I ran servers we had RAID 6. Yes mirroring isn't exactly the same, but surely you can just pull out the bad drive while it's running and shove in another and it rebuilds it in the background? I even used this function to upgrade to larger disks without interrupting the users, I just did one at a time. Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000? Anyway it was nice to walk into my garage this morning to feel the temperature of a butterfly house. I knew what that meant! My Amazon parrots thank you. |
Send message Joined: 3 May 18 Posts: 7 Credit: 45,954 RAC: 0 |
Hallo Tom, Can I get my points for the last work,because in the same time the server have had troubles. Best: Luciferius Infernalis Vel Tohu[color=red][size=12][color=red] |
Send message Joined: 27 Aug 20 Posts: 6 Credit: 39,683,314 RAC: 0 |
Time to break out the old Tonia Harding and giver her a good whack....... |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Time to break out the old Tonia Harding and giver her a good whack.......Is that Cockney Rhyming Slang? I had to look up who she is, my god how can a cute thing become so repulsive? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Luciferius Infernalis Vel TohuMy Latin isn't so good. Infernal devil? |
Send message Joined: 4 Mar 20 Posts: 10 Credit: 11,135,879 RAC: 4,590 |
Thanks |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000? This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I thought you'd set a limit, as it's usually precisely 10000 seperation and 1000 Nbody.Another question, sorry. How come whenever it recovers from a problem, the server ends up with 2 million tasks queued to send out instead of the usual 11000?This can be due to connection issues, so the server keeps accumulating jobs but isn't able to establish connections to volunteers to hand them out. It can also be due to server memory problems, where it doesn't have the memory available to query the database fast enough to hand everything out in time. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I believe the server tries to keep at least that many tasks queued up to be ready to send out, but sometimes it increases if the server is struggling. |
Send message Joined: 14 Mar 09 Posts: 1 Credit: 11,717,564 RAC: 0 |
Thanks for all you do |
Send message Joined: 1 Jan 22 Posts: 1 Credit: 28,764,554 RAC: 16,071 |
Server not fixed yet? |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
There is a large number of workunits that are waiting to be validated. I think that this is because the server is running more slowly while it is degraded. When we get the drive back we can restore full functionality of the server. Hopefully that isn't too long, but in the meantime this big drop in RAC might just be how things are at the moment. |
©2024 Astroinformatics Group