Welcome to MilkyWay@home

Server Trouble

Message boards : News : Server Trouble
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72040 - Posted: 18 Mar 2022, 19:05:43 UTC - in response to Message 72039.  
Last modified: 18 Mar 2022, 19:16:50 UTC

Was it a mirror/RAID and having one less drive lowered the read speed sufficiently to cause this problem?
The server is in RAID, although I don't remember the actual RAID setup. Losing this drive meant that the server was constantly rebuilding the data from the missing drive off of the working disks, which ate up a lot of memory overhead.
I didn't realise there was extra work from doing that, I guess there's some processing to be done to create the data from parity. This means it must be RAID and not just a mirror. The servers I've run have never been ones that do heavy processing and data storage on the same system, so I never had this problem. I'm guessing you don't have the cash to have several servers!

You mention memory overhead, maybe a memory upgrade is also in order.

Why wasn't there a spare? Drives are not that expensive.
I guess this drive was the spare, since the server is in RAID. We are now considering purchasing more drives and running in a higher RAID level (thanks to recent donations!)
Changing from RAID 5 to RAID 6 for example means that 2 drives can fail before armageddon. However you can also have a "spare" drive that's sat there not even powered up, but if one fails, the server automatically powers that up and uses it, as though you'd plugged it in as a replacement.

Glad to hear the donations worked well. That really should be on the home page, I never knew it existed.

And are they SSDs or HDDs?
I believe they are HDDs. However, I could definitely be wrong about that. I didn't build the server, so I'm not super aware of what hardware it has besides what I can gather from the command line.
You can't interrogate the hardware that way? If they're not SSD, no wonder things are slow. How big are they? SSD is pretty cheap now, around 20 cents a GB for server grade SSDs or NVMEs.
ID: 72040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nairb

Send message
Joined: 17 Feb 09
Posts: 24
Credit: 3,430,768
RAC: 58
Message 72041 - Posted: 18 Mar 2022, 19:27:44 UTC

The server page is all green and the feeder is green but 2 of my machines still get'

18/03/2022 19:24:18 | Milkyway@Home | Server error: feeder not running

Is it sit and wait time?.
ID: 72041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72042 - Posted: 18 Mar 2022, 19:28:06 UTC - in response to Message 72040.  

I plan on putting it on the home page at some point, I have no idea why the donation page isn't there in the first place!

Also, I did end up finding a way to identify HDD vs SSD without installing additional tools, and the server drives are HDDs.
ID: 72042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72043 - Posted: 18 Mar 2022, 19:29:36 UTC

Also, yes it is sit and wait time. The new drive is plugged in, but the server will have to recreate the faulty drive from parity, and that will take a little time.
ID: 72043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72044 - Posted: 18 Mar 2022, 19:32:46 UTC - in response to Message 72042.  

I plan on putting it on the home page at some point, I have no idea why the donation page isn't there in the first place!

Also, I did end up finding a way to identify HDD vs SSD without installing additional tools, and the server drives are HDDs.
If the donations are enough, I would suggest at some point more memory and SSDs would cheer it up. Looks like it was only just managing before the disk failed?
ID: 72044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimbocous
Avatar

Send message
Joined: 7 Mar 20
Posts: 22
Credit: 104,728,608
RAC: 13,041
Message 72045 - Posted: 18 Mar 2022, 19:35:34 UTC

Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span.
ID: 72045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72046 - Posted: 18 Mar 2022, 19:50:54 UTC - in response to Message 72045.  
Last modified: 18 Mar 2022, 19:56:54 UTC

Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span.
Clearly with this server the HDDs were only just managing if one dying made it this bad.

The life span may be less on SSDs, but it's more measurable, they tell you exactly when they're going to wear out.

It all depends on the capacity required, the speed required, and the funding available.

Universe@Home and SIDock@home recently changed over to SSD. Rosetta has had SSDs for a while - 72 of them!!!
ID: 72046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom George
Avatar

Send message
Joined: 29 Dec 21
Posts: 7
Credit: 8,995,805
RAC: 0
Message 72048 - Posted: 18 Mar 2022, 21:13:09 UTC - in response to Message 72045.  
Last modified: 18 Mar 2022, 21:14:16 UTC

Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span.


I agree with you... Maybe some SSD for cache drives (depending on if you're using a SAN or not) but I would devote the storage to HDD.
ID: 72048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GolfSierra

Send message
Joined: 11 Mar 22
Posts: 42
Credit: 21,902,543
RAC: 0
Message 72049 - Posted: 18 Mar 2022, 21:25:56 UTC

Just received another bunch of WUs and the completed ones were uploaded.

We're making progress here!

Thanks, Tom!
ID: 72049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72051 - Posted: 18 Mar 2022, 21:41:06 UTC - in response to Message 72048.  

Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span.


I agree with you... Maybe some SSD for cache drives (depending on if you're using a SAN or not) but I would devote the storage to HDD.
You have to if you have a lot of access. Mechanical stuff don't cut it no more.

Rosetta:
Primary file server:
72 x 1TB SSD via LSI SAS 9207-8i
ID: 72051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,034,665
RAC: 86,706
Message 72052 - Posted: 19 Mar 2022, 0:10:36 UTC

Still waiting for all the work today to start validating.
ID: 72052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
poppinfresh99

Send message
Joined: 28 Feb 22
Posts: 16
Credit: 2,400,538
RAC: 0
Message 72053 - Posted: 19 Mar 2022, 1:14:20 UTC - in response to Message 72024.  

Independant of what?

The projects listed as independent at the "Choosing BOINC projects" page do not have a university or corporate sponsor. While it may seem that supporting non-sponsored projects is "helping the underdog", corporations and universities have standards for both the research and the experts employed to do the research. This mostly guarantees that your time is not being wasted on a project that has either or both of (1) bad code or analysis or (2) code that could be sped up over 50 times (Collatz project for example). Also, sponsored projects have a means of using or publishing their results, whereas the results of independent projects may be gone when their project's website is deleted some years later.
ID: 72053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cavalary
Avatar

Send message
Joined: 23 Aug 11
Posts: 33
Credit: 11,062,253
RAC: 0
Message 72054 - Posted: 19 Mar 2022, 1:59:18 UTC

Woop, could report completed tasks at least. Thank you for the effort put into it!
Still not receiving new work yet though, says GPU work is available but for CPU not getting anything.
ID: 72054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72055 - Posted: 19 Mar 2022, 8:28:34 UTC

Ah, all work gone back and new stuff come out and half a million less queued on the server to validate, looks like it's catching up well.
ID: 72055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72056 - Posted: 19 Mar 2022, 8:29:39 UTC - in response to Message 72053.  

Independant of what?

The projects listed as independent at the "Choosing BOINC projects" page do not have a university or corporate sponsor. While it may seem that supporting non-sponsored projects is "helping the underdog", corporations and universities have standards for both the research and the experts employed to do the research. This mostly guarantees that your time is not being wasted on a project that has either or both of (1) bad code or analysis or (2) code that could be sped up over 50 times (Collatz project for example). Also, sponsored projects have a means of using or publishing their results, whereas the results of independent projects may be gone when their project's website is deleted some years later.
I've heard about Collatz, yet nobody has produced this "better alternative" to the calculations.
ID: 72056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72057 - Posted: 19 Mar 2022, 9:07:52 UTC - in response to Message 72046.  

Maybe just me, but if I were putting together a server and speed isn't the primary consideration I'd be using HDs rather than SSDs just due to the disparity in expected life span.
Clearly with this server the HDDs were only just managing if one dying made it this bad.

The life span may be less on SSDs, but it's more measurable, they tell you exactly when they're going to wear out.

It all depends on the capacity required, the speed required, and the funding available.

Universe@Home and SIDock@home recently changed over to SSD. Rosetta has had SSDs for a while - 72 of them!!!


Reading newer articles on "SSD vs. HDD" lifespans is quite interesting. Things have changed in the past years.

Repairing a HDD is expensive and takes time (usually) - and you loose data on it, which is "ok" if you have a "good" RAID-system.

But I would say, either way (SSD or HDD), the key is to have enough(!) spares in/for an excellently designed RAID-setup.
So you just have to pull the disk and replace it.
Or let RAID switch by itself to an "online" spare (if present and please more than one).
The recovery will be done through the RAID-system.

This has all been said in previous posts.

It is just a matter of "cash".
So let's donate ...
ID: 72057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,492
RAC: 22,331
Message 72058 - Posted: 19 Mar 2022, 10:27:17 UTC - in response to Message 72057.  

.

But I would say, either way (SSD or HDD), the key is to have enough(!) spares in/for an excellently designed RAID-setup.
So you just have to pull the disk and replace it.
Or let RAID switch by itself to an "online" spare (if present and please more than one).
The recovery will be done through the RAID-system.

This has all been said in previous posts.

It is just a matter of "cash".
So let's donate ...


And "Time" to repair the drives as Tom mentioned as well which can't be donated. Unless the "online spare" can keep itself up to date automatically it still takes "Time" to bring it into the fold and be ready for use no matter how fast or big it is.

Now yes a different Server setup with one for tasks being sent out and one for tasks being returned should speed up the Project and that CAN be done thru donations!! BUT and yes there is always one it depends on the amount of space etc allotted for the MilkyWay Servers etc, remember Seti had size limitations because they were relegated to a "small closet" before they shut down.
ID: 72058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Max_Pirx

Send message
Joined: 13 Dec 17
Posts: 46
Credit: 2,421,362,376
RAC: 0
Message 72060 - Posted: 19 Mar 2022, 10:33:53 UTC

Still, no regular supply of WUs for me despite the server status is all green and plenty of 'available' WUs to send. Verry intermittent supply of a few tens of WUs to a couple of hosts. I was hoping that by now the server should have been sorted. It's getting to the point of being really annoying and thinking about moving to a different project all together.
ID: 72060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72062 - Posted: 19 Mar 2022, 10:41:25 UTC - in response to Message 72060.  

Max:
I solved it (which you probably already tried without success) successfully by setting the project to
no new tasks
supsend
exit boinc
wait at least 3 minutes
restart boinc
set new tasks allowed
resume
Maybe it will help your case ....
ID: 72062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72063 - Posted: 19 Mar 2022, 10:45:45 UTC - in response to Message 72058.  

mikey:
time is stretchable "thing".
Sometimes it can mean

several minutes or

an hour only or maybe

days or even

weeks.

I wonder how long it took to repair this HDD?
ID: 72063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

Message boards : News : Server Trouble

©2024 Astroinformatics Group