Welcome to MilkyWay@home

Server Trouble

Message boards : News : Server Trouble
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

AuthorMessage
Max_Pirx

Send message
Joined: 13 Dec 17
Posts: 46
Credit: 2,421,362,376
RAC: 0
Message 72064 - Posted: 19 Mar 2022, 10:47:24 UTC

Well, I did try all sorts of 'gymnastics' (maybe not in that particular sequence, but definitely suspend, restart, etc... all that). This is just ridiculous, there shouldn't be any need for that sort of babysitting. The admins should sit and spend some time to sort the server issues completely and not do a piecemeal job of scratching here and there and hoping that things will sort themselves out.
ID: 72064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72065 - Posted: 19 Mar 2022, 11:05:34 UTC - in response to Message 72064.  

Well, I did try all sorts of 'gymnastics' (maybe not in that particular sequence, but definitely suspend, restart, etc... all that). This is just ridiculous, there shouldn't be any need for that sort of babysitting. The admins should sit and spend some time to sort the server issues completely and not do a piecemeal job of scratching here and there and hoping that things will sort themselves out.
I don't see why you're getting so upset about it. Join more than one project then you won't even care when one isn't available. It's not like they're supplying your wages or food.
ID: 72065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72066 - Posted: 19 Mar 2022, 11:07:44 UTC - in response to Message 72058.  

.

But I would say, either way (SSD or HDD), the key is to have enough(!) spares in/for an excellently designed RAID-setup.
So you just have to pull the disk and replace it.
Or let RAID switch by itself to an "online" spare (if present and please more than one).
The recovery will be done through the RAID-system.

This has all been said in previous posts.

It is just a matter of "cash".
So let's donate ...


And "Time" to repair the drives as Tom mentioned as well which can't be donated. Unless the "online spare" can keep itself up to date automatically it still takes "Time" to bring it into the fold and be ready for use no matter how fast or big it is.
It takes no time at all if there are enough disks. If the system isn't loaded to the max, one or two disks broken in a RAID don't cause a problem. And when you put another in, it rebuilds it in the background without anyone noticing. I used to run a server with RAID 6, two could fail without problem. As soon as one failed, I just put another one in. Didn't even have to touch the keyboard or reboot or anything. Slide one out, slide one in.
ID: 72066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72067 - Posted: 19 Mar 2022, 11:54:44 UTC - in response to Message 72066.  

Peter:
+1
ID: 72067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72068 - Posted: 19 Mar 2022, 11:59:05 UTC - in response to Message 72064.  

Max:
Well, ... (maybe not in that particular sequence, but definitely suspend, restart, etc... all that). ...


... the sequence does matter - at least in my cases.

and it has nothing to do with still "unsolved" server issues ...

Just let it sit a while (usually max 20 minutes) and you'll get new WUs.
I know, it could all be better (like at EatH), but that is life ...

Relax, have a beer or two on me !
Have a great Sunday.
ID: 72068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72069 - Posted: 19 Mar 2022, 12:07:42 UTC - in response to Message 72068.  

Max:
Well, ... (maybe not in that particular sequence, but definitely suspend, restart, etc... all that). ...


... the sequence does matter - at least in my cases.

and it has nothing to do with still "unsolved" server issues ...

Just let it sit a while (usually max 20 minutes) and you'll get new WUs.
I know, it could all be better (like at EatH), but that is life ...

Relax, have a beer or two on me !
Have a great Sunday.
Where is it Sunday?

I just press update on the project and get some. That removes any 3 hour backoff created by my client.
ID: 72069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72070 - Posted: 19 Mar 2022, 12:24:08 UTC - in response to Message 72069.  

Peter:

...
Have a great Sunday.
Where is it Sunday?

I just press update on the project and get some. That removes any 3 hour backoff created by my client.[/quote]

Well tomorrow is Sunday.

When I press update nothing happens ...
The backoff time first goes up with each try and after several more repeats it goes down around to 1:30 minutes and that is it.
Nothing is loaded, so I have to do some fiddeling .... (as mentioned before).

Ok, so I'll just say "have a nice day".
Cheers
ID: 72070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 72071 - Posted: 19 Mar 2022, 16:53:23 UTC

I'm not getting any workunits either... but I notice on the status page there are 3,591,958 workunits to waiting for validation right now. It may be trying to churn through these before it starts handing out new work. It's down by about a million since the new drive was installed, so, if I'm right, it could be a few more days before new units start going out.

I don't mind though... my GPU is just churning through my backup project. :)
ID: 72071 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72072 - Posted: 19 Mar 2022, 16:58:46 UTC - in response to Message 72071.  

It dropped to that a while ago then stopped, and was sending out new work too. Maybe he's paused things to let the disk rebuild get done? I'm thinking these are hamster powered disks he's using.

I tried a couple of other GPU projects and some projects are terrible. SRBase only uses your first GPU. Numberfields doesn't work on 280X cards. Both of these keep giving me GPU work after I told them not to. So I keep aborting them until the server learns it's lesson!
ID: 72072 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 72073 - Posted: 19 Mar 2022, 18:17:00 UTC
Last modified: 19 Mar 2022, 19:08:38 UTC

Hmm, I see that the status page hasn't updated in a few hours. Normally it updates once or twice an hour...

Disk rebuild times obviously vary depending on the disk, but I've seen as low as an hour or two for a 300GB 10k rpm drive, to two days for 4TB 720rpm. No idea what MW's disk specs are. The validation queue was consistently going up until Tom announced he'd replaced the drive, so the rebuild was probably quick. I get the feeling the validation queue is lower now, but the page simply isn't updating for some reason.

As far as other GPU projects go, I've been pretty happy with MLC@Home. Amicable Numbers also worked well, but that requires a ton of system RAM.

EDIT: Got exactly one new task, just now (19:06 UTC). So it's working, just light-years (heh heh) behind. Lots for it to still catch up on. EDIT 2: And 299 more 90 seconds later! So it's kind of working, here and there.
ID: 72073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luciferius Infernalis Vel Tohu

Send message
Joined: 3 May 18
Posts: 7
Credit: 45,954
RAC: 0
Message 72074 - Posted: 19 Mar 2022, 19:20:02 UTC

Hallo together, hallo Tom, how long make your server trouble???? I have many works done for this group,but the points I don't get. The last work I don't have get my points for work, because by the server crash!!!

Please repair the server and give me my workpoints.!!!Thanks Tom.....!!!
[color=red]
ID: 72074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 72075 - Posted: 19 Mar 2022, 19:26:59 UTC - in response to Message 72074.  

Hallo together, hallo Tom, how long make your server trouble???? I have many works done for this group,but the points I don't get. The last work I don't have get my points for work, because by the server crash!!!

Please repair the server and give me my workpoints.!!!Thanks Tom.....!!!

Please check out the Server Status page. Right now it says "Workunits waiting for validation: 3423883". This is why you don't have your points yet. This number is slowly decreasing now, so you will get your points within the next few days.
ID: 72075 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luciferius Infernalis Vel Tohu

Send message
Joined: 3 May 18
Posts: 7
Credit: 45,954
RAC: 0
Message 72076 - Posted: 19 Mar 2022, 19:44:12 UTC - in response to Message 72075.  

Hallo Tom,

Thank you very much!!! I love the work for the universe! It is fantastic work and I learn much about the cosmos!

ID: 72076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72077 - Posted: 19 Mar 2022, 19:44:13 UTC - in response to Message 72073.  

Hmm, I see that the status page hasn't updated in a few hours. Normally it updates once or twice an hour...

Disk rebuild times obviously vary depending on the disk, but I've seen as low as an hour or two for a 300GB 10k rpm drive, to two days for 4TB 720rpm.
My main desktop's data drive is a 4TB 7200rpm (TV and security camera and software installers) is 3/4 full and that only takes 5.5 hours to backup, so should be similar for a rebuild. I've never run a server with a 7200 drive! But if he's still running validations and serving us at the same time, the rebuild will be slower.

No idea what MW's disk specs are.
Tom seems embarrassed to say! He really ought to get some SSDs.

As far as other GPU projects go, I've been pretty happy with MLC@Home. Amicable Numbers also worked well, but that requires a ton of system RAM.
I love this one because it's the only double precision one, and I have cards very good at that, I bought them for MW on purpose, since I like the science topic.

EDIT: Got exactly one new task, just now (19:06 UTC). So it's working, just light-years (heh heh) behind. Lots for it to still catch up on. EDIT 2: And 299 more 90 seconds later! So it's kind of working, here and there.
If I leave my computers alone, sometimes I spot they've got a full batch of 300 per GPU. If I pester them I get nothing. I've acquired five R9 280X cards now. I love it when they reduce the price from £130 to £50 because they "don't work". No display output? Don't care! Some of the VRAM broken, won't run big programs like Einstein, but MW is ok!
ID: 72077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72079 - Posted: 19 Mar 2022, 21:11:26 UTC - in response to Message 72074.  

L-I-V-T:
RELAX ...
ID: 72079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72080 - Posted: 19 Mar 2022, 21:15:23 UTC - in response to Message 72079.  

L-I-V-T:
RELAX ...
An Infernal Lucifer cannot relax.
ID: 72080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,026,987
RAC: 38,352
Message 72081 - Posted: 20 Mar 2022, 3:14:15 UTC - in response to Message 72080.  

UH OH! trouble in paradise again..

3/19/2022 10:09:24 PM | Milkyway@Home | Reporting 2 completed tasks
3/19/2022 10:09:24 PM | Milkyway@Home | Requesting new tasks for CPU
3/19/2022 10:09:46 PM | Milkyway@Home | Scheduler request failed: Failure when receiving data from the peer
3/19/2022 10:09:47 PM | | Project communication failed: attempting access to reference site
3/19/2022 10:09:48 PM | | Internet access OK - project servers may be temporarily down.
ID: 72081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 72082 - Posted: 20 Mar 2022, 3:48:46 UTC - in response to Message 72077.  

My main desktop's data drive is a 4TB 7200rpm (TV and security camera and software installers) is 3/4 full and that only takes 5.5 hours to backup, so should be similar for a rebuild.

Rebuild times don't match the transfer rate of the drives, especially when rebuilding from parity. I've heard stories of multiple-day rebuilds of very large drives. (Larger than MW probably has or needs.)

I've never run a server with a 7200 drive!

It's actually more common than you think... or at least it was when I worked on that stuff several years back. Most commonly, they were the "capacity" tier of hybrid SAN arrays. One array I maintained had a full 3U tray of 3TB 7.2k disks, and rebuilding one of them once took over a day. They still use 7.2k in file servers, too - heck, AWS even offers them for cloud file servers, if you were to set one up.

But if he's still running validations and serving us at the same time, the rebuild will be slower.

Considering the validations weren't going at all when the failed drive was removed, I'm inclined to think the rebuild finished already. (Unless the validation pace picks up massively in the next several hours, in which case, I guess not!)

Tom seems embarrassed to say! He really ought to get some SSDs.

I doubt MW has the money. I mean, just a few years ago they were begging for money just to keep the research team going. I don't think they have funding for new hardware.

I love this one because it's the only double precision one, and I have cards very good at that, I bought them for MW on purpose, since I like the science topic.

Yeah, I have a Titan V myself... it's a pretty good performer for normal FP32 things, but so are cards a fifth of the price! So for other projects it's at least pretty productive, but still a waste overall.

If I leave my computers alone, sometimes I spot they've got a full batch of 300 per GPU. If I pester them I get nothing.

That's funny... I only got 300 tasks today by pestering it. Otherwise, getting nothing. :D

I've acquired five R9 280X cards now. I love it when they reduce the price from £130 to £50 because they "don't work". No display output? Don't care! Some of the VRAM broken, won't run big programs like Einstein, but MW is ok!

Nice :D the 280X is a pretty good FP64 card, especially at those prices.
ID: 72082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72083 - Posted: 20 Mar 2022, 3:49:54 UTC

Validation queue is increasing again. Dare I say there is a secondary problem?
ID: 72083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,142,956
RAC: 2
Message 72084 - Posted: 20 Mar 2022, 4:02:27 UTC - in response to Message 72082.  

Rebuild times don't match the transfer rate of the drives, especially when rebuilding from parity. I've heard stories of multiple-day rebuilds of very large drives. (Larger than MW probably has or needs.)
In my experience it goes as fast as the drive, but I did build overpowered servers so there was plenty CPU time and drive speed available. Having a $35K budget did help. I obviously made more than one with that, but it all fitted in one cabinet. After I'd released a colleague I locked in it for a laugh.

It's actually more common than you think... or at least it was when I worked on that stuff several years back. Most commonly, they were the "capacity" tier of hybrid SAN arrays. One array I maintained had a full 3U tray of 3TB 7.2k disks, and rebuilding one of them once took over a day. They still use 7.2k in file servers, too - heck, AWS even offers them for cloud file servers, if you were to set one up.
At the time I didn't need an enormous capacity, so I chose speed. They got bigger after a couple of years when the originals started failing and larger ones were cheaper. You'd think enterprise drives would be reliable.... I got them replaced under warranty but didn't wait for the replacements, I bought larger ones then sold the replacements. They would be refurbished drives and I didn't want that crap. I wonder if MW bought them from me on Ebay?

Considering the validations weren't going at all when the failed drive was removed, I'm inclined to think the rebuild finished already. (Unless the validation pace picks up massively in the next several hours, in which case, I guess not!)
It's going backwards now.

I doubt MW has the money. I mean, just a few years ago they were begging for money just to keep the research team going. I don't think they have funding for new hardware.
A disk is peanuts compared to the staff wages. He did say recent donations had been good, and he's going to put it on the homepage.

Yeah, I have a Titan V myself... it's a pretty good performer for normal FP32 things, but so are cards a fifth of the price! So for other projects it's at least pretty productive, but still a waste overall.
Only time I'll buy a Nvidia is if it's primarily for gaming, or if there are no DP projects. I detest this shrinking of DP speed. But the Nvidia I was soon going to buy has gone from £800 to £1200! Bitcoins causing a shortage? Surely that's been going on for years now?

That's funny... I only got 300 tasks today by pestering it. Otherwise, getting nothing. :D
I think it's just luck. With 7 computers asking, one of them will notice. When I see loads appear on the screen, I tell the rest to ask. That's why there are none left for you :-P

Nice :D the 280X is a pretty good FP64 card, especially at those prices.
The 7970 is pretty much identical (5% slower) and much cheaper. I find those sometimes.
ID: 72084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

Message boards : News : Server Trouble

©2024 Astroinformatics Group