Message boards :
News :
Server Downtime 3/21 1PM EST
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0 |
May your completed WUs are still in the validation queue maybe, or maybe not... one can only guess |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
... I just heard a rumor (over the grapevine), that certain tasks are being deliberately deleted (of course before they are validated) ... Which is good, because that way the queue can be cleared of strange tasks from unhappy users ... So I am very worried, that soon (in a couple of weeks) all of mine will be gone ... Oh, dear ... To the rest of the crunchers: Have a nice day! |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
... I just heard a rumor (over the grapevine), that certain tasks are being deliberately deleted (of course before they are validated) ...Either your joke was really funny or I've had too much homebrew. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
... I just heard a rumor (over the grapevine), that certain tasks are being deliberately deleted (of course before they are validated) ... I can't imagine the blowback MW would get and Boinc as well if MW started deleting tasks just because someone bitched and moaned in a forum!! That alone could set Boinc back 10 years as far as the upper level crunchers with the 100+ cpu cores and the tip of the spear gpu's let alone here at MW. IOW I believe that this rumor is just BS and you can safely ignore it!! |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Either your joke was really funny or I've had too much homebrew. ... pass some of that homebrew over ... |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Mikey and Peter: ... that was a joke ... Enjoy life !! Please ... |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Are you quite quite sure? It's 23% alcohol. Not sure how legal it is. Not sure what the shipping cost would be (assuming you aren't in the UK)?Either your joke was really funny or I've had too much homebrew. EDIT: You're in Germany. It costs £6 per litre to send to you. Assuming your alcohol in the shops is of a similar price to the UK, you can get the equivalent for £6.50 in the shops. No, we don't use your Euro rubbish over here. Freedom! |
Send message Joined: 24 Dec 07 Posts: 33 Credit: 1,923,330,147 RAC: 3,133 |
No admin has the time to intentionally delete WUs of specific users. And Max_Pirx need not worry. He simply needs to look at his stats at one of the stats sites, like BOINCStats, and see he is indeed earning cobblestones for this project. https://www.boincstats.com/stats/61/user/detail/1307417/lastDays |
Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 0 |
The following thread from Universe@home says their project would have failed with HDDs instead of their SSDs. Perhaps MilkyWay@home's server should switch (at least partially) to SSDs? https://universeathome.pl/universe/forum_thread.php?id=627 The thread also explains that people are coming from the paused WCG project. I am one of those (sorry!). I have put my CPUs towards the "N-Body Simulation" here. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
The following thread from Universe@home says their project would have failed with HDDs instead of their SSDs. Perhaps MilkyWay@home's server should switch (at least partially) to SSDs?The SSD was invented a decade ago, I can't believe anyone is still using hard disks for servers! |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,225 RAC: 0 |
The following thread from Universe@home says their project would have failed with HDDs instead of their SSDs. Perhaps MilkyWay@home's server should switch (at least partially) to SSDs?The SSD was invented a decade ago, I can't believe anyone is still using hard disks for servers! I can think of a reason or 2. Big data sets being one of them. Also cost being the other reason. HDD have one of the lowest cost per TB stored compared to SSDs. Also if you're using consumer SSDs their rated endurance is quite low in a server environment. eg 960GB WD Enterprise SSD: https://www.newegg.com/western-digital-gold-960gb/p/20-250-139 vs a 1TB consumer SSD https://www.newegg.com/western-digital-1tb-black-sn850-nvme/p/N82E16820250161 The 1 TB consumer SSD has endurance of 600TBW while the enterprise drive is 1.4PBW If the server is running ZFS, it would be a good idea to utilise L2ARC on a SSD it'll increase performance without too much cost |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I can think of a reason or 2.Yeah, 5 times cheaper and 50 times slower. Not a reasonable choice. Also if you're using consumer SSDs their rated endurance is quite low in a server environment. eg 960GB WD Enterprise SSD: https://www.newegg.com/western-digital-gold-960gb/p/20-250-139 vs a 1TB consumer SSD https://www.newegg.com/western-digital-1tb-black-sn850-nvme/p/N82E16820250161Enterprise SSDs are not much more expensive. Rosetta is on them, Universe (a small low budget project) is on them, Sidock (another small low budget project) is on them. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I can think of a reason or 2.Yeah, 5 times cheaper and 50 times slower. Not a reasonable choice. Also if you're using consumer SSDs their rated endurance is quite low in a server environment. eg 960GB WD Enterprise SSD: https://www.newegg.com/western-digital-gold-960gb/p/20-250-139 vs a 1TB consumer SSD https://www.newegg.com/western-digital-1tb-black-sn850-nvme/p/N82E16820250161Enterprise SSDs are not much more expensive. Rosetta is on them, Universe (a small low budget project) is on them, Sidock (another small low budget project) is on them. By the time the SSDs wear out, there will be ones 5 times better available anyway. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hey all, I'm going to address a bunch of recent comments: am guessing you'll keep server stats(*) exports on a 12 hourly export while the server is running a bit behind? I haven't made any changes to the frequency of the status page updates. I think that some of the DB queries that need to be done to update that page take a long time when there are a lot of stale WUs in the system, like when the transitioner backlog is large. This can impact the frequency that the page updates, especially when I'm killing slow DB tasks (which might be related to updating that page). Come on Tom, please admit what century this server was made in. Rebuilds should be a couple of hours. List some specs, I dare you. Looks like 4 SCSI SMC3108 HDDs that are running in RAID 5. The reason that the rebuild is taking so long is because the DB is constantly updating. The current server hardware was purchases 3 years ago, I believe. But I want to know why Tom is using outdated equipment and why he won't reveal what the specs are. A server which is this slow is absurd. I'm not hiding some conspiracy, and I'm not embarrassed about the hardware specs. I don't have control over what hardware is on the server, although I am able to relay problems to my supervisor. I have been working closely with them regarding this most recent drive failure. Well, it's the other way around (more or less) - we (at least some of us) are spending our money/resources to help with their research and it's only fair to get the best possible use of our resources and not waste our money. Thank you very much for your volunteering! We certainly appreciate it. Your time and effort are not going to waste - recently we just published work that came from the Nbody application, and we are working hard on publishing recent results from the Separation application. These things just take a lot of time and effort on our end that can't really be crowd-sourced. Regarding the updates, I tend to make them somewhat vague because I figure that people don't want to read tech jargon. I can be more specific if people would prefer that. In the past I've given overviews and then gone in-depth for some topics, so I can try to do that more as well. Right now we are just hoping that when the drive rebuilds and the current backlog of tasks clears, that we will return to service "as normal". However, if that is not the case (it's taking a long time for things to get back to normal), then we will have to figure out what is causing these problems. I'm not sure the exact cause for a lot of it - I figured that it was related to the drive failure - but it could be something else. If and when we decide to take more action, I will communicate that with you. I'm not too upset, I can just turn Einstein/Universe on aswell to keep everything doing something. But I agree that communication is nice. I'd love to know the specs of the server (Rosetta has them openly displayed on the webpage), what device failed, how much it would cost to buy something better, and if we should perhaps have a whip-round for some cash to get it. Clearly this server is just on the limit of managing, and with one disk missing it collapses in a heap. Faster hardware would make everything run smoothly all the time. It's disappointing that the server appears to be fragile enough that this drive failure caused such a large issue. We are trying to discuss ways of avoiding this in the future, but it is not a conversation that has a fast turnaround time. But where is the science output from the completed tasks in the last 3-4 weeks? A single separation run takes several months to complete, and then the data has to get analyzed, sometimes more runs need to go up, and then we have to write the publication, get peer reviewed, put out a press release, and all while teaching/taking classes and also working on other research projects. The point is, 3-4 weeks is not a reasonable amount of time to expect scientific feedback. We do have someone working on updating the science pages, because those are very behind. I'm not sure how far along that process is though. According to the server stats, nearly 4 million tasks got validated recently (at least they disappeared from the validation pending stock), but did not materialize as valid tasks with credit... where are they? Did you end up getting the credit? I sure hope so, otherwise I would want to try to figure something out to make sure that you all got the credit you deserve. Enterprise SSDs are not much more expensive. Rosetta is on them, Universe (a small low budget project) is on them, Sidock (another small low budget project) is on them. We're looking into purchasing SSDs, although it would still be several thousand dollars to upgrade. No guarantee that this will go anywhere, though. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I'd like to remind everyone that I'm not in charge of this project, I don't control funding or hardware, or even do a substantial amount of programming for the project. I'm just a grad student who works on the project, but my thesis isn't actually on MilkyWay@home Separation. I just happen to be the most public-facing person in the group, so it seems that I'm in charge of a lot more than I actually am. I come from a physics background, and while I have IT/programming experience, I am not adept at dealing with a lot of the server bugs. Additionally, I am only able to dedicate ~10% or less of my time to the server and the project, so I apologize when it takes work a long time to get done. In my opinion we need to hire a graduate student or other staff member to work on this project full time. That's what the volunteers deserve. Most of the time I'm just trying to keep the server working so that the science can still get done. I don't want this to sound like I'm making excuses! I just want people to remember that I'm not trying to hide anything, defraud you or your time, or have any insincere goals. I'm just another overworked grad student trying to do what they can to help things move along. :) Apologies for the long stint of problems that we've had lately, and I'll try to be more communicative about specifics moving forward so that we can work on these issues together. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Thanks Tom for the infos! ... What do you mean by this? You probably won't switch to SSDs? Well, you can count me in on donating! |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Thanks for that explanation, really appreciated. |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,910,125 RAC: 13 |
Would shutting the project down for a day (or however long it takes) help speed up the rebuilding process? Seems to me you can just stop sending out work, collect all the work currently out, and rebuild the new hard drive much faster than trying to keep things going while it rebuilds. Even going as far as taking the project/server completely offline once all outstanding work has been returned to let the server rebuild itself without the DB constantly changing. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Tom: You are doing fine. Thanks for your time and efforts. We (well, at least I) appreciate it. But it is good, that you reminded the crunchers how the situation actually is! And what your part is and what you have to deal with. As we say here: You are sitting between two chairs. Cheers - |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I'd like to remind everyone that I'm not in charge of this project, I don't control funding or hardware, or even do a substantial amount of programming for the project. I'm just a grad student who works on the project, but my thesis isn't actually on MilkyWay@home Separation. I just happen to be the most public-facing person in the group, so it seems that I'm in charge of a lot more than I actually am.Ok, thanks for letting us know, I thought you were the head guy, it does say "Project administrator, Project developer, Project tester, Project scientist" against your username! Several thousand sounds a lot for SSDs. What's the storage capacity of the RAID set? |
©2024 Astroinformatics Group