Message boards :
Number crunching :
Aaargh! Server out of new work!
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 22 · Next
Author | Message |
---|---|
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Definitely! Rosetta has not had work either. I now have no tasks to do. Oh well. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 25 Feb 10 Posts: 49 Credit: 10,137,837 RAC: 0 |
Collatz probably went down due to the mass switching of people from MW and DNETC. |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
|
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
Dnetc will be offline for at least another day -- during their software update/upgrade one of the HD's on the RAID failed -- they are in rebuild mode. I hope they were at least in RAID 5 mode and not RAID 0 as THAT would be seriously ugly. Ideally folks running multi-drive RAID arrays are running controllers that handle RAID 5 plus a hot spare. The drives are not that expensive, but server class RAID 5 + hot spare controllers can be a bit pricey. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Wooohoo! I got a full MW cache. :) |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
|
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Yes, mine are now filling up since I suspended Collatz. Now need to work off the Collatz cache between Milkyway sessions. Go away, I was asleep |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Validator needs a kick. It's only validating wu's that are paired. All single wu's are not being validated. |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
Well it is zero again. The validater has "overheathed". Time to let my rigs to cool as well (and saving some energy costs). Greetings from, TJ |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
|
Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0 |
f.ck, again... I've got used to shutdown every week on weekends, but it's Tuesday only. common guys, you might be kidding me - one day of work and then one day off. Could you PLS fix the server??? |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Dnetc may be back and running by the end of the week. The dreaded 'software upgrade' -- stress tested their server and they had a 'mid upgrade' RAID drive failure as well as a memory module failure. They are pretty much in a full rebuild mode for now. That is good to hear :-) This place is just unreliable so what we need is more Boinc ATI projects!!!! Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
One thing that seems odd to me. The problem appears fairly straightforward (at least the symptoms are pretty obvious and *repetitive*). The workaround resolution (either stop/start processes or a full server stop/restart) also seems reasonably straightforward. Actually a couple of questions (although I realize that RPI folks rarely clock in over here) Why does it take so long (12 hours or more) to go from symptom to restart? Wouldn't it be possible to auotmate the stop/restart process and run it say every 48 hours? I figure since this problem has been going on for months, in addition to efforts to track down the root cause, efforts to implement a workaround would be 'resource appropriate' and would have been in place by now. |
Send message Joined: 22 Apr 09 Posts: 38 Credit: 27,377,932 RAC: 0 |
That sure would be nice as there are still very little projects using ATI GPU's MilkyWay and Collatz C. are the only known to use ATI GPU's. SETI@Home , also at SETI BÊTA, a usergroup The LUNATICS are testing ATI GPU's for AP computing, they already have some working app.'s. Here is the latest installer for FERMI GPU's. And you'll find one of the ATI GPU app.'s as well. Knight Who says Ni |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
One thing that seems odd to me. The problem appears fairly straightforward (at least the symptoms are pretty obvious and *repetitive*). The workaround resolution (either stop/start processes or a full server stop/restart) also seems reasonably straightforward. It looks like there is something going on at RPI. Can you remember the posting 'Screensaver coming soon' ? Or can you remember the project DNA@HOME ? Milkyway3 ? It should not be a problem to detect that the validator stops validating. And of course, they do detect that because they stop producing wu's. But we all miss the next step which should be a rework of the validator, be it hardware, software or setup or what else. Or at least a quick restart. It looks like nobody is responsible there. This is the best way to kill not only a project but the whole idea of distributed computing. Project responsibles should be serious in their handling of the project issues. Alexander |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,196 RAC: 276 |
Dnetc -- when running, also supports ATI GPU's -- they hope to be back up and running again later this week (they encountered something of the worst case scenario -- memory and hard drive failure while in the middle of a software upgrade). They are in recovery mode for now. I'm still sort of bemused by the informational (and response time) black hole we often encounter here. To a certain degree (good new/bad news I suppose) it seems that folks here have gotten acclimated to the non response (or delayed response) here. That sure would be nice as there are still very little projects using ATI GPU's |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Barry All the DNETC web pages have returned (Home, account and forums). The only bit missing ATM is new work (a few hours yet) and the servers accepting crunched work "waiting to report" Go away, I was asleep |
Send message Joined: 20 Aug 10 Posts: 10 Credit: 63,514,783 RAC: 0 |
It really should not be so hard to automaticly restart what is failling every day or 2 days or so. Most MMOs that i played over the years also have a daily downtime to prevent stuff like this. (totally different application, but i guess the problem is sort of the same). |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Memroy leak kills system every 2 to 3 days, therefore reboot every 1 to 2 days until the source of the memory leak is found. Simple really. So reboot every Monday, Wedensday and Friday morning. ps. I think the system is conspiring to slow down my obtainment of major milestone...in this case 100mill cobblers on MW. Last time it was 100mill overall. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Work available again Go away, I was asleep |
©2024 Astroinformatics Group