Welcome to MilkyWay@home

Server Crash November 10

Message boards : Number crunching : Server Crash November 10
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 33222 - Posted: 12 Nov 2009, 1:05:32 UTC
Last modified: 12 Nov 2009, 1:06:50 UTC

Something is up with Collatz, as the Home page cannot be displayed.

Servers over there down because of the MW crunchers transferring, perhaps?

As exemplified by the current /BOINC Manager/Messages tab/ -

12/11/2009 00:55:27 Collatz Conjecture update requested by user
12/11/2009 00:55:29 Collatz Conjecture Sending scheduler request: Requested by user.
12/11/2009 00:55:29 Collatz Conjecture Reporting 1 completed tasks, not requesting new tasks
12/11/2009 00:55:51 Project communication failed: attempting access to reference site
12/11/2009 00:55:52 Internet access OK - project servers may be temporarily down.
12/11/2009 00:55:54 Collatz Conjecture Scheduler request failed: Failure when receiving data from the peer
Go away, I was asleep


ID: 33222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dan T. Morris
Avatar

Send message
Joined: 17 Mar 08
Posts: 165
Credit: 410,228,216
RAC: 0
Message 33223 - Posted: 12 Nov 2009, 1:47:13 UTC
Last modified: 12 Nov 2009, 1:48:03 UTC

Yup we broke it. Darn it all. I guess I will have to resort to Prime grid and Rosetta. Hey what the heck if I am going to have bad credits I just as well pick the worst ones[in credits]...smile...

DD,
ID: 33223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 33224 - Posted: 12 Nov 2009, 2:58:35 UTC
Last modified: 12 Nov 2009, 2:59:22 UTC

You could give RSA Lattice Siever a shot, 50% WU's report as invalid, and no message boards to report the problems.
.
ID: 33224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zanth
Avatar

Send message
Joined: 18 Feb 09
Posts: 158
Credit: 110,699,054
RAC: 0
Message 33225 - Posted: 12 Nov 2009, 4:56:30 UTC
Last modified: 12 Nov 2009, 4:57:47 UTC

Distributed.net also has a beta ATI client that seems to work pretty well. Unlike F@H it works on ATI HD5xxx cards with no stupid workarounds that yield less than stellar results. :P Off to Collatz with me tho til this gets sorted then I'll be back full force. :) Gotta keep the FLOPS up so I can make the top 100 list when it gets updated. :D
ID: 33225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Terry Stick

Send message
Joined: 5 Sep 09
Posts: 3
Credit: 267,349
RAC: 0
Message 33226 - Posted: 12 Nov 2009, 6:42:27 UTC - in response to Message 33225.  

Travis:
I read in the forums recently that you had a number of hard drives damaged due to construction around campus. To reduce the damage to the hard drives, have you looked at shock-mounted external hard drives or shock mounted hard drive internal cases?
See link:
http://www.quietpcusa.com/Smart-Damper-Hard-Disk-Drive-Antivibration-Mount-P525C3.aspx

Anti-vibration hard drive cases will protect your new hard drives on order.
ID: 33226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 33227 - Posted: 12 Nov 2009, 6:55:33 UTC - in response to Message 33226.  

Travis:
I read in the forums recently that you had a number of hard drives damaged due to construction around campus. To reduce the damage to the hard drives, have you looked at shock-mounted external hard drives or shock mounted hard drive internal cases?

Anti-vibration hard drive cases will protect your new hard drives on order.

How about sitting the whole server on a shock absorbing foam pad?
ID: 33227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 33229 - Posted: 12 Nov 2009, 7:55:04 UTC - in response to Message 33227.  

Travis:
I read in the forums recently that you had a number of hard drives damaged due to construction around campus. To reduce the damage to the hard drives, have you looked at shock-mounted external hard drives or shock mounted hard drive internal cases?

Anti-vibration hard drive cases will protect your new hard drives on order.

How about sitting the whole server on a shock absorbing foam pad?

How about putting the whole thing on a cruise liner and sailing it down to the doldrums :)


ID: 33229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 11 Sep 08
Posts: 22
Credit: 9,081,761
RAC: 0
Message 33231 - Posted: 12 Nov 2009, 11:40:32 UTC - in response to Message 33222.  

Something is up with Collatz, as the Home page cannot be displayed.

Servers over there down because of the MW crunchers transferring, perhaps?

As exemplified by the current /BOINC Manager/Messages tab/ -

12/11/2009 00:55:27 Collatz Conjecture update requested by user
12/11/2009 00:55:29 Collatz Conjecture Sending scheduler request: Requested by user.
12/11/2009 00:55:29 Collatz Conjecture Reporting 1 completed tasks, not requesting new tasks
12/11/2009 00:55:51 Project communication failed: attempting access to reference site
12/11/2009 00:55:52 Internet access OK - project servers may be temporarily down.
12/11/2009 00:55:54 Collatz Conjecture Scheduler request failed: Failure when receiving data from the peer


It's all on the server status page. Results in progress as well as ready to send are both 0, and the transitioner backlog is over 300,000. Until there's a transition up and running, it's as dead as a door nail.
ID: 33231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 11 Sep 08
Posts: 22
Credit: 9,081,761
RAC: 0
Message 33232 - Posted: 12 Nov 2009, 11:47:10 UTC - in response to Message 33227.  

Travis:
I read in the forums recently that you had a number of hard drives damaged due to construction around campus. To reduce the damage to the hard drives, have you looked at shock-mounted external hard drives or shock mounted hard drive internal cases?

Anti-vibration hard drive cases will protect your new hard drives on order.

How about sitting the whole server on a shock absorbing foam pad?


But then some member of the janitorial staff might forget to leave the air conditioner on in the server closet :p Something along those lines happened at one of the colleges I went too; where no a/c meant it hit like 140 degrees F in one of the server closets. Umm, the HDs were never quite the same.
ID: 33232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 33254 - Posted: 13 Nov 2009, 22:01:58 UTC

OK -- server crash on November 10 -- the home page indicates 'hopefully replacement drives in a day or two'. Now I realize that 'a day or two' is what I refer to as 'airport time' -- you encounter a flight with delayed boarding, are told that boarding will happen in 20 minutes, you come back in 20 minutes and are told = 'we told you before, boarding will happen in 20 minutes' -- check back in 20 minutes.

So, I realize the 'day or two' is one of those where 3 days later with no additional report, one asks about status, and I fully expect that sometime within the next 24 hours, I'll get the 'day or two' response again, but that notwithstanding, is there anything along the lines of a better estimate now, three days after the crash? Are we talking 2 or three more days, or one week or did (like a reverse of the Wrath of Kahn movie) did days mean weeks.
ID: 33254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TomaszPawel
Avatar

Send message
Joined: 9 Nov 08
Posts: 41
Credit: 92,786,635
RAC: 0
Message 33263 - Posted: 14 Nov 2009, 11:10:24 UTC - in response to Message 33254.  

Relax! Crunch CC up to the time until MW works!

http://www.youtube.com/watch?v=9j3DbsdmbOU
A proud member of the Polish National Team

COME VISIT US at Polish National Team FORUM

ID: 33263 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1391
Credit: 203,563,566
RAC: 0
Message 33264 - Posted: 14 Nov 2009, 12:15:52 UTC

Yep, relax is the best way at the moment. The Boss-man has one of the most important goals in his life to deal with, kit is on order, it has to be delivered and fitted, and got up and running. Personally I'm not expecting any new work till next mid-week at the earliest.

I'm happy to wait while I start to set up my second 4850 and give it a run out on Collatz.....


Don't drink water, that's the stuff that rusts pipes
ID: 33264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 33265 - Posted: 14 Nov 2009, 12:27:37 UTC - in response to Message 33264.  
Last modified: 14 Nov 2009, 12:28:01 UTC

Yep, relax is the best way at the moment. The Boss-man has one of the most important goals in his life to deal with, kit is on order, it has to be delivered and fitted, and got up and running. Personally I'm not expecting any new work till next mid-week at the earliest.

I'm happy to wait while I start to set up my second 4850 and give it a run out on Collatz.....


Relax? I don't think so. It may be quite handy for some that MW is down so they can relax and sunbathe and stuff, but there is a race under way in Prime Grid right now. How you can relax when the race is on is quite amazing...

ID: 33265 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[PST]Howard
Avatar

Send message
Joined: 31 Aug 07
Posts: 21
Credit: 21,004,179
RAC: 0
Message 33266 - Posted: 14 Nov 2009, 14:10:43 UTC - in response to Message 33254.  

OK -- server crash on November 10 -- the home page indicates 'hopefully replacement drives in a day or two'. Now I realize that 'a day or two' is what I refer to as 'airport time' -- you encounter a flight with delayed boarding, are told that boarding will happen in 20 minutes, you come back in 20 minutes and are told = 'we told you before, boarding will happen in 20 minutes' -- check back in 20 minutes.


I think you've spent too much time in the Airport, it's time to take a Taxi into BOINC city, go to a Bar and have a cool beer.

How many times have you ordered something and been told it will arrive in 2 days but you end up waiting a week or more.

I would rather Milkyway be down for another 2 weeks, giving time for Travis to get things running properly before we all hammer the server.

Theres plenty of work out there on other projects, that's the beauty of BOINC.

ID: 33266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 33267 - Posted: 14 Nov 2009, 15:03:29 UTC - in response to Message 33266.  

OK -- server crash on November 10 -- the home page indicates 'hopefully replacement drives in a day or two'. Now I realize that 'a day or two' is what I refer to as 'airport time' -- you encounter a flight with delayed boarding, are told that boarding will happen in 20 minutes, you come back in 20 minutes and are told = 'we told you before, boarding will happen in 20 minutes' -- check back in 20 minutes.


I think you've spent too much time in the Airport, it's time to take a Taxi into BOINC city, go to a Bar and have a cool beer.

How many times have you ordered something and been told it will arrive in 2 days but you end up waiting a week or more.

I would rather Milkyway be down for another 2 weeks, giving time for Travis to get things running properly before we all hammer the server.

Theres plenty of work out there on other projects, that's the beauty of BOINC.

I think some of us are very doubtful that it will be quick and run well when it gets going. Past experience has shown how things tend to work here. I don't mind waiting longer either if it means a better & proper setup. And the "2 days" probably means 2 months to me especially with no communication since. And I guess that something will go wrong once the server is running again.

@ Travis: With a new HD can the results stay 12 or 24 hours finally.?
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 33267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 33269 - Posted: 14 Nov 2009, 15:45:20 UTC - in response to Message 33267.  
Last modified: 14 Nov 2009, 15:50:48 UTC

And that's certainly fair, and if the notice on the home page said -- instead of one or two days (false hope that I think most of us knew was a form of optimistic denial (like a cancer patient hoping that his cancer has been cured), I guess I'd appreciate the more scientifically accurate notice of 'no less than 2 or three days and quite likely two or 3 weeks. False hope, on top of the lack of communication we've experienced here is, in my view, shall we say a suboptimal way of handling information. Posting the more likely scenario would not have taken any more effort and would be less likely to bother users.




I would rather Milkyway be down for another 2 weeks, giving time for Travis to get things running properly before we all hammer the server.

Theres plenty of work out there on other projects, that's the beauty of BOINC.


I process MW using CPU's -- so yes, there are plenty of other projects that are getting the cycle time. Those that are using ATI GPU's do have ONE current alternative. I'd love there to be more. For cuda GPU's there are currently two ruuning alternatives (as SETI rarely has work and Einstein is still in an inefficiant beta).

Further, if the new DA dicated credit schema for BOINC takes hold, it will likely act as a significant disincentive to future BOIIN GPU implementations.

But I'm sure you know that. Perhaps you are also engaged in a bit of optimistic denial.
ID: 33269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 33270 - Posted: 14 Nov 2009, 16:05:13 UTC - in response to Message 33267.  


I think you've spent too much time in the Airport, it's time to take a Taxi into BOINC city, go to a Bar and have a cool beer.



Actually, I spent a bit too much time coping with projects where optimism trumps accuracy (or honesty) regarding project status. Also, too much time coping with projects where information neglect trumps regular communications. It tends to sour the beer.


I think some of us are very doubtful that it will be quick and run well when it gets going. Past experience has shown how things tend to work here. I don't mind waiting longer either if it means a better & proper setup. And the "2 days" probably means 2 months to me especially with no communication since. And I guess that something will go wrong once the server is running again.

@ Travis: With a new HD can the results stay 12 or 24 hours finally.?


I think you may have the more realistic assessment here -- certainly the 1 or 2 days optimism NEVER HAD A CHANCE of happening. I hope my two or three week extrapopolation is more accurate than your 2 months fear, but you may be right.

It is good to know that I'm not alone in seeing the current crash and informational accuracy handling something that is a demonstration of a project in ongoing trouble.

The life cycle of many BOINC projects is such that I'm glad to have other BOINC projects which have more staying power (less credit per CPU though). For me, those projects have included Climate, Spinhenge, POEM, Einstein, Rosetta (though they are really low on the credit/cycle scale). I also do some processing with SETI (though not too much -- too close to the BOINC Czar), amd Malaria. For GPU work, my options are Collatz (and I hope Slicker stays interested there), and GPUGrid.

I suppose my view of the various BOINC projects is informed by over eight and half years of this sort of support.

ID: 33270 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 33273 - Posted: 14 Nov 2009, 18:39:12 UTC - in response to Message 33269.  

Those that are using ATI GPU's do have ONE current alternative. I'd love there to be more.

Yes, wouldn't we all. But how many alternatives were there when I invested in ATI cards in the early days, not so far back at the beginning of this year?

NONE. ZILCH.

No other projects. Just MilkyWay and no promise at all that there would be any other ATI projects.

It's lucky that we have Collatz, and unlucky that I can't hammer MW right now ;)


ID: 33273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 33279 - Posted: 14 Nov 2009, 20:46:17 UTC - in response to Message 33273.  

Oh, I agree with you -- credit to MW for (notwithstanding the Berkeley vantage point) encouraging the development and use of ATI GPU's. Credit also (at least early on) for bucking the Berkeley based credit control approach).

But projects are in some ways organisms, they have life cycles, the life cycle does vary from project to project, but over the past 8 months or so, it seems that MW has moved from youthful energy to middle age decline in terms of how the project responds to events and for that matter how it communicates.


Those that are using ATI GPU's do have ONE current alternative. I'd love there to be more.

Yes, wouldn't we all. But how many alternatives were there when I invested in ATI cards in the early days, not so far back at the beginning of this year?

NONE. ZILCH.

No other projects. Just MilkyWay and no promise at all that there would be any other ATI projects.

It's lucky that we have Collatz, and unlucky that I can't hammer MW right now ;)


ID: 33279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 33280 - Posted: 14 Nov 2009, 22:05:08 UTC - in response to Message 33279.  

But projects are in some ways organisms, they have life cycles, the life cycle does vary from project to project, but over the past 8 months or so, it seems that MW has moved from youthful energy to middle age decline in terms of how the project responds to events and for that matter how it communicates.


It's very hard to hide the dissapointment. I want to be grateful that MW provided a project I wanted to crunch in, but wonder how it could be set adrift and now down a black hole with no server.

Hopefully we can have new hardrives in a day or two and get things back up and running.


Yes hopefully. Which day or two was it?


ID: 33280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Server Crash November 10

©2024 Astroinformatics Group