Welcome to MilkyWay@home

Server Outages


Advanced search

Message boards : Number crunching : Server Outages
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileJayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
3 million credit badge10 year member badge
Message 1449 - Posted: 12 Jan 2008, 5:29:23 UTC

I find it interesting that under 3 days of no load the server hasn't crashed....under load it was crashing at least once a day.
ID: 1449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
3 million credit badge10 year member badge
Message 1450 - Posted: 12 Jan 2008, 16:43:29 UTC

Well there goes that theory ...server crashed this morning unless it was taken down on purpose....
ID: 1450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 29 Aug 07
Posts: 115
Credit: 258,955,921
RAC: 159
200 million credit badge10 year member badge
Message 1451 - Posted: 12 Jan 2008, 17:15:15 UTC

Server crashes? I though they were due to power outages?

ID: 1451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
3 million credit badge10 year member badge
Message 1452 - Posted: 12 Jan 2008, 17:24:58 UTC - in response to Message 1451.  
Last modified: 12 Jan 2008, 17:29:41 UTC

Server crashes? I though they were due to power outages?


When a computer reboots itself it could be both.....Travis never said all the computers in their computer room were having outages....it sounded like just this one.He also used the word crashed in the front page news ;)
ID: 1452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
2 million credit badge10 year member badge
Message 1453 - Posted: 12 Jan 2008, 23:13:20 UTC

The server could even be crashing due to an unstable power supply :)
ID: 1453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCrystallize
Avatar

Send message
Joined: 12 Nov 07
Posts: 31
Credit: 123,621
RAC: 0
100 thousand credit badge10 year member badge
Message 1456 - Posted: 12 Jan 2008, 23:52:45 UTC
Last modified: 12 Jan 2008, 23:53:45 UTC

What ever it is, it's obviously a hard ware error.

If you turn off the "auto reboot", you should get a BSOD that
gives you some more information what it could be that causing it.

CPU error, not very likely, unless it's due to over heating,
all BOINC projects create increased heating to the CPU,
which could explain why it crashes more often when the project is on.

How ever, over heating is a quite unusual reason for a crash,
unless you have over clocked the CPU.

The most usual error that causes the most vague and various symptoms
is actually faulty graphic cards.

The next most usual error is unfortunately the one that is the hardest
to fix and that is mother board failures...
ID: 1456 · Rating: -1 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
3 million credit badge10 year member badge
Message 1457 - Posted: 13 Jan 2008, 1:13:51 UTC - in response to Message 1456.  

What ever it is, it's obviously a hard ware error.

If you turn off the "auto reboot", you should get a BSOD that
gives you some more information what it could be that causing it.

CPU error, not very likely, unless it's due to over heating,
all BOINC projects create increased heating to the CPU,
which could explain why it crashes more often when the project is on.

How ever, over heating is a quite unusual reason for a crash,
unless you have over clocked the CPU.

The most usual error that causes the most vague and various symptoms
is actually faulty graphic cards.

The next most usual error is unfortunately the one that is the hardest
to fix and that is mother board failures...



Good info Crystallize.....however....I would "hope" Lab Staff knows all this :)
ID: 1457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKSMarksPsych
Avatar

Send message
Joined: 9 Sep 07
Posts: 22
Credit: 320,035
RAC: 0
100 thousand credit badge10 year member badge
Message 1458 - Posted: 13 Jan 2008, 9:35:55 UTC - in response to Message 1456.  


If you turn off the "auto reboot", you should get a BSOD that
gives you some more information what it could be that causing it.



Maybe if it ran Windows... But BOINC software runs under Linux.
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 1458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCrystallize
Avatar

Send message
Joined: 12 Nov 07
Posts: 31
Credit: 123,621
RAC: 0
100 thousand credit badge10 year member badge
Message 1459 - Posted: 13 Jan 2008, 10:03:57 UTC - in response to Message 1458.  



Good info Crystallize.....however....I would "hope" Lab Staff knows all this :)


Perhaps, but since it takes so long and they don't seem to have a clue.

I'm usually solving a hard ware problem in less than two days, how ever complicated it may seem. So I thought I just give some pointers... :o)


If you turn off the "auto reboot", you should get a BSOD that
gives you some more information what it could be that causing it.



Maybe if it ran Windows... But BOINC software runs under Linux.



But there must be some debugging tools also for Linux ? right ?
ID: 1459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 1467 - Posted: 14 Jan 2008, 2:17:12 UTC - in response to Message 1459.  



Good info Crystallize.....however....I would "hope" Lab Staff knows all this :)


Perhaps, but since it takes so long and they don't seem to have a clue.

I'm usually solving a hard ware problem in less than two days, how ever complicated it may seem. So I thought I just give some pointers... :o)


If you turn off the "auto reboot", you should get a BSOD that
gives you some more information what it could be that causing it.



Maybe if it ran Windows... But BOINC software runs under Linux.



But there must be some debugging tools also for Linux ? right ?


unfortunately, our labstaff isn't dedicated to just working with us - they handle all the systems administration for the entire computer science department at RPI. so getting things to work correctly often takes a bit longer than we'd all like. also unfortunately, we're also kind of locked into using them :P
ID: 1467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile[B@H] Ray

Send message
Joined: 27 Dec 07
Posts: 35
Credit: 1,432,926
RAC: 0
1 million credit badge10 year member badge
Message 1468 - Posted: 14 Jan 2008, 3:23:19 UTC

It looks like we are running again.
ID: 1468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileSaenger
Avatar

Send message
Joined: 28 Aug 07
Posts: 133
Credit: 12,837,839
RAC: 13,290
10 million credit badge10 year member badge
Message 1495 - Posted: 15 Jan 2008, 19:12:04 UTC - in response to Message 1468.  

It looks like we are running again.

And again ;)

Is the end of this somewhere in sight?
Grüße vom Sänger
ID: 1495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 1506 - Posted: 15 Jan 2008, 22:56:35 UTC - in response to Message 1495.  

It looks like we are running again.

And again ;)

Is the end of this somewhere in sight?


i've been bugging labstaff as much as i can :) school is back in session now so hopefully they're all back from vacations and things like that. i haven't gotten any response from them in the last few days.
ID: 1506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 1530 - Posted: 18 Jan 2008, 20:05:38 UTC

Why is this going down almost every day?
ID: 1530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 1533 - Posted: 19 Jan 2008, 14:24:45 UTC - in response to Message 1530.  

Why is this going down almost every day?


the guy i've been talkign to in labstaff thinks there might be a problem in the kernel. other than that i'm really not too sure.
ID: 1533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
3 million credit badge10 year member badge
Message 2008 - Posted: 7 Mar 2008, 1:07:54 UTC

Is anyone looking into the recent rash of server crashes?
ID: 2008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 2019 - Posted: 7 Mar 2008, 6:35:24 UTC - in response to Message 2008.  

Is anyone looking into the recent rash of server crashes?


i'll let labstaff know about it. not quite sure whats causing them. there have been a lot of downtime for different servers on campus so that might be part of the problem. it was up and running smoothly for a good week or two there before the recent bout of outages.
ID: 2019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
20 million credit badge10 year member badge
Message 2133 - Posted: 9 Mar 2008, 12:49:09 UTC

Phew, we're back again! :-))))
Lovely greetings, Cori
ID: 2133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePhiladelphia
Avatar

Send message
Joined: 9 Nov 07
Posts: 131
Credit: 180,454
RAC: 0
100 thousand credit badge10 year member badge
Message 2139 - Posted: 9 Mar 2008, 22:48:18 UTC - in response to Message 2133.  

Phew, we're back again! :-))))


That's what we're talking about, more work :)



CLICK TO HELP BUILD
ID: 2139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDoctorNow
Avatar

Send message
Joined: 28 Aug 07
Posts: 146
Credit: 10,276,862
RAC: 0
10 million credit badge10 year member badge
Message 2146 - Posted: 10 Mar 2008, 19:08:54 UTC

Yippie, we're back.
Was about time. ;-)
Member of BOINC@Heidelberg and ATA!

My BOINCstats
ID: 2146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server Outages

©2019 Astroinformatics Group