Welcome to MilkyWay@home

More and more failures to connect to server- deja vu

Message boards : Number crunching : More and more failures to connect to server- deja vu
Message board moderation

To post messages, you must log in.

AuthorMessage
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 32548 - Posted: 19 Oct 2009, 12:33:13 UTC

Looks like we are heading back to the ways of old with overloaded server... more and more often failing to connect for more work and rigs sitting idle... any plans to upgrade the system?

10/19/2009 7:24:37 AM Project communication failed: attempting access to reference site
10/19/2009 7:24:37 AM Milkyway@home Temporarily failed upload of de_11_1s_const_1_4463570_1255954546_0_0: HTTP error
10/19/2009 7:24:37 AM Milkyway@home Backing off 1 min 0 sec on upload of de_11_1s_const_1_4463570_1255954546_0_0
10/19/2009 7:24:38 AM Internet access OK - project servers may be temporarily down.
10/19/2009 7:24:40 AM Milkyway@home Scheduler request failed: Failure when receiving data from the peer
10/19/2009 7:24:59 AM Project communication failed: attempting access to reference site
10/19/2009 7:24:59 AM Milkyway@home Temporarily failed upload of de_11_1s_const_1_4463569_1255954546_0_0: HTTP error
10/19/2009 7:24:59 AM Milkyway@home Backing off 1 min 0 sec on upload of de_11_1s_const_1_4463569_1255954546_0_0
10/19/2009 7:25:00 AM Milkyway@home update requested by user
10/19/2009 7:25:01 AM Internet access OK - project servers may be temporarily down.
ID: 32548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile uBronan
Avatar

Send message
Joined: 9 Feb 09
Posts: 166
Credit: 27,520,813
RAC: 0
Message 32551 - Posted: 19 Oct 2009, 16:37:17 UTC
Last modified: 19 Oct 2009, 16:38:39 UTC

What did you expect ?!
These new ati cards work through the load so fast it simply can't keep up
So me one of the slower ones end up get even less ;)
I hate ati for making such beasts :D
But prepare to see even worse since more and more people switch over to the ati cards, or buy the new beasts
Its new, its relative fast... my new bicycle
ID: 32551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
localizer

Send message
Joined: 28 Jan 08
Posts: 40
Credit: 379,931,801
RAC: 0
Message 32555 - Posted: 19 Oct 2009, 16:45:04 UTC

Actually, having had the same problems, I'm pretty sure that with Collatz down over this weekend, a 'few' users have joined/returned...... I know I have an extra host on MW because I can't get Collatz WUs........
ID: 32555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 32558 - Posted: 19 Oct 2009, 17:13:37 UTC


What's the difference between people spending their money on a few Core i7 965s overclocked to 4.8GHz or a few ATI cards? Either way you can't keep blaming people for wanting to crunch here.

Crunch, crunch crunch! ;)



ID: 32558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 32561 - Posted: 19 Oct 2009, 20:29:32 UTC
Last modified: 19 Oct 2009, 20:30:40 UTC

What did you expect ?!
These new Pentium Is work through the load so fast it simply can't keep up
Everyone should be made to run the project on a Z-80. </sarky>
OT My 5870 should arrive later this week...
Cheers,

PeterV

.
ID: 32561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 32563 - Posted: 19 Oct 2009, 21:13:33 UTC - in response to Message 32561.  

What did you expect ?!


I expect that when I risk all and go for ATIs in the early stages at the beginning of the year when this mega crunching was all an unknown and a risky waste of money if they didn't work out, and then you go and buy one 5870 and proceed to grind me into dust as you quantum leap to billions of credits with your thousands of shaders and DDR 368 memory... I expect I might have to join a team where I can keep up :P


ID: 32563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 32564 - Posted: 19 Oct 2009, 21:49:58 UTC - in response to Message 32563.  

Or you can just throw more hardware at the problem... :)
Cheers,

PeterV

.
ID: 32564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 32565 - Posted: 19 Oct 2009, 22:33:43 UTC - in response to Message 32564.  

Or you can just throw more hardware at the problem... :)


I could be tempted ;)


ID: 32565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gill..
Avatar

Send message
Joined: 25 Aug 09
Posts: 12
Credit: 179,143,357
RAC: 0
Message 32566 - Posted: 20 Oct 2009, 0:18:41 UTC

So project servers ARE down? Been down since late into the night last night. And, if you're wondering - got a number of computation errors right before....which is NOT normal....

NICE on the 5870 - you'll have to let me know how it goes..I'm almost at 2 M in a couple week period with my 4870..

got the GD70 with 4 PICE's....ohhh imagine those stacked with 5870's (never could I afford that)...

but may 5850's over time!

Also, everyone on my overclock site says x8x8 lanes don't affect this type of crunching - but they're going off Folding numbers...is that true here too??

my board will do x16 x16 or x8x8x8x8

blam!
ID: 32566 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 32567 - Posted: 20 Oct 2009, 0:35:55 UTC - in response to Message 32566.  

So project servers ARE down? Been down since late into the night last night.

No the server hasn't gone down. I believe that it is so overloaded AGAIN that it can't handle all of the requests and denys some/many. I have had retry each time to connect to turn in ad get work. I can't even get this to post.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 32567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile uBronan
Avatar

Send message
Joined: 9 Feb 09
Posts: 166
Credit: 27,520,813
RAC: 0
Message 32600 - Posted: 21 Oct 2009, 8:27:55 UTC

Hehehe yea it is overloaded no access to site again and too many fast cards which even want more new units gives us the deja vu :D
And yes Ice i am also tempted but with no units enough to work on, its no good we need more ati asssisted projects ;)
So we can overload them all xD

Its new, its relative fast... my new bicycle
ID: 32600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 32605 - Posted: 21 Oct 2009, 12:55:15 UTC

Well, it's not a network thing, as I can ping the server just fine when these message board issues are happening. Here's a trace...

Tracing route to milkyway.cs.rpi.edu [128.213.28.20]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms 192.168.1.1
2 * * * Request timed out.
3 13 ms 9 ms 9 ms 98.172.172.17
4 9 ms 9 ms 7 ms 98.172.172.6
5 14 ms 10 ms 7 ms 98.172.172.13
6 44 ms 20 ms 18 ms 68.1.1.25
7 22 ms 17 ms 17 ms vlan99.csw4.Washington1.Level3.net [4.68.17.254]

8 16 ms 15 ms 16 ms ae-92-92.ebr2.Washington1.Level3.net [4.69.134.1
57]
9 25 ms 19 ms 21 ms ae-3-3.ebr1.NewYork2.Level3.net [4.69.132.90]
10 23 ms 20 ms 21 ms ae-16-51.car2.NewYork2.Level3.net [4.69.138.199]

11 23 ms 24 ms 23 ms NYSERNET.car2.NewYork2.Level3.net [4.71.188.38]

12 25 ms 24 ms 23 ms vccfr7-39-160.net.rpi.edu [128.113.39.161]
13 24 ms 25 ms 28 ms milkyway.cs.rpi.edu [128.213.28.20]
14 28 ms 25 ms 28 ms milkyway.cs.rpi.edu [128.213.28.20]
15 25 ms 31 ms 24 ms milkyway.cs.rpi.edu [128.213.28.20]

Trace complete.

Odd that it does some multiple hops there on campus listed as the same IP, but eh...routing is not my strong suit...

So the http service is having a problem, but the network layer is ok, or at least it isn't saturated / congested. So, it would likely seem to be a problem with database transactions per second. Perhaps Travis should increase the setting to 5 minutes for no work queries to the database...if it's still at 1 minute (or whatever it was set to).
ID: 32605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 32612 - Posted: 21 Oct 2009, 17:37:29 UTC - in response to Message 32600.  

its no good we need more ati asssisted projects ;)
So we can overload them all xD

It's about time someone had the idea of taking distributed processing a step further and having distributed servers. I'm sure that many would volunteer to use their powerful systems to recieve WUs and issue new ones, and perhaps being given credits for WUs recieved, stored, moved to the MW server when it is better able to cope, and dispense new WUs in the meantime.


ID: 32612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 32616 - Posted: 21 Oct 2009, 18:54:22 UTC - in response to Message 32612.  

its no good we need more ati asssisted projects ;)
So we can overload them all xD

It's about time someone had the idea of taking distributed processing a step further and having distributed servers. I'm sure that many would volunteer to use their powerful systems to recieve WUs and issue new ones, and perhaps being given credits for WUs recieved, stored, moved to the MW server when it is better able to cope, and dispense new WUs in the meantime.


That would never work because people are too demanding, too fickle, and too unpredictable, not to mention being a logistical nightmare. People would demand to have more and more credit, or we'd have the CPP people stepping in and saying there was too much credit being issued. People would be fickle and get irritated with a project and decide they weren't going to do it anymore. You'd also have a lack of people that signed up to do it that would be on and able to do it at any given time. After all that, you'd have to make sure that all tasks were synchronized across multiple systems, especially for a project that depends on incoming work to generate new work. Results on various "distributed servers" could be stale and no longer needed.

Beyond even those issues, you'd have issues of security. The area of the system would have to be encrypted with the only people having access being the people at the PROJECT. If the donor of the system were to have access to that system, the science could be compromised and/or user results could be compromised. There'd have to be monsterous audit trails to keep everything tracked. There'd also need to be some sort of mandatory enforcement of antivirus programs and signatures being up to date.

Then you'd have to get down to actual physical specs. A mandatory requirement would be a battery backup of significant backup time. Next you'd need to have those systems set up with at least RAID 5, in case of data loss. The donor would have to be responsible for performing a full backup probably every day with hourly incrementals. Redundant power supplies would also be required. Next, you'd need a broadband connection that provided significant up/down speeds and one that allowed server-type actions.

Once all is said and done, the hassle of dealing with the general public, the possibility of users compromising the project security or data integrity, and other significant risks to the project along with significant costs for the donor, would make this a non-starter. Even if technology improved drastically, you'd still have the bane of system admins, the users...or in this case the donors, to contend with, and as I said, they're demanding, fickle, and unpredictable...

Best solution == longer tasks for GPUs :-)
ID: 32616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : More and more failures to connect to server- deja vu

©2024 Astroinformatics Group