Welcome to MilkyWay@home

Aaargh! Server out of new work!


Advanced search

Message boards : Number crunching : Aaargh! Server out of new work!
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

AuthorMessage
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 41347 - Posted: 9 Aug 2010, 18:19:35 UTC

Work is up now.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 41347 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge10 year member badge
Message 41348 - Posted: 9 Aug 2010, 19:39:31 UTC

Sighing with relief, and letting the raw patch at the back of my throat time to get better.

Where is my voice?
Go away, I was asleep


ID: 41348 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dirk Broer

Send message
Joined: 11 Dec 09
Posts: 17
Credit: 53,837,363
RAC: 3,417
50 million credit badge10 year member badge
Message 41353 - Posted: 10 Aug 2010, 9:00:37 UTC

Feeder is not running

10-08-2010 10:51:55 Milkyway@home Sending scheduler request: Requested by user.
10-08-2010 10:51:55 Milkyway@home Reporting 2 completed tasks, requesting new tasks for GPU
10-08-2010 10:51:57 Milkyway@home Scheduler request completed: got 0 new tasks
10-08-2010 10:51:57 Milkyway@home Message from server: Server error: feeder not running

ID: 41353 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 134,485,269
RAC: 11,604
100 million credit badge10 year member badge
Message 41354 - Posted: 10 Aug 2010, 10:06:58 UTC - in response to Message 41353.  

I've sent a email to stuff that servers are down.

Alexander
ID: 41354 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dirk Broer

Send message
Joined: 11 Dec 09
Posts: 17
Credit: 53,837,363
RAC: 3,417
50 million credit badge10 year member badge
Message 41355 - Posted: 10 Aug 2010, 10:47:03 UTC - in response to Message 41354.  

So did I, and the feeder works again!
ID: 41355 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge10 year member badge
Message 41356 - Posted: 10 Aug 2010, 11:33:09 UTC

I presume a third request to the admins for a server reboot will not go amiss. The Validator has a balance of Workunits waiting for validation 34,273 , so the awaiting for work must have happened a while ago.
Go away, I was asleep


ID: 41356 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Cartoonman

Send message
Joined: 10 Dec 09
Posts: 18
Credit: 9,455,410
RAC: 340
5 million credit badge10 year member badge
Message 41357 - Posted: 10 Aug 2010, 11:45:49 UTC

.. well, no new Wu's again. just finished my batch now...

i was wondering why there was so many Wu's waiting to report...
ID: 41357 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ProfileDavid Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
500 million credit badge10 year member badge
Message 41359 - Posted: 10 Aug 2010, 11:52:18 UTC

Since I live on the other side of the planet, I think the project admin should give me a big RED button to push whenever this happens, so the servers reset.

At least three of my babies switch automatically over to the backup project now.
ID: 41359 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Old man

Send message
Joined: 8 Mar 09
Posts: 192
Credit: 10,868,615
RAC: 0
10 million credit badge10 year member badge
Message 41360 - Posted: 10 Aug 2010, 12:03:25 UTC - in response to Message 41357.  

.. well, no new Wu's again. just finished my batch now...

i was wondering why there was so many Wu's waiting to report...


I dont know about other users but i'm switched to dnetc@home. I have 12 tasks cache waiting to run. I run them out when this project is running again but not before it.
ID: 41360 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge10 year member badge
Message 41361 - Posted: 10 Aug 2010, 12:16:24 UTC

I moved over to Collatz as the ATI HD3850 crunches DNETC incredibly slowly compered to either Milkyway (preferred) or Collatz (back up).

Now waiting the results of E-mails to the project admins to re-boot the servers.
Go away, I was asleep


ID: 41361 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1387
Credit: 186,726,858
RAC: 0
100 million credit badge10 year member badge
Message 41362 - Posted: 10 Aug 2010, 12:30:10 UTC

I'm already crunching elsewhere with backup projects! Micro management sucks though....

Don't drink water, that's the stuff that rusts pipes
ID: 41362 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profilemdhittle*
Avatar

Send message
Joined: 25 Jun 10
Posts: 284
Credit: 260,490,091
RAC: 0
200 million credit badge9 year member badge
Message 41365 - Posted: 10 Aug 2010, 12:40:58 UTC

The reliabilty of this project is almost getting as bad as SETI. I managed to get about 12 work units 30 minutes ago, then it stopped again.
ID: 41365 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 134,485,269
RAC: 11,604
100 million credit badge10 year member badge
Message 41367 - Posted: 10 Aug 2010, 14:06:09 UTC - in response to Message 41365.  

The reliabilty of this project is almost getting as bad as SETI. I managed to get about 12 work units 30 minutes ago, then it stopped again.


Have you ever tried orbit@home or lhc@home?
That could change your mind!

For your GPU Collatz Conjecture could be a backup project.

Alexander
ID: 41367 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profilemdhittle*
Avatar

Send message
Joined: 25 Jun 10
Posts: 284
Credit: 260,490,091
RAC: 0
200 million credit badge9 year member badge
Message 41368 - Posted: 10 Aug 2010, 14:42:16 UTC - in response to Message 41367.  


Have you ever tried orbit@home or lhc@home?
That could change your mind!

For your GPU Collatz Conjecture could be a backup project.

Alexander


I haven't tried orbit@home, and I gave up on lhc@home awhile ago.

As far as Collatz goes, after I installed my second ATI 5970 card, all Collatz will do for me is lock up my system. I wish I could get it to run. Evidently it doesn't like an i7 980x cpu, Win7 64bit, and 2 ATI 5970 cards.

Mike..
ID: 41368 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 134,485,269
RAC: 11,604
100 million credit badge10 year member badge
Message 41369 - Posted: 10 Aug 2010, 15:00:33 UTC - in response to Message 41368.  


As far as Collatz goes, after I installed my second ATI 5970 card, all Collatz will do for me is lock up my system. I wish I could get it to run. Evidently it doesn't like an i7 980x cpu, Win7 64bit, and 2 ATI 5970 cards.

Mike..


Mike,
collatz likes i7, win64 and 2 ATI-cards. As you can see, my mainsys is a similar configuration, except that I do not have 2 5970 but one 5830 and one 4870 and 'only' 8 threads. And collatz works fine.

But when I take a look onto your computers, I cannot find one with ATI-GPU's. There are two listed with nVidia.
Maybe you have a more basic problem?

Alexander
ID: 41369 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profilemdhittle*
Avatar

Send message
Joined: 25 Jun 10
Posts: 284
Credit: 260,490,091
RAC: 0
200 million credit badge9 year member badge
Message 41371 - Posted: 10 Aug 2010, 15:25:22 UTC - in response to Message 41369.  
Last modified: 10 Aug 2010, 15:27:06 UTC


Mike,
collatz likes i7, win64 and 2 ATI-cards. As you can see, my mainsys is a similar configuration, except that I do not have 2 5970 but one 5830 and one 4870 and 'only' 8 threads. And collatz works fine.

But when I take a look onto your computers, I cannot find one with ATI-GPU's. There are two listed with nVidia.
Maybe you have a more basic problem?

Alexander


That's strange, when I look at my computers the first one listed at the top is the system I am talking about with the 2 ATI 5970 cards. I see 10 systems when I go to my list of computers.

Mike...
ID: 41371 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 283,151,643
RAC: 523
200 million credit badge10 year member badgeextraordinary contributions badge
Message 41372 - Posted: 10 Aug 2010, 15:42:01 UTC

An automatic pre-emptive stop/start of the server (or server processes) is something of a brute force *work-around* which doesn't deal with what appears to be a root cause problem that could use some analysis and resolution efforts.

Back in the day when Travis was more closely involved, pleas here for that sort of corrective action seemed to have more effect.

Seemingly at this point, it is more a case of auto-pilot (where the best that can be had is frequent reboots) as the various admins have a lot of other things on their plate in addition to this project.


ID: 41372 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 283,151,643
RAC: 523
200 million credit badge10 year member badgeextraordinary contributions badge
Message 41373 - Posted: 10 Aug 2010, 15:51:24 UTC - in response to Message 41365.  

Well not quite -- I mean the approach these days at SETI is a weekly *three day* outage -- preceded by 12 to 24 hour traffic jam and then followed by a post outage traffic jam of 12 to 24 hours. I believe the idea was to improve reliability when the outage wasn't going on -- it hasn't yet done that.

So for SETI, what is now in place is a part time project, but their message boards run about 162 hours a week. Rather a fair amount of resource there for message boards it seems to me. SETI moved to close to the bottom of my list rather a long time ago.

I suspect a large part of the problem here is that to a fair degree, the now *Doctor* Travis has moved on (as is to be expected) and there no longer is the motivational force behind this project.


The reliabilty of this project is almost getting as bad as SETI. I managed to get about 12 work units 30 minutes ago, then it stopped again.


ID: 41373 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 283,151,643
RAC: 523
200 million credit badge10 year member badgeextraordinary contributions badge
Message 41374 - Posted: 10 Aug 2010, 16:23:21 UTC - in response to Message 41361.  

For me, Milkyway dropped to my second project simply because I have a flock of GPU's that MW doesn't support (ie non-double precision cards).

With DNetC now available which provides largely the same GPU support as Collatz, MW will drop down to my number three project in terms of TC within a couple of months. It is interesting the sort of reliability that Collatz and Dnetc can provide with quite limited resources (of course with lower user counts).

I agree with you regarding the somewhat finicky nature of Dnetc -- there are clearly some GPU configurations it doesn't play well with (like the dual 5970 ATI's and your 3850), and it can push the cards to a distracting degree -- I can't run Dnet on my primary computer when I am doing even ordinary tasks, compared to Collatz.

I guess we can hope that Travis is able to pass on the torch here to someone at RPI who will be 'invested' in the project as he was in the past. My other hope is that additional 'low end' GPU projects, particularly ATI GPU projects, start showing as well.


I moved over to Collatz as the ATI HD3850 crunches DNETC incredibly slowly compered to either Milkyway (preferred) or Collatz (back up).

Now waiting the results of E-mails to the project admins to re-boot the servers.


ID: 41374 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1387
Credit: 186,726,858
RAC: 0
100 million credit badge10 year member badge
Message 41375 - Posted: 10 Aug 2010, 16:36:09 UTC

Well not quite -- I mean the approach these days at SETI is a weekly *three day* outage -- preceded by 12 to 24 hour traffic jam and then followed by a post outage traffic jam of 12 to 24 hours. I believe the idea was to improve reliability when the outage wasn't going on -- it hasn't yet done that.


As I understand it, the 3 day outage is to let Nitpicker run on 10 years worth of results to sift for likely candidates to re-examine. When they tried it in real time it zonked the servers and the database out. You can't upload or download work for 3 days, but the message boards are only out for 9-12 hours as they were before.

I suspect a large part of the problem here is that to a fair degree, the now *Doctor* Travis has moved on (as is to be expected) and there no longer is the motivational force behind this project.


He did say that he would be around but not have as much involvement as before, so you are about right in what you say. The point is that there is DNETC which gives about 90% of credits you get here, and also Collatz which gives about 60%.
That is of course running GPU's.

Talking of GPU's I have said over and over again, that the basic Boinc infrastructure used by the majority of projects was just not designed for the high levels of data throughput that the onslaught of GPU crunching has unleashed. Servers were scoped out to deal with CPU work and it is not surprising to me at all that all the popular projects are struggling.

If you couple that with a general slow down of the www/Internet due to the world population approaching 7 billion, and the fact that China has nearly 20% of that, and is expanding its web presence at an exponential rate, everything is creaking at the seams.

Will DC survive ??






Don't drink water, that's the stuff that rusts pipes
ID: 41375 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

Message boards : Number crunching : Aaargh! Server out of new work!

©2020 Astroinformatics Group