Welcome to MilkyWay@home

Down for maintenance?


Advanced search

Message boards : Number crunching : Down for maintenance?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
ProfileSlicker [TopGun]

Send message
Joined: 20 Mar 08
Posts: 46
Credit: 69,382,802
RAC: 0
50 million credit badge14 year member badge
Message 37502 - Posted: 18 Mar 2010, 17:01:49 UTC

Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it.

The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. Contacting the server when multiple WUs are completed verses after each one is done would reduce the Collatz server load considerable (70% or more!). If a machine has 4 GPUs and each can do a WU in 10 minutes, then it contacts the server every 2.5 minutes instead of once every couple hours (because of a cache size of over a hundred WUs on Collatz) if the results are returned immediately. While the user gets instant gratification, it really pounds on the server and eventually, the server gets overwhelmed.
ID: 37502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileSlicker [TopGun]

Send message
Joined: 20 Mar 08
Posts: 46
Credit: 69,382,802
RAC: 0
50 million credit badge14 year member badge
Message 37503 - Posted: 18 Mar 2010, 17:06:46 UTC - in response to Message 37501.  

"90% off my GPU time should be used for MW, so i often just stop Collatz to fetch new work (YES i know not the best way ;) ). So i waste yesterday about 2 hours of GPUtime. Idle sucks :/"

skysnake, please excuse the out of place post: maybe I misunderstand, but how are you setting 90% gpu for mw having collatz as another share? I have had no success running collatz with mw in that collatz takes over no matter how I set the sharing. I read that it might be a boinc problem, but whatever it is collatz dictates on my machine if running. that is not happening to you?


thanks.


NP ;)

The problem is very simpel. A MW WU takes about 1:40 and a Collatz WU about 5:40
MW allows 12 WU´s max. Collatz about 20 or more. So i spend allways more time for collatz, because if a WU is uploaded, a new WU is downloaded. Perhaps just chance the preferences should help, but i haven´t try it yet.


If MW takes 1:40 and you have it limited to GPU crunching and a quad core, you can get 6 * 4 WUs @ 1.40 = ~40 minutes of work. So, set your additional network resources to only allow 0.03 days cache and it won't fill several days cache of Collatz when MW is down. Since BOINC processes GPU results in first in, first out order, it will only have to process about 7 Collatz WUs before it switches back to MW when MW comes back online.

At least, in theory, that is how BOINC should work. ;-)
ID: 37503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
55degrees

Send message
Joined: 8 Sep 09
Posts: 62
Credit: 61,330,584
RAC: 0
50 million credit badge12 year member badge
Message 37504 - Posted: 18 Mar 2010, 17:34:15 UTC - in response to Message 37503.  

a good idea to go down to .00# cache. I have not tried a cache limit so small. however, I am at .25 cache and have tried resource sharing at 1 for collatz and 99 for mw and the result collatz still hoged the gpu. I know this is out of place. if I continue to struggle with collatz domination, I will start another thread. I like collatz just trying to get at least at first a 50-50 thing going to establish some control of this sharing. great support. thanks.
ID: 37504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skysnake

Send message
Joined: 31 Oct 09
Posts: 20
Credit: 12,074,198
RAC: 0
10 million credit badge12 year member badge
Message 37505 - Posted: 18 Mar 2010, 17:34:30 UTC - in response to Message 37503.  


If MW takes 1:40 and you have it limited to GPU crunching and a quad core, you can get 6 * 4 WUs @ 1.40 = ~40 minutes of work. So, set your additional network resources to only allow 0.03 days cache and it won't fill several days cache of Collatz when MW is down. Since BOINC processes GPU results in first in, first out order, it will only have to process about 7 Collatz WUs before it switches back to MW when MW comes back online.

At least, in theory, that is how BOINC should work. ;-)


I only have a E8400, but @3.8 GHz ;), so i just can have work for 20 min. That´s not enough. I always wont a bit more work in the cache, so if it give some problems, i have something to do. And also i want to crunch most of the time for MW. Your proposal don´t fix this problem. I would be very happy if i can cache 100 WU´s+. With a 5870 it´s no problem to solve them in time.
ID: 37505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBerserk_Tux
Avatar

Send message
Joined: 2 Jan 08
Posts: 79
Credit: 365,471,675
RAC: 0
300 million credit badge14 year member badge
Message 37507 - Posted: 18 Mar 2010, 17:50:46 UTC - in response to Message 37505.  

And still no reply from the prosjekt.

ID: 37507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge14 year member badge
Message 37508 - Posted: 18 Mar 2010, 18:17:37 UTC - in response to Message 37507.  

And still no reply from the prosjekt.


I wouldn't expect one for a bit yet. The last year it seems to take longer and longer for any reply most times.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 37508 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge14 year member badge
Message 37511 - Posted: 18 Mar 2010, 18:51:06 UTC

Has anyone PM'd the admins?
ID: 37511 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 602,433,172
RAC: 143,425
500 million credit badge13 year member badge
Message 37512 - Posted: 18 Mar 2010, 18:54:44 UTC - in response to Message 37504.  

a good idea to go down to .00# cache. I have not tried a cache limit so small. however, I am at .25 cache and have tried resource sharing at 1 for collatz and 99 for mw and the result collatz still hoged the gpu. I know this is out of place. if I continue to struggle with collatz domination, I will start another thread. I like collatz just trying to get at least at first a 50-50 thing going to establish some control of this sharing. great support. thanks.

The problem is due to:

1) BOINC's FIFO method of processing GPU work.

2) MilkyWay's limit of 6 WUs per CPU core, also applied to machines with GPUs.

To make scheduling work correctly one or both of the above policies has to change. IMO both policies are poor decisions.
ID: 37512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bob

Send message
Joined: 12 Apr 09
Posts: 15
Credit: 199,626,534
RAC: 157,300
100 million credit badge13 year member badge
Message 37517 - Posted: 18 Mar 2010, 20:32:20 UTC - in response to Message 37511.  

It's Spring Break! Maybe no one is at Castle Greyskull, all in FLA.
ID: 37517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge13 year member badge
Message 37519 - Posted: 18 Mar 2010, 21:10:50 UTC
Last modified: 18 Mar 2010, 21:11:27 UTC

Although Collatz is working it's very slow, and was down some time ratlier.

I seem to be finding Einstein is also down/problems pulling down new work, with downloads reporting obtaining files at 10.5kbps.

Malaria is between runs, so all my projects ate pooping ATM
Go away, I was asleep


ID: 37519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
2 million credit badge14 year member badge
Message 37520 - Posted: 18 Mar 2010, 21:17:12 UTC - in response to Message 37519.  

I seem to be finding Einstein is also down/problems pulling down new work, with downloads reporting obtaining files at 10.5kbps.

Yep, their new run is starting and it looks like all their mirrors are overloaded - leaving BOINC at it seems to be the best solution, as downloads do work on occasion. Just 4 files left (down from, I don't know, 50?) and I'll be able to get to crunching for them again ...
ID: 37520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
George

Send message
Joined: 19 Feb 10
Posts: 4
Credit: 1,050,320
RAC: 0
1 million credit badge12 year member badge
Message 37524 - Posted: 18 Mar 2010, 23:28:44 UTC

Any one know how much longer its gonna be down for MAINTENANCE ? I`ve been trying for best part of 2days to upload results .No downloads either!
ID: 37524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge13 year member badge
Message 37528 - Posted: 19 Mar 2010, 0:34:19 UTC

Your guess is as good as anyones, George. If it's not sorted Friday then we will be down for the week end as well.
Go away, I was asleep


ID: 37528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 297,449,221
RAC: 119,736
200 million credit badge13 year member badgeextraordinary contributions badge
Message 37529 - Posted: 19 Mar 2010, 0:45:07 UTC - in response to Message 37502.  

Thanks for coming over here in MW land to provide an explanation of the Collatz server 'response' -- now if you could answer for the unanswering MW folks that would be REALLY impressive <smile>.


Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it.

The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. Contacting the server when multiple WUs are completed verses after each one is done would reduce the Collatz server load considerable (70% or more!). If a machine has 4 GPUs and each can do a WU in 10 minutes, then it contacts the server every 2.5 minutes instead of once every couple hours (because of a cache size of over a hundred WUs on Collatz) if the results are returned immediately. While the user gets instant gratification, it really pounds on the server and eventually, the server gets overwhelmed.


ID: 37529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 297,449,221
RAC: 119,736
200 million credit badge13 year member badgeextraordinary contributions badge
Message 37533 - Posted: 19 Mar 2010, 2:17:23 UTC

One of the things that has shown up more frequently of late for MW is that when things go bump in the night, there seems to be a lack of folks 'on the job' to either notice that there is a problem or provide information regarding the problem.

Perhaps this is some form of blowback on the project -- it demonstrates 'black hole' syndrome. Might be due to the focus of the research...

That being said, the lack of project comment on problems which sometimes extend for more than a day or two is rather troublesome and tiresome.
ID: 37533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[PST]Howard
Avatar

Send message
Joined: 31 Aug 07
Posts: 21
Credit: 21,004,179
RAC: 0
20 million credit badge14 year member badge
Message 37538 - Posted: 19 Mar 2010, 6:37:09 UTC

If its still Spring Break there, nowt may happen till Monday
ID: 37538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
George

Send message
Joined: 19 Feb 10
Posts: 4
Credit: 1,050,320
RAC: 0
1 million credit badge12 year member badge
Message 37548 - Posted: 19 Mar 2010, 10:46:26 UTC - in response to Message 37528.  

Thanks for reply just have to wait n`see,havin problems with other projects as well.CE LA VIE!
ID: 37548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge14 year member badge
Message 37552 - Posted: 19 Mar 2010, 12:49:32 UTC - in response to Message 37502.  

Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it.

The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. ...

THis is another of those devices that DA has made all or nothing when we have asked several times that it be made per project. Some projects want/need results as fast as possible (GPU Grid is the best example) and RRI is good for them. Other projects would prefer as you have noted that would be better in a more batch mode.

There is one more reason to use RRI and that is to avoid the issue of the systems "running dry" of work. A problem that I had noted on the alpha list with logs included... but of course I am on UCB's ignore list so ...

One side note, I am even having trouble getting work from GPU Grid because of the unexpected down-time of MW and the slowness of Collatz ... I have one GPU core idle as I write this as I cannot get work ... well, it does make the room quieter ... :)
ID: 37552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileRAMen
Avatar

Send message
Joined: 8 Apr 08
Posts: 45
Credit: 161,943,995
RAC: 0
100 million credit badge14 year member badge
Message 37555 - Posted: 19 Mar 2010, 13:48:37 UTC
Last modified: 19 Mar 2010, 13:52:04 UTC

Up and running !!
ID: 37555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
20 million credit badge14 year member badge
Message 37557 - Posted: 19 Mar 2010, 14:18:25 UTC

And now who took the last WUs without generating new ones? *LOL*
Lovely greetings, Cori
ID: 37557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Down for maintenance?

©2022 Astroinformatics Group