Down for maintenance?

Author	Message
Slicker [TopGun] Send message Joined: 20 Mar 08 Posts: 46 Credit: 69,382,802 RAC: 0	Message 37502 - Posted: 18 Mar 2010, 17:01:49 UTC Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it. The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. Contacting the server when multiple WUs are completed verses after each one is done would reduce the Collatz server load considerable (70% or more!). If a machine has 4 GPUs and each can do a WU in 10 minutes, then it contacts the server every 2.5 minutes instead of once every couple hours (because of a cache size of over a hundred WUs on Collatz) if the results are returned immediately. While the user gets instant gratification, it really pounds on the server and eventually, the server gets overwhelmed. ID: 37502 · Rating: 0 · rate: / Reply Quote

Slicker [TopGun] Send message Joined: 20 Mar 08 Posts: 46 Credit: 69,382,802 RAC: 0	Message 37503 - Posted: 18 Mar 2010, 17:06:46 UTC - in response to Message 37501. "90% off my GPU time should be used for MW, so i often just stop Collatz to fetch new work (YES i know not the best way ;) ). So i waste yesterday about 2 hours of GPUtime. Idle sucks :/" skysnake, please excuse the out of place post: maybe I misunderstand, but how are you setting 90% gpu for mw having collatz as another share? I have had no success running collatz with mw in that collatz takes over no matter how I set the sharing. I read that it might be a boinc problem, but whatever it is collatz dictates on my machine if running. that is not happening to you? thanks. NP ;) The problem is very simpel. A MW WU takes about 1:40 and a Collatz WU about 5:40 MW allows 12 WUÂ´s max. Collatz about 20 or more. So i spend allways more time for collatz, because if a WU is uploaded, a new WU is downloaded. Perhaps just chance the preferences should help, but i havenÂ´t try it yet. If MW takes 1:40 and you have it limited to GPU crunching and a quad core, you can get 6 * 4 WUs @ 1.40 = ~40 minutes of work. So, set your additional network resources to only allow 0.03 days cache and it won't fill several days cache of Collatz when MW is down. Since BOINC processes GPU results in first in, first out order, it will only have to process about 7 Collatz WUs before it switches back to MW when MW comes back online. At least, in theory, that is how BOINC should work. ;-) ID: 37503 · Rating: 0 · rate: / Reply Quote

55degrees Send message Joined: 8 Sep 09 Posts: 62 Credit: 61,330,584 RAC: 0	Message 37504 - Posted: 18 Mar 2010, 17:34:15 UTC - in response to Message 37503. a good idea to go down to .00# cache. I have not tried a cache limit so small. however, I am at .25 cache and have tried resource sharing at 1 for collatz and 99 for mw and the result collatz still hoged the gpu. I know this is out of place. if I continue to struggle with collatz domination, I will start another thread. I like collatz just trying to get at least at first a 50-50 thing going to establish some control of this sharing. great support. thanks. ID: 37504 · Rating: 0 · rate: / Reply Quote

Skysnake Send message Joined: 31 Oct 09 Posts: 20 Credit: 12,074,198 RAC: 0	Message 37505 - Posted: 18 Mar 2010, 17:34:30 UTC - in response to Message 37503. If MW takes 1:40 and you have it limited to GPU crunching and a quad core, you can get 6 * 4 WUs @ 1.40 = ~40 minutes of work. So, set your additional network resources to only allow 0.03 days cache and it won't fill several days cache of Collatz when MW is down. Since BOINC processes GPU results in first in, first out order, it will only have to process about 7 Collatz WUs before it switches back to MW when MW comes back online. At least, in theory, that is how BOINC should work. ;-) I only have a E8400, but @3.8 GHz ;), so i just can have work for 20 min. ThatÂ´s not enough. I always wont a bit more work in the cache, so if it give some problems, i have something to do. And also i want to crunch most of the time for MW. Your proposal donÂ´t fix this problem. I would be very happy if i can cache 100 WUÂ´s+. With a 5870 itÂ´s no problem to solve them in time. ID: 37505 · Rating: 0 · rate: / Reply Quote

Berserk_Tux Send message Joined: 2 Jan 08 Posts: 79 Credit: 365,471,675 RAC: 0	Message 37507 - Posted: 18 Mar 2010, 17:50:46 UTC - in response to Message 37505. And still no reply from the prosjekt. ID: 37507 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 37508 - Posted: 18 Mar 2010, 18:17:37 UTC - in response to Message 37507. And still no reply from the prosjekt. I wouldn't expect one for a bit yet. The last year it seems to take longer and longer for any reply most times. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 37508 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 37511 - Posted: 18 Mar 2010, 18:51:06 UTC Has anyone PM'd the admins? ID: 37511 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 37512 - Posted: 18 Mar 2010, 18:54:44 UTC - in response to Message 37504. a good idea to go down to .00# cache. I have not tried a cache limit so small. however, I am at .25 cache and have tried resource sharing at 1 for collatz and 99 for mw and the result collatz still hoged the gpu. I know this is out of place. if I continue to struggle with collatz domination, I will start another thread. I like collatz just trying to get at least at first a 50-50 thing going to establish some control of this sharing. great support. thanks. The problem is due to: 1) BOINC's FIFO method of processing GPU work. 2) MilkyWay's limit of 6 WUs per CPU core, also applied to machines with GPUs. To make scheduling work correctly one or both of the above policies has to change. IMO both policies are poor decisions. ID: 37512 · Rating: 0 · rate: / Reply Quote

bob Send message Joined: 12 Apr 09 Posts: 15 Credit: 278,731,391 RAC: 0	Message 37517 - Posted: 18 Mar 2010, 20:32:20 UTC - in response to Message 37511. It's Spring Break! Maybe no one is at Castle Greyskull, all in FLA. ID: 37517 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 37519 - Posted: 18 Mar 2010, 21:10:50 UTC Last modified: 18 Mar 2010, 21:11:27 UTC Although Collatz is working it's very slow, and was down some time ratlier. I seem to be finding Einstein is also down/problems pulling down new work, with downloads reporting obtaining files at 10.5kbps. Malaria is between runs, so all my projects ate pooping ATM Go away, I was asleep ID: 37519 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 37520 - Posted: 18 Mar 2010, 21:17:12 UTC - in response to Message 37519. I seem to be finding Einstein is also down/problems pulling down new work, with downloads reporting obtaining files at 10.5kbps. Yep, their new run is starting and it looks like all their mirrors are overloaded - leaving BOINC at it seems to be the best solution, as downloads do work on occasion. Just 4 files left (down from, I don't know, 50?) and I'll be able to get to crunching for them again ... ID: 37520 · Rating: 0 · rate: / Reply Quote

George Send message Joined: 19 Feb 10 Posts: 4 Credit: 1,050,320 RAC: 0	Message 37524 - Posted: 18 Mar 2010, 23:28:44 UTC Any one know how much longer its gonna be down for MAINTENANCE ? I`ve been trying for best part of 2days to upload results .No downloads either! ID: 37524 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 37528 - Posted: 19 Mar 2010, 0:34:19 UTC Your guess is as good as anyones, George. If it's not sorted Friday then we will be down for the week end as well. Go away, I was asleep ID: 37528 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,262 RAC: 263	Message 37529 - Posted: 19 Mar 2010, 0:45:07 UTC - in response to Message 37502. Thanks for coming over here in MW land to provide an explanation of the Collatz server 'response' -- now if you could answer for the unanswering MW folks that would be REALLY impressive <smile>. Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it. The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. Contacting the server when multiple WUs are completed verses after each one is done would reduce the Collatz server load considerable (70% or more!). If a machine has 4 GPUs and each can do a WU in 10 minutes, then it contacts the server every 2.5 minutes instead of once every couple hours (because of a cache size of over a hundred WUs on Collatz) if the results are returned immediately. While the user gets instant gratification, it really pounds on the server and eventually, the server gets overwhelmed. ID: 37529 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,528,262 RAC: 263	Message 37533 - Posted: 19 Mar 2010, 2:17:23 UTC One of the things that has shown up more frequently of late for MW is that when things go bump in the night, there seems to be a lack of folks 'on the job' to either notice that there is a problem or provide information regarding the problem. Perhaps this is some form of blowback on the project -- it demonstrates 'black hole' syndrome. Might be due to the focus of the research... That being said, the lack of project comment on problems which sometimes extend for more than a day or two is rather troublesome and tiresome. ID: 37533 · Rating: 0 · rate: / Reply Quote

[PST]Howard Send message Joined: 31 Aug 07 Posts: 21 Credit: 21,004,179 RAC: 0	Message 37538 - Posted: 19 Mar 2010, 6:37:09 UTC If its still Spring Break there, nowt may happen till Monday ID: 37538 · Rating: 0 · rate: / Reply Quote

George Send message Joined: 19 Feb 10 Posts: 4 Credit: 1,050,320 RAC: 0	Message 37548 - Posted: 19 Mar 2010, 10:46:26 UTC - in response to Message 37528. Thanks for reply just have to wait n`see,havin problems with other projects as well.CE LA VIE! ID: 37548 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 37552 - Posted: 19 Mar 2010, 12:49:32 UTC - in response to Message 37502. Collatz swamped? Yep. Normal is about 300 concurrent users. This a.m. there were over 700. It seems to have settled back down now but the feeder is still having trouble keeping up. Once everyone' caches are full, it should be able to handle it. The "return results immediately" setting that many have turned on in order to attempt to keep their MW cache full at 6 WUs per core on MW doesn't help any. ... THis is another of those devices that DA has made all or nothing when we have asked several times that it be made per project. Some projects want/need results as fast as possible (GPU Grid is the best example) and RRI is good for them. Other projects would prefer as you have noted that would be better in a more batch mode. There is one more reason to use RRI and that is to avoid the issue of the systems "running dry" of work. A problem that I had noted on the alpha list with logs included... but of course I am on UCB's ignore list so ... One side note, I am even having trouble getting work from GPU Grid because of the unexpected down-time of MW and the slowness of Collatz ... I have one GPU core idle as I write this as I cannot get work ... well, it does make the room quieter ... :) ID: 37552 · Rating: 0 · rate: / Reply Quote

RAMen Send message Joined: 8 Apr 08 Posts: 45 Credit: 161,943,995 RAC: 0	Message 37555 - Posted: 19 Mar 2010, 13:48:37 UTC Last modified: 19 Mar 2010, 13:52:04 UTC Up and running !! ID: 37555 · Rating: 0 · rate: / Reply Quote

Cori Send message Joined: 27 Aug 07 Posts: 647 Credit: 27,592,547 RAC: 0	Message 37557 - Posted: 19 Mar 2010, 14:18:25 UTC And now who took the last WUs without generating new ones? LOL Lovely greetings, Cori ID: 37557 · Rating: 0 · rate: / Reply Quote