Welcome to MilkyWay@home

bypassing server set cache limits


Advanced search

Message boards : News : bypassing server set cache limits
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 45739 - Posted: 24 Jan 2011, 19:34:21 UTC

While we appreciate everyone wanting to crunch more MilkyWay@Home by increasing their cache limits; this is part of the reason why we've had so many server problems lately with an unresponsive validator. Mainly, our machine/database is not fast enough to keep up with the additional amount of workunits this is causing in the database. So if anyone is modifying their BOINC client to artificially increase their cache we're asking you to stop so the project will be more stable (until we can further improve our hardware). A few of the really offending clients (who have cached 1k+ workunits) are being banned as they haven't responded to us, and they're hurting everyones ability to crunch the project as a whole.



So in short, we need you guys to work with us as we're working with limited hardware that can't handle more than 500k+ workunits at a time -- our cache is low partially for this reason. Second, as we've said in a bunch of previous threads in the past, due to the nature of the science we're doing at the project we need a low cache because this really improves the quality of the work you guys report.



As you (hopefully) know by now, we search for structure and try to optimize parameters to fit that structure within the Milky Way galaxy. And lately we've been also doing N-Body simulation of the formation of those structures. What your workunits are doing is trying to find the optimal set of parameters for those N-Body simulations to end up best representing our sky survey data or to fit those different structures (like dwarf galaxy tidal streams) from that data.



To do this, we use strategies which mimic evolution. The server keeps track of a population of good potential solutions to these problems, and then generates workunits by mutating some solutions, and using others to create offspring. You guys crunch the data and return the result -- if it's a good one we insert it into the population which improves as a whole. Over time, we get very very good solutions which aren't really possible using other deterministic approaches.



If people have large caches, that means the work they're crunching can come from very old versions of those populations which have since evolved quite a bit away from where they were when the user filled up their cache. When they return the results there's a lower chance for the results to improve the population of results we're currently working with.



So that's why our cache is so low, and we'd really appreciate it if you worked with us on this. There are other great BOINC projects out there which can help fill in missing crunch time when we go down, and the BOINC client can definitely handle running more than one at a time. So it might not be too bad to explore some of the other great research going on out there. :)



Thanks again for your time and understanding,

--Travis
ID: 45739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brickhead
Avatar

Send message
Joined: 20 Mar 08
Posts: 108
Credit: 2,562,515,681
RAC: 0
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 45744 - Posted: 24 Jan 2011, 22:06:09 UTC - in response to Message 45739.  

You have two good points, Travis, but also two different ones.

In order to prevent the clients from crunching old data, the report deadline ought to be short. Presently, that's 8 days.

In order to minimise the number of work units the server needs to keep track of, you can limit the number of cached work units per core. Presently that limit is 6.

My fastest rig (as an example) is allowed a cache of 72 WUs. That's equivalent to ~23 minutes. When that runs out, it gets work from a backup project. Hours and hours of work (much more than my set preferences suggest, for some reason). While this lot is worked on, new MW WUs usually trickle in after a few minutes, just sitting there getting older before BOINC is done with the other project's work. (This FIFO behaviour might change in a future release of the BOINC core client.)
    A short deadline is good for keeping the crunched data recent.
    A small per-core cache is good (not really, but there aren't many better ways) for limiting the server load. But for keeping crunched data recent, it's actually a little counter-productive.

As long as the N-body WUs are CPU-only, how about shortening the deadline on the separation WUs (like 1/4 or 1/8 of the current value) and reserving them for GPUs? If you also increase the separation WUs' per-core limit by 1, GPU caches will run out noticeably less often. Admittedly this will increase the server load slightly when everyone behaves, but it will also enable BOINC's built-in mechanisms to work better towards discouraging ridiculous caches on those misbehaving clients.


ID: 45744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJonathan Brier
Avatar

Send message
Joined: 28 Nov 07
Posts: 14
Credit: 89,123,543
RAC: 16,184
50 million credit badge10 year member badge
Message 45763 - Posted: 25 Jan 2011, 14:54:49 UTC - in response to Message 45739.  

I am glad to see this being posted in the news feed. Not everyone reads the forums often or at all. Providing announcements here is helpful for highlighting important issues such as this. Thank you for the update.
ID: 45763 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CTAPbIi

Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
50 million credit badge9 year member badge
Message 45805 - Posted: 26 Jan 2011, 21:55:40 UTC

Travis,
What do you think about this post:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2170&nowrap=true#45787

ID: 45805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cliff

Send message
Joined: 26 Nov 09
Posts: 32
Credit: 56,670,099
RAC: 4,860
50 million credit badge10 year member badge
Message 45820 - Posted: 27 Jan 2011, 14:35:26 UTC

i have two GPU's in one system... i just would like them to have a back up of 12 hours. i work between 3 projects. if i want MW to have any work i have to not allow GPU work from other projects because the other two will down load 1/2 day work and MW will only down load 24 unit per day (this is like 2 hours of work). i'm lazy i want the systme to work with out me having to turn off and on projects. i will work with what i've got and support MW.

ID: 45820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bobsama

Send message
Joined: 21 Mar 10
Posts: 5
Credit: 19,414,466
RAC: 0
10 million credit badge9 year member badge
Message 45827 - Posted: 27 Jan 2011, 17:50:53 UTC
Last modified: 27 Jan 2011, 17:56:15 UTC


ID: 45827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 45830 - Posted: 27 Jan 2011, 19:15:27 UTC - in response to Message 45827.  


Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 45830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henry Bundy

Send message
Joined: 2 Mar 10
Posts: 4
Credit: 7,304,146
RAC: 0
5 million credit badge9 year member badgeextraordinary contributions badge
Message 45914 - Posted: 2 Feb 2011, 0:33:44 UTC

I'm pretty sure that guys like Bill Gates can toss in a few million to solve your hardware problem. There's more to this project than screwing around with code. Get off your butt and find the funding to make this project work!
ID: 45914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fred J. Verster

Send message
Joined: 22 Apr 09
Posts: 38
Credit: 27,377,932
RAC: 0
20 million credit badge10 year member badge
Message 45958 - Posted: 4 Feb 2011, 10:07:09 UTC - in response to Message 45914.  

Haven't seen MW work in some time now, it looks like projects are basically underfunded and, like SETI and some more, crunchers need not only 'donate' their free computer cycles but also money for new servers?!

Since the start of CUDA/CAL/OpenCL GPU crunching, only a few people are needed to 'crunch', when looking at the huge difference in RAC or Average Work Throughput, between CPU only and GPU crunchers.

Maybe the whole concept of Distributed Computing, has to be reviewed.
Changes in hardware (Moore's Law), and thus software are changing that fast,
some projects are probably better of when doing the crunching on a few Mega-GPU-Computers, in order to keep their server problems under control.

Just my 2 (Euro) cents..............


Knight Who says Ni
ID: 45958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 45959 - Posted: 4 Feb 2011, 10:38:40 UTC

I've got a cache full of wu's...

The more crunchers the more science can get done and when more scientists comprehend the computing resources available to them the more projects will eventuate. All up Distributed Computing is a massive success!
ID: 45959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 45962 - Posted: 4 Feb 2011, 13:58:54 UTC - in response to Message 45958.  

Haven't seen MW work in some time now

See the reply I left you on the Collatz board, it should solve some of your problems.
ID: 45962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 46064 - Posted: 8 Feb 2011, 13:39:45 UTC - in response to Message 45744.  

When that runs out, it gets work from a backup project. Hours and hours of work (much more than my set preferences suggest, for some reason). While this lot is worked on, new MW WUs usually trickle in after a few minutes, just sitting there getting older before BOINC is done with the other project's work. (This FIFO behaviour might change in a future release of the BOINC core client.)

FIFO for the GPU is finally gone in the 6.12.x series...
ID: 46064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popo666

Send message
Joined: 18 Dec 09
Posts: 9
Credit: 176,672,890
RAC: 207,230
100 million credit badge9 year member badge
Message 46099 - Posted: 9 Feb 2011, 19:40:53 UTC

I agree that this project requres a new population every few hours, but there is a great gap betweet GPUs and I have 5870 that runs out of work in less than 18 minutes (if no work is sent). I dont want to participate in math projects, so my card is idle in last month for 70% of time. Some projects have ability to chose a longer WUs, but I didnt find this feature in settings
What can I do to reduce this idle time of my card? I cant find any suggestion here on forums.
ID: 46099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge10 year member badge
Message 46123 - Posted: 10 Feb 2011, 3:06:50 UTC

ID: 46123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Doctor

Send message
Joined: 11 Nov 09
Posts: 17
Credit: 7,324,208
RAC: 0
5 million credit badge10 year member badge
Message 46192 - Posted: 12 Feb 2011, 19:09:50 UTC

I have a task on seti and I'm worried it might expire before I can download it, their servers haven't been liking me for awhile.
ID: 46192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill Walker

Send message
Joined: 19 Aug 09
Posts: 23
Credit: 631,303
RAC: 0
500 thousand credit badge10 year member badge
Message 46223 - Posted: 13 Feb 2011, 19:06:18 UTC - in response to Message 46192.  

Docter, if you post this question in Number Crunching at SET@Home they will probably remind you that when servers are down (like now) that expired tasks can still get credit once the servers are fixed.

If it is stuck in download, don't worry, S@H will sort things out.
ID: 46223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 46363 - Posted: 26 Feb 2011, 13:59:04 UTC

Travis,

thanks for posting this in the news. It should give many people a better understanding of how MW works and is supposed to work. However, I've got a nice (IMO) idea how to improve things further.

Let me first explain my 2 personal hosts:

Nr. 1 has an i7 and a HD4870. It's allowed 42 units (HT on, 7 cores used) and needs ~175s GPU time per WU, i.e. a cache of 2h.

Nr. 2 has a C2Q and a pimped HD6950. It's allowed 24 units and needs ~65s GPU time per WU, i.e. a cache of 26 mins.

The faster host is allowed to cache much less work, which totally doesn't make sense. If someone ran the work on an HT-enabled notebook i7 that cache might last for an entire day. This clearly contradicts what you want to have.

Also I observed that host Nr. 2 runs out of MW work rather often, way more often than Nr. 1. With such a small cache any small glitch, be it on my or your side, will cause a backup project to kick in and keep the machine busy with (almost useless) non MW work. And as someone else said, after a short time 24 new MW WUs will arrive and stay in the cache for hours, waiting for their turn. Again, this is not what you want.

So may I make the following suggestion? Use the "Average turnaround time" as a basis to calculate the amount of allowed WUs per host. It's a value readily available to the server, as it's already in the database. A setting of no more than 1h should probably work. This will solve several problems for you:

- it's independent of CPU count (which had been meaningless ever after the first GPUs started to appear here)
- it scales automatically with the number and speed of GPUs (or whatever co-processor will be used in the future)
- it allows fast hosts to contribute with less interruptions
- it prevents slow hosts from caching days of CPU work
- it prevents excessive caching by modified BOINC clients - no matter how much they request, if they can't return in time, they won't get more (and less next time)

Best regards,
MrS
Scanning for our furry friends since Jan 2002
ID: 46363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[boinc.at] Nowi

Send message
Joined: 22 Mar 09
Posts: 99
Credit: 503,422,495
RAC: 0
500 million credit badge10 year member badge
Message 46364 - Posted: 26 Feb 2011, 14:13:00 UTC


ID: 46364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileFarscape
Avatar

Send message
Joined: 6 Apr 09
Posts: 26
Credit: 853,553,519
RAC: 75,090
500 million credit badge10 year member badge
Message 46367 - Posted: 26 Feb 2011, 15:21:20 UTC
Last modified: 26 Feb 2011, 15:22:22 UTC

Thank you MrS (Extra Terrestrial Apes) for summarizing. Your idea makes far more sense than the current limit of 6 work units per (cpu) core. I have the exact problem you described. Because of the physical size of the 5970, the only computer case that the monster would fit into has a single core processor (old MB) allowing only 6 work units at a time. With a dual GPU video card - unless everything works perfectly at MW - I run out of WUs in +/- 9 minutes.....

I have set DNETC as my backup project on the machine and most days it does more DNETC work than MW!

I think your idea is great! I also vote for you for president!!!
ID: 46367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 46380 - Posted: 27 Feb 2011, 15:32:47 UTC

Thanks for the flowers, guys :)

MrS
Scanning for our furry friends since Jan 2002
ID: 46380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : bypassing server set cache limits

©2019 Astroinformatics Group