rpi_logo
bypassing server set cache limits
bypassing server set cache limits
log in

Advanced search

Message boards : News : bypassing server set cache limits

1 · 2 · Next
Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 45739 - Posted: 24 Jan 2011, 19:34:21 UTC

While we appreciate everyone wanting to crunch more MilkyWay@Home by increasing their cache limits; this is part of the reason why we've had so many server problems lately with an unresponsive validator. Mainly, our machine/database is not fast enough to keep up with the additional amount of workunits this is causing in the database. So if anyone is modifying their BOINC client to artificially increase their cache we're asking you to stop so the project will be more stable (until we can further improve our hardware). A few of the really offending clients (who have cached 1k+ workunits) are being banned as they haven't responded to us, and they're hurting everyones ability to crunch the project as a whole.



So in short, we need you guys to work with us as we're working with limited hardware that can't handle more than 500k+ workunits at a time -- our cache is low partially for this reason. Second, as we've said in a bunch of previous threads in the past, due to the nature of the science we're doing at the project we need a low cache because this really improves the quality of the work you guys report.



As you (hopefully) know by now, we search for structure and try to optimize parameters to fit that structure within the Milky Way galaxy. And lately we've been also doing N-Body simulation of the formation of those structures. What your workunits are doing is trying to find the optimal set of parameters for those N-Body simulations to end up best representing our sky survey data or to fit those different structures (like dwarf galaxy tidal streams) from that data.



To do this, we use strategies which mimic evolution. The server keeps track of a population of good potential solutions to these problems, and then generates workunits by mutating some solutions, and using others to create offspring. You guys crunch the data and return the result -- if it's a good one we insert it into the population which improves as a whole. Over time, we get very very good solutions which aren't really possible using other deterministic approaches.



If people have large caches, that means the work they're crunching can come from very old versions of those populations which have since evolved quite a bit away from where they were when the user filled up their cache. When they return the results there's a lower chance for the results to improve the population of results we're currently working with.



So that's why our cache is so low, and we'd really appreciate it if you worked with us on this. There are other great BOINC projects out there which can help fill in missing crunch time when we go down, and the BOINC client can definitely handle running more than one at a time. So it might not be too bad to explore some of the other great research going on out there. :)



Thanks again for your time and understanding,

--Travis
____________

Brickhead
Avatar
Send message
Joined: 20 Mar 08
Posts: 108
Credit: 2,541,622,319
RAC: 217,879

Message 45744 - Posted: 24 Jan 2011, 22:06:09 UTC - in response to Message 45739.

You have two good points, Travis, but also two different ones.

In order to prevent the clients from crunching old data, the report deadline ought to be short. Presently, that's 8 days.

In order to minimise the number of work units the server needs to keep track of, you can limit the number of cached work units per core. Presently that limit is 6.

My fastest rig (as an example) is allowed a cache of 72 WUs. That's equivalent to ~23 minutes. When that runs out, it gets work from a backup project. Hours and hours of work (much more than my set preferences suggest, for some reason). While this lot is worked on, new MW WUs usually trickle in after a few minutes, just sitting there getting older before BOINC is done with the other project's work. (This FIFO behaviour might change in a future release of the BOINC core client.)

    A short deadline is good for keeping the crunched data recent.
    A small per-core cache is good (not really, but there aren't many better ways) for limiting the server load. But for keeping crunched data recent, it's actually a little counter-productive.

As long as the N-body WUs are CPU-only, how about shortening the deadline on the separation WUs (like 1/4 or 1/8 of the current value) and reserving them for GPUs? If you also increase the separation WUs' per-core limit by 1, GPU caches will run out noticeably less often. Admittedly this will increase the server load slightly when everyone behaves, but it will also enable BOINC's built-in mechanisms to work better towards discouraging ridiculous caches on those misbehaving clients.
____________

Jonathan Brier
Avatar
Send message
Joined: 28 Nov 07
Posts: 14
Credit: 87,450,800
RAC: 5,461

Message 45763 - Posted: 25 Jan 2011, 14:54:49 UTC - in response to Message 45739.

I am glad to see this being posted in the news feed. Not everyone reads the forums often or at all. Providing announcements here is helpful for highlighting important issues such as this. Thank you for the update.
____________

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0

Message 45805 - Posted: 26 Jan 2011, 21:55:40 UTC

Travis,
What do you think about this post:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2170&nowrap=true#45787

____________

Cliff
Send message
Joined: 26 Nov 09
Posts: 32
Credit: 54,169,840
RAC: 423

Message 45820 - Posted: 27 Jan 2011, 14:35:26 UTC

i have two GPU's in one system... i just would like them to have a back up of 12 hours. i work between 3 projects. if i want MW to have any work i have to not allow GPU work from other projects because the other two will down load 1/2 day work and MW will only down load 24 unit per day (this is like 2 hours of work). i'm lazy i want the systme to work with out me having to turn off and on projects. i will work with what i've got and support MW.

Bobsama
Send message
Joined: 21 Mar 10
Posts: 5
Credit: 19,414,466
RAC: 0

Message 45827 - Posted: 27 Jan 2011, 17:50:53 UTC
Last modified: 27 Jan 2011, 17:56:15 UTC

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0

Message 45830 - Posted: 27 Jan 2011, 19:15:27 UTC - in response to Message 45827.

Henry Bundy
Send message
Joined: 2 Mar 10
Posts: 4
Credit: 7,304,146
RAC: 0

Message 45914 - Posted: 2 Feb 2011, 0:33:44 UTC

I'm pretty sure that guys like Bill Gates can toss in a few million to solve your hardware problem. There's more to this project than screwing around with code. Get off your butt and find the funding to make this project work!

Fred J. Verster
Send message
Joined: 22 Apr 09
Posts: 38
Credit: 27,377,932
RAC: 0

Message 45958 - Posted: 4 Feb 2011, 10:07:09 UTC - in response to Message 45914.

Haven't seen MW work in some time now, it looks like projects are basically underfunded and, like SETI and some more, crunchers need not only 'donate' their free computer cycles but also money for new servers?!

Since the start of CUDA/CAL/OpenCL GPU crunching, only a few people are needed to 'crunch', when looking at the huge difference in RAC or Average Work Throughput, between CPU only and GPU crunchers.

Maybe the whole concept of Distributed Computing, has to be reviewed.
Changes in hardware (Moore's Law), and thus software are changing that fast,
some projects are probably better of when doing the crunching on a few Mega-GPU-Computers, in order to keep their server problems under control.

Just my 2 (Euro) cents..............

____________

Knight Who says Ni

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0

Message 45959 - Posted: 4 Feb 2011, 10:38:40 UTC

I've got a cache full of wu's...

The more crunchers the more science can get done and when more scientists comprehend the computing resources available to them the more projects will eventuate. All up Distributed Computing is a massive success!

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0

Message 45962 - Posted: 4 Feb 2011, 13:58:54 UTC - in response to Message 45958.

Haven't seen MW work in some time now

See the reply I left you on the Collatz board, it should solve some of your problems.

Profile Paul D. Buck
Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0

Message 46064 - Posted: 8 Feb 2011, 13:39:45 UTC - in response to Message 45744.

When that runs out, it gets work from a backup project. Hours and hours of work (much more than my set preferences suggest, for some reason). While this lot is worked on, new MW WUs usually trickle in after a few minutes, just sitting there getting older before BOINC is done with the other project's work. (This FIFO behaviour might change in a future release of the BOINC core client.)

FIFO for the GPU is finally gone in the 6.12.x series...

popo666
Send message
Joined: 18 Dec 09
Posts: 9
Credit: 125,636,642
RAC: 70,585

Message 46099 - Posted: 9 Feb 2011, 19:40:53 UTC

I agree that this project requres a new population every few hours, but there is a great gap betweet GPUs and I have 5870 that runs out of work in less than 18 minutes (if no work is sent). I dont want to participate in math projects, so my card is idle in last month for 70% of time. Some projects have ability to chose a longer WUs, but I didnt find this feature in settings
What can I do to reduce this idle time of my card? I cant find any suggestion here on forums.

Profile arkayn
Avatar
Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0

Message 46123 - Posted: 10 Feb 2011, 3:06:50 UTC

Open CL Multibeam app for SETI.
http://lunatics.kwsn.net/index.php?module=Downloads;catd=43
____________

Doctor
Send message
Joined: 11 Nov 09
Posts: 17
Credit: 7,324,208
RAC: 0

Message 46192 - Posted: 12 Feb 2011, 19:09:50 UTC

I have a task on seti and I'm worried it might expire before I can download it, their servers haven't been liking me for awhile.

Bill Walker
Send message
Joined: 19 Aug 09
Posts: 23
Credit: 631,303
RAC: 0

Message 46223 - Posted: 13 Feb 2011, 19:06:18 UTC - in response to Message 46192.

Docter, if you post this question in Number Crunching at SET@Home they will probably remind you that when servers are down (like now) that expired tasks can still get credit once the servers are fixed.

If it is stuck in download, don't worry, S@H will sort things out.
____________

ExtraTerrestrial Apes
Avatar
Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0

Message 46363 - Posted: 26 Feb 2011, 13:59:04 UTC

Travis,

thanks for posting this in the news. It should give many people a better understanding of how MW works and is supposed to work. However, I've got a nice (IMO) idea how to improve things further.

Let me first explain my 2 personal hosts:

Nr. 1 has an i7 and a HD4870. It's allowed 42 units (HT on, 7 cores used) and needs ~175s GPU time per WU, i.e. a cache of 2h.

Nr. 2 has a C2Q and a pimped HD6950. It's allowed 24 units and needs ~65s GPU time per WU, i.e. a cache of 26 mins.

The faster host is allowed to cache much less work, which totally doesn't make sense. If someone ran the work on an HT-enabled notebook i7 that cache might last for an entire day. This clearly contradicts what you want to have.

Also I observed that host Nr. 2 runs out of MW work rather often, way more often than Nr. 1. With such a small cache any small glitch, be it on my or your side, will cause a backup project to kick in and keep the machine busy with (almost useless) non MW work. And as someone else said, after a short time 24 new MW WUs will arrive and stay in the cache for hours, waiting for their turn. Again, this is not what you want.

So may I make the following suggestion? Use the "Average turnaround time" as a basis to calculate the amount of allowed WUs per host. It's a value readily available to the server, as it's already in the database. A setting of no more than 1h should probably work. This will solve several problems for you:

- it's independent of CPU count (which had been meaningless ever after the first GPUs started to appear here)
- it scales automatically with the number and speed of GPUs (or whatever co-processor will be used in the future)
- it allows fast hosts to contribute with less interruptions
- it prevents slow hosts from caching days of CPU work
- it prevents excessive caching by modified BOINC clients - no matter how much they request, if they can't return in time, they won't get more (and less next time)

Best regards,
MrS
____________
Scanning for our furry friends since Jan 2002

[boinc.at] Nowi
Send message
Joined: 22 Mar 09
Posts: 99
Credit: 503,422,495
RAC: 0

Message 46364 - Posted: 26 Feb 2011, 14:13:00 UTC

Profile Farscape
Avatar
Send message
Joined: 6 Apr 09
Posts: 26
Credit: 846,920,685
RAC: 9

Message 46367 - Posted: 26 Feb 2011, 15:21:20 UTC
Last modified: 26 Feb 2011, 15:22:22 UTC

Thank you MrS (Extra Terrestrial Apes) for summarizing. Your idea makes far more sense than the current limit of 6 work units per (cpu) core. I have the exact problem you described. Because of the physical size of the 5970, the only computer case that the monster would fit into has a single core processor (old MB) allowing only 6 work units at a time. With a dual GPU video card - unless everything works perfectly at MW - I run out of WUs in +/- 9 minutes.....

I have set DNETC as my backup project on the machine and most days it does more DNETC work than MW!

I think your idea is great! I also vote for you for president!!!

ExtraTerrestrial Apes
Avatar
Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0

Message 46380 - Posted: 27 Feb 2011, 15:32:47 UTC

Thanks for the flowers, guys :)

MrS
____________
Scanning for our furry friends since Jan 2002

1 · 2 · Next
Post to thread

Message boards : News : bypassing server set cache limits


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group