Message boards :
Number crunching :
20 workunit limit
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
i figured i'd sticky this post because it's something that comes up pretty often. the reason we can't have more than 20 WUs at a time (and even this is too many) has been discussed on here many times. it's even in the known issues section. what we're doing is dynamically updating a genetic search based on the results you guys return. if we feed out more than 20 workunits (even this is too many) by the time you finish crunching them, the population has moved so far away from where those points were generated that the work you've done on them is basically useless. ideally i'd really like to have the number somewhere around 5-10. when we update the server code, you should be able to download new workunits as soon as you finish with your previous ones. if you're just complaining about wanting more WUs out there in the case when the server crashes... i don't think theres anything we can do about that as the workunits need to be dynamically generated as new ones come in. that being said, we definitely are trying to increase the time per workunit. the model will be updated, and we'll be doing it across multiple streams of stars - you just need to be patient with us here. this is science in progress and Nate is working on how exactly to do that. also, i'm working on incorporating doing a line search into a workunit (which should increase the computation time by a factor of 10-50) or more) -- but that's maybe a week or two away because that'll take a bit of bug fixing. i'm hoping to have that in the next version of the application. |
Send message Joined: 20 Nov 07 Posts: 13 Credit: 1,129,285 RAC: 0 |
Thank you for updating everyone Travis. |
Send message Joined: 4 Dec 07 Posts: 45 Credit: 1,257,904 RAC: 0 |
I think what people are really looking for is a steady amount of work coming from the project. I understand Milkyway is going through some growing pains, what project hasn't, but I've been crunching for various projects since 2001 with SETI@Home with the original SETI Classic. Keep the work units flowing and we'llkeep returning them. By the way, I kind of like the short return times. Keep up the good work. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
By the way, I kind of like the short return times. Keep up the good work. Yeah, me too :P By the way, how much redundancy are you using, Travis? That is, how many completed WUs (for the same parameters) do you require before you send out new ones based on that result? |
Send message Joined: 6 Mar 08 Posts: 15 Credit: 3,006,602 RAC: 0 |
Yes, thanks Travis. We Newbies to this project were not aware of the previous discussions. We'll give MW@H the highest priority because of its dynamic requirements.....I wish we could give you 100% but we must keep a backup project going (the beauty of BOINC) until things settle down a little..:)..Cheers, Rog. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
By the way, I kind of like the short return times. Keep up the good work. actually, we dont use any redundancy. the beauty of the asynchronous genetic search we're doing is that if any of the workunits fail, it'll keep on crunching -- theres really no dependency between WUs. if you read any of our papers it goes into this :) basically we generate initial random workunits (each corresponding to a set of parameters for our milkyway model), these get evaluated by you guys and inserted into our genetic population, ranked by how well they matched the star data. after we have an initial population filled out by sets of random parameters, we generate new workunits from that population using mutation (taking one of the individuals in that population and randomly modifying one parameter), and reproduction (either averaging two individuals to produce an offspring, or some more funky goings on like using multiple individuals and the simplex algorithm) to generate new workunits, filling up the workunit buffer. as your results come in, they replace worse fit individuals in the population, and eventually this process converges to the best (hopefully) sets of parameters for the model we use. this is why if everyone got say 5000 workunits initially, 4990 or so of them would probably be useless, because by the time they get returned, so many other workunits from other clients would have improved the population far away from where those were generated from. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
Aah, thanks for your reply. if you read any of our papers it goes into this :) Doh, I guess I should look into it eh? My knowledge of evolutionary algorithms isn't very great, and I don't know that much about physics either, so I tend to shy away from the actual papers; still, I'll have a look :) By the way, are you familiar with the Netflix Prize? Their aim is rather different so I don't know if any of the work is relevant, but perhaps the Simon Funk method and BellKor's papers would be good reads regardless. |
Send message Joined: 4 Dec 07 Posts: 45 Credit: 1,257,904 RAC: 0 |
Good info to know on how our returns are actually used. I can see why wu's are kept down... That's okay with me. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then. |
Send message Joined: 2 Sep 07 Posts: 18 Credit: 180,611 RAC: 0 |
It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then. Or it would be 125-250 min (Coppermine-P3 and G4) up to 570-1140 min (G3) on not so recent computers. |
Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 203 |
|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then. yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment). |
Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 203 |
yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment). Well, there are a few on other projects, and more here soon, I hope! http://www.xbitlabs.com/news/cpu/display/20080312115926_Intel_Plans_to_Speed_Up_Introduction_of_Nehalem_Microprocessors_Slides.html 8 cores per chip, 2 threads per core, 2 chips per machine = 32 tasks at once. =;^) |
Send message Joined: 17 Nov 07 Posts: 17 Credit: 663,827 RAC: 0 |
Hello, I did have 2x Compaq Proliant 8500R Data Centre Servers that are both 8 core. They cost an absolute fortune to run with regards to electricity costs. I chucked them both out up at the local tip. The money i saved in electricity i bought A few Compaq Evo W6000 workstation's. These are all dual xeon HT's.(quad core equivilant) These evo workstations out perform the proliant servers 10 fold.And electricity consumption/costs are much lower ! Happy Crunchin' John :) |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment). Has there been any progress on assigning an amount of WUs based on the amount of cores detected? |
Send message Joined: 2 Mar 08 Posts: 5 Credit: 50,383,094 RAC: 0 |
If these WUs are time sensitive, then why do they have a 5 day deadline? The server has been out of work for nearly 24 hours now, and there are still over 5000 WUs out in the wind. Seems to me that if new work generation depends on the old results, the deadline should be shortened. Maybe try 12 hours and see how that goes, at least if the WU ends up on a duffer, it would time out and possibly get sent to a faster, more reliable host. Fish |
Send message Joined: 31 Aug 07 Posts: 66 Credit: 1,002,668 RAC: 0 |
The server has been out of work for nearly 24 hours now, and there are still over 5000 WUs out in the wind. Seems to me that if new work generation depends on the old results, the deadline should be shortened. And the rate that these are being cleared is very slow. Could we use server-side aborts on 'old' WUs? Yes, it's annoying when your cache/stash gets aborted, but I can't see any point on wasting cycles on useless work. If this isn't feasible, maybe BOINC just isn't the right platform? Maybe the work driving the next generation work should be non-BOINC and other work farmed out to a slower BOINC network. Maybe try 12 hours and see how that goes, at least if the WU ends up on a duffer, it would time out and possibly get sent to a faster, more reliable host. Tight deadlines are a nightmare. MW will end up in High Pri permanently and other projects will be starved. Maybe if each WU carried work from a number of different genetic seeds and ran for a while longer? When MW is running, it's fine for me. But my slow host may take 5 hours to turn around the last WU in a batch :( I can't help but feel that server-side aborts are the way to go, but machines that are not on a permanent net connection are still going to waste work :( Al. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
Perhaps wasting work, or even the lax deadlines aren't that much of a problem. The server could send out WUs in progress again to new users if it hasn't gotten results after a certain amount of time. The result that arrives later will simply be ignored, but credit still granted. Since computers are fickle and the WUs are short, this could mean the difference between a few hours and a few days... |
Send message Joined: 31 Aug 07 Posts: 66 Credit: 1,002,668 RAC: 0 |
Replying to myself ... Well, at least I listen to myself sometimes ;) ... The remaining WUs are being cleared at about 132 WU/hour. I usually clear about 15 WU/hour. So we've either got 9 active 'consumers' or a load of totally irrelevant results waiting to be returned from hosts bunged-up with other work. Let's abort the old WUs from the server and make the science count. I no longer run a number of projects because they let stale work trickle in way past it's sell-by date, only to be discarded. Sorry, I went all *serious* for a moment there ;) Al. |
Send message Joined: 10 Mar 08 Posts: 7 Credit: 60,169,291 RAC: 0 |
Why not change from 20 WU's per host to 5 or 10 WU's per core. Also if a WU is not returned in time then reduce the next batch by the number not returned or completed without errors. CPDN has this and many computers are reduced to only getting 1 WU each day (or at a time). The duffers would then be restricted to only 1 WU on hold at any time after a couple of days. |
©2024 Astroinformatics Group