Welcome to MilkyWay@home

20 workunit limit

Message boards : Number crunching : 20 workunit limit
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 2231 - Posted: 13 Mar 2008, 23:26:03 UTC

i figured i'd sticky this post because it's something that comes up pretty often.

the reason we can't have more than 20 WUs at a time (and even this is too many) has been discussed on here many times. it's even in the known issues section. what we're doing is dynamically updating a genetic search based on the results you guys return. if we feed out more than 20 workunits (even this is too many) by the time you finish crunching them, the population has moved so far away from where those points were generated that the work you've done on them is basically useless. ideally i'd really like to have the number somewhere around 5-10.

when we update the server code, you should be able to download new workunits as soon as you finish with your previous ones. if you're just complaining about wanting more WUs out there in the case when the server crashes... i don't think theres anything we can do about that as the workunits need to be dynamically generated as new ones come in.

that being said, we definitely are trying to increase the time per workunit. the model will be updated, and we'll be doing it across multiple streams of stars - you just need to be patient with us here. this is science in progress and Nate is working on how exactly to do that. also, i'm working on incorporating doing a line search into a workunit (which should increase the computation time by a factor of 10-50) or more) -- but that's maybe a week or two away because that'll take a bit of bug fixing. i'm hoping to have that in the next version of the application.
ID: 2231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tex2002ans

Send message
Joined: 20 Nov 07
Posts: 13
Credit: 1,129,285
RAC: 0
Message 2241 - Posted: 14 Mar 2008, 3:26:59 UTC

Thank you for updating everyone Travis.
ID: 2241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
Avatar

Send message
Joined: 4 Dec 07
Posts: 45
Credit: 1,257,904
RAC: 0
Message 2253 - Posted: 14 Mar 2008, 15:20:48 UTC

I think what people are really looking for is a steady amount of work coming from the project. I understand Milkyway is going through some growing pains, what project hasn't, but I've been crunching for various projects since 2001 with SETI@Home with the original SETI Classic. Keep the work units flowing and we'llkeep returning them. By the way, I kind of like the short return times. Keep up the good work.
ID: 2253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 2254 - Posted: 14 Mar 2008, 16:42:11 UTC - in response to Message 2253.  
Last modified: 14 Mar 2008, 16:42:30 UTC

By the way, I kind of like the short return times. Keep up the good work.


Yeah, me too :P By the way, how much redundancy are you using, Travis? That is, how many completed WUs (for the same parameters) do you require before you send out new ones based on that result?
ID: 2254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AnRM

Send message
Joined: 6 Mar 08
Posts: 15
Credit: 3,006,602
RAC: 0
Message 2258 - Posted: 14 Mar 2008, 18:31:03 UTC
Last modified: 14 Mar 2008, 18:43:39 UTC

Yes, thanks Travis. We Newbies to this project were not aware of the previous discussions. We'll give MW@H the highest priority because of its dynamic requirements.....I wish we could give you 100% but we must keep a backup project going (the beauty of BOINC) until things settle down a little..:)..Cheers, Rog.
ID: 2258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 2270 - Posted: 15 Mar 2008, 5:21:05 UTC - in response to Message 2254.  

By the way, I kind of like the short return times. Keep up the good work.


Yeah, me too :P By the way, how much redundancy are you using, Travis? That is, how many completed WUs (for the same parameters) do you require before you send out new ones based on that result?


actually, we dont use any redundancy. the beauty of the asynchronous genetic search we're doing is that if any of the workunits fail, it'll keep on crunching -- theres really no dependency between WUs. if you read any of our papers it goes into this :)

basically we generate initial random workunits (each corresponding to a set of parameters for our milkyway model), these get evaluated by you guys and inserted into our genetic population, ranked by how well they matched the star data.

after we have an initial population filled out by sets of random parameters, we generate new workunits from that population using mutation (taking one of the individuals in that population and randomly modifying one parameter), and reproduction (either averaging two individuals to produce an offspring, or some more funky goings on like using multiple individuals and the simplex algorithm) to generate new workunits, filling up the workunit buffer. as your results come in, they replace worse fit individuals in the population, and eventually this process converges to the best (hopefully) sets of parameters for the model we use.

this is why if everyone got say 5000 workunits initially, 4990 or so of them would probably be useless, because by the time they get returned, so many other workunits from other clients would have improved the population far away from where those were generated from.
ID: 2270 · Rating: 1 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 2277 - Posted: 15 Mar 2008, 16:12:03 UTC - in response to Message 2270.  

Aah, thanks for your reply.

if you read any of our papers it goes into this :)


Doh, I guess I should look into it eh? My knowledge of evolutionary algorithms isn't very great, and I don't know that much about physics either, so I tend to shy away from the actual papers; still, I'll have a look :)

By the way, are you familiar with the Netflix Prize? Their aim is rather different so I don't know if any of the work is relevant, but perhaps the Simon Funk method and BellKor's papers would be good reads regardless.
ID: 2277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
Avatar

Send message
Joined: 4 Dec 07
Posts: 45
Credit: 1,257,904
RAC: 0
Message 2279 - Posted: 16 Mar 2008, 0:24:19 UTC - in response to Message 2270.  
Last modified: 16 Mar 2008, 0:24:32 UTC

Good info to know on how our returns are actually used. I can see why wu's are kept down... That's okay with me.
ID: 2279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 2280 - Posted: 16 Mar 2008, 0:32:04 UTC

It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then.
ID: 2280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MB Atlanos

Send message
Joined: 2 Sep 07
Posts: 18
Credit: 180,611
RAC: 0
Message 2281 - Posted: 16 Mar 2008, 1:06:32 UTC - in response to Message 2280.  

It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then.

Or it would be 125-250 min (Coppermine-P3 and G4) up to 570-1140 min (G3) on not so recent computers.
ID: 2281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 29 Aug 07
Posts: 115
Credit: 501,600,404
RAC: 4,799
Message 2285 - Posted: 16 Mar 2008, 4:46:38 UTC - in response to Message 2280.  

It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then.


Minimum needs to me 8. 8-way boxes are just too common any more.



ID: 2285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 2286 - Posted: 16 Mar 2008, 6:25:27 UTC - in response to Message 2285.  

It would be fine to have 5-10 wu's at a time when the time is extended by say 5-10 times. They would run 55-100 min on mine then.


Minimum needs to me 8. 8-way boxes are just too common any more.



yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment).
ID: 2286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 29 Aug 07
Posts: 115
Credit: 501,600,404
RAC: 4,799
Message 2287 - Posted: 16 Mar 2008, 6:41:01 UTC - in response to Message 2286.  

yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment).


Well, there are a few on other projects, and more here soon, I hope!

http://www.xbitlabs.com/news/cpu/display/20080312115926_Intel_Plans_to_Speed_Up_Introduction_of_Nehalem_Microprocessors_Slides.html

8 cores per chip, 2 threads per core, 2 chips per machine = 32 tasks at once. =;^)



ID: 2287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 17 Nov 07
Posts: 17
Credit: 663,827
RAC: 0
Message 2296 - Posted: 16 Mar 2008, 19:50:02 UTC

Hello,

I did have 2x Compaq Proliant 8500R Data Centre Servers that are both 8 core.

They cost an absolute fortune to run with regards to electricity costs.

I chucked them both out up at the local tip.

The money i saved in electricity i bought A few Compaq Evo W6000 workstation's. These are all dual xeon HT's.(quad core equivilant)

These evo workstations out perform the proliant servers 10 fold.And electricity consumption/costs are much lower !

Happy Crunchin' John :)
ID: 2296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 2298 - Posted: 16 Mar 2008, 20:44:53 UTC - in response to Message 2286.  

yeah, i think after the application update with more expensive WUs i'll drop the limit to 16 (i dont think any machines are more than quad cpu quad core at the moment).


Has there been any progress on assigning an amount of WUs based on the amount of cores detected?
ID: 2298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Fish
Avatar

Send message
Joined: 2 Mar 08
Posts: 5
Credit: 50,383,094
RAC: 0
Message 2312 - Posted: 17 Mar 2008, 15:37:31 UTC

If these WUs are time sensitive, then why do they have a 5 day deadline? The server has been out of work for nearly 24 hours now, and there are still over 5000 WUs out in the wind. Seems to me that if new work generation depends on the old results, the deadline should be shortened. Maybe try 12 hours and see how that goes, at least if the WU ends up on a duffer, it would time out and possibly get sent to a faster, more reliable host.




Fish
ID: 2312 · Rating: 2 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ChertseyAl
Avatar

Send message
Joined: 31 Aug 07
Posts: 66
Credit: 1,002,668
RAC: 0
Message 2327 - Posted: 17 Mar 2008, 19:31:34 UTC - in response to Message 2312.  

The server has been out of work for nearly 24 hours now, and there are still over 5000 WUs out in the wind. Seems to me that if new work generation depends on the old results, the deadline should be shortened.


And the rate that these are being cleared is very slow.

Could we use server-side aborts on 'old' WUs? Yes, it's annoying when your cache/stash gets aborted, but I can't see any point on wasting cycles on useless work.

If this isn't feasible, maybe BOINC just isn't the right platform? Maybe the work driving the next generation work should be non-BOINC and other work farmed out to a slower BOINC network.

Maybe try 12 hours and see how that goes, at least if the WU ends up on a duffer, it would time out and possibly get sent to a faster, more reliable host.


Tight deadlines are a nightmare. MW will end up in High Pri permanently and other projects will be starved.

Maybe if each WU carried work from a number of different genetic seeds and ran for a while longer?

When MW is running, it's fine for me. But my slow host may take 5 hours to turn around the last WU in a batch :(

I can't help but feel that server-side aborts are the way to go, but machines that are not on a permanent net connection are still going to waste work :(

Al.
ID: 2327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 2330 - Posted: 17 Mar 2008, 20:12:42 UTC - in response to Message 2327.  

Perhaps wasting work, or even the lax deadlines aren't that much of a problem. The server could send out WUs in progress again to new users if it hasn't gotten results after a certain amount of time. The result that arrives later will simply be ignored, but credit still granted. Since computers are fickle and the WUs are short, this could mean the difference between a few hours and a few days...
ID: 2330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ChertseyAl
Avatar

Send message
Joined: 31 Aug 07
Posts: 66
Credit: 1,002,668
RAC: 0
Message 2331 - Posted: 17 Mar 2008, 20:20:28 UTC - in response to Message 2327.  

Replying to myself ... Well, at least I listen to myself sometimes ;) ...

The remaining WUs are being cleared at about 132 WU/hour.

I usually clear about 15 WU/hour.

So we've either got 9 active 'consumers' or a load of totally irrelevant results waiting to be returned from hosts bunged-up with other work.

Let's abort the old WUs from the server and make the science count.

I no longer run a number of projects because they let stale work trickle in way past it's sell-by date, only to be discarded.

Sorry, I went all *serious* for a moment there ;)

Al.
ID: 2331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - JohnR

Send message
Joined: 10 Mar 08
Posts: 7
Credit: 60,169,291
RAC: 0
Message 2332 - Posted: 17 Mar 2008, 20:23:36 UTC

Why not change from 20 WU's per host to 5 or 10 WU's per core. Also if a WU is not returned in time then reduce the next batch by the number not returned or completed without errors. CPDN has this and many computers are reduced to only getting 1 WU each day (or at a time).
The duffers would then be restricted to only 1 WU on hold at any time after a couple of days.
ID: 2332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : 20 workunit limit

©2024 Astroinformatics Group