Welcome to MilkyWay@home

GECCO2008 paper accepted


Advanced search

Message boards : Number crunching : GECCO2008 paper accepted
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 2271 - Posted: 15 Mar 2008, 5:22:46 UTC
Last modified: 15 Mar 2008, 5:25:04 UTC

good news everyone!

the paper we submitted to the GECCO 2008 conference was accepted, this paper is all about results we've gotten using the genetic search/simplex hybrid on BOINC -- so we couldn't have done it without you :)

for more information on the conference: GECCO 2008

and here's a link to our paper so you can all read it :)

An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing

ID: 2271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLiborA
Avatar

Send message
Joined: 15 Sep 07
Posts: 15
Credit: 9,744,830
RAC: 4,062
5 million credit badge10 year member badge
Message 2272 - Posted: 15 Mar 2008, 6:10:58 UTC

Congratulation Travis
ID: 2272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Honza

Send message
Joined: 28 Aug 07
Posts: 31
Credit: 86,152,236
RAC: 0
50 million credit badge10 year member badge
Message 2274 - Posted: 15 Mar 2008, 8:30:58 UTC

Thanks for the paper made available.

From reading it, I got the feeling that returning results ASAP is essential for both effectivity and quality of results as a project.

There are also couple of question arised from the reading.
If you find them interseting/meaningful and have time to answer...

It may be considered of benefit by users if their results shows fitness (alike Rosetta).

Another consideration: would it be possible that application itself generates double shot it's children from parents? This would reduce db size and server load. Client would report only results of a better fitness. Or we are not to use double shot but rather probabilistic simplex operator which is more effective?

Is there any data that evaulate return time and quality? I know it only takes more iteration to get the quality (as in the case of BlueGene vs. BOINC) but anyway...
EDIT: OK, I guess the question should have been rather number of evaulation rather then return time...so find the answer in paper.

How far/close are we to the limit of a single server. How close are we to the point where more island would be needed?
ID: 2274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePhiladelphia
Avatar

Send message
Joined: 9 Nov 07
Posts: 131
Credit: 180,454
RAC: 0
100 thousand credit badge10 year member badge
Message 2275 - Posted: 15 Mar 2008, 13:39:17 UTC

Nice going Travis, congratulations.

CLICK TO HELP BUILD
ID: 2275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKSMarksPsych
Avatar

Send message
Joined: 9 Sep 07
Posts: 22
Credit: 320,035
RAC: 0
100 thousand credit badge10 year member badge
Message 2288 - Posted: 16 Mar 2008, 9:38:31 UTC - in response to Message 2271.  

good news everyone!

the paper we submitted to the GECCO 2008 conference was accepted, this paper is all about results we've gotten using the genetic search/simplex hybrid on BOINC -- so we couldn't have done it without you :)

for more information on the conference: GECCO 2008

and here's a link to our paper so you can all read it :)

An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing



Don't forget to add it to the Trac page.
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 2288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeezlebub
Avatar

Send message
Joined: 18 Nov 07
Posts: 18
Credit: 38,429,435
RAC: 0
30 million credit badge10 year member badge
Message 2310 - Posted: 17 Mar 2008, 14:17:11 UTC

Nice paper Travis!

Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.

In the first phase of the algorithm (while the population
size is less than the maximum population size) the server
is being initialized and a random population is generated.
When a request work message is processed, a random pa-
rameter set is generated, and when a report work message
is processed, the population is updated
with the parameters
and the fitness of that evaluation. When enough report work
messages have been processed
, the algorithm proceeds into
the second phase which performs the actual genetic search.
In the second phase, report work will insert the new pa-
rameters and their fitness into the population but only if
they are better than the worst current member and remove
the worst member if required to keep the population size
the same. Otherwise the parameters and the result are dis-
carded. Processing a request work message will either return
a mutation or reproduction (crossover) from the population.

"There is no limit to the amount of good a
person can do if they do not care who gets credit for it."


ID: 2310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Honza

Send message
Joined: 28 Aug 07
Posts: 31
Credit: 86,152,236
RAC: 0
50 million credit badge10 year member badge
Message 2316 - Posted: 17 Mar 2008, 16:59:44 UTC - in response to Message 2310.  

I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.
Well, turn-around time would be best figure to use.
One may have fast host, finish WUs quickly but doesn't report back.

So people with fast machines and Report Result Immediately feature in Boinc core would be most effective.
ID: 2316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileChertseyAl
Avatar

Send message
Joined: 31 Aug 07
Posts: 66
Credit: 1,002,668
RAC: 0
1 million credit badge10 year member badge
Message 2317 - Posted: 17 Mar 2008, 16:59:56 UTC - in response to Message 2310.  

Nice paper Travis!

Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.

In the first phase of the algorithm (while the population
size is less than the maximum population size) the server
is being initialized and a random population is generated.
When a request work message is processed, a random pa-
rameter set is generated, and when a report work message
is processed, the population is updated
with the parameters
and the fitness of that evaluation. When enough report work
messages have been processed
, the algorithm proceeds into
the second phase which performs the actual genetic search.
In the second phase, report work will insert the new pa-
rameters and their fitness into the population but only if
they are better than the worst current member and remove
the worst member if required to keep the population size
the same. Otherwise the parameters and the result are dis-
carded. Processing a request work message will either return
a mutation or reproduction (crossover) from the population.


AS Travis explained in This Thread?

Al.
ID: 2317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCrunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
200 million credit badge10 year member badge
Message 2318 - Posted: 17 Mar 2008, 17:00:56 UTC - in response to Message 2316.  


So people with fast machines and Report Result Immediately feature in Boinc core would be most effective.


That's why i'm here :P

Join Support science! Joinc Team BOINC United now!
ID: 2318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeezlebub
Avatar

Send message
Joined: 18 Nov 07
Posts: 18
Credit: 38,429,435
RAC: 0
30 million credit badge10 year member badge
Message 2321 - Posted: 17 Mar 2008, 17:38:06 UTC - in response to Message 2316.  
Last modified: 17 Mar 2008, 17:38:57 UTC

I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.
Well, turn-around time would be best figure to use.
One may have fast host, finish WUs quickly but doesn't report back.

So people with fast machines and Report Result Immediately feature in Boinc core would be most effective.


Yes Honza that was what I was trying to say :) Thanks for stating it that way.


"There is no limit to the amount of good a
person can do if they do not care who gets credit for it."


ID: 2321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeezlebub
Avatar

Send message
Joined: 18 Nov 07
Posts: 18
Credit: 38,429,435
RAC: 0
30 million credit badge10 year member badge
Message 2322 - Posted: 17 Mar 2008, 17:44:21 UTC - in response to Message 2318.  


So people with fast machines and Report Result Immediately feature in Boinc core would be most effective.


That's why i'm here :P


We need a thread listing OS, CPU, and run time per model to compare results. Interesting to see what combo returns the best times.

"There is no limit to the amount of good a
person can do if they do not care who gets credit for it."


ID: 2322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 2395 - Posted: 19 Mar 2008, 18:21:42 UTC - in response to Message 2310.  

Nice paper Travis!

Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.

In the first phase of the algorithm (while the population
size is less than the maximum population size) the server
is being initialized and a random population is generated.
When a request work message is processed, a random pa-
rameter set is generated, and when a report work message
is processed, the population is updated
with the parameters
and the fitness of that evaluation. When enough report work
messages have been processed
, the algorithm proceeds into
the second phase which performs the actual genetic search.
In the second phase, report work will insert the new pa-
rameters and their fitness into the population but only if
they are better than the worst current member and remove
the worst member if required to keep the population size
the same. Otherwise the parameters and the result are dis-
carded. Processing a request work message will either return
a mutation or reproduction (crossover) from the population.


actually if you read into the results section, we go into a bit of depth about the effect of WU round trip time to it's effectiveness. the figures on the last two pages of the paper show how effective a WU was in improving the population not only by how it was generated but by how many WUs were recevied between when it was generated and when the result was assimilated.

what was interesting is that WUs that were returned faster tended to be more useful as a whole, but slow WUs were still quite useful. additionally, depending on how the WU was generated by the probabilistic simplex, some generation strategies could be more effective for slower computers, while others more effective for faster ones.

so in short, even if you're returning results slowly, they're useful :D
ID: 2395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 2396 - Posted: 19 Mar 2008, 18:21:45 UTC - in response to Message 2310.  

Nice paper Travis!

Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.

In the first phase of the algorithm (while the population
size is less than the maximum population size) the server
is being initialized and a random population is generated.
When a request work message is processed, a random pa-
rameter set is generated, and when a report work message
is processed, the population is updated
with the parameters
and the fitness of that evaluation. When enough report work
messages have been processed
, the algorithm proceeds into
the second phase which performs the actual genetic search.
In the second phase, report work will insert the new pa-
rameters and their fitness into the population but only if
they are better than the worst current member and remove
the worst member if required to keep the population size
the same. Otherwise the parameters and the result are dis-
carded. Processing a request work message will either return
a mutation or reproduction (crossover) from the population.


actually if you read into the results section, we go into a bit of depth about the effect of WU round trip time to it's effectiveness. the figures on the last two pages of the paper show how effective a WU was in improving the population not only by how it was generated but by how many WUs were recevied between when it was generated and when the result was assimilated.

what was interesting is that WUs that were returned faster tended to be more useful as a whole, but slow WUs were still quite useful. additionally, depending on how the WU was generated by the probabilistic simplex, some generation strategies could be more effective for slower computers, while others more effective for faster ones.

so in short, even if you're returning results slowly, they're useful :D
ID: 2396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 2397 - Posted: 19 Mar 2008, 18:25:53 UTC - in response to Message 2274.  

Thanks for the paper made available.

From reading it, I got the feeling that returning results ASAP is essential for both effectivity and quality of results as a project.


this isn't really true. slower results weren't as useful, but they were still quite useful - and in fact in some sense they are better at exploring different areas of the solution space, while the faster WUs tended to be better at converging to a minima. both are equally important in a genetic search, because if you converge to fast to a minima theres a good chance you'll miss the global best solution


There are also couple of question arised from the reading.
If you find them interseting/meaningful and have time to answer...

It may be considered of benefit by users if their results shows fitness (alike Rosetta).

we might be able to do this if we got a visualization or something along those lines working. the interesting thing is that while a lower fitness is a better result - it might not be the best for the population, as we don't want to keep the population too homogeneous.


Another consideration: would it be possible that application itself generates double shot it's children from parents? This would reduce db size and server load. Client would report only results of a better fitness. Or we are not to use double shot but rather probabilistic simplex operator which is more effective?


the next version of the application should be able to do a line search based on the initial parameter set and a direction, this should make the WUs be a lot longer as each will be doing multiple evaluations. not quite the same as what you're saying but it's along the same lines. having a WU do a genetic search on it's own would be way too computationally expensive... i don't think we'd ever get anywhere doing that.


Is there any data that evaulate return time and quality? I know it only takes more iteration to get the quality (as in the case of BlueGene vs. BOINC) but anyway...
EDIT: OK, I guess the question should have been rather number of evaulation rather then return time...so find the answer in paper.

How far/close are we to the limit of a single server. How close are we to the point where more island would be needed?


we didn't use return time vs quality, but rather the number of updates because we felt it was a more fair metric. the number of updates measures how long the population had a time to evolve since when the WU was generated, whereas having it just be time could be effected by a lot of other factors.
ID: 2397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Honza

Send message
Joined: 28 Aug 07
Posts: 31
Credit: 86,152,236
RAC: 0
50 million credit badge10 year member badge
Message 2400 - Posted: 19 Mar 2008, 18:44:21 UTC

Thanks for the answers, Travis, much appreciated.
ID: 2400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 442
Credit: 1,423,410
RAC: 109
1 million credit badge10 year member badge
Message 2408 - Posted: 19 Mar 2008, 20:42:11 UTC - in response to Message 2271.  


and here's a link to our paper so you can all read it :)

An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing


I trust all crunchers have downloaded this and are wading through it? I think I'll print it out and stick it on the back of the bathroom door...

JUST KIDDING!!

I have downloaded it and took a quick look at it. Very interesting, but it will take a serious reading to grasp it all.
Thanks for making it available to us.
Rod
ID: 2408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeezlebub
Avatar

Send message
Joined: 18 Nov 07
Posts: 18
Credit: 38,429,435
RAC: 0
30 million credit badge10 year member badge
Message 2417 - Posted: 20 Mar 2008, 0:40:52 UTC

Thanks from me also Travis.
"There is no limit to the amount of good a
person can do if they do not care who gets credit for it."


ID: 2417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : GECCO2008 paper accepted

©2020 Astroinformatics Group