GECCO2008 paper accepted

Author	Message
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 2271 - Posted: 15 Mar 2008, 5:22:46 UTC Last modified: 15 Mar 2008, 5:25:04 UTC good news everyone! the paper we submitted to the GECCO 2008 conference was accepted, this paper is all about results we've gotten using the genetic search/simplex hybrid on BOINC -- so we couldn't have done it without you :) for more information on the conference: GECCO 2008 and here's a link to our paper so you can all read it :) An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing ID: 2271 · Rating: 0 · rate: / Reply Quote

LiborA Send message Joined: 15 Sep 07 Posts: 15 Credit: 9,818,265 RAC: 0	Message 2272 - Posted: 15 Mar 2008, 6:10:58 UTC Congratulation Travis ID: 2272 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 28 Aug 07 Posts: 31 Credit: 86,152,236 RAC: 0	Message 2274 - Posted: 15 Mar 2008, 8:30:58 UTC Thanks for the paper made available. From reading it, I got the feeling that returning results ASAP is essential for both effectivity and quality of results as a project. There are also couple of question arised from the reading. If you find them interseting/meaningful and have time to answer... It may be considered of benefit by users if their results shows fitness (alike Rosetta). Another consideration: would it be possible that application itself generates double shot it's children from parents? This would reduce db size and server load. Client would report only results of a better fitness. Or we are not to use double shot but rather probabilistic simplex operator which is more effective? Is there any data that evaulate return time and quality? I know it only takes more iteration to get the quality (as in the case of BlueGene vs. BOINC) but anyway... EDIT: OK, I guess the question should have been rather number of evaulation rather then return time...so find the answer in paper. How far/close are we to the limit of a single server. How close are we to the point where more island would be needed? ID: 2274 · Rating: 0 · rate: / Reply Quote

Philadelphia Send message Joined: 9 Nov 07 Posts: 131 Credit: 180,454 RAC: 0	Message 2275 - Posted: 15 Mar 2008, 13:39:17 UTC Nice going Travis, congratulations. CLICK TO HELP BUILD ID: 2275 · Rating: 0 · rate: / Reply Quote

KSMarksPsych Send message Joined: 9 Sep 07 Posts: 22 Credit: 320,035 RAC: 0	Message 2288 - Posted: 16 Mar 2008, 9:38:31 UTC - in response to Message 2271. good news everyone! the paper we submitted to the GECCO 2008 conference was accepted, this paper is all about results we've gotten using the genetic search/simplex hybrid on BOINC -- so we couldn't have done it without you :) for more information on the conference: GECCO 2008 and here's a link to our paper so you can all read it :) An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing Don't forget to add it to the Trac page. Kathryn :o) The BOINC FAQ Service The Unofficial BOINC Wiki The Trac System More BOINC information than you can shake a stick of RAM at. ID: 2288 · Rating: 0 · rate: / Reply Quote

Beezlebub Send message Joined: 18 Nov 07 Posts: 18 Credit: 38,429,435 RAC: 0	Message 2310 - Posted: 17 Mar 2008, 14:17:11 UTC Nice paper Travis! Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. In the first phase of the algorithm (while the population size is less than the maximum population size) the server is being initialized and a random population is generated. When a request work message is processed, a random pa- rameter set is generated, and when a report work message is processed, the population is updated with the parameters and the fitness of that evaluation. When enough report work messages have been processed, the algorithm proceeds into the second phase which performs the actual genetic search. In the second phase, report work will insert the new pa- rameters and their fitness into the population but only if they are better than the worst current member and remove the worst member if required to keep the population size the same. Otherwise the parameters and the result are dis- carded. Processing a request work message will either return a mutation or reproduction (crossover) from the population. "There is no limit to the amount of good a person can do if they do not care who gets credit for it." ID: 2310 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 28 Aug 07 Posts: 31 Credit: 86,152,236 RAC: 0	Message 2316 - Posted: 17 Mar 2008, 16:59:44 UTC - in response to Message 2310. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. Well, turn-around time would be best figure to use. One may have fast host, finish WUs quickly but doesn't report back. So people with fast machines and Report Result Immediately feature in Boinc core would be most effective. ID: 2316 · Rating: 0 · rate: / Reply Quote

ChertseyAl Send message Joined: 31 Aug 07 Posts: 66 Credit: 1,002,668 RAC: 0	Message 2317 - Posted: 17 Mar 2008, 16:59:56 UTC - in response to Message 2310. Nice paper Travis! Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. In the first phase of the algorithm (while the population size is less than the maximum population size) the server is being initialized and a random population is generated. When a request work message is processed, a random pa- rameter set is generated, and when a report work message is processed, the population is updated with the parameters and the fitness of that evaluation. When enough report work messages have been processed, the algorithm proceeds into the second phase which performs the actual genetic search. In the second phase, report work will insert the new pa- rameters and their fitness into the population but only if they are better than the worst current member and remove the worst member if required to keep the population size the same. Otherwise the parameters and the result are dis- carded. Processing a request work message will either return a mutation or reproduction (crossover) from the population. AS Travis explained in This Thread? Al. ID: 2317 · Rating: 0 · rate: / Reply Quote

Crunch3r Volunteer developer Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0	Message 2318 - Posted: 17 Mar 2008, 17:00:56 UTC - in response to Message 2316. So people with fast machines and Report Result Immediately feature in Boinc core would be most effective. That's why i'm here :P Join Support science! Joinc Team BOINC United now! ID: 2318 · Rating: 0 · rate: / Reply Quote

Beezlebub Send message Joined: 18 Nov 07 Posts: 18 Credit: 38,429,435 RAC: 0	Message 2321 - Posted: 17 Mar 2008, 17:38:06 UTC - in response to Message 2316. Last modified: 17 Mar 2008, 17:38:57 UTC I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. Well, turn-around time would be best figure to use. One may have fast host, finish WUs quickly but doesn't report back. So people with fast machines and Report Result Immediately feature in Boinc core would be most effective. Yes Honza that was what I was trying to say :) Thanks for stating it that way. "There is no limit to the amount of good a person can do if they do not care who gets credit for it." ID: 2321 · Rating: 0 · rate: / Reply Quote

Beezlebub Send message Joined: 18 Nov 07 Posts: 18 Credit: 38,429,435 RAC: 0	Message 2322 - Posted: 17 Mar 2008, 17:44:21 UTC - in response to Message 2318. So people with fast machines and Report Result Immediately feature in Boinc core would be most effective. That's why i'm here :P We need a thread listing OS, CPU, and run time per model to compare results. Interesting to see what combo returns the best times. "There is no limit to the amount of good a person can do if they do not care who gets credit for it." ID: 2322 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 2395 - Posted: 19 Mar 2008, 18:21:42 UTC - in response to Message 2310. Nice paper Travis! Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. In the first phase of the algorithm (while the population size is less than the maximum population size) the server is being initialized and a random population is generated. When a request work message is processed, a random pa- rameter set is generated, and when a report work message is processed, the population is updated with the parameters and the fitness of that evaluation. When enough report work messages have been processed, the algorithm proceeds into the second phase which performs the actual genetic search. In the second phase, report work will insert the new pa- rameters and their fitness into the population but only if they are better than the worst current member and remove the worst member if required to keep the population size the same. Otherwise the parameters and the result are dis- carded. Processing a request work message will either return a mutation or reproduction (crossover) from the population. actually if you read into the results section, we go into a bit of depth about the effect of WU round trip time to it's effectiveness. the figures on the last two pages of the paper show how effective a WU was in improving the population not only by how it was generated but by how many WUs were recevied between when it was generated and when the result was assimilated. what was interesting is that WUs that were returned faster tended to be more useful as a whole, but slow WUs were still quite useful. additionally, depending on how the WU was generated by the probabilistic simplex, some generation strategies could be more effective for slower computers, while others more effective for faster ones. so in short, even if you're returning results slowly, they're useful :D ID: 2395 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 2396 - Posted: 19 Mar 2008, 18:21:45 UTC - in response to Message 2310. Nice paper Travis! Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed. In the first phase of the algorithm (while the population size is less than the maximum population size) the server is being initialized and a random population is generated. When a request work message is processed, a random pa- rameter set is generated, and when a report work message is processed, the population is updated with the parameters and the fitness of that evaluation. When enough report work messages have been processed, the algorithm proceeds into the second phase which performs the actual genetic search. In the second phase, report work will insert the new pa- rameters and their fitness into the population but only if they are better than the worst current member and remove the worst member if required to keep the population size the same. Otherwise the parameters and the result are dis- carded. Processing a request work message will either return a mutation or reproduction (crossover) from the population. actually if you read into the results section, we go into a bit of depth about the effect of WU round trip time to it's effectiveness. the figures on the last two pages of the paper show how effective a WU was in improving the population not only by how it was generated but by how many WUs were recevied between when it was generated and when the result was assimilated. what was interesting is that WUs that were returned faster tended to be more useful as a whole, but slow WUs were still quite useful. additionally, depending on how the WU was generated by the probabilistic simplex, some generation strategies could be more effective for slower computers, while others more effective for faster ones. so in short, even if you're returning results slowly, they're useful :D ID: 2396 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 2397 - Posted: 19 Mar 2008, 18:25:53 UTC - in response to Message 2274. Thanks for the paper made available. From reading it, I got the feeling that returning results ASAP is essential for both effectivity and quality of results as a project. this isn't really true. slower results weren't as useful, but they were still quite useful - and in fact in some sense they are better at exploring different areas of the solution space, while the faster WUs tended to be better at converging to a minima. both are equally important in a genetic search, because if you converge to fast to a minima theres a good chance you'll miss the global best solution There are also couple of question arised from the reading. If you find them interseting/meaningful and have time to answer... It may be considered of benefit by users if their results shows fitness (alike Rosetta). we might be able to do this if we got a visualization or something along those lines working. the interesting thing is that while a lower fitness is a better result - it might not be the best for the population, as we don't want to keep the population too homogeneous. Another consideration: would it be possible that application itself generates double shot it's children from parents? This would reduce db size and server load. Client would report only results of a better fitness. Or we are not to use double shot but rather probabilistic simplex operator which is more effective? the next version of the application should be able to do a line search based on the initial parameter set and a direction, this should make the WUs be a lot longer as each will be doing multiple evaluations. not quite the same as what you're saying but it's along the same lines. having a WU do a genetic search on it's own would be way too computationally expensive... i don't think we'd ever get anywhere doing that. Is there any data that evaulate return time and quality? I know it only takes more iteration to get the quality (as in the case of BlueGene vs. BOINC) but anyway... EDIT: OK, I guess the question should have been rather number of evaulation rather then return time...so find the answer in paper. How far/close are we to the limit of a single server. How close are we to the point where more island would be needed? we didn't use return time vs quality, but rather the number of updates because we felt it was a more fair metric. the number of updates measures how long the population had a time to evolve since when the WU was generated, whereas having it just be time could be effected by a lot of other factors. ID: 2397 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 28 Aug 07 Posts: 31 Credit: 86,152,236 RAC: 0	Message 2400 - Posted: 19 Mar 2008, 18:44:21 UTC Thanks for the answers, Travis, much appreciated. ID: 2400 · Rating: 0 · rate: / Reply Quote

Odd-Rod Send message Joined: 7 Sep 07 Posts: 444 Credit: 5,715,481 RAC: 0	Message 2408 - Posted: 19 Mar 2008, 20:42:11 UTC - in response to Message 2271. and here's a link to our paper so you can all read it :) An Asynchronous Hybrid Genetic-Simplex Search for Modeling the Milky Way Galaxy using Volunteer Computing I trust all crunchers have downloaded this and are wading through it? I think I'll print it out and stick it on the back of the bathroom door... JUST KIDDING!! I have downloaded it and took a quick look at it. Very interesting, but it will take a serious reading to grasp it all. Thanks for making it available to us. Rod ID: 2408 · Rating: 0 · rate: / Reply Quote

Beezlebub Send message Joined: 18 Nov 07 Posts: 18 Credit: 38,429,435 RAC: 0	Message 2417 - Posted: 20 Mar 2008, 0:40:52 UTC Thanks from me also Travis. "There is no limit to the amount of good a person can do if they do not care who gets credit for it." ID: 2417 · Rating: 0 · rate: / Reply Quote