Now that we have native ATI GPU support, how about longer tasks?

Author	Message
zombie67 [MM] Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 0	Message 35703 - Posted: 16 Jan 2010, 6:20:12 UTC Last modified: 16 Jan 2010, 6:22:38 UTC Thanks for implementing native ATI support! And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)? No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two. In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load. ID: 35703 · Rating: 0 · rate: / Reply Quote

Bigred Send message Joined: 23 Nov 07 Posts: 33 Credit: 300,042,542 RAC: 0	Message 35705 - Posted: 16 Jan 2010, 18:48:32 UTC - in response to Message 35703. Thanks for implementing native ATI support! And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)? No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two. In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load. Sounds like a good idea to me but it should be for all GPUs not just the ATIs. ID: 35705 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 35718 - Posted: 17 Jan 2010, 4:03:12 UTC - in response to Message 35703. Thanks for implementing native ATI support! And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)? No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two. In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load. I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that. ID: 35718 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 35721 - Posted: 17 Jan 2010, 4:43:02 UTC - in response to Message 35718. Thanks for implementing native ATI support! And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)? No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two. In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load. I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that. In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs. ID: 35721 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 35725 - Posted: 17 Jan 2010, 5:12:43 UTC I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it. ID: 35725 · Rating: 0 · rate: / Reply Quote

David Glogau* Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0	Message 35737 - Posted: 17 Jan 2010, 11:05:20 UTC - in response to Message 35725. I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it. I second that request. 25%, 50% and 75% would be my wish, thanks. ID: 35737 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 0	Message 35751 - Posted: 17 Jan 2010, 17:04:33 UTC - in response to Message 35721. In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs. HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation. ID: 35751 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 35758 - Posted: 17 Jan 2010, 19:36:06 UTC - in response to Message 35751. In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs. HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation. We actually use a different method for validation than any of the typical BOINC redundancy strategies. My thesis goes into this with a little bit of detail, and I'm actually working on a paper right now about it. We don't need to validate EVERY workunit, unlike other projects. Since we're doing evolutionary algorithms which are based on populations of solutions (newly generated work is based off of different recombinations of a known population of best solutions), when we get a result back that could potentially improve the population, we validate it before we put it into the population. This keeps us from generating new work from potentially invalid results. What we've been testing lately is a more optimistic validation strategy. Since most of your results are correct, waiting for results to be validated before putting them in the population can slow our search progress down quite a bit. So I've been trying out a validation strategy which uses potentially good results immediately, and then reverts them to previously validated results if they turn out to be invalid. So far it's working out really well so that's what the paper is about. ID: 35758 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 35760 - Posted: 17 Jan 2010, 20:26:00 UTC - in response to Message 35751. Last modified: 17 Jan 2010, 20:28:57 UTC In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs. HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation. Perhaps I'm getting ahead of the curve with trying to segregate tasks, regardless of quorum. Not sure if there's already a way to do that, but the whole point is that GPU users need to be placed in a different classification category than CPU users. You folks can exclusively have the 3-stream (longer-running) tasks, and leave CPU users with the 1-stream, 2-stream, or other shorter-running tasks. Perhaps I am phrasing the BOINC equivalent wrong, and there is something there already, but if the planned "2 to 4 times increase" in runtime happens again, then that will undo the increase in deadline and will cause people with CPUs to start howling again... I'm advocating making everyone happier, not just a few. Same as I've been doing all along... I think if something like what I'm suggesting is done, it will improve total project throughput and maybe, just maybe, allow you all to have a larger cache. Might not, but it is certainly worth a try if there is a way to do that already or if it is a minimal change. ID: 35760 · Rating: 0 · rate: / Reply Quote

Crab Send message Joined: 6 Oct 09 Posts: 25 Credit: 4,849,998 RAC: 0	Message 35771 - Posted: 18 Jan 2010, 4:00:10 UTC Last modified: 18 Jan 2010, 4:14:22 UTC in short: yes, we need a longer gpu's tasks, checkpoints (for not very fast gpu) and a shorter tasks for cpu. and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors? ID: 35771 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 35776 - Posted: 18 Jan 2010, 4:49:16 UTC - in response to Message 35771. and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors? The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught. ID: 35776 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 743,017,399 RAC: 52,027	Message 35783 - Posted: 18 Jan 2010, 17:48:59 UTC - in response to Message 35718. Last modified: 18 Jan 2010, 17:49:25 UTC I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that. That should be very helpful. Anything to get the queue time up for GPUs is most appreciated. Thanks! The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught. Good to hear, would like to know more about this when you get the chance. ID: 35783 · Rating: 0 · rate: / Reply Quote