Message boards :
Number crunching :
Now that we have native ATI GPU support, how about longer tasks?
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 07 Posts: 115 Credit: 501,625,147 RAC: 2,660 |
Thanks for implementing native ATI support! And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)? No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two. In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load. |
Send message Joined: 23 Nov 07 Posts: 33 Credit: 300,042,542 RAC: 0 |
Thanks for implementing native ATI support! Sounds like a good idea to me but it should be for all GPUs not just the ATIs. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Thanks for implementing native ATI support! I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Thanks for implementing native ATI support! In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs. |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it. |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it. I second that request. 25%, 50% and 75% would be my wish, thanks. |
Send message Joined: 29 Aug 07 Posts: 115 Credit: 501,625,147 RAC: 2,660 |
HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
We actually use a different method for validation than any of the typical BOINC redundancy strategies. My thesis goes into this with a little bit of detail, and I'm actually working on a paper right now about it. We don't need to validate EVERY workunit, unlike other projects. Since we're doing evolutionary algorithms which are based on populations of solutions (newly generated work is based off of different recombinations of a known population of best solutions), when we get a result back that could potentially improve the population, we validate it before we put it into the population. This keeps us from generating new work from potentially invalid results. What we've been testing lately is a more optimistic validation strategy. Since most of your results are correct, waiting for results to be validated before putting them in the population can slow our search progress down quite a bit. So I've been trying out a validation strategy which uses potentially good results immediately, and then reverts them to previously validated results if they turn out to be invalid. So far it's working out really well so that's what the paper is about. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Perhaps I'm getting ahead of the curve with trying to segregate tasks, regardless of quorum. Not sure if there's already a way to do that, but the whole point is that GPU users need to be placed in a different classification category than CPU users. You folks can exclusively have the 3-stream (longer-running) tasks, and leave CPU users with the 1-stream, 2-stream, or other shorter-running tasks. Perhaps I am phrasing the BOINC equivalent wrong, and there is something there already, but if the planned "2 to 4 times increase" in runtime happens again, then that will undo the increase in deadline and will cause people with CPUs to start howling again... I'm advocating making everyone happier, not just a few. Same as I've been doing all along... I think if something like what I'm suggesting is done, it will improve total project throughput and maybe, just maybe, allow you all to have a larger cache. Might not, but it is certainly worth a try if there is a way to do that already or if it is a minimal change. |
Send message Joined: 6 Oct 09 Posts: 25 Credit: 4,849,998 RAC: 0 |
in short: yes, we need a longer gpu's tasks, checkpoints (for not very fast gpu) and a shorter tasks for cpu. and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors? The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught. |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that. That should be very helpful. Anything to get the queue time up for GPUs is most appreciated. Thanks! The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught. Good to hear, would like to know more about this when you get the chance. |
©2024 Astroinformatics Group