Welcome to MilkyWay@home

Now that we have native ATI GPU support, how about longer tasks?

Message boards : Number crunching : Now that we have native ATI GPU support, how about longer tasks?
Message board moderation

To post messages, you must log in.

AuthorMessage
zombie67 [MM]
Avatar

Send message
Joined: 29 Aug 07
Posts: 115
Credit: 501,600,397
RAC: 5,019
Message 35703 - Posted: 16 Jan 2010, 6:20:12 UTC
Last modified: 16 Jan 2010, 6:22:38 UTC

Thanks for implementing native ATI support!

And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)?

No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two.

In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load.

ID: 35703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bigred
Avatar

Send message
Joined: 23 Nov 07
Posts: 33
Credit: 300,042,542
RAC: 0
Message 35705 - Posted: 16 Jan 2010, 18:48:32 UTC - in response to Message 35703.  

Thanks for implementing native ATI support!

And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)?

No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two.

In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load.


Sounds like a good idea to me but it should be for all GPUs not just the ATIs.
ID: 35705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 35718 - Posted: 17 Jan 2010, 4:03:12 UTC - in response to Message 35703.  

Thanks for implementing native ATI support!

And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)?

No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two.

In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load.


I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that.
ID: 35718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 35721 - Posted: 17 Jan 2010, 4:43:02 UTC - in response to Message 35718.  

Thanks for implementing native ATI support!

And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)?

No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two.

In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load.


I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that.


In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.
ID: 35721 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 35725 - Posted: 17 Jan 2010, 5:12:43 UTC

I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it.
ID: 35725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile David Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
Message 35737 - Posted: 17 Jan 2010, 11:05:20 UTC - in response to Message 35725.  

I also think we should ask Gipsel/Cluster Physik nicely if he will implement checkpointing in the app as longer tasks will definitely need it.


I second that request. 25%, 50% and 75% would be my wish, thanks.
ID: 35737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 29 Aug 07
Posts: 115
Credit: 501,600,397
RAC: 5,019
Message 35751 - Posted: 17 Jan 2010, 17:04:33 UTC - in response to Message 35721.  


In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.


HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation.

ID: 35751 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 35758 - Posted: 17 Jan 2010, 19:36:06 UTC - in response to Message 35751.  


In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.


HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation.


We actually use a different method for validation than any of the typical BOINC redundancy strategies. My thesis goes into this with a little bit of detail, and I'm actually working on a paper right now about it.

We don't need to validate EVERY workunit, unlike other projects. Since we're doing evolutionary algorithms which are based on populations of solutions (newly generated work is based off of different recombinations of a known population of best solutions), when we get a result back that could potentially improve the population, we validate it before we put it into the population. This keeps us from generating new work from potentially invalid results.

What we've been testing lately is a more optimistic validation strategy. Since most of your results are correct, waiting for results to be validated before putting them in the population can slow our search progress down quite a bit. So I've been trying out a validation strategy which uses potentially good results immediately, and then reverts them to previously validated results if they turn out to be invalid. So far it's working out really well so that's what the paper is about.
ID: 35758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 35760 - Posted: 17 Jan 2010, 20:26:00 UTC - in response to Message 35751.  
Last modified: 17 Jan 2010, 20:28:57 UTC


In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.


HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation.


Perhaps I'm getting ahead of the curve with trying to segregate tasks, regardless of quorum. Not sure if there's already a way to do that, but the whole point is that GPU users need to be placed in a different classification category than CPU users. You folks can exclusively have the 3-stream (longer-running) tasks, and leave CPU users with the 1-stream, 2-stream, or other shorter-running tasks.

Perhaps I am phrasing the BOINC equivalent wrong, and there is something there already, but if the planned "2 to 4 times increase" in runtime happens again, then that will undo the increase in deadline and will cause people with CPUs to start howling again...

I'm advocating making everyone happier, not just a few. Same as I've been doing all along... I think if something like what I'm suggesting is done, it will improve total project throughput and maybe, just maybe, allow you all to have a larger cache. Might not, but it is certainly worth a try if there is a way to do that already or if it is a minimal change.
ID: 35760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crab

Send message
Joined: 6 Oct 09
Posts: 25
Credit: 4,849,998
RAC: 0
Message 35771 - Posted: 18 Jan 2010, 4:00:10 UTC
Last modified: 18 Jan 2010, 4:14:22 UTC

in short: yes, we need a longer gpu's tasks, checkpoints (for not very fast gpu) and a shorter tasks for cpu.

and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors?
ID: 35771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 35776 - Posted: 18 Jan 2010, 4:49:16 UTC - in response to Message 35771.  

and regarding WU's validation. what if i'm overclock my gpu and it becomes produce results with errors. me and you will never know of this errors?


The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught.
ID: 35776 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 35783 - Posted: 18 Jan 2010, 17:48:59 UTC - in response to Message 35718.  
Last modified: 18 Jan 2010, 17:49:25 UTC

I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that.

That should be very helpful. Anything to get the queue time up for GPUs is most appreciated. Thanks!

The majority of errors tend to give us good results, ie., they're false positives; especially from overclocked CPUs and GPUs. So a couple results might get validated, but they don't harm our searches, and the majority get caught.

Good to hear, would like to know more about this when you get the chance.
ID: 35783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Now that we have native ATI GPU support, how about longer tasks?

©2024 Astroinformatics Group