Welcome to MilkyWay@home

GPU app teaser


Advanced search

Message boards : Application Code Discussion : GPU app teaser
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 19 · Next

AuthorMessage
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 11018 - Posted: 16 Feb 2009, 14:49:21 UTC - in response to Message 11017.  

Also, I can raise the workunit-per-cpu limit. What would be a good value?

3600*24/9 is up to ~10K WUs per day on HD4870.

Too bad BOINC is still far from ready for GPUs.
I would have suggested to raise WU limit only for hosts with GPUs and distribute WUs with pretty short deadline or extra large ones for such hosts...


I think one of the recent updates to the BOINC server code allows for a separate daily WU queue for GPUs. I'll do a little looking into it and if thats the case then we can give the GPUs a 10k daily limit without touching the other one.
ID: 11018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDaniel

Send message
Joined: 25 Nov 07
Posts: 25
Credit: 54,276,968
RAC: 2,064
50 million credit badge10 year member badge
Message 11019 - Posted: 16 Feb 2009, 14:54:51 UTC

Sounds like a good idea to me.
ID: 11019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Honza

Send message
Joined: 28 Aug 07
Posts: 31
Credit: 86,152,236
RAC: 0
50 million credit badge10 year member badge
Message 11023 - Posted: 16 Feb 2009, 15:09:39 UTC - in response to Message 11018.  
Last modified: 16 Feb 2009, 15:10:41 UTC

I think one of the recent updates to the BOINC server code allows for a separate daily WU queue for GPUs.
Well, it may (i don't know). But I known that even latest BOINC client 6.6.7 still doesn't recognize GPUs (means both nVidia and ATI/AMD GPUs), only CUDA capable devices.

(not only) MW would benefit a lot from support of ATI GPUs under BOINC, especially those capable of double precision...
BOINC Project specifications and hardware requirements
ID: 11023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11024 - Posted: 16 Feb 2009, 15:09:56 UTC - in response to Message 11007.  
Last modified: 16 Feb 2009, 15:11:29 UTC

avg_ncpus set to 0.1

max_ncpus set to 3 (core2duo)

Seems to work very fine, the Milkyway units are still completed in 8 or 10 seconds. It's the same time when I used 0.50 core, is it "normal" ?

Too bad we have the 1000 workunit-per-cpu limit, is it possible to send a request somewhere to remove this limit when using an optimized app ? This limit is now obsolete when you can calculate 3 or 4 times more units in a day with an optimized app. The only reason I'm moving a computer on milkyway is to help you for your excellent work on ATI graphic cards, but the credits are not very interesting ^^

THank you for this app, anyway ^^

You should not set max_ncpus to another value than exactly 1. It is the maximal number of cores a single WU can use. As the app is single threaded it can't use more than a core.

That the WUs are taking the same time no matter how many WUs are running concurrently is perfectly normal. There is probably a slight increase in efficiency (maybe 5%) if two WUs are running compared to a single one. The reason is that you can carry out the few calculations still necessary on the CPU in the time when another WU is waiting for the GPU.

But more than two WUs won't help more (but don't hurt either). You will have a throughput of about one WU per 9.x seconds either way on a HD4870. But there is a limit on the number of concurrent WUs. If you try to run more than 12-16 (~30) WUs on a 512MB (1GB) card, it starts to get slower and finally breaks, because there is not enough memory on the card. In the moment there is no mechanism to check for available RAM on the card. You shouldn't set avg_cpus to very low values to avoid this situation.

PS:
I guess the credit situation gets better if the limits are lifted by Travis ;)
ID: 11024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 11031 - Posted: 16 Feb 2009, 16:52:04 UTC - in response to Message 11018.  


I think one of the recent updates to the BOINC server code allows for a separate daily WU queue for GPUs. I'll do a little looking into it and if thats the case then we can give the GPUs a 10k daily limit without touching the other one.


I think you need to make sure that there is plenty of work to do if many of these are ran. Might be time for server #2.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 11031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileExar Kun [HoloNet]
Avatar

Send message
Joined: 12 Nov 08
Posts: 26
Credit: 1,519,179
RAC: 2
1 million credit badge10 year member badge
Message 11038 - Posted: 16 Feb 2009, 19:28:27 UTC

max_ncpus back to "1".

avg_ncpus set to 0.25. If I keep the default value of 0.50, I have only three WU running at the same time : two optimized MW and one World Community Grid. With avg_ncpus to 0.25 I have four workunits (two for each project)

Travis > thank you very much for your answer about the credits, it's a very good news. For maximum number of units per core, I can say that : with a Core2Duo and a HD4850 512Mo, I crunched 2000 workunits in 6 hours (with no other projet running at the same time, so the CPU wasn't at full use). So 4.000 workunits per core and day will be enough for this configuration (my computer is working "only" 12 hours per day or so, so 4.000 WU per core and day is the very, very maximum). I don't know how many workunits can be crunched with another model of ATI.

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

Do you need something special to help you for your tests ?

PS : during this writing I reached my 2.000 workunits limit - later than before, probably because I tried avg_ncpus to 0.10 ...
Star Wars BOINC Team



ID: 11038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobgoblin

Send message
Joined: 8 Dec 07
Posts: 60
Credit: 67,028,931
RAC: 0
50 million credit badge10 year member badge
Message 11041 - Posted: 16 Feb 2009, 20:16:42 UTC - in response to Message 11038.  

max_ncpus back to "1".

avg_ncpus set to 0.25. If I keep the default value of 0.50, I have only three WU running at the same time : two optimized MW and one World Community Grid. With avg_ncpus to 0.25 I have four workunits (two for each project)

Travis > thank you very much for your answer about the credits, it's a very good news. For maximum number of units per core, I can say that : with a Core2Duo and a HD4850 512Mo, I crunched 2000 workunits in 6 hours (with no other projet running at the same time, so the CPU wasn't at full use). So 4.000 workunits per core and day will be enough for this configuration (my computer is working "only" 12 hours per day or so, so 4.000 WU per core and day is the very, very maximum). I don't know how many workunits can be crunched with another model of ATI.

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

Do you need something special to help you for your tests ?

PS : during this writing I reached my 2.000 workunits limit - later than before, probably because I tried avg_ncpus to 0.10 ...



With the gpu, I can crunch 16 @ a time with the i7, so 10k limit per core, or 80,000 in my case, would be a more realistic than 4000.
ID: 11041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11055 - Posted: 16 Feb 2009, 22:26:08 UTC - in response to Message 11038.  

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

The goal is actually to use not a full core (or even more), but maybe only 10% of a core or so. This way your CPU would be free to crunch something else.
If it is really wanted I could put in support for simultaneous crunching of MW on GPU and CPU. But this would have a low priority on my list.
ID: 11055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11057 - Posted: 16 Feb 2009, 22:33:20 UTC - in response to Message 11041.  

With the gpu, I can crunch 16 @ a time with the i7, so 10k limit per core, or 80,000 in my case, would be a more realistic than 4000.

But the throughput is still be one WU every 9 seconds or so with a HD4870. It is not getting faster with more concurrent WUs. So with a HD4870 a limit of 10,000 WUs a day would be enough as long there is no multi GPU support implemented (or massive overclocking involved).

I would say 10,000 WUs per host and day are needed now. When multiple cards are working and/or newer GPUs are available, this needs to be raised again.
ID: 11057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileGalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
100 million credit badge10 year member badge
Message 11058 - Posted: 16 Feb 2009, 22:38:37 UTC - in response to Message 11055.  

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

The goal is actually to use not a full core (or even more), but maybe only 10% of a core or so. This way your CPU would be free to crunch something else.
If it is really wanted I could put in support for simultaneous crunching of MW on GPU and CPU. But this would have a low priority on my list.

What I would like to see is the ability to use my Nvidia in MW. I know you are using an ATI since it is faster, but I only have the Nvidia which I'd be interested in transferring from crunching on GPUGRID to MW.


ID: 11058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11060 - Posted: 16 Feb 2009, 22:42:37 UTC - in response to Message 11058.  
Last modified: 16 Feb 2009, 22:43:38 UTC

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

The goal is actually to use not a full core (or even more), but maybe only 10% of a core or so. This way your CPU would be free to crunch something else.
If it is really wanted I could put in support for simultaneous crunching of MW on GPU and CPU. But this would have a low priority on my list.

What I would like to see is the ability to use my Nvidia in MW. I know you are using an ATI since it is faster, but I only have the Nvidia which I'd be interested in transferring from crunching on GPUGRID to MW.

Afaik, there is already a student starting to work on a CUDA app. As this is easier to work with, I guess we could see some results soon ;)
But don't expect times much below 25s per WU for nvidias GTX line. And older ones won't work at all (lack of double precision units).
ID: 11060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileGalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
100 million credit badge10 year member badge
Message 11061 - Posted: 16 Feb 2009, 22:51:43 UTC - in response to Message 11060.  

What I would like to see is the ability to use my Nvidia in MW. I know you are using an ATI since it is faster, but I only have the Nvidia which I'd be interested in transferring from crunching on GPUGRID to MW.

Afaik, there is already a student starting to work on a CUDA app. As this is easier to work with, I guess we could see some results soon ;)
But don't expect times much below 25s per WU for nvidias GTX line. And older ones won't work at all (lack of double precision units).

25 secs? Blimey, hurry up student :)


ID: 11061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11064 - Posted: 16 Feb 2009, 23:03:04 UTC - in response to Message 11061.  

25 secs? Blimey, hurry up student :)

But that is soooo slooooow compared to the less than 10 seconds on ATIs HD4870 ;)
ID: 11064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Oct 07
Posts: 77
Credit: 404,471,187
RAC: 0
300 million credit badge10 year member badge
Message 11067 - Posted: 16 Feb 2009, 23:16:07 UTC - in response to Message 11064.  

25 secs? Blimey, hurry up student :)

But that is soooo slooooow compared to the less than 10 seconds on ATIs HD4870 ;)

25 sec is good enough for me, hurry up student :-)
ID: 11067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileCori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
20 million credit badge10 year member badge
Message 11068 - Posted: 16 Feb 2009, 23:17:58 UTC - in response to Message 11067.  

25 secs? Blimey, hurry up student :)

But that is soooo slooooow compared to the less than 10 seconds on ATIs HD4870 ;)

25 sec is good enough for me, hurry up student :-)

Rats, I need a new graphics card finally!
Lovely greetings, Cori
ID: 11068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobgoblin

Send message
Joined: 8 Dec 07
Posts: 60
Credit: 67,028,931
RAC: 0
50 million credit badge10 year member badge
Message 11081 - Posted: 17 Feb 2009, 0:38:40 UTC - in response to Message 11057.  

With the gpu, I can crunch 16 @ a time with the i7, so 10k limit per core, or 80,000 in my case, would be a more realistic than 4000.

But the throughput is still be one WU every 9 seconds or so with a HD4870. It is not getting faster with more concurrent WUs. So with a HD4870 a limit of 10,000 WUs a day would be enough as long there is no multi GPU support implemented (or massive overclocking involved).

I would say 10,000 WUs per host and day are needed now. When multiple cards are working and/or newer GPUs are available, this needs to be raised again.



oh, i agree with that too. the turn around time for the gpu app was about 2 1/2 minutes. i've been running the op app this week and it's crunching 8 wu's in 6 minutes since the .19's came out, so that limit needs to go much higher as well.
ID: 11081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 11087 - Posted: 17 Feb 2009, 0:54:19 UTC - in response to Message 11081.  

With the gpu, I can crunch 16 @ a time with the i7, so 10k limit per core, or 80,000 in my case, would be a more realistic than 4000.

But the throughput is still be one WU every 9 seconds or so with a HD4870. It is not getting faster with more concurrent WUs. So with a HD4870 a limit of 10,000 WUs a day would be enough as long there is no multi GPU support implemented (or massive overclocking involved).

I would say 10,000 WUs per host and day are needed now. When multiple cards are working and/or newer GPUs are available, this needs to be raised again.



oh, i agree with that too. the turn around time for the gpu app was about 2 1/2 minutes. i've been running the op app this week and it's crunching 8 wu's in 6 minutes since the .19's came out, so that limit needs to go much higher as well.

It should be enough for your i7, as the current limit is 1,000 WUs per day and core/thread. That means on your i7 you have actually 8,000 WUs a day to play with. You won't come close to that limit with the CPU alone, but it will last for 21 hours a day only on the GPU ;)
ID: 11087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 11101 - Posted: 17 Feb 2009, 2:04:30 UTC - in response to Message 11060.  

Cluster > If I understand correctly, your optimization uses "only" one core at a time, is that right ? Is it possible to use more core, so we can use only Milkyway on one computer with more than one core ?

The goal is actually to use not a full core (or even more), but maybe only 10% of a core or so. This way your CPU would be free to crunch something else.
If it is really wanted I could put in support for simultaneous crunching of MW on GPU and CPU. But this would have a low priority on my list.

What I would like to see is the ability to use my Nvidia in MW. I know you are using an ATI since it is faster, but I only have the Nvidia which I'd be interested in transferring from crunching on GPUGRID to MW.

Afaik, there is already a student starting to work on a CUDA app. As this is easier to work with, I guess we could see some results soon ;)
But don't expect times much below 25s per WU for nvidias GTX line. And older ones won't work at all (lack of double precision units).


Yeah hopefully within the next week or two we'll have an alpha CUDA application for you guys to crash :D

ID: 11101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 11116 - Posted: 17 Feb 2009, 6:54:30 UTC - in response to Message 11101.  
Last modified: 17 Feb 2009, 6:55:37 UTC

Yeah hopefully within the next week or two we'll have an alpha CUDA application for you guys to crash :D


Well, I have one GTX 280 and 2 GTX 295s ... start your engines ...

Of course we will need a setting on the site to get only CPU work, only CUDA work ... or both ...
ID: 11116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile[AF>HFR>RR] ThierryH

Send message
Joined: 2 Jan 08
Posts: 23
Credit: 495,882,464
RAC: 0
300 million credit badge10 year member badge
Message 11132 - Posted: 17 Feb 2009, 10:02:06 UTC - in response to Message 11116.  

Yeah hopefully within the next week or two we'll have an alpha CUDA application for you guys to crash :D


Well, I have one GTX 280 and 2 GTX 295s ... start your engines ...

Of course we will need a setting on the site to get only CPU work, only CUDA work ... or both ...


It's effectively important to have both options.


ID: 11132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 19 · Next

Message boards : Application Code Discussion : GPU app teaser

©2019 Astroinformatics Group