Welcome to MilkyWay@home

Relative GPU Crunching Speeds

Message boards : Number crunching : Relative GPU Crunching Speeds
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 12983 - Posted: 26 Feb 2009, 20:42:03 UTC

Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is?

Live long and BOINC.
ID: 12983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 12984 - Posted: 26 Feb 2009, 21:04:20 UTC - in response to Message 12983.  
Last modified: 26 Feb 2009, 21:08:00 UTC

Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is?

Live long and BOINC.


All I could do for you is to quote my times. I suppose if we could get enough people to quote the times we could get a feel for it. The only problem is that the times form a spread ...

s20: 6.02 - 6.3
s21: 6.05; 11.61 - 11.83
s22: 6.84 - 9.14; 12.59 - 13.06
s23: 6.34 - 6.70; 12.48 - 12.58
s79: 5.73 - 6.11
s82: 6.02 - 6.13
s86: 9.13 - 9.36

I don't know why there are two groupings of numbers for some of the streams. As I understood the idea, I would have expected clustering about one set of numbers. Either 6 or 12 seconds is fast enough ... the major hang-up I still have is that I have occassional outages... and at times after I get loaded with work it does not start running tasks without intervention. Again, this is still alpha so I am not too bent about it ...

I did try 6.11, and gave up after half an hour. Much worse for me on fetching work than 6.5.0 ...

Oh, HD 4870, 512 M, standard clocks (I did nothing to change them):

770MHz GPU
900MHz memory
50% fan 57-62C
ID: 12984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13001 - Posted: 27 Feb 2009, 0:00:54 UTC - in response to Message 12983.  
Last modified: 27 Feb 2009, 0:04:53 UTC

Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is?

When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base):
3850 : 3870 : 4830 : 4850 : 4870
  1  : 1.16 : 1.72 : 2.33 : 2.80

Of course the ratios only applies for stock clocks (670MHz for 3850, 775MHz for 3870, 575MHz for 4830, 625MHz for 4850 and 750MHz for 4870 with the memory speed only a very minor factor).
ID: 13001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13011 - Posted: 27 Feb 2009, 1:22:10 UTC - in response to Message 13001.  
Last modified: 27 Feb 2009, 1:34:56 UTC

When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base):
3850 : 3870 : 4830 : 4850 : 4870
  1  : 1.16 : 1.72 : 2.33 : 2.80

Couldn't edit anymore. With a small addition it looks like that:
3850 : 3870 : 4830 : 4850 : 4870 : 4890 (850MHz) : 4890 (850 MHz, 960 shader ?)
  1  : 1.16 : 1.72 : 2.33 : 2.80 : 3.17 : 3.81
ID: 13011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13020 - Posted: 27 Feb 2009, 2:35:38 UTC



Have you found that different CPU or bus type has an impact on speed...


I installed a 4870 today on an older Pent D 3.0 - It has a older IDE drive on to a SATA -

And my times are almost double what Paul has posted.


.
ID: 13020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caferace
Avatar

Send message
Joined: 4 Aug 08
Posts: 46
Credit: 8,255,900
RAC: 0
Message 13022 - Posted: 27 Feb 2009, 3:03:33 UTC

I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal?

I can get some times tomorrow, but I imagine the variance between GPU clock and Wall clock has a bearing as well...

-jim
ID: 13022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13024 - Posted: 27 Feb 2009, 4:29:59 UTC - in response to Message 13020.  



Have you found that different CPU or bus type has an impact on speed...


I installed a 4870 today on an older Pent D 3.0 - It has a older IDE drive on to a SATA -

And my times are almost double what Paul has posted.


It is a bit hard to measure the times the WUs really need. Best would be to take a stop watch and measure the time it needs for 24 WUs of one type by hand.

The times you see in the task list are actually CPU times. But the CPU time has no immediate meaning for WUs crunched on the GPU. With a new test version I have already seen CPU times as low as 1.0 seconds for a 8 credit WU. If you lower the CPU load of the GPU app the measured CPU times will also be lower.

Sometimes the wall clock time in the task details will give you some indication of the real times. If for instance 4 WUs run at all times, the wall clock time will be roughly 4 times the time a single WU actually needed to be crunched on the GPU. But this also only works if you have 4 identical WUs. If they are of different types this method will also give somewhat skewed results.

I have to think about how to measure some kind of GPU time. But this will be hard as the GPU executes the stuff asynchronously. If you want to measure it correctly, you have to poll the GPU very fast (raising the CPU load). Maybe I find a way to do this without an excessive CPU load and add it later to enable easier comparisons.
ID: 13024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13025 - Posted: 27 Feb 2009, 4:34:27 UTC - in response to Message 13022.  

I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal?

I can get some times tomorrow, but I imagine the variance between GPU clock and Wall clock has a bearing as well...

-jim

It's more like multithreading on a single core ;)
And it is perfectly normal. You may activate another project to crunch fewer WUs at once but individually faster (will give you the same throughput). And that would put your CPU cores to some use, too.
ID: 13025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 13030 - Posted: 27 Feb 2009, 6:38:23 UTC

I will make one other observation, which held true on GPU Grid and seems to be sort of true on this GPU application ... the CPU load seems to be handled better on HT than on a straight CPU core. I suppose that it is the interleave effect where the "idle" load is not competing so much for the resources and so the other tasks run well and the GPU attendance is less "costly" ...

I conclusively proved it on GPU Grid, but I cannot here because I only have the one card running on an HT machine ... I would run out and buy another card but it is income tax time and I just spent a wad on Nvidia cards and have run out of spare PCI-e slots ... alas ...
ID: 13030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 13034 - Posted: 27 Feb 2009, 8:12:30 UTC

Any issues with mixing an ATI card and a Nvidia card in the same machine?
ID: 13034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 13043 - Posted: 27 Feb 2009, 10:28:23 UTC
Last modified: 27 Feb 2009, 10:29:40 UTC

Thanks Paul. A question I was thinking about.

Assuming the old card was left in the box (mixed cards), I assume this would drive the screen and other graphics as it has always done. But, would there be any IRC conflicts?
ID: 13043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MJD1964

Send message
Joined: 24 Nov 07
Posts: 9
Credit: 102,125,541
RAC: 0
Message 13066 - Posted: 27 Feb 2009, 16:46:11 UTC

Do the 4830's work with the GPU app ?????
ID: 13066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13078 - Posted: 27 Feb 2009, 17:44:36 UTC - in response to Message 13066.  

Do the 4830's work with the GPU app ?????

Nope. Only the HD3850, HD3870, HD4850, HD4870 and the HD4890 when it's due to come out in April.


ID: 13078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13082 - Posted: 27 Feb 2009, 18:11:23 UTC - in response to Message 13066.  

Do the 4830's work with the GPU app ?????

Yes they do. It is tested and they run.
ID: 13082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 13084 - Posted: 27 Feb 2009, 18:19:28 UTC - in response to Message 13043.  

Thanks Paul. A question I was thinking about.

Assuming the old card was left in the box (mixed cards), I assume this would drive the screen and other graphics as it has always done. But, would there be any IRC conflicts?


There should not be IRQ conflicts, however, I have never tried a Nvidia / ATI mix... THEORY says it would work ... but some have suggested with XP it will not, but will with Vista.

No experience.

HOWEVER ... I have had various versions of cards from Nvidia in the same system at the same time ... either a GTX280 and 9800 GT or GT 280 and GTX 295 ... and it worked. IT gave me two or three cores working on GPU Grid. I am currently running in my one dual PCI-e computer two GTX 295 cards. All other machines have only one PCI-e slot and in those I have GTX 280, 9800 GT and a HD4870 ...

I have an Nvidia card in the Mac Pro not that it does me any good (yet) ... and when I build the next system I will be looking to get it with at least 2 PCI-e slots or better 3 ... From there I will possibly look to replace just the MB to add PCI-e slots so I can add GPUs...

You can get, for Nvidia cards at least, a good feel for what is going on over at GPU Grid ... even if you don't yet have a card you can make an account and lurk the forums ...

Long term I am likely to be having a mix of systems with one of them with ATI cards and the other with Nvidia for the next couple years.
ID: 13084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Napsterbater

Send message
Joined: 26 Feb 09
Posts: 4
Credit: 5,592,569
RAC: 0
Message 13086 - Posted: 27 Feb 2009, 18:20:10 UTC - in response to Message 13025.  
Last modified: 27 Feb 2009, 18:32:43 UTC

It's more like multithreading on a single core ;)
And it is perfectly normal. You may activate another project to crunch fewer WUs at once but individually faster (will give you the same throughput). And that would put your CPU cores to some use, too.


Ok i have a Tri core AMD phenom II x3 720 with an ATI 4870, I was testing some setting and set the "avg_ncpus" to .1, and restarted boinc what I saw with I got 3 MW WUs running and 3 Seti WUs running (2 using 100% of 2 cores 1 use maybe 66%ish), now my questing is do these WU need a 100% of a core to do the same work or can they use 33%ish (all 3 combined) and get the same work done.


What I'm trying to say in theory would allowing another project to rune like that produce that same amount of credits/hr for WM, then not running the other project.

Just trying to get the most out of my computer.

EDIT:To add If I leave "avg_ncpus" set to .5 Boinc only runs 2 Seti (100%x2) and 2 MW at 50% each (100% of one core), and .1 get 3xSeti 2x100% 1x66% and 3 MW's using a total of 33% of one core.
ID: 13086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13093 - Posted: 27 Feb 2009, 19:11:50 UTC - in response to Message 13082.  

Do the 4830's work with the GPU app ?????

Yes they do. It is tested and they run.

Hmmmm, I better update zslip then...

You're just getting too good Cluster Physik ;)


ID: 13093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caferace
Avatar

Send message
Joined: 4 Aug 08
Posts: 46
Credit: 8,255,900
RAC: 0
Message 13108 - Posted: 27 Feb 2009, 22:02:58 UTC

Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU)

Running Milkyway@home ATI GPU application version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (315ms)
CAL Runtime: 1.3.145
Found 1 CAL devices
Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote)
GPU core clock: 669 MHz, memory clock: 829 MHz
320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision

Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU.
Calculated about 9.08961e+008 floatingpoint ops on FPU (stars).
WU completed. It took 28.4531 seconds CPU time and 29.362 seconds wall clock time @ 2.40011 GHz.

----

Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (8 WU's active)

Running Milkyway@home ATI GPU application version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (349ms)
CAL Runtime: 1.3.145
Found 1 CAL devices
Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote)
GPU core clock: 669 MHz, memory clock: 829 MHz
320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision

Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU.
Calculated about 9.08961e+008 floatingpoint ops on FPU (stars).
WU completed. It took 29.0313 seconds CPU time and 226.155 seconds wall clock time @ 2.40011 GHz.

NOTE: This unit is a cruncher only, with no other applications running. Stock clocks on both CPU and 3850.

Hope this is of help...

-jim
ID: 13108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 13118 - Posted: 27 Feb 2009, 22:32:26 UTC - in response to Message 13108.  

Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU)

WU completed. It took 28.4531 seconds CPU time and 29.362 seconds wall clock time @ 2.40011 GHz.

----

Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (8 WU's active)

WU completed. It took 29.0313 seconds CPU time and 226.155 seconds wall clock time @ 2.40011 GHz.


NOTE: This unit is a cruncher only, with no other applications running. Stock clocks on both CPU and 3850.

Hope this is of help...

-jim

It's great you have done this!
Let's see. A single WU takes 29.362s wall clock time. That means, you would have a throughput of 122.6 WUs (of this kind) per hour.

With 8 concurrently running WUs, 8 of them take together 226.155s, or 28.27 s per WU. That is actually the single second per WU (would be slightly more with a slow CPU) that can be shaved off by overlapping the calculations of two (or more) concurrent WUs, I was talking about several times already. It is not that much for the HD3850 card (throughput is now 127.35 WUs per hour, less than a 4% increase), but for a HD4850 or 4870 it would be already about 10%.
ID: 13118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Avatar1966

Send message
Joined: 21 Jul 08
Posts: 3
Credit: 33,031,789
RAC: 0
Message 13159 - Posted: 27 Feb 2009, 23:20:05 UTC

Hi,

I know it's old technology but what about this card?

SAPPHIRE 100228L Radeon HD 3850 512MB 256-bit GDDR3 AGP 4X/8X

How much speed loss because of the AGP Bus?

Thanks
ID: 13159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Relative GPU Crunching Speeds

©2024 Astroinformatics Group