Relative GPU Crunching Speeds

Author	Message
The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 12983 - Posted: 26 Feb 2009, 20:42:03 UTC Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? Live long and BOINC. ID: 12983 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 12984 - Posted: 26 Feb 2009, 21:04:20 UTC - in response to Message 12983. Last modified: 26 Feb 2009, 21:08:00 UTC Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? Live long and BOINC. All I could do for you is to quote my times. I suppose if we could get enough people to quote the times we could get a feel for it. The only problem is that the times form a spread ... s20: 6.02 - 6.3 s21: 6.05; 11.61 - 11.83 s22: 6.84 - 9.14; 12.59 - 13.06 s23: 6.34 - 6.70; 12.48 - 12.58 s79: 5.73 - 6.11 s82: 6.02 - 6.13 s86: 9.13 - 9.36 I don't know why there are two groupings of numbers for some of the streams. As I understood the idea, I would have expected clustering about one set of numbers. Either 6 or 12 seconds is fast enough ... the major hang-up I still have is that I have occassional outages... and at times after I get loaded with work it does not start running tasks without intervention. Again, this is still alpha so I am not too bent about it ... I did try 6.11, and gave up after half an hour. Much worse for me on fetching work than 6.5.0 ... Oh, HD 4870, 512 M, standard clocks (I did nothing to change them): 770MHz GPU 900MHz memory 50% fan 57-62C ID: 12984 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13001 - Posted: 27 Feb 2009, 0:00:54 UTC - in response to Message 12983. Last modified: 27 Feb 2009, 0:04:53 UTC Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base): 3850 : 3870 : 4830 : 4850 : 4870 1 : 1.16 : 1.72 : 2.33 : 2.80 Of course the ratios only applies for stock clocks (670MHz for 3850, 775MHz for 3870, 575MHz for 4830, 625MHz for 4850 and 750MHz for 4870 with the memory speed only a very minor factor). ID: 13001 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13011 - Posted: 27 Feb 2009, 1:22:10 UTC - in response to Message 13001. Last modified: 27 Feb 2009, 1:34:56 UTC When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base): 3850 : 3870 : 4830 : 4850 : 4870 1 : 1.16 : 1.72 : 2.33 : 2.80 Couldn't edit anymore. With a small addition it looks like that: 3850 : 3870 : 4830 : 4850 : 4870 : 4890 (850MHz) : 4890 (850 MHz, 960 shader ?) 1 : 1.16 : 1.72 : 2.33 : 2.80 : 3.17 : 3.81 ID: 13011 · Rating: 0 · rate: / Reply Quote

Kevint Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0	Message 13020 - Posted: 27 Feb 2009, 2:35:38 UTC Have you found that different CPU or bus type has an impact on speed... I installed a 4870 today on an older Pent D 3.0 - It has a older IDE drive on to a SATA - And my times are almost double what Paul has posted. . ID: 13020 · Rating: 0 · rate: / Reply Quote

caferace Send message Joined: 4 Aug 08 Posts: 46 Credit: 8,255,900 RAC: 0	Message 13022 - Posted: 27 Feb 2009, 3:03:33 UTC I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal? I can get some times tomorrow, but I imagine the variance between GPU clock and Wall clock has a bearing as well... -jim ID: 13022 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13024 - Posted: 27 Feb 2009, 4:29:59 UTC - in response to Message 13020. Have you found that different CPU or bus type has an impact on speed... I installed a 4870 today on an older Pent D 3.0 - It has a older IDE drive on to a SATA - And my times are almost double what Paul has posted. It is a bit hard to measure the times the WUs really need. Best would be to take a stop watch and measure the time it needs for 24 WUs of one type by hand. The times you see in the task list are actually CPU times. But the CPU time has no immediate meaning for WUs crunched on the GPU. With a new test version I have already seen CPU times as low as 1.0 seconds for a 8 credit WU. If you lower the CPU load of the GPU app the measured CPU times will also be lower. Sometimes the wall clock time in the task details will give you some indication of the real times. If for instance 4 WUs run at all times, the wall clock time will be roughly 4 times the time a single WU actually needed to be crunched on the GPU. But this also only works if you have 4 identical WUs. If they are of different types this method will also give somewhat skewed results. I have to think about how to measure some kind of GPU time. But this will be hard as the GPU executes the stuff asynchronously. If you want to measure it correctly, you have to poll the GPU very fast (raising the CPU load). Maybe I find a way to do this without an excessive CPU load and add it later to enable easier comparisons. ID: 13024 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13025 - Posted: 27 Feb 2009, 4:34:27 UTC - in response to Message 13022. I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal? I can get some times tomorrow, but I imagine the variance between GPU clock and Wall clock has a bearing as well... -jim It's more like multithreading on a single core ;) And it is perfectly normal. You may activate another project to crunch fewer WUs at once but individually faster (will give you the same throughput). And that would put your CPU cores to some use, too. ID: 13025 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 13030 - Posted: 27 Feb 2009, 6:38:23 UTC I will make one other observation, which held true on GPU Grid and seems to be sort of true on this GPU application ... the CPU load seems to be handled better on HT than on a straight CPU core. I suppose that it is the interleave effect where the "idle" load is not competing so much for the resources and so the other tasks run well and the GPU attendance is less "costly" ... I conclusively proved it on GPU Grid, but I cannot here because I only have the one card running on an HT machine ... I would run out and buy another card but it is income tax time and I just spent a wad on Nvidia cards and have run out of spare PCI-e slots ... alas ... ID: 13030 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 13034 - Posted: 27 Feb 2009, 8:12:30 UTC Any issues with mixing an ATI card and a Nvidia card in the same machine? ID: 13034 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 13043 - Posted: 27 Feb 2009, 10:28:23 UTC Last modified: 27 Feb 2009, 10:29:40 UTC Thanks Paul. A question I was thinking about. Assuming the old card was left in the box (mixed cards), I assume this would drive the screen and other graphics as it has always done. But, would there be any IRC conflicts? ID: 13043 · Rating: 0 · rate: / Reply Quote

MJD1964 Send message Joined: 24 Nov 07 Posts: 9 Credit: 102,125,541 RAC: 0	Message 13066 - Posted: 27 Feb 2009, 16:46:11 UTC Do the 4830's work with the GPU app ????? ID: 13066 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 13078 - Posted: 27 Feb 2009, 17:44:36 UTC - in response to Message 13066. Do the 4830's work with the GPU app ????? Nope. Only the HD3850, HD3870, HD4850, HD4870 and the HD4890 when it's due to come out in April. ID: 13078 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13082 - Posted: 27 Feb 2009, 18:11:23 UTC - in response to Message 13066. Do the 4830's work with the GPU app ????? Yes they do. It is tested and they run. ID: 13082 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 13084 - Posted: 27 Feb 2009, 18:19:28 UTC - in response to Message 13043. Thanks Paul. A question I was thinking about. Assuming the old card was left in the box (mixed cards), I assume this would drive the screen and other graphics as it has always done. But, would there be any IRC conflicts? There should not be IRQ conflicts, however, I have never tried a Nvidia / ATI mix... THEORY says it would work ... but some have suggested with XP it will not, but will with Vista. No experience. HOWEVER ... I have had various versions of cards from Nvidia in the same system at the same time ... either a GTX280 and 9800 GT or GT 280 and GTX 295 ... and it worked. IT gave me two or three cores working on GPU Grid. I am currently running in my one dual PCI-e computer two GTX 295 cards. All other machines have only one PCI-e slot and in those I have GTX 280, 9800 GT and a HD4870 ... I have an Nvidia card in the Mac Pro not that it does me any good (yet) ... and when I build the next system I will be looking to get it with at least 2 PCI-e slots or better 3 ... From there I will possibly look to replace just the MB to add PCI-e slots so I can add GPUs... You can get, for Nvidia cards at least, a good feel for what is going on over at GPU Grid ... even if you don't yet have a card you can make an account and lurk the forums ... Long term I am likely to be having a mix of systems with one of them with ATI cards and the other with Nvidia for the next couple years. ID: 13084 · Rating: 0 · rate: / Reply Quote

Napsterbater Send message Joined: 26 Feb 09 Posts: 4 Credit: 5,592,569 RAC: 0	Message 13086 - Posted: 27 Feb 2009, 18:20:10 UTC - in response to Message 13025. Last modified: 27 Feb 2009, 18:32:43 UTC It's more like multithreading on a single core ;) And it is perfectly normal. You may activate another project to crunch fewer WUs at once but individually faster (will give you the same throughput). And that would put your CPU cores to some use, too. Ok i have a Tri core AMD phenom II x3 720 with an ATI 4870, I was testing some setting and set the "avg_ncpus" to .1, and restarted boinc what I saw with I got 3 MW WUs running and 3 Seti WUs running (2 using 100% of 2 cores 1 use maybe 66%ish), now my questing is do these WU need a 100% of a core to do the same work or can they use 33%ish (all 3 combined) and get the same work done. What I'm trying to say in theory would allowing another project to rune like that produce that same amount of credits/hr for WM, then not running the other project. Just trying to get the most out of my computer. EDIT:To add If I leave "avg_ncpus" set to .5 Boinc only runs 2 Seti (100%x2) and 2 MW at 50% each (100% of one core), and .1 get 3xSeti 2x100% 1x66% and 3 MW's using a total of 33% of one core. ID: 13086 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 13093 - Posted: 27 Feb 2009, 19:11:50 UTC - in response to Message 13082. Do the 4830's work with the GPU app ????? Yes they do. It is tested and they run. Hmmmm, I better update zslip then... You're just getting too good Cluster Physik ;) ID: 13093 · Rating: 0 · rate: / Reply Quote

caferace Send message Joined: 4 Aug 08 Posts: 46 Credit: 8,255,900 RAC: 0	Message 13108 - Posted: 27 Feb 2009, 22:02:58 UTC Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU) Running Milkyway@home ATI GPU application version 0.19 by Gipsel CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (315ms) CAL Runtime: 1.3.145 Found 1 CAL devices Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote) GPU core clock: 669 MHz, memory clock: 829 MHz 320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU. Calculated about 9.08961e+008 floatingpoint ops on FPU (stars). WU completed. It took 28.4531 seconds CPU time and 29.362 seconds wall clock time @ 2.40011 GHz. ---- Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (8 WU's active) Running Milkyway@home ATI GPU application version 0.19 by Gipsel CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (349ms) CAL Runtime: 1.3.145 Found 1 CAL devices Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote) GPU core clock: 669 MHz, memory clock: 829 MHz 320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU. Calculated about 9.08961e+008 floatingpoint ops on FPU (stars). WU completed. It took 29.0313 seconds CPU time and 226.155 seconds wall clock time @ 2.40011 GHz. NOTE: This unit is a cruncher only, with no other applications running. Stock clocks on both CPU and 3850. Hope this is of help... -jim ID: 13108 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 13118 - Posted: 27 Feb 2009, 22:32:26 UTC - in response to Message 13108. Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU) WU completed. It took 28.4531 seconds CPU time and 29.362 seconds wall clock time @ 2.40011 GHz. ---- Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (8 WU's active) WU completed. It took 29.0313 seconds CPU time and 226.155 seconds wall clock time @ 2.40011 GHz. NOTE: This unit is a cruncher only, with no other applications running. Stock clocks on both CPU and 3850. Hope this is of help... -jim It's great you have done this! Let's see. A single WU takes 29.362s wall clock time. That means, you would have a throughput of 122.6 WUs (of this kind) per hour. With 8 concurrently running WUs, 8 of them take together 226.155s, or 28.27 s per WU. That is actually the single second per WU (would be slightly more with a slow CPU) that can be shaved off by overlapping the calculations of two (or more) concurrent WUs, I was talking about several times already. It is not that much for the HD3850 card (throughput is now 127.35 WUs per hour, less than a 4% increase), but for a HD4850 or 4870 it would be already about 10%. ID: 13118 · Rating: 0 · rate: / Reply Quote

Avatar1966 Send message Joined: 21 Jul 08 Posts: 3 Credit: 33,031,789 RAC: 0	Message 13159 - Posted: 27 Feb 2009, 23:20:05 UTC Hi, I know it's old technology but what about this card? SAPPHIRE 100228L Radeon HD 3850 512MB 256-bit GDDR3 AGP 4X/8X How much speed loss because of the AGP Bus? Thanks ID: 13159 · Rating: 0 · rate: / Reply Quote