Message boards :
Number crunching :
Relative GPU Crunching Speeds
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? Live long and BOINC. |
![]() Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 ![]() ![]() |
Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? All I could do for you is to quote my times. I suppose if we could get enough people to quote the times we could get a feel for it. The only problem is that the times form a spread ... s20: 6.02 - 6.3 s21: 6.05; 11.61 - 11.83 s22: 6.84 - 9.14; 12.59 - 13.06 s23: 6.34 - 6.70; 12.48 - 12.58 s79: 5.73 - 6.11 s82: 6.02 - 6.13 s86: 9.13 - 9.36 I don't know why there are two groupings of numbers for some of the streams. As I understood the idea, I would have expected clustering about one set of numbers. Either 6 or 12 seconds is fast enough ... the major hang-up I still have is that I have occassional outages... and at times after I get loaded with work it does not start running tasks without intervention. Again, this is still alpha so I am not too bent about it ... I did try 6.11, and gave up after half an hour. Much worse for me on fetching work than 6.5.0 ... Oh, HD 4870, 512 M, standard clocks (I did nothing to change them): 770MHz GPU 900MHz memory 50% fan 57-62C |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
Since the MW tasks on the GPU take just seconds, what is the relative worth of the various suitable ATI cards. For example the cards that can be used are HD3850, HD3870, HD4850 and HD4870. So do we know how much better than the other each card in the progression is? When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base): 3850 : 3870 : 4830 : 4850 : 4870 1 : 1.16 : 1.72 : 2.33 : 2.80 Of course the ratios only applies for stock clocks (670MHz for 3850, 775MHz for 3870, 575MHz for 4830, 625MHz for 4850 and 750MHz for 4870 with the memory speed only a very minor factor). |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
When neglecting the 1-2 seconds CPU calculation for every WU (which is justified if you have two or more concurrent WUs running), one can break it down to the following performance relations (taking the HD3850 as base):3850 : 3870 : 4830 : 4850 : 4870 1 : 1.16 : 1.72 : 2.33 : 2.80 Couldn't edit anymore. With a small addition it looks like that: 3850 : 3870 : 4830 : 4850 : 4870 : 4890 (850MHz) : 4890 (850 MHz, 960 shader ?) 1 : 1.16 : 1.72 : 2.33 : 2.80 : 3.17 : 3.81 |
![]() ![]() Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 ![]() ![]() |
|
![]() ![]() Send message Joined: 4 Aug 08 Posts: 46 Credit: 8,255,900 RAC: 0 ![]() ![]() |
I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal? I can get some times tomorrow, but I imagine the variance between GPU clock and Wall clock has a bearing as well... -jim |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
It is a bit hard to measure the times the WUs really need. Best would be to take a stop watch and measure the time it needs for 24 WUs of one type by hand. The times you see in the task list are actually CPU times. But the CPU time has no immediate meaning for WUs crunched on the GPU. With a new test version I have already seen CPU times as low as 1.0 seconds for a 8 credit WU. If you lower the CPU load of the GPU app the measured CPU times will also be lower. Sometimes the wall clock time in the task details will give you some indication of the real times. If for instance 4 WUs run at all times, the wall clock time will be roughly 4 times the time a single WU actually needed to be crunched on the GPU. But this also only works if you have 4 identical WUs. If they are of different types this method will also give somewhat skewed results. I have to think about how to measure some kind of GPU time. But this will be hard as the GPU executes the stuff asynchronously. If you want to measure it correctly, you have to poll the GPU very fast (raising the CPU load). Maybe I find a way to do this without an excessive CPU load and add it later to enable easier comparisons. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
I'm crunching with my Q6600 and a 3850 and it does 8 tasks at once, almost like it's hyperthreading. BOINC 6.4.6. Is that normal? It's more like multithreading on a single core ;) And it is perfectly normal. You may activate another project to crunch fewer WUs at once but individually faster (will give you the same throughput). And that would put your CPU cores to some use, too. |
![]() Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 ![]() ![]() |
I will make one other observation, which held true on GPU Grid and seems to be sort of true on this GPU application ... the CPU load seems to be handled better on HT than on a straight CPU core. I suppose that it is the interleave effect where the "idle" load is not competing so much for the resources and so the other tasks run well and the GPU attendance is less "costly" ... I conclusively proved it on GPU Grid, but I cannot here because I only have the one card running on an HT machine ... I would run out and buy another card but it is income tax time and I just spent a wad on Nvidia cards and have run out of spare PCI-e slots ... alas ... |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
Any issues with mixing an ATI card and a Nvidia card in the same machine? |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 ![]() ![]() |
Thanks Paul. A question I was thinking about. Assuming the old card was left in the box (mixed cards), I assume this would drive the screen and other graphics as it has always done. But, would there be any IRC conflicts? |
Send message Joined: 24 Nov 07 Posts: 9 Credit: 102,125,541 RAC: 0 ![]() ![]() |
Do the 4830's work with the GPU app ????? |
![]() ![]() Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 ![]() ![]() |
|
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
Do the 4830's work with the GPU app ????? Yes they do. It is tested and they run. |
![]() Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 ![]() ![]() |
Thanks Paul. A question I was thinking about. There should not be IRQ conflicts, however, I have never tried a Nvidia / ATI mix... THEORY says it would work ... but some have suggested with XP it will not, but will with Vista. No experience. HOWEVER ... I have had various versions of cards from Nvidia in the same system at the same time ... either a GTX280 and 9800 GT or GT 280 and GTX 295 ... and it worked. IT gave me two or three cores working on GPU Grid. I am currently running in my one dual PCI-e computer two GTX 295 cards. All other machines have only one PCI-e slot and in those I have GTX 280, 9800 GT and a HD4870 ... I have an Nvidia card in the Mac Pro not that it does me any good (yet) ... and when I build the next system I will be looking to get it with at least 2 PCI-e slots or better 3 ... From there I will possibly look to replace just the MB to add PCI-e slots so I can add GPUs... You can get, for Nvidia cards at least, a good feel for what is going on over at GPU Grid ... even if you don't yet have a card you can make an account and lurk the forums ... Long term I am likely to be having a mix of systems with one of them with ATI cards and the other with Nvidia for the next couple years. |
Send message Joined: 26 Feb 09 Posts: 4 Credit: 5,592,569 RAC: 0 ![]() ![]() |
It's more like multithreading on a single core ;) Ok i have a Tri core AMD phenom II x3 720 with an ATI 4870, I was testing some setting and set the "avg_ncpus" to .1, and restarted boinc what I saw with I got 3 MW WUs running and 3 Seti WUs running (2 using 100% of 2 cores 1 use maybe 66%ish), now my questing is do these WU need a 100% of a core to do the same work or can they use 33%ish (all 3 combined) and get the same work done. What I'm trying to say in theory would allowing another project to rune like that produce that same amount of credits/hr for WM, then not running the other project. Just trying to get the most out of my computer. EDIT:To add If I leave "avg_ncpus" set to .5 Boinc only runs 2 Seti (100%x2) and 2 MW at 50% each (100% of one core), and .1 get 3xSeti 2x100% 1x66% and 3 MW's using a total of 33% of one core. |
![]() ![]() Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 ![]() ![]() |
|
![]() ![]() Send message Joined: 4 Aug 08 Posts: 46 Credit: 8,255,900 RAC: 0 ![]() ![]() |
Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU) Running Milkyway@home ATI GPU application version 0.19 by Gipsel CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (315ms) CAL Runtime: 1.3.145 Found 1 CAL devices Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote) GPU core clock: 669 MHz, memory clock: 829 MHz 320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU. Calculated about 9.08961e+008 floatingpoint ops on FPU (stars). WU completed. It took 28.4531 seconds CPU time and 29.362 seconds wall clock time @ 2.40011 GHz. ---- Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (8 WU's active) Running Milkyway@home ATI GPU application version 0.19 by Gipsel CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 2.4001 GHz (349ms) CAL Runtime: 1.3.145 Found 1 CAL devices Device 0: ATI Radeon HD 3800 (RV670) 512 MB local RAM (28 MB cached + 512 MB uncached remote) GPU core clock: 669 MHz, memory clock: 829 MHz 320 shader units organized in 4 SIMDs with 16 5-issue VLIW units each supporting double precision Calculated about 1.2268e+012 floatingpoint ops on GPU, 6.18221e+007 on FPU. Calculated about 9.08961e+008 floatingpoint ops on FPU (stars). WU completed. It took 29.0313 seconds CPU time and 226.155 seconds wall clock time @ 2.40011 GHz. NOTE: This unit is a cruncher only, with no other applications running. Stock clocks on both CPU and 3850. Hope this is of help... -jim |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 ![]() ![]() ![]() |
Q6600 & HD3850 512Mb using BOINC 6.4.6 Result (lone WU) It's great you have done this! Let's see. A single WU takes 29.362s wall clock time. That means, you would have a throughput of 122.6 WUs (of this kind) per hour. With 8 concurrently running WUs, 8 of them take together 226.155s, or 28.27 s per WU. That is actually the single second per WU (would be slightly more with a slow CPU) that can be shaved off by overlapping the calculations of two (or more) concurrent WUs, I was talking about several times already. It is not that much for the HD3850 card (throughput is now 127.35 WUs per hour, less than a 4% increase), but for a HD4850 or 4870 it would be already about 10%. |
Send message Joined: 21 Jul 08 Posts: 3 Credit: 33,031,789 RAC: 0 ![]() ![]() |
Hi, I know it's old technology but what about this card? SAPPHIRE 100228L Radeon HD 3850 512MB 256-bit GDDR3 AGP 4X/8X How much speed loss because of the AGP Bus? Thanks |
©2025 Astroinformatics Group