IMPORTANT! Nvidia's 400 series crippled by Nvidia

Author	Message
John P. Myers Send message Joined: 6 Mar 09 Posts: 5 Credit: 47,227,802 RAC: 0	Message 38378 - Posted: 8 Apr 2010, 13:30:45 UTC In an effort to increase the sales of their Tesla C2050 and Tesla C2070 cards, Nvidia has intentionally crippled the FP64 compute ability by a whopping 75%. If left alone, the GTX 480 would have performed FP64 at 672.708 GFLOPS, almost 119 GFLOPS more than ATI's 5870. Instead, the GTX 480 comes in at 168.177 GFLOPS. Here is a link to Nvidia's own forum where we have been discussing this. You will see there is also a CUDA-Z performance screenshot confirming it, on top of the confirmation by Nvidia's own staff. Nvidia is not publically making anyone aware that this is the case. Anandtech also snuck the update onto page 6 of a 20 page review of the GTX 480/470. ID: 38378 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 38405 - Posted: 8 Apr 2010, 18:43:38 UTC Noted 2 weeks ago. I wonder if anyone is willing to own up to crunching with one here? ID: 38405 · Rating: 0 · rate: / Reply Quote

CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 38410 - Posted: 8 Apr 2010, 18:58:16 UTC me - no. My plan was to get it for GPUGRID, but now... In this case way more sense to get 597(9)0 for milky :-) ID: 38410 · Rating: 0 · rate: / Reply Quote

John P. Myers Send message Joined: 6 Mar 09 Posts: 5 Credit: 47,227,802 RAC: 0	Message 38430 - Posted: 8 Apr 2010, 23:32:12 UTC - in response to Message 38410. Last modified: 8 Apr 2010, 23:33:41 UTC @The Gas Giant, at that time, it was just a rumor. Since then, it has been confirmed and demonstrated with a benchmark. If anyone was on the fence waiting for absolute confirmation, now you have it. The link to Nvidia's forum in my first post includes a post(on page 2) showing FP64, FP32, INT32 and INT24 performance. Funny though how Nvidia has still yet to admit this outside of their forums. Even EVGA was not aware this was the case. They are now. ID: 38430 · Rating: 0 · rate: / Reply Quote

mesyn191 Send message Joined: 25 Dec 09 Posts: 29 Credit: 139,554,326 RAC: 2,396	Message 38440 - Posted: 9 Apr 2010, 5:27:34 UTC Actually techreport.com did a "preview" of the GF100 GPU's back in January and it was mentioned in them that they intentionally crippled GPGPU performance in the non-Tesla versions. http://techreport.com/articles.x/18332/5 By all rights, in this architecture, double-precision math should happen at half the speed of single-precision, clean and simple. However, Nvidia has made the decision to limit DP performance in the GeForce versions of the GF100 to 64 FMA ops per clock—one fourth of what the chip can do. This is presumably a product positioning decision intended to encourage serious compute customers to purchase a Tesla version of the GPU instead. Double-precision support doesn't appear to be of any use for real-time graphics, and I doubt many serious GPU-computing customers will want the peak DP rates without the ECC memory that the Tesla cards will provide. But a few poor hackers in Eastern Europe are going to be seriously bummed, and this does mean the Radeon HD 5870 will be substantially faster than any GeForce card at double-precision math, at least in terms of peak rates. They're doing it for reasons of product differentiation, so they can sell more Tesla's which have higher margins for them. Pretty stupid IMO, but whatever, its their product to screw up. ID: 38440 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38441 - Posted: 9 Apr 2010, 6:12:44 UTC As I posted on Einstein@Home: "I'm very disappointed, and really torn about getting one of these cards now. Even if I admit to myself I won't be getting it just for its science computing capabilities, this artificial limiting leaves a very bad taste in my mouth. There's simply no way any consumer is going to buy a Tesla card, so I really think they're shooting themselves in the foot with this. Tesla should simply imply direct, lifetime support and maybe have all the shader clusters enabled. They may think this is a good business decision, but I've seen a little of just how much money some of the more competitive crunchers have - I think doing this is just going to lose them a percentage of their sales without actually helping out Tesla in the slightest." If this is something that was done at the hardware level there's not much we can do about it; if it's purely software, maybe Nvidia will change their minds and give us the full potential of these cards, or maybe someone will figure out how to hack their drivers to do the same thing. Only time will tell I suppose - but let's hope they at least get some seriously bad PR for this. If they're able to keep this mostly under wraps, they have nothing to worry about. ID: 38441 · Rating: 0 · rate: / Reply Quote

mesyn191 Send message Joined: 25 Dec 09 Posts: 29 Credit: 139,554,326 RAC: 2,396	Message 38442 - Posted: 9 Apr 2010, 6:28:28 UTC - in response to Message 38441. Last modified: 9 Apr 2010, 6:31:15 UTC nV is pretty firm on segmenting the market to make Tesla look good, I wouldn't expect them to do a 180 on that. If the GPU is gimped in software (driver or firmware) a hack may be possible. If its done via a burned out fuse or laser cut traces at the fab then there is nothing you can do. The whole GPGPU scene is pretty small, I wouldn't expect much if any in the way of bad PR for nV over this either BTW. Outside of a few programs like MW or F@H no one really cares right now. Perhaps in a few years that will change though after GPGPU gets more widespread. ID: 38442 · Rating: 0 · rate: / Reply Quote

Gill.. Send message Joined: 25 Aug 09 Posts: 12 Credit: 179,143,357 RAC: 0	Message 38443 - Posted: 9 Apr 2010, 7:05:44 UTC - in response to Message 38405. Noted 2 weeks ago. I wonder if anyone is willing to own up to crunching with one here? My 470 is in CT on the way to MA as we speak! I'm torn on how I should set it up, I'm thinking deuce 5850's upstairs - 470 downstairs. If it's loud, wife will make me put it upstairs. I'll do a comparison and definitely post. Either way, getting it for $349 - I couldn't pass it up. right place, right time..... I'm sure some Nvidia-phile would love it if it doesn't compare.... I'll do Seti, GPUGRID and Milky? sound good. I'm so psyched, I can't sleep. ID: 38443 · Rating: 0 · rate: / Reply Quote

Brian Priebe Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0	Message 38447 - Posted: 9 Apr 2010, 9:07:13 UTC - in response to Message 38442. Last modified: 9 Apr 2010, 9:07:46 UTC Outside of a few programs like MW or F@H no one really cares right now. Does Folding@Home care? At one point, Vijay Pande stated in the support forums that the GPU clients were single-precision. ID: 38447 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 38450 - Posted: 9 Apr 2010, 9:51:30 UTC - in response to Message 38443. I'll do Seti, GPUGRID and Milky? sound good. I'm so psyched, I can't sleep. You should test Collatz, too. It probably experiences the largest boost compared to a GTX285 because of the much improved integer capabilities. This may also apply to other math projects with CUDA apps (primegrid comes to my mind). ID: 38450 · Rating: 0 · rate: / Reply Quote

mesyn191 Send message Joined: 25 Dec 09 Posts: 29 Credit: 139,554,326 RAC: 2,396	Message 38451 - Posted: 9 Apr 2010, 10:17:25 UTC - in response to Message 38447. Though their app is single precision only they'd probably be against the principal of GPU manufacturer's gimping their hardware for product segmentation. There is no reason why they'd have to gimp just double precision work loads on their consumer GPU's you know. If they really wanted to they could gimp all GPGPU work loads to make Tesla/FireStream look even better. Like I said, its stupid, but that is the way some of the corporate types think. ID: 38451 · Rating: 0 · rate: / Reply Quote

avidday Send message Joined: 8 Apr 10 Posts: 15 Credit: 534,184 RAC: 0	Message 38453 - Posted: 9 Apr 2010, 11:00:25 UTC - in response to Message 38451. There is no reason why they'd have to gimp just double precision work loads on their consumer GPU's you know. If they really wanted to they could gimp all GPGPU work loads to make Tesla/FireStream look even better. Except, of, course, that single precision GPGPU calculations are indistinguishable from DX10 or DX11 programmable shader calculations, so artificially limiting the single precision throughput would have a direct effect on the 3D graphics performance of the GPU, whereas reducing double precision performance has no such effect. This will be the case until Microsoft mandate double precision capability in some future version of DirectX, or OpenCL becomes the dominant compute API and an integral part of consumer operating systems and applications, and Khronos promotes the cl_khr_fp64 option to an integral part of the standard, rather than an optional extension as it is now. Only then double precision capability will become a focus for consumer GPUs, and the current situation might change. It is worth pointing out that this is hardly a new phenomena for NVIDIA. For as long as there has been Quadro cards, they have been using firmware to only enable a number of OpenGL hardware acceleration paths on Quadro cards used with Quadro specific drivers, despite Quadro products being based off identical silicon to their GeForce counterparts. ID: 38453 · Rating: 0 · rate: / Reply Quote

mesyn191 Send message Joined: 25 Dec 09 Posts: 29 Credit: 139,554,326 RAC: 2,396	Message 38458 - Posted: 9 Apr 2010, 12:30:19 UTC - in response to Message 38453. From what I've been told there is more than one way to gimp GPGPU performance on these chips (just a layman so...), but still have a GPU that meets DX11/10 spec and performs well in games. I know that the long term trend is to make them (GPU's) more general purpose and all, so I'm not really sure how that would play out. It all seems silly and self defeating to me so if you think its a reasonable way to do product segmentation than I dunno what to say. ID: 38458 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38459 - Posted: 9 Apr 2010, 12:31:46 UTC - in response to Message 38453. For as long as there has been Quadro cards, they have been using firmware to only enable a number of OpenGL hardware acceleration paths on Quadro cards used with Quadro specific drivers, despite Quadro products being based off identical silicon to their GeForce counterparts. Then let's hope they did the same thing here so we can crossflash our cards to the Fermi equivalents. ID: 38459 · Rating: 0 · rate: / Reply Quote

avidday Send message Joined: 8 Apr 10 Posts: 15 Credit: 534,184 RAC: 0	Message 38467 - Posted: 9 Apr 2010, 13:42:13 UTC - in response to Message 38458. Last modified: 9 Apr 2010, 13:42:39 UTC From what I've been told there is more than one way to gimp GPGPU performance on these chips (just a layman so...), but still have a GPU that meets DX11/10 spec and performs well in games. As someone who has done a lot of disassembly of CUDA, OpenCL and shader language code, I don't see how it can be done. There really isn't anything inside compiled code to say that "this is a single precision compute job" or "this is a tesselation call" or "this is a shader fragment". It is all just the same assembler code that runs on the same shaders and uses the same ALU/FPUs. There isn't anything obvious I can see that would allow you to artificially limit the instruction issue rate of one, without effecting the other. Double precision is different, because the double precision FPUs are separate from the single precision ones, and it would be possible to have the MP scheduler artificially limit the rate of double precision instruction issue without effecting the performance of the rendering pipeline. It all seems silly and self defeating to me so if you think its a reasonable way to do product segmentation than I dunno what to say. I never said it was reasonable, I just was making the observation that this isn't anything new and shouldn't really have been a surprised to anyone. Both hemispheres of the GPU world have been doing this sort of differentiation with their OpenGL acceleration for as long as they have both been making professional OpenGL cards based on their consumer GPUs. As it is, NVIDIA are about to deliver (in very, very limited quantities if rumours are correct) a new high end consumer card which has about double the usable peak single and double precision performance of its predecessor. Down the road you will be able to buy a professional version which offers slightly lower single precision performance, but about 4 times double precision and more twice the memory (along with stuff like ECC memory) for about 5-6 times higher price. Those who truly need the additional double precision will buy the professional card. Those who don't won't. ID: 38467 · Rating: 0 · rate: / Reply Quote

John P. Myers Send message Joined: 6 Mar 09 Posts: 5 Credit: 47,227,802 RAC: 0	Message 38469 - Posted: 9 Apr 2010, 13:49:23 UTC - in response to Message 38459. I don't know what the rated Integer speed on the current ATI cards is, but if someone else here does, please post it. The Integer compute speed on a 480 is ~672GIOPS. I do not recommend using any Nvidia 400 series card to crunch MW (unless a hack is found. I'm keeping my fingers crossed). They will only be ~2x faster than a GTX 285. ATI has that beat easily for the same price or less. If you've already ordered a 400 series, I'm pretty certain you could put it on eBay and get your money back and maybe a little more. ID: 38469 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 38472 - Posted: 9 Apr 2010, 15:34:51 UTC - in response to Message 38469. Last modified: 9 Apr 2010, 15:36:28 UTC I don't know what the rated Integer speed on the current ATI cards is, but if someone else here does, please post it. The Integer compute speed on a 480 is ~672GIOPS. It depends on the instruction. For adds/subs, shifts and logical instructions it is 1360 GIOps/s on a HD5870, 24 bit integer multiplies are done with 1088 GIOps/s (it can do actually a multiply-add with the same rate, that would mean even 2176 GOperations/s) and 32bit integer multiplies with 272 GIOps/s. Furthemore, you reach the peak values only, if the the compiler finds enough parallel instructions (32bit multiplies don't block the execution of other instructions at the same time for instance). To find the real world throughput for a certain algorithm, the easiest thing is to look at the average filling of the five slots of each execution unit. As we had Collatz@home already as example, the average utilization in the innermost loop is about 58% for HD5000 cards. As 100% would equal 1360 GigaInstructions/s, one arrives roughly at 785 GInstructions/second. That is already above the theoretical throughput of a GTX480, even if the nvidia GPU would reach 100% efficiency. I therefore doubt a bit it will beat the crap out of a HD5870 there. But to say the truth, the efficiency of the cache and memory system also affect the performance with Collatz. Nevertheless, I fully expect a 20% to 30% advantage of a HD5870 against a GTX480 there, which is actually quite an improvement compared to a GTX285 (which has less than 30% the speed of a HD5870, I expect a GTX480 to roughly triple that). ID: 38472 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 38489 - Posted: 9 Apr 2010, 19:18:54 UTC - in response to Message 38469. I don't know what the rated Integer speed on the current ATI cards is, but if someone else here does, please post it. The Integer compute speed on a 480 is ~672GIOPS. I do not recommend using any Nvidia 400 series card to crunch MW (unless a hack is found. I'm keeping my fingers crossed). They will only be ~2x faster than a GTX 285. ATI has that beat easily for the same price or less. If you've already ordered a 400 series, I'm pretty certain you could put it on eBay and get your money back and maybe a little more. Wow, you really do have a problem with Nvidia. The cards just won't work all that well here at MW, but they will at projects that only require SP. You don't need to keep telling us to not buy these cards..... ID: 38489 · Rating: 0 · rate: / Reply Quote

John P. Myers Send message Joined: 6 Mar 09 Posts: 5 Credit: 47,227,802 RAC: 0	Message 38496 - Posted: 9 Apr 2010, 20:51:24 UTC - in response to Message 38489. lol Maybe i've gone a little overboard but since Nvidia won't publicly announce the gimpage, it's only fair that everyone finds out one way or the other. As a fellow cruncher, i would appreciate advice on what upgrades to stay away from. Personally i've always used Nvidia GPUs. I'm not an ATI fanboi. (but may be soon lol) It's low-down and unfair to people like us for Nvidia to make these attepts at hiding. Frankly, it pisses me off. I just don't want to see fellow crunchers spending their money on what they think is a great product for this project. That is all :) ID: 38496 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 38508 - Posted: 9 Apr 2010, 23:25:58 UTC Nvidia do pull a lot of marketing 'shifties' - and I agreee they should be outed so that we can then make up our own minds on what to buy. Sadly though it will most likely take someone to buy one and try it here and own up to it before we get the real performance indication. ID: 38508 · Rating: 0 · rate: / Reply Quote