GF104 - A New Class of Fermi
As previously mentioned, the GeForce GTX 460 isn’t simply a neutered GTX 480. Codenamed GF104, the GPU inside of it has been designed to deliver as much performance in a smaller chip as possible. To accomplish this, NVIDIA integrated a number of changes inside the GPU’s Streaming Multiprocessors (SMs, for short).
To start, NVIDIA increased the number of CUDA cores inside each SM. While the GF100 chip used in the GTX 480/470/465 has 32 CUDA cores per SM, the GF104 has 48, a 50% increase.
With more CUDA Cores per SM, NVIDIA needed to keep the increased CUDA Cores fed with data. To accomplish this, they doubled the number of dispatch units from two in GF100, to four in GF104. As a result, two instructions can be dispatched per warp, for a grand total of four instructions per clock per SM.
Finally, the number of special function units (SFUs) and texture units were doubled from 4 in GF100 to 8 in GF104.
Let’s get down to brass tacks. I’m sure you’re wondering about the particulars and how the GTX 460 stacks up to other graphics cards. Below is a chart comparing it to its direct competitor – the Radeon HD 5830 – as well as its bigger, badder brother – the GeForce GTX 480.
|GeForce GTX 460 Specifications Comparison|
|Radeon HD 5830||GeForce GTX 460 768MB||GeForce GTX 460 1GB||GeForce GTX 480|
|Graphics Processing Clusters||-||2||2||4|
|Graphics Core Clock||800 MHz||675 MHz||675 MHz||700 MHz|
|Stream Processor Clock||800 MHz||1,350 MHz||1,350 MHz||1,400 MHz|
|Memory Clock||1,000 MHz||900 MHz||900 MHz||924 MHz|
|Effective Memory Data Rate||4,000 MHz||3,600 MHz||3,600 MHz||3,696 MHz|
|Video Memory Size||1,024MB GDDR5||768MB GDDR5||1,024MB GDDR5||1,536MB GDDR5|
|Memory Bandwidth||128 GB/sec||86.4 GB/sec||115.2 GB/sec||177.4 GB/sec|
|Texture Fill-rate||44.8 Gigatexels/sec||37.8 Gigatexels/sec||37.8 Gigatexels/sec||42 Gigatexels/sec|
|Max Board Power||175 W||150 W||160 W||250 W|
As you can see, the major difference between the two GTX 460 reference SKUs is the memory. Both feature high quality VRAM running at 900MHz (3.6GHz effective), but the additional amount and bus size lend a significant bandwidth advantage to the 1GB version. Two other, slighter discrepancies can be seen in power consumption and ROP count. It is also worth noting that the 1GB version has 512KB of L2 cache versus the 768MB board’s 384KB.
Tessellation engines – AKA PolyMorph engines for NVIDIA – aren’t brand new to graphics architectures, but have taken center stage along with the debut of DirectX 11. (Although ATI has offered a dedicated tessellation unit in GPUs dating back to the 2900 XT, it was never used in games.) They’re used primarily to accelerate geometry processing for tessellation, of course, which NVIDIA is banking on really taking off in this generation of games.
Comparing raw figures like memory bandwidth and texture fill-rate, it would appear the Radeon HD 5830 has the advantage. However, we’ve found traditional performance metrics like these don’t always prove to be a good indicator of actual gaming performance.
Many of you are no-doubt pleased to see that the GTX 460’s power consumption is significantly less than that of the GTX 480. A whopping reduction of nearly 40% is a welcome improvement, indeed, considering the 480’s operating temperatures peak in excess of 90 degrees C. We’ll go into more detail about that on the next page.