A Ferrari 458 Italia rendered with GPU-based ray tracing
GPU Ray tracing isn't quite ready for gaming in realtime however. GF100 ran this demo at 0.6 fps versus GT200 at 0.2 fps.
NVIDIA’s got a big problem. Literally.
Their next-generation DirectX 11 graphics chip, internally codenamed GF100 (Graphics Fermi, “100” denotes the high-end part of the GF family of GPUs) missed Q4, the busiest time of the year, and it’s looking like it’s going to miss most of Q1 also.
The GPU is the most ambitious design NVIDIA’s ever conceived, containing over 3 billion transistors, making it the most complex chip on the planet. To provide some context, NVIDIA’s current flagship, the GT200b GPU found inside GeForce GTX 285 features 1.4 billion transistors, while ATI’s RV870 GPU used in the Radeon HD 5870 weighs in at 2.15 billion transistors. How does this compare to CPUs? Intel’s Core i7-965, which is universally considered to be the most powerful CPU money can buy, contains approximately 731 million transistors.
NVIDIA Supersonic sled demo
The supersonic sled demo uses tessellation, DX11, and PhysX effects
As you can imagine, designing a 3 billion transistor GPU isn’t easy. NVIDIA’s likely poured hundreds of millions into R&D on GF100. Part of the reason why the chip is late and so large is because GF100 is dramatically different than previous GeForce GPUs. Unlike GT200 and RV870, which are fundamentally based on preceding graphics designs, merely tweaked to support new features and scaled up to incorporate more shaders, with refinements here and there, GF100 is a completely new architecture. NVIDIA’s redesigned and tweaked practically everything on the chip, with an emphasis on GPU compute and geometric realism.
As far as NVIDIA’s concerned, these two areas are key to ushering in the next-generation of gaming.
NVIDIA envisions a future where games begin to incorporate hybrid rendering, where the strengths of different methods are combined to produce the final result. For instance, a game would use DirectX 11 to render the basic scene, then use the GPU’s compute engine for gaming applications. You could selectively incorporate ray-tracing for effects like shadowing or reflections, while Direct Compute could be use to add depth of field (DoF) effects instead of using the pixel shaders (good DoF techniques don’t always perform well with pixel shaders). And of course, there’s also PhysX.
To improve geometry processing performance, NVIDIA has incorporated 16 tessellation engines and four raster engines; the graphics pipeline itself has been changed – instead of handling geometry processing at the front of the pipeline, where it’s traditionally done, its been incorporated directly into the streaming multiprocessors (SMs, ATI refers to them as SIMD units), the shading clusters themselves.
Each SM has its own dedicated hardware for tessellation and other geometry processing units. As a result, GF100 can perform tessellation and other geometry processing in parallel, enabling breakthrough levels of geometry performance.
But it doesn’t stop there. As NVIDIA revealed at GPU Technology Conference last year, GF100 features a 512 unified shader architecture, that’s more than twice that of GT200, which featured 240. NVIDIA has incorporated a 384-bit memory interface – that’s a little narrower than GT200’s 512-bit interface – but thanks to the use of GDDR5 memory, GF100 should end up offering more memory bandwidth than its predecessor.
We’ll be discussing all this and more in further detail on the following pages. Unfortunately, we aren’t going to be able to discuss clock speeds or performance today, NVIDIA’s not quite ready to commit to that level of detail just yet, but we can tell you all about the architecture behind GF100. Let’s get started…
An excerpt from NVIDIA’s Supersonic Sled demo. Source: NVIDIA