The RV670 GPU
At the heart of the Radeon HD 3850 and 3870 lies AMD’s brand new RV670 GPU. RV670 is built largely on the R600 GPU used today in the Radeon HD 2900 XT, the chip features 320 stream processors, just like R600, only it has been updated to include support for UVD, providing full HD decode for H.264 and VC-1. RV670 has also been updated to support DirectX 10.1 and PCI Express 2.0.
The following table summarizes the features found in the Radeon HD 3850 and Radeon HD 3870, and how they compare to AMD’s Radeon HD 2900 XT:
| AMD Radeon HD 2000 Series Comparison |
| Radeon HD 3870 | Radeon HD 3850 | Radeon HD 2900 XT |
| # of Transistors | 666M | 666M | 700M |
| Manufacturing Process | 55-nm | 55-nm | 80-nm |
| # of Stream Processors | 320 | 320 | 320 |
| Texture Units | 16 | 16 | 16 |
| Render Back-ends (ROPs) | 16 | 16 | 16 |
| Core Clock Speed | 777MHz | 670MHz | 740MHz |
| Memory Clock Speed | 2.25GHz (Effective) | 1.80GHz (Effective) | 1.65GHz (Effective) |
| Memory Interface | 256-bit | 256-bit | 512-bit |
| Memory Bandwidth | 72.0 | 57.6 | 105.6 |
| Multiply-Add Math Processing Rate | 497 GigaFLOPS | 428 GigaFLOPS | 475 GigaFLOPS |
| System Bus Support | PCI Express 2.0 | PCI Express 2.0 | PCI Express 1.0/1.1 |
| DirectX Support | 10.1 | 10.1 | 10 |
| Tessellation Unit | Yes | Yes | Yes |
| UVD Support | Yes | Yes | No |
| PowerPlay Support | Yes | Yes | No |
 |
55-nm manufacturing process
As you can see in the chart above, the RV670 chip used in the Radeon HD 3870 and 3850 is built on TSMC’s 55-nm manufacturing process. Moving to a smaller process allows AMD to integrate all of the key features found in R600, including all 320 stream processors, and still manufacture the chip affordably. To further reduce costs, AMD even managed to remove unneeded transistors from R600; moving to a narrower 256-bit memory interface also helps reduce AMD’s manufacturing costs for RV670.
![AMD Radeon HD 3870/3850 Performance Preview [ 55-nm process @ 800 x 600 ] > View Full-Size in another window.](images/07-s.jpg) 55-nm process
|
|
As you can see in the image above, thanks to the 55-nm process, RV670 boasts a much smaller die: 192 square millimeters in RV670 versus R600’s 408 square millimeters. This allows AMD to get twice as many RV670 chips from a single silicon wafer in comparison to R600, assuming equal yields.
Another benefit of the smaller process is reduced power consumption. RV670 needs only one 6-pin PCIe power connector in order to operate. With lower power consumption, the chip also generates less heat. This allows AMD to cool the chip with just a single-slot heatsink/fan unit in the case of the Radeon HD 3850.
DirectX 10.1
AMD’s Radeon HD 3870 and 3850 are the first GPUs on the market to support DirectX 10.1, which will make its debut with the first service pack for Windows Vista sometime next year. DirectX 10.1 is an update to the original DirectX 10 spec released earlier this year with Windows Vista. The most notable addition to DirectX 10.1 is arguably support for real-time global illumination, which should allow developers to provide better lighting and shadows in their games.
![AMD Radeon HD 3870/3850 Performance Preview [ DX10.1 vs DX10 @ 800 x 600 ] > View Full-Size in another window.](images/08-s.jpg) DX10.1 vs DX10
|
|
![AMD Radeon HD 3870/3850 Performance Preview [ ATI Ping Pong DX10.1 Demo @ 800 x 600 ] > View Full-Size in another window.](images/09-s.jpg) ATI Ping Pong DX10.1 Demo
|
|
DirectX 10.1 also provides pre-defined AA sample patterns that all DX10.1 compliant cards must support. This ensures a more consistent minimum level of AA image quality across all DX10.1 cards regardless of manufacturer (board manufacturers are also free to use their own custom sample patterns for even better AA quality).
PCIe 2.0
RV670 fully supports PCIe 2.0. PCIe 2.0 offers double the bandwidth of PCIe 1.1; 8.0GB/sec in each direction, providing a total of 16GB/sec of total memory bandwidth.
PowerPlay and CrossFire X
In addition to the smaller manufacturing process AMD has added a new embedded power state controller to further address RV670’s power draw. This controller monitors the GPU’s command buffer to see how extensively the GPU is being used. If the GPU is only partially taxed, the power state controller can then power down parts of the chip that aren’t being used. According to AMD “engine and memory clocks, voltages, clock gating and other parameters can be altered” as needed.
This goes much further than previous solutions AMD has implemented, which adjusted these aspects dynamically based on what applications were currently running.
Another feature RV670 supports is 4-way CrossFire support. AMD refers to this as CrossFire X. With this feature, up to four RV670 cards can be used to run up to 8 monitors (AMD has a video of this up and running here on YouTube), or the cards can be connected together to deliver a performance improvements of much greater than 2X. AMD will even provide the ability to independently overclock all four cards.
![AMD Radeon HD 3870/3850 Performance Preview [ CrossFire X @ 800 x 600 ] > View Full-Size in another window.](images/10-s.jpg) CrossFire X
|
|
![AMD Radeon HD 3870/3850 Performance Preview [ CrossFire supports independent overclocking @ 800 x 600 ] > View Full-Size in another window.](images/11-s.jpg) CrossFire supports independent overclocking
|
|
AMD will be providing a beta CrossFire X driver sometime next month, and we’ll see an official WHQL release by the end of January 2008.
How does RV670 stack up against G92?
The G92 GPU inside NVIDIA’s GeForce 8800 GT will be going head-to-head with the RV670 chip inside Radeon HD 3870/3850. As such, we’re sure many of you are curious to see how the two GPUs stack up against one another, as well as their predecessors. The following chart summarizes things nicely:
| Mainstream GPU Comparison |
| Radeon HD 3870 | GeForce 8800 GT | Radeon HD 3850 | GeForce 8600 GTS | Radeon HD 2600 XT |
| Core Clock Speed | 777MHz | 600MHz | 670MHz | 675MHz | 800MHz |
| Stream Processor Clock Speed | 777MHz | 1.5GHz | 670MHz | 1.45GHz | 800MHz |
| # of Stream Processors | 320 | 112 | 320 | 32 | 120 |
| Memory Clock | 2.25GHz | 1.8GHz | 1.8GHz | 1.0GHz | 1.1GHz |
| Memory Interface | 256-bit | 256-bit | 256-bit | 256-bit | 256-bit |
| Texture fill-rate (Gigatexels/sec) | 12.4 | 33.6 | 10.7 | 10.8 | 6.4 |
| Memory Bandwidth | 72.0 | 57.6GB/sec | 57.6GB/sec | 32GB/sec | 35.2GB/sec |
| Memory Size | 512MB GDDR4 | 512MB GDDR3 | 256MB GDDR3 | 256MB GDDR3 | 256MB GDDR4 |
| Board Power | 105W | 110W | 95W | 71W | 74W |
| MSRP | $220 | $250+ | $179 | $150 | $150 |
 |
Keep in mind that paper specs can often be deceiving, we saw this most recently with R600. With 320 stream processors, AMD clearly has more shading horsepower than NVIDIA on paper, although keep in mind that while NVIDIA has fewer shaders than AMD, they’re running significantly faster at 1.5GHz. NVIDIA’s GeForce 8800 GT also has more texturing horsepower than AMD. NVIDIA’s at a memory bandwidth disadvantage on paper, but also keep in mind that the GDDR4 memory used on the Radeon HD 3870 card runs at higher latencies than the GDDR3 used on the 8800 GT board.