Summary: Armed with new 8xMSAA and custom filter AA benchmarks, we set out to test the Radeon HD 4850 and 4870 against NVIDIA's latest GeForce GTX and 9800 GPUs. How do the new ATI cards stack up to NVIDIA? You'll be very impressed!
If you’re skeptical of this last statement, just look at what happened last week as proof. As a result of the Radeon HD 4850’s tremendous price/performance ratio, NVIDIA was forced to slash prices on their entire family of GeForce 8/9 graphics cards. The GeForce 9800 GTX went from being a $300 card on Thursday, to selling for $199.99 on Friday! NVIDIA also cooked up a brand new GeForce 9800 GTX+ SKU to take on the new Radeons that will arrive in July. This is all wonderful news if you’re a gamer who is in the market looking to upgrade: building a powerful rig for gaming just got much cheaper as a result.
So how did we get here? Let’s rewind a bit shall we?
After experiencing repeated delays with each next-generation architecture (first R520, then R600) ATI began to realize that designing these massive, cutting-edge GPUs that were needed to service the high-end market was becoming less practical. They came to the conclusion that they were reaching a point of diminishing returns where the resources that were being thrown into these high-end GPUs weren’t being fully realized: by the time software came out that really exploited the capabilities of the hardware, the GPU was outdated. In addition, there were cases where functional units inside these pricey GPUs had to be disabled and/or clock speeds reduced in order to bring lower-priced SKUs to market at specific price points. Examples of this include the Radeon X1800 XL and ATI’s Radeon 2900 Pro/GT, which were based on ATI’s R520 and R600 GPUs respectively.
Rather than devote the hundreds of millions in R&D required to bring another large GPU to market, they decided to take the opposite approach for RV770; they would start with a much smaller, more cost effective midrange GPU design, and scale it up and down to meet the needs of different markets. Their belief was that a less complicated GPU design could be brought to market faster than a large GPU, and at price points that a wider group of people actually want. ATI’s engineers were given very specific transistor and die size budgets to shoot for, while at the same time they were still tasked to achieve certain levels of performance: at least double the performance of R600 was the goal.
But did they accomplish this goal? If you saw the Radeon HD 4850 benchmarks last week, you already know the answer is a resounding YES! In this article, we’re going to take a quick look at the new architecture, and then a deep dive into some benchmarks. We’re not only going to examine 4xAA performance (one of R600’s weak points), but also 8xMSAA and ATI’s custom filter AA modes as well.
ATI lovingly refers to the architecture behind RV770 as their Terascale graphics engine. This moniker is an obvious nod at its distinction as the first desktop graphics card to break the 1 TeraFlop mark (Radeon 4870 actually boasts 1.2 TeraFlops), and as you can see the new GPU boasts some impressive specs:
Unified Superscalar Shader Architecture
Microsoft® DirectX® 10.1 support
OpenGL 2.0 support
Dynamic Geometry Acceleration
Texture filtering features
ATI Avivo™ HD Video and Display Platform
Two independent display controllers
Two integrated DVI display outputs
Integrated AMD Xilleon™ HDTV encoder
ATI CrossFireX™ Multi-GPU Technology
956 million transistors on 55nm fabrication process
PCI Express 2.0 x16 bus interface
256-bit GDDR3/GDDR5 memory interface
Numbers that jump out at you are obviously the increase in the number of stream processors – up from 320 in R600 to 800 in RV770! This increase (along with several others) bumps the transistor count from 666 million in RV670 up to 965 million transistors in RV770. Illustrating RV770’s efficiency, ATI was able to pull this off with a die that’s only about 30% larger than RV670, despite the fact that both GPUs are made on the same 55-nm manufacturing process. All the key features found in RV670 remain, such as DirectX 10.1 support, the tessellation unit, and PCI Express 2.0, while ATI’s added a tweaked unified video decoding engine that boasts new capabilities while the chip boasts a new microcontroller that constantly monitors thermal and activity usage of various blocks within the GPU. The microcontroller controls clock gating, clock speeds, and voltages to ensure the GPU is running at peak power efficiency.
800 stream processors and new texture units
256-bit memory interface
ATI has also incorporated a new memory interface into RV770. ATI has ditched the ring-bus architecture in favor of a new distributed controller design, with four 64-bit memory controllers spread across the edge of the die (256-bit total), directly adjacent to the ROPs. Each memory controller has its own L2 cache, and the controllers are linked to a central hub which handles duties such inter-chip communication, PCI Express, display controllers and the CrossFire interconnect. The memory controllers support GDDR3 memory as well as GDDR5, which is the memory type used on the Radeon HD 4870 and ATI’s upcoming dual GPU Radeon HD 4870 X2.
We introduced you to the Radeon HD 4850 last week, but we’ll go over the 4850 again before proceeding on to the 4870. The Radeon HD 4850 and 4870 cards we received come from VisionTek. The cards are carbon copies of ATI’s reference design for the 4850 and 4870 GPUs; in fact all board partners are sticking to the reference design for their 1st-generation boards. A few manufacturers have announced enhanced Radeon 4850 cards with features such as dual-slot cooling and 1GB of memory, but none of these boards have made it to market, and they won’t hit retail shelves until sometime next month. With that out of the way, let’s take a closer look at the cards…
VisionTek Radeon HD 4850
The Radeon HD 4850 is a single-slot graphics card with a single 6-pin PCIe power connector and copper cooling. According to ATI, peak power draw of the board is just 110W.
As we mentioned last week, the PCB of the 4850 board gets quite hot under load. In fact even at idle the PCB gets pretty toasty. Unfortunately, ATI does not employ heatpipes with this board, relying instead on just a copper heatsink that reminds us of the cooler used previously on the Radeon HD 3850 (although it’s not the same cooler).
Supplying the heatsink with cool air is a variable spin fan that appears to spin up based on usage rather than temperature, or at least it remains at the same speeds at idle regardless of the actual GPU temperature. A lot of end users have complained about this, as idle temps of 70 degrees (or more) have been spotted with the fan barely breaking a sweat to compensate. Since the fan doesn’t spin up, the temps seem to progressively get hotter in PC cases that don’t have adequate ventilation. Since ATI doesn’t provide a method to adjust fan RPMs manually via slider, you’ll probably want to download a 3rd party app like RivaTuner (once it supports RV770) to manually adjust RPMs to something you’re comfortable with. It’s also possible that ATI could address this issue at some point with a future driver update that adjusts RPMs more aggressively.
We think the ideal solution would be for ATI to incorporate a larger fan, much like many of NVIDIA’s board partners did last year when the same problem surfaced on the GeForce 8800 GT. That single-slot board also quickly developed a reputation for running hot until a larger fan was integrated onto the card. Spinning at the same RPMs as the previous fan, this new design dramatically reduced temps without generating a lot of noise, effectively addressing one of the 8800 GT’s few shortcomings.
VisionTek Radeon HD 4870
With its higher clock speeds and power draw (up to 160W), ATI has come up with a dual-slot cooler for the Radeon HD 4870, and the card requires two 6-pin PCIe connectors for power. At 9.7”, the Radeon HD 4870 also measures slightly longer than the Radeon HD 4850, but it’s still nowhere near as long as a GeForce 9800 GTX or GTX 260/280.
In terms of cooling, ATI has developed a cooling system with two copper heatpipes and an aluminum heatsink cooling the heatpipes. A red metal plate is also used to help dissipate heat off the top of the PCB.
ATI borrows the same fan originally used on the 3870 X2 for the Radeon HD 4870. It’s a variable speed fan that runs near silently at idle, and even under load is still under 50dB. Like other dual-slot cards, the fan exhausts hot air from the GPU outside your case, helping to keep the inside of your system cool. Unfortunately though, like the Radeon HD 4850, the 4870’s PCB gets quite hot under use.
Officially Radeon HD 4870 cards will go on sale beginning today, but we’ve been told that supplies of GDDR5 memory have been holding things up and that you won’t see cards en masse until next month. So if you do happen to see a 4870 card in stock today and you want to buy it, don’t wait as retailers may have a hard time keeping boards on hand.
We should also note that while both VisionTek cards carry Mass Effect PC logos and branding (both on the box and on the cards themselves), neither of our boards actually shipped with a copy of the game itself.
SIDEBAR: The codename for the Radeon HD 4850 card is Makedon, while the 4870 board was codenamed Trojan.
Normally when AA is applied, a box filter is used. With newer ATI drivers however, end users can run custom ATI filters to deliver improved AA quality over the traditional box filter. In particular we’re using ATI’s edge detect filter, which harnesses the GPU’s shaders to analyze the image and apply additional filtering on all the edges. According to ATI, the edge detect filter is capable of delivering AA that is similar to the image quality achieved if three times the number of samples had been taken using a conventional box filter. In laymen’s terms, if 4xAA is enabled with the edge detect filter turned on, the AA quality would be the equivalent of 12xAA without the filter, and 8xAA with edge detect delivers visuals equivalent to 24xAA. We’ve provided screenshots that illustrate this:
4xAA with edge detect (12xAA)
8xAA with edge detect (24xAA)
As you can see in the 400% zoomed screenshots, the edges of the power cord have quite a few jaggies under conventional 4xMSAA, and to a lesser extent with 8xAA. The jaggies are considerably smoother once the edge detect filter is applied though, you can really see this in the final screenshot, where 8xAA with edge detect is used to produce “24x”AA.
Unfortunately, custom filter AA doesn’t come free. You will see a performance hit. The premise though is simple: CFAA can be used to enhance the image quality of older and/or less-demanding games. Or if you’re on an LCD and stuck at a max resolution of say 1680x1050, where a card like the Radeon HD 4870 has plenty of extra memory bandwidth to spare, you could conceivably use CFAA to improve game visuals while hopefully still maintaining a playable frame rate.
That’s the theory at least. But does it pan out?
To test this we booted up our Half-Life 2: Episode Two timedemo as well as Quake Wars, and ran tests with Oblivion in our foliage area. The results were definitely mixed:
As you can see, the 4850 and 4870 cards took a huge hit in performance in Episode Two, but the performance hit wasn’t that great in Quake Wars or Oblivion. The Radeon 4850 saw a performance reduction of just 15% when going from 8xMSAA to 8xMSAA with edge detect filter (24xAA) in Quake Wars for instance. Why is that?
For starters, our Episode Two demo is taken in a large outdoor environment. Trees, bushes, and other foliage are everywhere. While our Oblivion foliage test also contains lots of foliage, we’re running in a smaller, more concentrated area that isn’t as expansive as our Episode Two demo where trees that are miles away have to be rendered. Oblivion is also getting old in the tooth, even at max settings a performance-oriented card like the 4850 is more than capable of handling it at 1920x1200 with HDR and 4xAA/16xAF.
Our Quake Wars timedemo is also outdoors, but it doesn’t contain nearly the amount of foliage as the other two games, hence the better CFAA performance.
Based on this, we feel pretty safe in surmising that the extent of the performance hit you’ll see from CFAA will not only vary from game-to-game, but it’s also very likely that it could vary from one level to the next. A game with a mixture of outdoor and indoor areas will see lots of performance fluctuations: in an area with large, open spaces and lots of foliage you’ll see a greater performance hit than a scene that takes place indoors with tight corridors.
Intel Core 2 Extreme QX9770
EVGA nForce 790i Ultra SLI motherboard (for GeForce cards)
ASUS P5E3 Premium WiFi AP Edition (for Radeon cards)
4GB OCZ DDR3 @ 1333MHz
GeForce 9800 GX2
GeForce 8800 GT 512MB
GeForce GTX 290
GeForce GTX 260
GeForce 9800 GTX+
GeForce 9800 GTX
AMD Radeon HD 4850
AMD Radeon HD 4870
300GB Western Digital Caviar SE
Windows Vista Ultimate 64-bit w/Service Pack 1
Company of Heroes 1.71
Both ATI and NVIDIA supplied us with newer drivers for the GeForce GTX 200/9800 GTX and Radeon HD 4800 series cards than the drivers we tested with last week. In fact, ATI’s Series 5 driver is so new we just received it on Monday, we’ve been told it offers improved CrossFire performance over the 4850 hotfix driver that was released on Friday. We’re also testing the Radeon 4800 cards on ASUS’ P5E3 Premium X48 motherboard, whereas the 4850 was tested last week on the same motherboard as the GeForce cards, EVGA’s 790i SLI.
We’ve excluded Radeon 4870 CrossFire results from this graph, as we couldn’t get the cards to completely scale in CoH. It’s not that the cards weren’t scaling under CrossFire, they certainly were, but their scaling performance was barely an improvement over the 4850 CrossFire setup. We’re confident we weren’t CPU-bound either, as we saw this occur under both 4xAA and 8xAA with max game settings. No matter what we tried the cards were only a couple of percentage points faster than 4850 CrossFire. We’re going to look into the issue further, we have a feeling we probably ran into a snag with the cards under Vista. CoH was the only game that exhibited this behavior with the 4870’s running CrossFire so we’re confident it was an isolated incident.
Crysis High – Direct3D
But what about today’s cards?
We can’t help but be impressed by the Radeon HD 4850. Priced at $199, the Radeon HD 4850 completely shatters what you expect out of a $200 graphics card: just a few months ago a card with this kind of performance would have set you back at least $400. At its worst, it performs on par or slightly slower than NVIDIA’s GeForce 9800 GTX. The GeForce 9800 GTX ran up to 13% faster than Radeon 4850 in Lost Planet, although by 2560x1600 the 4850 had surpassed the 9800 GTX+ in performance. The Radeon 4850 also traded blows with the 9800 GTX in our testing with Company of Heroes DX10, with both cards posting wins at different resolutions. At its best though, there are games like BioShock where the Radeon HD 4850 outperformed NVIDIA’s $400 GeForce GTX 260!
The Radeon 4850 also outgunned the 9800 GTX+ in our Crysis tests with AA, and the 4850 also outperformed the GeForce 9800 GTX in the majority of our DX9 tests.
For those of you who are willing to splurge a little, with its faster clocks and GDDR5 memory, ATI’s Radeon HD 4870 generally performs about 15-20% faster than the Radeon HD 4850. This is more than enough to best the GeForce 9800 GTX; in fact the board often comes close to, or eclipses the GeForce GTX 260 in performance (in BioShock the 4870 was actually faster than GTX 280) for $100 less!
ATI’s really dialed in their 8xMSAA performance also. Whereas GeForce GTX and 9800 boards see a significant performance hit at 8xAA, the Radeon 4800 boards continue to scale well. 8xAA is actually playable in games like Oblivion, Company of Heroes, and Quake Wars. In fact, in our testing the Radeon 4850 was capable of giving the GeForce GTX 260 a run for its money in Oblivion, Quake Wars, and Episode 2, while the 4870 actually outgunned the GTX 280! The GTX cards managed to pull ahead in CoH though.
While we don’t have graphs, we also briefly checked out Call of Duty 4 with 8xAA and the 4850 was nearly pumping out 60 fps on average.
With its 512-bit memory interface, the GTX 280 should have flourished in these tests, but it just didn’t. We have a feeling that part of this is probably due to drivers, NVIDIA probably hasn’t fine-tuned their 8xAA performance just yet, but this is also a testament to the ROP and texturing improvements ATI has instituted into RV770 as well as the 4870’s GDDR5 memory.
The Radeon 4800s are even more impressive when combined together for CrossFire. In our testing, dual 4850s were faster than one GeForce GTX 280. We were completely shocked to see this kind of performance out of a $199 graphics card! NVIDIA argues that there are cases where two GeForce 9800 GPUs outperform the GTX 280 also, so we’ll have to test that in our next article. We’re also eager to see how well the 55-nm process NVIDIA has incorporated for the GTX+ scales. Our RV770 boards didn’t OC as well as we’d hoped, but it’s also likely that we’re being held up by cooling. Hopefully ATI’s board partners will come up with some unique Radeon 4800 cards shortly.
As exciting as these new Radeon cards are when it comes to gaming performance, the future is equally bright on the GPU computing front. At ATI’s Cinema 2.0 event, Adobe showed off the exact same Photoshop demos that we discussed in our GeForce GTX 200 article, so Radeon-accelerated Photoshop will be on the way shortly. ATI is also partnering with CyberLink to bring GPU-based video encoding to PowerDirector.
In closing, RV770 has put ATI back in the game when it comes to performance. We know they’ve also got R700 on tap to directly take on the GeForce GTX 280, but with the 4850 and 4870 cards already performing this well, does it even matter? As long as ATI can continue to keep their CrossFire drivers scaling well as newer DX10 content is released later this year, we think the $400 Radeon 4850 CrossFire combo is pretty hard to beat. If today’s CrossFire performance is any indication, R700 will just pull ATI that much further ahead of GeForce GTX 200.
|© Copyright 2003 FS Media, Inc.|