Summary: Armed with its GF110 GPU and new vapor chamber cooler, the GeForce GTX 580 is supposed to address the key weakness of GTX 480 while delivering more performance for gamers. Does it accomplish its mission? Find out in today's review!
During my undergraduate studies I had an economics professor tell us that the best dramas in the world happen between companies. Over the past five quarters we have partaken in one of the best stories the computer graphics business can offer. In September of 2009, AMD released the world’s first DirectX 11 enabled graphics cards to the world and not to be out-shined, Nvidia unveiled its Fermi architecture to the public in November.
However, Taiwan Semiconductor Manufacturing Company (TSMC) disclosed that it was having problems with production, citing trouble with “chamber matching” and ion implanting supplies. Variability in any production process can be minimized but the choice of production arrangements can create new problems. As an example, production bottlenecks can be offset if wafers can be separated into smaller groups as to spread work to multiple stations and then merged together after that step. This is fine but high variances between the stations can be catastrophic. At 40nm we are talking about 5,000 transistors fitting on the end of a human hair. Therefore large variations in chemical solution recipes or the failure to remove material at an ongoing uniform scale will result in large disparities across wafers. In the TSMC case, their troubles plagued AMD, Nvidia and consumers. Without a solid flow of wafers, the supply for AMD Radeon 5000 series graphics cards became tight. Another force compounded the effect of short supply, namely a lack of competition at the high end from Nvidia. Together the price for Radeon based cards raised to levels above the original launch price targets. In January TSMC announced that they resolved its issues with production congruity and two months later Nvidia launched its flagship consumer graphics product GTX 480.
Fast forward to last month when AMD released its Radeon 6000 series graphics boards. The company redesigned the silicon in an effort to maximize performance, balance the functional unit mix, cut power consumption, reduce cost per die and improve esthetics like acoustics. (And to one-up Nvidia). However, with any good story there needs to be a happy ending. That brings us to the present. Like AMD, there were a lot of changes that needed to be made to GF100. Instead of the 512 ALUs (“CUDA Cores”), 16 polymorph engines (which include tessellation), power efficiency, and what we were told Fermi would deliver, we received GeForce GTX 480.
Now while that may sound like someone gave us a second rate birthday present, it in some ways was. GTX 480 was the most powerful Nvidia product it has ever produced, but it was not the same as what we got all jazzed up about back in November. This is where we are today… Fermi 2.0 inside the GeForce GTX 580, and on the pages that follow you will understand what the late Paul Harvey would call, “the rest of the story.”
Below you can see the original block diagram that Nvidia supplied to show some of the functional units inside of GF100. However, you will see something missing. What is missing in that hole is exactly what didn’t show up inside GeForce GTX 480. We received 480 (93%) of the possible 512 arithmetic logic units (ALU) or “CUDA cores.” This equals one full streaming multiprocessor group comprised of one SIMD, 32 ALUs and a tessellator.
Each SM can generate about 0.25 triangles per Polymorph Engine or 1 per Graphics Processing Cluster (GPC). This in theory appears to be a balanced approach as a single Raster Unit in each GPC can render 1 triangle clock. Not having the 16th unit meant that the GPU could create 3.75 triangles per clock cycle versus 4 rasterized per clock. This imbalance creates a slight bottleneck between creating and rasterizing triangles. Both are important for triangle subdivision and using textures for displacement mapping. While 0.25 triangles per clock may not seem like a lot, but theoretically it equates to 193,000 triangles per second of diminished geometry throughput and less performance from tessellation and vertex texture fetching. I say theoretically because in the real world, not all triangles are equal. Under certain usage patterns it is closer to 2 billion triangles per second versus the 3 billion that a full 16 SM graphics processor could supposedly output.
GF100 debuted with slower than expected core clock and memory frequencies, increased power consumption, and additional heat which required a more powerful and loud cooling solution. Despite its limitations, Nvidia’s GTX 480 is certainly a monster of a chip and can handle almost anything currently available on the market to render. That being said, who cares about GF100, we now have GF110. Nvidia took what it learned from launching GTX 480 and designed a piece of silicon that took the best of GF100 and some of the improvements from GF104 to deliver what we now know as GeForce GTX 580.
<% @serve_inline_ad( 0 ); %>
This is the Fermi I was told about only better. FULL EVERYTHING! All things being equal, the architectural changes and fixes could show performance as much as 5% in Unigine Heaven 2.1 and Metro 2033. Nvidia is even claiming as much as 15% in Dirt2.
As you can see from the tables, there are very nice improvements all over the place from GeForce GTX 480 to GTX 580. We are clearly expecting GeForce GTX 580 to crush geometry and be a bit soft on shading. A big surprise… right? When has that really been different when looking at AMD and Nvidia?
Additionally, Fermi is scalar heavy compared to the AMD GPUs. Beyond3D did some compute tests and showed that GF100 can issue twice as many scalar instructions than AMD and AMD can issue twice as many Vect4 instructions. [Alex Voixu, et al. Beyond3D.com] Again, not too surprising as old habits die hard and we like to do what we have always been good at. We expect GF110 to be an even bigger brute to geometry. (and it is… we peeked at the test scores)
Nvidia gave GF100 support for more tile formats to enhance depth buffering to improve z-culling. The basic premise here is that the z-buffer is a table. If you look at this table like a texture, it can be stored, compressed, uncompressed, given different levels of detail and so on. You therefore can access larger data sets for depths and use what best speeds up your application. This is something that we could probably write an entire article on, but the key here is that streamlining the process of getting pixels to your screen is what is most important (and that they look correct). Therefore, removing pixels that cannot be seen because they are on geometry that is obscured by other geometry should be removed from the work schedule as soon as possible. Just like people, the less time you have to think about or do fruitless work, the better and more efficient you are.
GF100 incorporated fully compliant IEEE 754-2008 single and double precision. Each of these ALUs use fused multiply-add instructions. Looking at the diagram below, you can see that using a FMA over a MAD (Multiply-Add) is better at retaining higher precision. It also makes it possible to do two floating point operations per clock cycle. This is huge when AMD and Nvidia are making talking points about GLOP throughput calculations.
GF110 got something juicy from its little brother. GeForce GTX 460 (GF104) introduced “full speed” 64-bit floating-point (FP16) texture filtering. DX9.0c introduced a minimum 32-bit floating-point lighting precision. When hardware started supporting FP16 blending, high dynamic range rendering (HDRR) really came alive. That being said, GF100 was designed to handle one texture address and four samples per texture unit. With 64 texture units it can still only deliver one location but thanks to GF104 it can now return and filter four INT8 (32-bit), four FP16 (64-bit) or one FP32 (128-bit) texture samples. Not only should this help GF110 to process HDR but also when using displacement mapping and texture heavy applications.
Nvidia made a lot of changes to how GeForce GTX 580 handles heat, voltage levels and noise output. The printed circuit board (PCB) is exactly the same length and width. GTX 580 and 480 are shorter than Radeon 5870 but longer than Radeon 6870, 6850, and 5850. The first change you can see is to the cooling system. In the image below there is a lack of plumbing extending out the side of the shroud compared to GeForce GTX 480. Another change is the shape of the cover. The top has been beveled to allow better access of air to the intake. While in a single card configuration this should not impact cooling performance. However, for SLI and Triple-SLI in tight cases this should help improve airflow.
The next change that you can visibly see is the new cooling system. The lack of plumbing extending out the side of the shroud compared to GeForce GTX 480 is clearly visible in the image below. Additionally, Nvidia made the intake opening 10mm wider in diameter. GeForce GTX 580 is 65mm wide while GTX 480 is 55mm.
Flipping the cards over you will see that Nvidia removed the hole in the PCB, GTX 480 had this space to gain additional air which could be drawn into the fan. In its place on GTX 580 is some new voltage monitoring circuitry. In the image below you can see it located on the top side of GTX 580’s PCB. There are three separate units. Each one monitor a different 12V power connection (8-pin, 6-pin and PCI Express connectors). Nvidia states that this is to adjust performance when certain applications that stress the card’s power draw beyond the shipping specifications. While over-draw protection is great for the general consumer, we are not so sure it will be received the same by extreme overclockers.
<% @serve_inline_ad( 0 ); %>
On the previous page we showed you that GeForce GTX 580 did not have copper heat pipes. Nvidia is using a vapor chamber to transport heat away from the processor. This isn’t a new technology. In fact, Sapphire used a vapor chamber on its HD 4890 Toxic cards to improve cooling efficiency for its factory overclock.
Heat pipes and vapor chambers are similar in functional design. Both utilize the natural process of phase changing and the transport mechanism of a convection cell. A copper container is filled with a special liquid and then gets sealed inside. This chemical compound is special because it must have a boiling point close to but not too far above room temperature. It also needs to be able to take and give away energy freely when it changes between being a liquid or a gas.
As a heat source is applied, the liquid evaporates once it reaches its boiling point. This gas then expands and moves away from the heat source. The gas will eventually reach a surface that is cooler than the boiling point and condense back into a liquid. Rinse and repeat to create a nice convection cell. Heat from the processor naturally moves toward the surface with cooling fins and back again to be heated.
You may be thinking, “The cards will not be situated with the GPU facing up when I put them into my case. Will this cooler work correctly?” Yes it will. Orientation of the cards is moot as the heating and cooling creates transport mechanism for the liquid/gas. Heat pipes work on the same premise and there are many aftermarket coolers which have pipes twisted in crazy directions yet still work properly.
Once you remove the cooler for the graphics processor you can see the large metal plate that acts as a heatsink for the memory. There were some small modifications to the fan and its controller. The radial fan sits on a foam cushion to help reduce vibrations and the second is a ring added around the fins. The ring acts as a retainer to keep the fins from oscillating. Nvidia also added a card specific profile to keep sound levels below an undisclosed threshold.
Once we remove the rest of the cooling apparatus we can see the bare PCB. The layout is almost identical to the GTX 480 except for the changes we already talked about. The card use Samsung K4G10325FE-HC04 5.0Gbps rated memory modules. Now to the main event! Let’s let the cards duke it out for the belt!
Our test bench for this article was comprised of a Maingear Shift. We wanted to be able to show that systems from GeForce GTX 580 system launch partners are available today. We will be spending more time in another article dedicated to checking out the Maingear system in more detail.
The test system has been overclocked to 4.33GHz and should certainly eliminate any potential for the benchmarks to be CPU bound. You can find more specifics on the system via the CPU-Z. Additionally, we have supplied the GPU-Z screenshots so you can see the card specific details.
For each test we ran at least three runs. The average reported per resolution and configuration is the geometric mean of all of the runs as to get a true center of the data. The minimum is the minimum of all of the runs. We would like to demonstrate user experiences as much as possible. While it does not play well for PR and marketing types, it is what we experienced and what a gamer would experience under the same conditions. Additionally, the three resolutions we selected are the two most popular and the maximum. For the MS DirectX SDK we used their default and 1920x1080 for Unigine Heaven 2.1. If you have comments about the test setup or what you would like to see run through the paces, please contact us.
Nvidia Reference GeForce GTX 580
EVGA GeForce GTX 580
Asus EAH 6780
Asus EAH6850 DirectCU
<% @serve_inline_ad( 0 ); %>
AMD Reference Radeon HD 5870
AMD Reference Radeon HD 5850
As mentioned on the previous page, we decided to take two samples from the DirectX 11 software developers’ kit to see raw output. We chose the Detail Tessellation 11 and SubD11 samples because they do what their names suggest, tessellation and triangle subdivision. Earlier we made the notion that GF110 would be a brute when it came to geometry. Well, it is. In a screenshot you can see the wireframe and all of the pretty geometry that underlies this sample. It is classic example that takes a simple shape, uses a displacement map for height and then tessellates it into A LOT of triangles. As you can see for yourself, GeForce GTX 580 just chews through the geometry and says “Thank you! May I please have another?”
Detail Tessellation 11 SDK Sample
SubD11 SDK Sample
This leads me to a point of contention for some of you. “How much tessellation is too much?” Obviously in this example there is a point when simulating a super smooth surface on a cartoonish character really will not bring me into a place of realism. If there are people who play World of Warcraft and are completely immersed, it is certainly not due to the graphics realism. But before you pick up rocks and yell “Blasphemy!” Let me just state for the record, I personally love the concept of breaking down geometry into as many smaller bits as possible for the more realistic surfaces. I have been saying so for almost 4 years. Tessellation with displacement mapping can save large amounts of memory address space and bandwidth. It is amazing. There is a point of diminishing returns and I believe the level of detail should be in the hands of the consumer.
<% @serve_inline_ad( 0 ); %>
Unigine Heaven 2.1
GTX 580 GeForces its will over the geometry. As with the other tests, GF110 crushes the geometry as set up by the Heaven 2.1 benchmark.
Battlefield 2: Bad Company
Battlefield 2: Bad Company was very playable on this new high end graphics processor. All of the test subjects performed admirably. However, regardless of what the numbers say, the game felt smoother with the two Nvidia cards. This is why we actually play games with the cards. Call it qualitative testing. (Yes honey, I have to “test” this game again.)
<% @serve_inline_ad( 0 ); %>
Metro is an interesting bird. It has some issues with 2560x1600. On both the Nvidia and AMD cards, the benchmark would not load textures and even light the environment correctly. It did not always happen but it did happen to all of the cards. We had to run more than three tests several times just to get everything to look correct before we actually started recording scores. Next time around I will be doing a walkthrough with Fraps instead of the benchmark with Fraps. It will give a better representation of “real” game play and to hopefully bypass a repeat of the shenanigans we experienced.
Civilization V has some great surface tessellation. It looks immensely better than without it enabled. Here is another place where too much tessellation could become a bad thing but in its current state it looks great. This and other real time strategy games are about units. (…”guns, lots of guns” – Neo) Sometimes scrolling around the map can cause nasty spikes in performance. Minimum frames are something you cringe at with an FPS game because it can mean virtual life or death. RTS games like Civilization V are all about micromanagement to the n-th degree and more units and moving all over the map can cause delays, but not because the graphics card can’t render it. It can be due to waiting for the CPU to calculate something, something to load, an event started, and so on. That being said, the new GeForce did extremely well. It ousted the AMD cards in every showing.
Colin McRae: Dirt2
|© Copyright 2003 FS Media, Inc.|