[ Print Article! ]

Eternal Battle Day 4: Ultimate Gaming PC versus Ultimate Workstation Benchmarks
June 29, 2005

Summary: It's the main event: Performance Analysis. We put our flagship FX-57 machine against our Dual Opteron 252's through our gaming, digital photography, and digital video benchmark suite... and a new challenger arrives: the AMD Athlon 64 X2 4200+ with a single NVIDIA 7800 GTX


IntroductionPage:: ( 1 / 16 )

Recap


But we needed another gimmick, so we’re bringing in the Athlon 64 X2 4200+. This CPU features two 2.2GHz cores with 512KB L2 cache each. Pricing is essentially dirt cheap, at $585, as it equivalent to a pair of Athlon 3500+ Venice’s. The question is if it’s the best of both worlds? Going with a cheaper CPU meant that we could go with a faster GPU – so what if we put a single NVIDIA GeForce 7800 GTX into the mix?

Benchmark Design

In designing my benchmark suite, I wanted to select applications that reflected real-world uses. Likewise, instead of trying to take a brute-force approach to benchmarking, I’ve carefully selected a set of benchmarks to maximize the value of the information while keeping the testing simple and easy to follow.

Every benchmark was run on NVIDIA’s High-Quality texture mode (rather than Quality). This produces lower scores that you may be used to seeing, so remember that you do not want to simply compare these numbers against those you’ve found in other reviews. We also ran every application at 1280x1024 (when possible) to reflect the native resolution of today’s 17 and 19” LCD monitors. These are higher quality settings than default settings, but in the end, that’s the whole point of building a high-end system, right?

First though, let’s talk about principles of benchmarking.

In our last episode…

My last article, the Dual Core AMD Opteron performance analysis, caused some commotion when it came to benchmark selection. There were those arguing about transparency of tests such as the Matlab N72, the relevancy of SiSoft Sandra, so, I thought I’d take the time to explain the art of benchmarking and explain the importance of choosing your review website carefully. This part of my article contains elements of my 3DMark 05 article published in the November 2004 issue of PC Enthusiast Magazine.

Let's start with a fundamental question: What is the purpose of a benchmark?



Benchmarking PrinciplesPage:: ( 2 / 16 )


Thus, the question a benchmark should answer seems clear, right? What is going to give me the fastest performance for the applications I run?

So, when it comes to games, it’s pretty easy to pick out which games are important. A good number of people play Half-Life 2 and the associated Counter Strike: Source, and a lot of people were into Doom 3… but the rest of us play more than those two games, and in fact, I’m not sure that many people are still playing Doom 3. To really get the best sense of how fast a product is, you'd really want to have performance data on every game out there. Readers would then check off the boxes of the games they played and see a computer generated result listing the optimal hardware configuration (patent pending).

Unfortunately, it's not practical for reviewers to test every single game on the market and this is where synthetic benchmarks should come in. The point of a synthetic benchmark isn't to evaluate the “theoretical” hardware performance but to serve as a marker of performance in those dozens of games you also play but cannot test. This is why Doom 3 is still benchmarked even though it’s not an actively popular game – we know that future games will be released on the Doom 3 graphics engine. Results in Doom 3 will help predict the performance results of some of tomorrow’s games.

Taking this into account, a mistake of academic benchmarkers is to believe that the perfect benchmark is one that is open source to ensure full transparency and to produce a true vendor-neutral test. While this certainly helps people gain insight into the underlying engineering of a GPU or CPU – it’s not at all important for most end-users for a simple reason:

Games aren't vendor-neutral. Applications aren’t vendor-neutral.




Benchmarking for gamesPage:: ( 3 / 16 )

Gaming

To be vendor-neutral you'd have to look at the all the major hardware first, figure out what's commonly supported, and then start programming. Almost no game developer takes this approach in the strictest sense, because the completed game wouldn't harness the steady improvement in hardware over the year or two the game was in development. What really ends up happening is that the artists and programmers let their creative ideas flow, typically overshooting at first and then editing what they have already done to make it work. It's the difference between thinking in English versus thinking in a foreign language and then translating it into English.

Since a useful synthetic benchmark should mimic real games, the best synthetic benchmarks shouldn't be vendor-neutral. Think of Half-Life 2 and Doom 3; a vendor-neutral synthetic benchmark would not be able to capture those differences. What you really want is a synthetic benchmark that has multiple personalities. You want a synthetic benchmark that captures the essence of the games of today and tomorrow.

That's what 3DMark05 does. 3DMark05 is an effective tool because it is very much vendor-specific, allowing the user to utilize intermediate-precision shaders, NVIDIA specific shading models, or even selecting between version 2.0 and 3.0 DirectX Shader Models. In the default mode, 3DMark05 does the best it can to provide one number that reflects overall performance of a typical game of the future. Through the permutation of options, a talented reviewer can use that same tool to simulate even more types of real game environments. That's cool.


What about the 3Dc issue?

In the perfect world, 3DMark05 would add 3Dc support giving it the additional personality it needs to emulate the set of games that implement ATI technologies. So why didn't Futuremark include 3Dc if it would make 3DMark05 a better product?

It comes back to the Futuremark's Benchmark Development Program (BDP). The BDP includes all of the major graphics companies such as ATI, NVIDIA, PowerVR, S3, and XGI. In order to produce a benchmark that remains useful for two years, Futuremark needs input from hardware manufacturers and knowledge of their future product plans. To determine what graphics techniques will be adopted, they also get feedback from game developers and review the academic literature. Once they've got all the data, they decide their course of action. Simply put, Futuremark determined was that 3Dc was not supported by enough developers and/or future hardware that warranted investing their development costs of adding 3Dc or the extra ~150MB into the download. Moreover, 3DMark05 already features the DirectX version of normal map compression: DXT5 (also the approach used by Doom3).


3DMark (cont’d)Page:: ( 4 / 16 )

Indeed, the industry-standard DXT5 can provide similar bandwidth savings to 3Dc. Depending on whether or not you normalize the data, DXT5 can even provide a faster solution than 3Dc. True, 3Dc offers normal maps with less compression artifacts than DXT5, but if a developer wanted to implement "anti-aliased lighting" via mipmapped un-normalized normal maps, he couldn't do it with 3Dc. So it's a feature with trade-offs. Of course, should things radically change in the world of 3Dc, there's nothing to stop a "3DMark05: Second Edition" from being produced.

Still, the fact that something like 3Dc won't be universal doesn't change the fact that a handful of superb games are going to feature the technology. Without 3Dc, 3DMark05 alone will not be a good marker for games like Half-Life 2, Serious Sam 2, or Tribes Vengeance running on ATI cards. But that's OK. A good reviewer will know this and take this into account. 3DMark05 isn’t the one-stop solution for benchmarking, but it is an additional tool in the armamentarium of the reviewer, and a well designed one at that.

That is the core of our job as reviewers though -- we'll try to sort out the benchmarks we need to run and the right way to interpret them. Our goal is to help you decide what the best product is for you… and what a coincidence, that's the same purpose of a benchmark. You don't want a vendor-neutral benchmark. What you really want is a vendor-neutral reviewer with realistic benchmarks!


Open Source Benchmarks?

The concept of realistic benchmarks limits the value of open source benchmarking. It simply is not the end-all ideal solution. If you take on the task of creating an open source benchmark mimicking the process of camera RAW image using something such as dcraw, it is unclear that it would truly be any better than a SiSoft Sandra type of synthetic test. This is because the raw processors in active use by real photographers will have vastly different performance. Knowing that a specific product is faster on an open source RAW developer is of no importance to a photographer who uses Capture One on a day to day basis.

Fixed Benchmark Suites?

Fixed benchmark suites such as Bapco’s SyMark are common in the mainstream press. The appeal seems obvious – SysMark attempts to mimic “real-world” tests in the same way I’ve described above. There’s just one catch though, these fixed benchmark suites do not age well. If you look at the applications used, they’ll be several generations old. SysMark 2004 uses Photoshop 7.01 for example. Performance in Photoshop 7.01 isn’t necessarily predictive of the performance you get with the current Photoshop CS2.

So, when you look to website or magazine for hardware reviews, start your reading with a place where you trust the software reviews too. Anybody can get a bunch of hardware together and show you pretty graphs. That doesn't mean that they understand how those numbers translate into real performance for games you actually play or the applications you actually run. You need someone passionate about the same software you are passionate about.

Alright, let’s look at the test suite I came up with for the 2005 Eternal Battle.



Graphics SuitePage:: ( 5 / 16 )


Half Life 2 Turrent Demo

There’s really no need to explain why Half-Life 2 makes the benchmark list of every gaming website. First person shooters make up the core of the high-performance PC gaming industry. The simple reason is that in a FPS, improvements in frame rates translate into a direct improvement in gaming performance. Half-Life 2 provides a good test of current generation gaming. The Turrent demo is a FiringSquad specific time demo. We set our Half-Life 2 settings to the max, including “reflect all” and ran our numbers at 4x FSAA with 8x anisotropic filtering as the baseline quality setting, and 8xS FSAA with 16x anisotropic filtering as the maximum quality.

3DMark 05

Although an easy target when talking about the flaws of “benchmark monkeys” who run numbers instead of reviewing products, 3DMark 05 is actually a reliable test that’s useful as a predictor of performance across a wider range of DirectX9 applications when used in conjunction with our other benchmarks. Since there has been some confusion, we are running these numbers in high-quality mode from the NVIDIA drivers which will result in slower numbers than would be seen in the standard quality mode used by most.

Final Fantasy XI - Benchmark Vana’diel 3

FF11 will seem like an odd selection to many but this was selected as a pure synthetic test. It’s a good test of accumulation buffer effects and is highly sensitive to small changes in memory and system performance. It is not SLI aware.

SPEC ViewPerf 8.01

A common mistake is to look at SPEC ViewPerf and to assume that it’s a purely synthetic test. In fact, it’s closer to a timedemo of “professional” applications where you have an “infinitely fast CPU” taking care of the normal program overhead. That is, the way SPEC ViewPerf is created is that the members of SPEC sit down in a room and figure out which applications and types of geometry reflect the real-world use of 3D graphics professionals. Then, they capture the RAW OpenGL commands issued to the video card by these professional graphics applications and then use that data to generate the benchmark. There really isn’t a test suite like ViewPerf in the consumer world.



CPU TestingPage:: ( 6 / 16 )

SiSoft Sandra

We use SiSoft Sandra Professional for our synthetic CPU tests. These tests are probably best representative of system performance when you’re developing customized software. That is to say, while these tests won’t be useful to the average end user, they are probably the most predictive for all those aerospace and semiconductor companies who need all the compute power they can get their hands on. This is a great test suite and should be a core of any enthusiast’s benchmarking library.

Digital Photography Test Suite

Our Digital Photography Test Suite has slightly changed from the last time. We are continuing to use Capture One D-SLR 3.7 RC1 and Bibble Pro 4.2.2 as our RAW image processors, however we have also added Photoshop CS2 to our tests.

Capture One D-SLR 3.7 RC1

(http://www.rawworkflow.com)
Although RAW processing software from the competition has improved significantly, Capture One continues to be the one of the most popular high-end RAW processors on the market. This time, I will be evaluating RAW processing performance with 6 different cameras
1. Canon EOS-20D (8.2 megapixels)
2. Canon EOS 1D Mark II (8.2 megapixels)
3. Canon EOS 1Ds Mark II (16.7 megapixels)
4. Nikon D2H (4.1 megapixels)
5. Nikon D2X (12.4 megapixels)
6. Phase One P25 (22 megapixels)
Version 3.7 final of Capture One is now available, however we have benchmarked with RC1 to allow comparison with the numbers from our dual-core Opteron review.

Bibble Pro 4.2.2

(http://www.bibblelabs.com)
Bibble 4 was the epitome of vaporware for some point. But before we can talk about Version 4, we have to talk about Version 3. Bibble was one of the original 3rd party RAW developers, engineered by a Nikon shooter who was a game programmer in the past. From the very beginning, Bibble offered the best detail, color, and performance of any RAW processor. The catch was that Bibble support was Nikon-centric (understandably so given the author’s penchant). Beta Canon support was introduced in the later versions of Bibble 3.0, but it was buggy. The author of Bibble promised to include Canon support in the “next version” of Bibble, but this next version was going to take longer because it was going to be a complete rewrite.

So began the two-year wait for Bibble 4.0.
When version 4.0 came out, Bibble showed the world that its reputation was well deserved. Not only was its support for multiple processors superior to competing products, but it was significantly faster than any other RAW processor on the market. In our dual core Opteron review, I was repeating my results several times just because it was so fast!
Version 4.2.6 of Bibble Pro is now available, however we have benchmarked with 4.2.2 to allow comparison with our dual-core Opteron review.


Noise Ninja 2

(http://www.picturecode.com)
Although today’s digital SLRs offer incredibly high-ISO and low-light performance, there continues to be a role for noise removal software as people start moving from trying to shoot under ISO400 to trying to shoot under ISO 1600. In our original set of digital photography benchmarks, I relied on NeatImage one of the most popular noise removal tools available at the time. Since then, I’ve decided to standardize on Noise Ninja 2 for a few reasons. While NeatImage is still great, as the basic version is distributed as freeware, and I highly recommend it, Noise Ninja 2 has become more of the professional’s choice thanks to its improved workflow and faster performance. That is to say that Neat Image does a great job when you’ve got the time to fine-tune your images, but Noise Ninja 2 is clearly the better choice when you’re dealing with huge amounts of images and want the best automated noise filter. From a strict benchmarking perspective, the ability to select the number of threads used is also helpful. In this test, we just measure the performance of filtering out an ISO 800 JPEG from a Canon EOS-20D. This ends up being more hard drive limited than CPU limited.

Photoshop CS2

(http://www.adobe.com)
Our Photoshop CS2 test evaluates the performance of these systems on two core elements of the new version. One test involves opening up a 16MP Canon 1Ds Mark II RAW image, and the other is a test of Photoshop CS2’s new Smart Sharpen filter which is actually a deblurring filter very similar to Focus Magic.


Digital Video TestsPage:: ( 7 / 16 )

Working with digital video is also an area that has gain much interest in recent years thanks to the convenience of DV and HDV format camcorders. Digital video is likely to see a new resurgence in interest among home enthusiasts. On the professional line, companies such as Sony are beginning to incorporate CMOS imaging sensor technology in cameras such as the HVR-A1U, and companies like Panasonic are introducing true variable frame rate HD camcoders like the AG-HVX200 at a relative bargain. While most home videographers won’t be willing to spend anywhere near as much as required for professional , like all technology, it’s only a matter of time until the technologies introduced in those professional products will be affordable for the home user.

Adobe After Effects 6.5

Adobe After Effects is a compositing tool similar to discreet Combustion or Apple Motion. In our test, we’ve gone with the standard publicly available “Total AE Benchmark” project starting with a generated fractal sequence and using that in a multi-layer composition. This time, we also ran our numbers using OpenGL accelerated rendering.

Canopus ProCoder 2.0

Our transcoding tests were done with Canopus ProCoder 2.0. We ran two different tests. The first test was a torture test converting a 1440x1080p WMV-HD clip into a 24MBps 1920x1080p MPEG-2 file. This is a stress test for the system as real-time decoding of WMV-HD clips in real time already requires a 3GHz class CPU. Since the WMV decoding isn’t optimized for multiple processors, we also ran a second test that converted an uncompressed 1440x1080p AVI to a 24MBps 1920x1080p MPEG-2 file.

TSUNAMI Video Encoder Xpress

Built around the reknown TMPGEnc technology, TSUMANI Video Encoder Express is a high-performance standard-definition transcoder. One of the great features of the software is that it is able to take advantage of the SSE3 instructions found in our new E3 revision or newer Athlon 64’s.

Although Video Encoder Xpress does not have the high-definition capabilities of the more expensive commercial solutions, it is a relative bargain at $50 and provides a best-bang-for-the-buck solution for converting video from one format to another. This can be used as a standalone product, but can also be paired with Tsunami MPEG DVD Author, which allows users to make their own DVD movies (it is not as well suited for DVD slideshows) and is comparable to thinks like VideoWave 7 or DVD MovieFactory

While TSUNAMI MPEG’s products aren’t at the level of a professional level transcoding package such as Canopus ProCoder 2.0 which can do high-definition MPEG-2, and esoteric things such as transport streams, it does reflect an excellent choice for the home enthusiast just looking to produce DVDs at home.

SiSoft Sandra Disk Performance

HDD throughput is also an essential element of video production and we’re using SiSoft Sandra to evaluate the bandwidth of our HDDs.



Half Life 2Page:: ( 8 / 16 )





NVIDIA is always cautious about having reviewers run the Quadro FX through gaming benchmarks, but it was quite the surprise to see the Quadro FX4400 SLI as the fastest GPU for Half-Life 2. Even the extra clockspeed of the Athlon FX-57 was not enough to put the ultimate desktop system ahead of the ultimate workstation. This is likely due to the additional 256MB on the Quadro cards and the faster 16x PCI-e bus available on the Tyan K8WE. The 7800GTX on the Athlon64 X2 4200+ is great – it performs like SLI’d 6800 Ultra’s.

The efficacy of low-latency RAM is evident here, as the 2-2-2-5 RAM @ 200MHz is able to keep up with the 3-4-4-10 RAM @ 289MHz. In the case of the Athlon FX-55, the lower latency RAM provided faster performance, however with the newer memory controller of the Athlon 3500+ the case was reversed. Our Athlon FX-57 was an engineering sample and difficulty achieving the high memory clockspeeds, so we only have our numbers available at 2-2-2-5.

Remember, we’re testing at higher resolution and higher texture quality settings which means that these Half-Life 2 benchmarks may seem slower than the numbers you may be accustomed to. You should only make comparisons within the chart.


3DMark 05Page:: ( 9 / 16 )






With 3DMark 2005, the Quadro FX4400’s 512MB advantage is lost. These tests are largely GPU bound, so there isn’t a significant difference between the SLI results. Data for the Athlon 3500+ in non-SLI and the FX-55 in SLI was corrupted, and hence not reported. There’s not much to see here other than the fact that SLI represents a better investment than going to a faster CPU. The 7800 GTX holds its own, but it’s still clear that owners of SLI 6800 Ultra still can enjoy high Shader 3.0 technology games.

Remember, we’re testing at higher resolution and higher texture quality settings which means that these 3DMark 2005 benchmarks may seem slower than the numbers you may be accustomed to. You should only make comparisons within the chart.




Final Fantasy XIPage:: ( 10 / 16 )



The story changes with Final Fantasy XI. Since this is not an SLI-certified game, enabling multi-GPU support in Final Fantasy XI actually decreases performance! Here, it was the Athlon64 FX-57 that produced the highest scores, and as a reminder, these scores were obtained with the texture filtering set to the highest quality, one step above the default.

Importantly, there was a significant difference in performance between the high-bandwidth/high latency RAM and the low-latency/standard bandwidth RAM. For Final Fantasy XI, going with OCZ’s DFI Special RAM resulted in a faster performance than standard 2-2-2-5 RAM on both the Athlon 3500+ and Athlon FX-55.



SPEC ViewPerf 8.01Page:: ( 11 / 16 )



Here is the official SPEC description for 3dsmax-03

The 3dsmax-03 viewset was created from traces of the graphics workload generated by 3ds max 3.1. To insure a common comparison point, the OpenGL plug-in driver from Discreet was used during tracing.

The models for this viewset came from the SPECapc 3ds max 3.1 benchmark. Each model was measured with two different lighting models to reflect a range of potential 3ds max users. The high-complexity model uses five to seven positional lights as defined by the SPECapc benchmark and reflects how a high-end user would work with 3ds max. The medium-complexity lighting models uses two positional lights, a more common lighting environment.
The viewset is based on a trace of the running application and includes all the state changes found during normal 3ds max operation. Immediate-mode OpenGL calls are used to transfer data to the graphics subsystem.

There’s really “no contest” when it comes to 3D Studio Max. The Quadro FX4400 is significantly faster than the GeForce. In fact, I would expect real-world performance to be even faster once you use NVIDIA’s MAXtreme driver set. On the other hand, a pair of 6800 Ultra’s ($1000) performs about 50% of the speed of the FX4400 (~$2000). So dollar for dollar, the pricing is fairly similar so it’s not unreasonable to go with a GeForce given the lower price. Low latency RAM ends up being more important for 3dsmax.


Here’s the official description for the CATIA-01
The catia-01 viewset was created from traces of the graphics workload generated by the CATIA™ V5R12 application from Dassault Systemes.
Three models are measured using various modes in CATIA. Phil Harris of LionHeart Solutions, developer of CATBench2003, supplied SPEC/GPC with the models used to measure the CATIA application. The models are courtesy of CATBench2003 and CATIA Community.
The car model contains more than two million points. SPECviewperf replicates the geometry represented by the smaller engine block and submarine models to increase complexity and decrease frame rates. After replication, these models contain 1.2 million vertices (engine block) and 1.8 million vertices (submarine).
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older SPECviewperf viewsets.
Mirroring the application, draw arrays are used for some tests and immediate mode used for others

Once again, the Quadro outclasses the GeForce. The performance benefit from SLI Quadro is relatively small. The benefits of low-latency appears to be a theme for Viewperf.

Here’s the official description for the Ensight-02
The ensight-02 viewset has been updated to bring it closer to the behavior of the real application data stream. The new viewset provides the ability to compare display list and immediate mode paths, and additional quality checks for display list results. It represents engineering and scientific visualization workloads created from traces of CEI's EnSight application.
CEI contributed the models and suggested workloads. Various modes of the EnSight application are tested using both display-list and immediate-mode paths through the OpenGL API. The model data is replicated by SPECviewperf 8.1 to generate 3.2 million vertices per frame.
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
Mirroring the application, both immediate-mode and display-list modes are measured. See notes on specific tests for more information.

The graph speaks for itself.



The official description for light-07
The light-07 viewset was created from traces of the graphics workload generated by the Lightscape Visualization System from Discreet Logic. Lightscape combines proprietary radiosity algorithms with a physically based lighting interface.
The most significant feature of Lightscape is its ability to accurately simulate global illumination effects by precalculating the diffuse energy distribution in an environment and storing the lighting distribution as part of the 3D model. The resulting lighting "mesh" can then be rapidly displayed.

Here, there’s only a minimal advantage for SLI Quadro’s over non-SLI Quadro’s. The poorer performance of the X2 with 7800 GTX suggests that we’re CPU limited here. Notice that the X2 4200+ is just a hair faster than the Venice single core 3500+. I’d be willing to bet that the small difference in performance is due to the second core taking care of the housekeeping tasks.


Here is the official description:
The maya-01 viewset was created from traces of the graphics workload generated by the Maya V5 application from Alias.
The models used in the tests were contributed by artists at NVIDIA. Various modes in the Maya application are measured.
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
As in the Maya V5 application, array element is used to transfer data through the OpenGL API.

The fact that a single Quadro FX4400 outperformed an SLI Quadro FX4400 seems a bit odd, although it may simply be due to the fact that the SLI drivers for the Quadro are still new and unoptimized. We’ve heard of many users attempting to use a GeForce 6800 Ultra for Maya as opposed to a Quadro, but these results suggest the Quadro is the way to go.



Here is the official description for the ProE-03 viewset:
The proe-03 viewset was created from traces of the graphics workload generated by the Pro/ENGINEER 2001™ application from PTC.
Two models and three rendering modes are measured during the test. PTC contributed the models to SPEC for use in measurement of the Pro/ENGINEER application. The first of the models, the PTC World Car, represents a large-model workload composed of 3.9 to 5.9 million vertices. This model is measured in shaded, hidden-line removal, and wireframe modes. The wireframe workloads are measured both in normal and antialiased mode. The second model is a copier. It is a medium-sized model made up of 485,000 to 1.6 million vertices. Shaded and hidden-line-removal modes were measured for this model.
This viewset includes state changes as made by the application throughout the rendering of the model, including matrix, material, light and line-stipple changes. The PTC World Car shaded frames include more than 100MB of state and vertex information per frame. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
Mirroring the application, draw arrays are used for the shaded tests and immediate mode is used for the wireframe. The gradient background used by the Pro/E application is also included to better model the application workload.

Here, the performance of a Quadro FX4400 is about 3x the speed of the GeForce 6800 Ultra. If Pro/E is your primary application, there’s essentially no excuse not to get a Quadro FX.



Here is the official description:
The sw-01 viewset was created from traces of the graphics workload generated by the Solidworks 2004 application from Dassault Systemes.
The model and workloads used were contributed by Solidworks as part of the SPECapc for SolidWorks 2004 benchmark.
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
Mirroring the application, draw arrays are used for some tests and immediate mode used for others. See notes on specific tests for more information


Interestingly, the high-bandwidth, high-latency RAM takes the lead here for the first time in the SPEC ViewPerf benchmark suite.



The official description
The ugs-04 viewset was created from traces of the graphics workload generated by Unigraphics V17.
The engine model used was taken from the SPECapc for Unigraphics V17 application benchmark. Three rendering modes are measured -- shaded, shaded with transparency, and wireframe. The wireframe workloads are measured both in normal and anti-alised mode. All tests are repeated twice, rotating once in the center of the screen and then moving about the frame to measure clipping performance.
The viewset is based on a trace of the running application and includes all the state changes found during normal Unigraphics operation. As with the application, OpenGL display lists are used to transfer data to the graphics subsystem. Thousands of display lists of varying sizes go into generating each frame of the model.
To increase model size and complexity, SPECviewperf 8.0 replicates the model two times more than the previous ugs-03 test.

There’s actually nothing special to see here. The Quadro FX4400’s so completely outclass the GeForce 6800 Ultra line-up that our tests really weren’t needed.

SPEC ViewPerf Summary

While the Quadro FX4400 was able to hold its own against the GeForce 6800 Ultra in gaming benchmarks, when it came to professional work, the GeForce 6800 Ultra could barely keep up.

The SLI Quadro FX4400 isn’t something that has been benchmarked often. Recently, one of our colleagues posted recording some of the highest SPEC ViewPerf 8.01 benchmarks he had ever seen on a manufactured flagship workstation. Our numbers are even faster than those that were published. Given the performance benefits seen with low-latency RAM, we probably could have achieved even faster SPEC ViewPerf scores had we gone with Corsair’s Registered 2-3-2-6 XMS DDR400 as opposed to our “maximum stability” decision to go with the JEDEC certified 3-3-3-8 RAM.



SiSoft Sandra 2005Page:: ( 12 / 16 )








Interestingly, the Athlon 64X2 4200+ had poorer memory bandwidth than any of the other CPUs. It wasn’t an issue of incorrect placement of the RAM as moving to non-dual channel slots resulted in slower performance in the range of 2000MB/sec. Given how new Athlon64 X2’s are, it may be a BIOS issue.




Photography BenchmarksPage:: ( 13 / 16 )








With Capture One, the dual Opteron 252 proves to be the fastest system available but the Athlon 64 X2 is still a superstar. High-bandwidth/high latency RAM and low-latency /standard bandwidth RAM appeared to perform similarly.



With Bibble, the more cores you can throw at the application, the faster it gets. Again the benefit from low-latency RAM is split. The dual core Athlon64 X2 is again shows impressive results.



Like Bibble, Noise Ninja is capable of taking advantage of 4 CPU cores and so the dual Opteron 275’s are at the top of the pack.






Photoshop’s RAW processing engine does not appear to take advantage of multiple CPUs as well as Bibble does. Nevetheless, it is interesting to see that the Athlon64 X2 4200+ is notably faster than the dual Opteron 252, despite the Opteron having a faster core clock speed and significantly higher bandwidth. Although we clearly need to run more tests, it may be that the single core Athlons are CPU-bound tests whereas the Athlon64 X2 4200+ represents a memory bound test in which low-latency is valued.




Digital Video PerformancePage:: ( 14 / 16 )




When it comes to multimedia encoding, the benefit of memory bandwidth is enormous. In the WMV-HD to MPEG-2 HD conversion, the first step is dependent on Microsoft’s WMV decoding software. However, in our second test that converts an uncompressed high-definition AVI to a 1920x1080 MPEG-2 file, Canopus algorithms are used throughout the transcoding process. With the OCZ DFI special RAM, the Athlon 3500+ at 2.17GHz outperforms the Athlon64 FX-55 (2.6GHz) with 2-2-2-5 RAM. That’s quite impressive. We did not have time to run the Athlon64 X2 4200+ with OCZ RAM. It’s possible it might beat the Opterons!




The After Effects Benchmark shows that the Opteron platform performs well, but the Athlon64 X2 does a better job for the money. That said, video compositing using the onboard OpenGL hardware provides the best possible performance.



Again, we see that high-bandwidth RAM plays a significant role with video transcoding. In this case, the other impressive detail is that the Athlon64 3500+ comes so close to the Athlon FX-55. Tsunami’s Video Encoder software will take advantage of SSE3 multimedia instructions and this likely accounts for the strong performance of the Athlon 3500+. Once again the X2 4200+ seems to work wonders.



Hard Disk PerformancePage:: ( 15 / 16 )









As we promised, the Hitachi T7K250 drives are remarkably fast. In a RAID configuration, it completely outclasses a single Western Digital Raptor. In buffered read/write, and random read tests, even a single Hitachi T7K250 running on SATA-II is able to outperform the Western Digital Raptor 74GB.

The pair of Hitachi T7K250’s 250GB units will set me back $250 and a pair of 160GB T7K250 drives will set me back $200. In comparison, a retail boxed WD Raptor 74 GB is $250 and an OEM WD Raptor 74GB is $190.

Basically the decision tree is spending $250 for 500GB or saving $60 and getting a WD Raptor 74GB which is only 15% of the capacity… and slower.



ConclusionPage:: ( 16 / 16 )

With these benchmarks, the graphs speak for themselves. All you need to do is to take a look at the applications you run most and take a look at the performance. That said, there are a few general themes

1. Opteron’s are fast, even for games.

Although games do not take advantage of multiple processors at this time, it’s pretty clear that the Opteron 252 at 2.6GHz performs comparably to the Athlon FX-55 at 2.6GHz. This suggests that the performance losses from running Registered DDR-RAM are countered by the second CPU’s ability to handle the “housekeeping” tasks.

2. The Athlon64 X2 is a superstar

As fast as the Opteron’s are, the Athlon64 X2 was a true superstar. The X2 4200+ was always in the same ballpark as the dual Opteron 252’s, but at 1/3 to 1/4th of the price! We haven’t had a chance to bring Dual-Core Pentiums into this benchmark, but at least amongst AMD’s own line-up the Athlon64 X2 is amazing. Without the need to run registered RAM, and the ability to go with exotic technologies such as 2-2-2-5 or OCZ DFI-customer RAM, our bottom-of-the-line Athlon64 X2 4200+ is able to outperform the more expensive FX-57 in all of our digital video and digital photography tests. Higher-end Athlon64 X2 models should be even better. The combo of the X2 4200+ with the 7800 GTX gives you great gaming performance and great workstation performance.



3. The Hitachi T7K250 is also a superstar.

The Hitachi T7K250 is nothing short of amazing. A single T7K250 running on SATA-II (300MB/sec with NCQ) on our nForce4 SLI board was able to outperform the Western Digital Raptor 74GB in the majority of our synthetic tests. At the moment, we can find no reason to recommend the WD Raptor over the Hitachi – if reliability is a concern, a RAID-1 mirrored array of two T7K250’s still gives the better value for the money. Of course, a drive with a 10,000 rpm spindle, SATA-II 300MB/sec, and NCQ might even be better!

4. Quadro FX4400’s are fast for everything

You have to remember that the GeForce 6800 Ultra boards I used were not your traditional GF6800 Ultras; they were the flagship products from BFG Tech. BFG knows what they’re doing and they were one of the first board manufacturers to rely upon NVIDIA’s drivers as their reference standard rather than try to waste time and rebadge the Detonator drivers under their own custom settings. Of course, this means that the GF6800 Ultra was clocked at higher speeds than your non-BFG Tech GF6800 Ultras.

In games, the Quadro FX4400 held its own against the game-optimized GeForce 6800 Ultra GPUs and for workstation applications, there was no contest in the superiority of the Quadro FX4400.

While we wouldn’t recommend buying Quadro FX4400’s in gaming systems, this clearly is something to keep in mind if your company is offering to buy a workstation for you, and are reluctant to spend the extra cash on upgrading to a “gaming GPU” such as the GeForce 6800 Ultra but are completely open to buying you a high-end Quadro (don’t laugh – I had MANY emails last build guide discussing the same thing).

5. Low-latency/standard bandwidth and high-bandwidth/high-latency RAM have non-overlapping talents

For most benchmarks, low-latency 2-2-2-5 PC3200 RAM performed very similarly to
“PC4650” running at 3-4-4-10. When working with video transcoding software, however, the high-bandwidth RAM seems to trump everything. The worst case scenario for the high-bandwidth RAM was a 7% performance loss in comparison to the low-latency 2-2-2-5 RAM, but for video transcoding the low-latency / standard bandwidth RAM was a staggering 38% slower than the high-bandwidth / high-latency RAM. Clearly, it’s important to understand your task. We like the fact that OCZ technology understands this and offers both high-bandwidth/high-latency RAM and low-latency/standard-bandwidth RAM allowing the user to buy the right RAM for their tasks. Corsair is focused more on low-latency rather than high-bandwidth, however they do produce reliable modules and come up with exotic bling like the LED driven XPERT series.

Wrap Up

That concludes day 4 of our 2005 Eternal Battle. I think it’s pretty clear that even though we spent $4k on the parts for our ultimate desktop and $9k for our ultimate workstation, the Athlon64 X2 4200+ with the 7800GTX proves that you don’t have to decide between gaming or work – you can enjoy both and that the best-value system is an approach worth taking. The concept of $600 for your CPU and $600 for your GPU seems like a great approach.

Our final article will be going up next week. The topic is related to the Athlon 64 X2 and GeForce 7800 GTX related (which you probably could have guessed), but the real question is what kind of system it’s going to be… how much will we spend? What new toys will we discuss? Come back tomorrow for the final article of this year’s Eternal Battle.

© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]