Summary: Find out what it takes to build a high-end workstation by following Alan’s real-life progress. Oh, and by the way, we take a detour to make a few comments on the first gaming benchmarks of the Quadro FX 2000 with the never before seen 42.81 Detonator drivers. See how the Quadro FX performs in this article!
We have built or tweaked custom gaming systems -- if you didn’t, you wouldn’t be reading FiringSquad. Your custom-built gaming system is probably an equally impressive system for your non-gaming tasks, and so if you ever needed to build an office PC for someone else, your experiences will serve you well.
However, what happens when someone asks you to advise them on building a high-performance PC where the focus is not gaming or even office applications, but honest-to-goodness computation? In these cases, you need to throw out everything you know about gaming PCs. Building a workstation PC takes a different mindset. Recently, I had an opportunity to do just that for a medical lab that relies heavily on finite element analysis. The details are unimportant, just think of it as being similar to the crash-test studies a car manufacturer will do, only on a smaller scale.
Our lab’s current computation system is an SGI Octane2 with dual R12000 400MHz CPUs, 4GB of RAM, and V10 graphics. These systems were around $40,000 when first released. Each R12000 400MHz has a SpecFP2000 of around 350-360 and so it’s approximately equal to an Athlon 1.2GHz. The caveat is that the SpecFP2000 benchmark is actually made up of a bunch of other, smaller, tests. For computational fluid dynamics or neural network image recognition, the 400MHz SGI CPU is 2.5 to 5 times faster than the Athlon! For crash test simulations, the Athlon is almost 2 times faster than the MIPS CPU. From the SpecFP alone, it’s hard to tell whether a PC or SGI machine would be faster for our own research. You have to recall that there’s the issue of operating system overhead, the crossbar architecture of the SGI, and other little details such as the fact that the MIPS CPU has 2MB of L2 cache. In addition, it’s hard to know exactly how close a custom app will be to the synthetic benchmarks.
That said, our lab still needed to move to x86. SGI equipment was too expensive, and although we have some custom written software for IRIX, a good amount of our work involves MATLAB; In January, the developers of MATLAB announced that they were abandoning the SGI platform. That gave me the green light to build a high-end PC.
In this article, I’ll describe the system I built. Along the way, I’ll explain why I picked certain components and why it’s different from what I’d pick in a gaming system.
SIDEBAR: Thanks to software such as Inferno, SGI systems are still the leaders for cinematic video compositing.
Pentium 4 versus Athlon
No one gets fired for buying Intel…no one gets fired for buying Dell… but at one point in time, no one got fired for using i820 Rambus.
Athlon MP versus Athlon XP
Techies all know that these two CPUs are the same, right? They’re wrong.
Dealing with power
We have our CPU. Now we need a motherboard. Before I go on, let me side step a bit and talk about stability. Recall from December’s article that the power supply plays a huge role in stability. For the workstation I was building, I wanted to go with an EPS12V power supply as opposed to ATX12V.
For our motherboard, we went with the Tyan Thunder K7X Pro. We did this for a few reasons. When it comes to EPS12V based dual AMD systems, Tyan had the most experience of the motherboard manufacturers we considered. Google and Yahoo run on Tyan motherboards. Specifically, the Thunder K7X Pro gave us two on-board Ethernet controllers, one being Gigabit Ethernet. This was important to us since the system may become a node in a larger cluster in the future.
I went with PC2100 memory, not PC2700 or PC3300. Why? It’s not because I only needed PC2100 DDR RAM. If you were building a gaming or office PC with 266MHz DDR-RAM, you’d still shop for the memory with the highest rating. What makes a workstation or server different is Registered ECC memory.
Two to twelve times each year, a bit in memory gets inappropriately flipped. This can be caused by cosmic rays flying through your RAM or a decay of the minute radioactive isotopes found in your RAM – the impurity need only be a single atom. Most of the time, this flipped bit is unimportant. Maybe it’s a flipped bit in unallocated memory, or maybe it just altered the position of a pixel for a fraction of a second. If you’re unlucky though, this flipped bit can alter critical data and cause your system to crash. In our situation, a flipped bit could potentially alter our results significantly.
ECC memory provides error checking and correction facilities. By adding an extra memory chip, ECC memory makes your memory act in a similar way to a RAID array. In other words, when that lone bit is inappropriately altered, the memory is able to detect this failure and correct it.
FiringSquad has always had good experiences with Corsair memory, and so I went with two 512 MB sticks of registered DDR memory. In general, it’s not a good idea to rely on a single DIMM for a high-end workstation. Even though Corsair offers a lifetime warranty, replacing of a bad stick of memory still requires an RMA (shipping time), or a trip to Fry’s. Having a system built with two memory sticks means that you’ll still have a running system while you wait for the part to be replaced.
But I made mistake in selecting the memory. There is no such thing as “cost-is-no-object” -- you are always on a budget. In selecting the memory, I decided that a gig of RAM would be plenty. Of course, once the system was built and we started running some more test applications, I realized that 1GB was just enough to provide data to one CPU. (Some of our datasets contain over 800MB of data). Our system needed at least 2GB of memory. By going with two 512MB DIMMS instead of two 1GB DIMMS, I have limited the system upgrade potential to 3GB rather than 4GB (unless we are willing to sacrifice the two 512MB DIMMS). Two lessons:
1. Make sure you know your requirements when you build the PC
2. You can never have too much memory … really.
SIDEBAR: Overheard: 1GB is enough for everyone.
What video card would you choose? In our system, we decided to go with the Quadro FX 2000. This card alone would double the cost of the system. Did we need the QuadroFX instead of a slower Quadro4-class card? Probably…but we certainly needed a Quadro instead of a GeForce or a FireGL instead of a Radeon. Do you know why?
The Athlon MP, EPS power supply, and ECC memory were all chosen for stability. If you answered stability as the reason for choosing the Quadro … you’re wrong. In terms of stability, your regular GeForce lineup is already excellent. If you answered that it was because we didn’t know about “SoftQuadro” or know that the GeForce line is 95% similar to the Quadro, you’re also wrong. We went with the Quadro FX because we needed that extra 5% that the Quadro platform offered over GeForce.
Admittedly, a fair amount of the Quadro’s performance comes through workstation-optimized drivers, but a hacked GeForce isn’t the same thing as a Quadro. You might read that the hacks work, and that performance is “pretty similar except for a few benchmarks.” The problem is that the exceptions are important in true workstation applications, and I’ll prove this using real world data. The Quadro line of cards adds hardware:
In workstation applications, you’re often working with lines rather than polygons/textures. Traditional multisampling or supersampling anti-aliasing algorithms may not be the best algorithm for rendering points or lines. The Quadro lineup features hardware line anti-aliasing. Unfortunately, it’s not a feature that can be forced on.
Accelerated Clip Regions
Workstation 3D is almost always windowed 3D and therefore, it’s not uncommon to have multiple windows open. During a typical use, applications may pop up many windows containing other 3D scenes or menus. On regular gaming cards, overlapping windows can noticeably affect graphics performance.
The advantage of subpixel precision is just like the advantage of having floating point color buffers. Lines are more precisely drawn in 3D, reducing artifacts. Imagine if you had to draw a line from (0.5,0.13) to (35,50). You cannot draw a pixel at non-integer locations; subpixel precision provides the ability to keep geometry calculations at a higher precision. The nice thing about this is that it is automatically turned on – you don’t need to wait for new software. That’s all I need to say about this feature because the pictures will explain everything. In these examples, I am using my real data sets and not a mythical “worst-case scenario.” The blue lines on the red surface represent the angle of the myocytes at that location and subpixel accuracy is critical for positing the lines on the surface, and not below the surface.
With a card such as the Matrox G400, there is limited subpixel accuracy and as you can see, some of the lines are completely missing from the image. Subpixel precision is different from anti-aliasing. Anti-aliasing is still very important, and complements subpixel precision. The Quadro FX was run using 4x anti-aliasing while the Radeon 9700 image is featuring 6x FSAA.
Here, the “shimmer” of Radeon’s image is caused by less sub-pixel accuracy. The blues “shell” representing the elements looks better on the Radeon thanks to its increased sampling AA. This isn’t the ideal test for the Quadro FX 2000, however since line antialiasing is not a feature found in QuadroView.
We did not get a chance to ask NVIDIA about the subpixel accuracy of the standard GeForce FX, but this does bring up an interesting point. Precision through the 3D pipeline is costly. It takes up die space and eats away at performance. In the case of something such as subpixel accuracy, it is not a feature that can be turned off easily nor is it a feature that the typical user will notice.
To a certain degree, our datasets do represent unique cases. In a game, subpixel accuracy means that polygon gaps will be minimized. You’d never have lines coming across the surface though – it’d probably be a texture. Again, this is an example of how in scientific computing, the visualization is often of hard data that can’t be fudged.
SIDEBAR: Quadro FX supports twice as many pixel shader instructions as GeForce FX
There are a number of reasons why people would want a Quadro FX. It’s unmatched when it comes to subpixel accuracy and the card offers incredible performance. On my Athlon XP 2000+, my SpecViewPerf 7.0 scores are
3dsmax-01 Weighted Geometric Mean = 18.13
drv-08 Weighted Geometric Mean = 80.05
dx-07 Weighted Geometric Mean = 78.35
light-05 Weighted Geometric Mean = 20.51
proe-01 Weighted Geometric Mean = 28.27
ugs-01 Weighted Geometric Mean = 28.26
The Unigraphics viewset (ugs-01) has been one of the FireGL X1’s strongest benchmarks (more than doubling the Quadro4 900), and yet the score of 28.26 on my XP2000+ handily beats published results of 19.10 on a FireGL X1 on a Pentium4 2.2GHz.
The CineFX architecture is also a big deal. Perhaps you have an application needing 128-bit color, our Shader 2.0+ technology. The QuadroFX is the only way you’ll have shaders with 2048 instructions long! But wait you think, what good is having a card that does 2048 when the shaders are going to be “dumbed down” so that they’ll run on regular consumer cards? Two answers. Cg is a just-in-time compiler which means that high-level code can be compiled differently for the QuadroFX than it is for a consumer card. The second answer is that a lot of people will never even use the QuadroFX for game development. Shader technology is already being used by 3D CAD applications such as Solidworks to improve visualization.
Why a FireGL
Suppose you don’t need the performance of the QuadroFX or the added subpixel precision, would the ATI Radeon 9700 based FireGL X1 be a good choice? Sure – in fact, there are a few obvious reasons why someone would want a FireGL.
Everything comes back to the lesson of the day: know your task. Professional 3D graphics isn’t just about hardware – it’s also the software. Quadro cards come bundled with a number of workstations applications, one being QuadroView. QuadroView is an SGI Open Inventor viewer that I can use to visualize my data sets…an utterly useless application to the gamer, but critical to me. The FireGL products offered no such bundle. In addition, QuadroView was twice as fast as the demo of TGS 3Space Assistant (a commercial Open Inventor viewer for Windows), and a freeware Open Inventor viewer from dev-gallery.com wouldn’t even open up my data sets produced from the SGI.
Even though this card will ultimately be used for research work, I couldn’t let it stay idle over the weekend, and so I decided to try out the card in my home system. Since workstations have better case cooling, NVIDIA decided that the FX Flow was not necessary. The Quadro FX 2000 cooling solution is still dual slot, but it is closer to a traditional heatsink and fan setup, and is very similar to the GeForce4 Ti cooler. The difference is that 12 inches away from the case, the A-weighted sound level is only 52 dB. My case features six fairly quiet temperature-controlled cooling fans. 50 dB is the same intensity as very soft music. The background noise outside my apartment was 55dB. Suffice it to say that when I first booted my computer with the Quadro FX in it, I was worried that something had failed – it was just too quiet. In idle, the GPU was only 51-52 degrees C with ambient temperature at 47.
First of all, my test platform isn’t very impressive, so keep that in mind when reviewing the benchmarks…
AMD AthlonXP 2000+ on Albatron KT333
512MB Kingston ValueRAM DDR2100 running at 333MHz CAS 2.5
Enlight 360W power supply
I’ve run benchmarks at high resolutions when possible to minimize the influence of the CPU. By default the Quadro FX 2000 operates at 300/600MHz in 2D mode, and 400/800MHz in 3D performance mode. The new Detonators allow “auto-detection” of the optimal overclocking speed. This was determined to be 468/937. The GeForce FX 5800 Ultra runs at 500/1000. Here are the results we obtained with the card overclocked to 468/937:
Unreal Tournament 2003 Demo – Benchmark
The UT2003 flyby is a GPU limited benchmark, and so these numbers have the Quadro FX about 10% faster than a Radeon 9700, but just 2% shy of the Sapphire ATLANTIS running at 337.5/330.8.
Unreal Tournament 2003 Demo – Flyby Antalus
Most sites (including FiringSquad) use the UT2K3 benchmark feature, but some our colleagues like to use the individual tests. Tom’s Hardware only obtained 93.7 fps at 1600x1200x32, but they had an Athlon XP 2700+ on an nForce2 running DDR2700 CAS2! That said it seems as if the latest drivers have primarily improved Unreal Tournament. My Q3A 1.17 Demo001 scores at 16x12 were 169.1 fps – a R9700 on an XP1800+ does 189.6fps.
No AA, No AF
10x7 - 185
10x7 - 60.9
What is interesting
I’m getting solid performance with a GPU that never runs past 63C and enters into the “high fan speed mode.” This brings up the question of the headroom available to the NV30 GPU itself, and how much the GPU is being held back by memory bandwidth. There’s also the question about how an NV30 based card is going to perform in the future, when 3D applications start becoming shader limited rather than memory bandwidth limited. Why is it that I have a Quadro FX 2000 in a quiet system running incredibly stable without an FX Flow? Maybe I was lucky and got a golden sample or are we seeking some of the advantages of Quadro FX having longer development times? Time will tell.
stretched its crimson wings and flew
off into midnight
So what else did we choose?
Western Digital 80GB Caviar with 8MB Cache
Sure, in an ideal world, we’d run Ultra SCSI 320. We’ve already got plenty of SCSI devices (from our SGIs). Again, price is always an issue. One thing to consider is RAID (even software RAID). Our backup policies are good enough, so it was OK to skip the RAID.
Lite-On 48x24x48 CD-RW
My Plextor is king but I’ve had great success with my own Lite-On 48X CD writer. DVD-RW and DVD+RW drives are now affordable, however media is still expensive.
Teac 1.44 floppy
Don’t forget the floppy! Teac has been making 1.44 floppies since they were invented, and Teac has historically produced floppy drives with the best reliability.
Philips Seismic Edge 5.1
You always need a sound card for Windows systems. Speakers are optional. A handful of Windows applications demand the presence of a sound device. If no sound card is installed, the application will just quit out with an error message (bad programming). The sound card is there to preemptively address this problem. The Philips Seismic Edge came to me recommended by Tyan. Since my sound card was primarily for compatibility reasons, stability was key. Philips has had a stellar record with their current Windows 2000/XP drivers and dual processor systems. One significant caveat is that Linux isn’t supported, but we can get by. With the increased popularity of Linux among gamers, we certainly hope that Philips will at least produce basic drivers that support MP3 playback.
Don’t forget the keyboard and mouse either!
Chieftec DX-01 Chassis
I needed a large case to support the extended ATX motherboard. One mistake I made here was forgetting that I would need to buy longer IDE cables.
When you’re building a system for someone else, always think about what it is going to be used for and make sure you spec out the system appropriately. Is ECC memory something that’s going to be important? Will professional 3D accelerators be needed instead? Is the Athlon or Pentium4 more appropriate for the task? Do your research, be confident, and spend the money when it’s necessary.
|© Copyright 2003 FS Media, Inc.|