[ Print Article! ]

Building the Ultimate High-End Gaming Workstation: Stage 2
October 20, 2003 Alan and Alexis Dang

Summary: Adding more RAM modules than you need can slow your system down.

Ok, if that blurb doesn't interest you enough, how about CPU benchmarks using real-world digital photography suites and scientific computing benches that compare the Athlon64 3200+ against the recently built Opteron 246 and a pair of P4 rigs? That's all inside and a brief look at our Athlon64 build: you won't want to miss this article!


IntroductionPage:: ( 1 / 20 )

Yesterday, you learned everything there was to know about building a true high-performance system. With the dual Opteron complete, we'll have to compare it against a few systems. Intel elected not to offer us Xeon CPUs for Xeon/i875P in this round-up, and so we'll be comparing the Dual Opteron 246 against the Athlon 64 3200+ against a Pentium 4 for today. Don't worry, you won't be disappointd with the content when you're done.


For the rest of you techies, before we go on with the numbers, Alexis is going add just a quick overview of the other systems we used in the test.

Athlon 64 Construction


Alexis: For our Athlon 64 test system, we wanted to create a budget, but high performance system. We want a system that is stable, well cooled, and with upgrade potential. Let's first start with the foundation, the case and power supply.

For some people, choosing a computer case has become as confusing as choosing the appropriate necktie to go with your shirt. Today, there are so many different models and colors of cases, leading some to believe that they are all the same; nothing could be further from the truth. Cases are often overlooked, but they can have a big impact on the stability of a system, especially when it comes to cooling.

Since this system is designed as a workstation, we will pass on the addition of a viewing window due to EMI concerns. We want a case with an ample number of drive bays, both 3.5" and 5.25". Personally, I don't like cases with only 5.25" drive bays as it requires adapters for the hard drives and does take up a little more space, but the advantage is that your hard drives do get more room to breathe and you can add more bay accessories.

For cooling, it used to be the more fans the better. We've moved beyond that to consider not just quantity but quality of airflow. The use of fewer, larger (120mm) fans is relatively new to the PC world, but the big workstations and servers and companies like SGI have been using these fans for a while now. With these bigger fans you are able to maintain a high CFM without a noise penalty. The debate between steel and aluminum continues in the case world. As this is a budget system we will go with a traditional steel case. You should be aware that there are many grades of steel. The most expensive is the steel that is shiny, the cheaper kind is the one that is matte. Even with these types of steel, the thickness may vary from 0.8mm to 1.2mm thick. Thicker cases offer better noise isolation, sturdier construction, but are heavier.


SIDEBAR: Sometimes I spend more time thinking up random facts and the next page tags than I do writing



Chenbro SR205Page:: ( 2 / 20 )

With the above requirements, we scoured the internet for potential cases and settled on the Chenbro SR205. Those of you hardcore case modders may be familiar with this case as it was one of the first to offer a front and rear 120mm fan, it even goes beyond that with a 80mm fan in front of the hard drives. The 120mm fan positions can also accommodate 92mm fans as well. As you all know, hard drives are becoming a more substantial source of heat in our systems. With a fan directed at the hard drives, the drives will be kept cool and hopefully have their MTBF's extended. This case has an ample supply of drive bays, with five 3.5" bays and three 5.25" bays. The 5.25" bays use drive rails. I think drive rails are nice, but they are often proprietary so if you lose the set that came with your case, you'll have to contact the manufacturer. This happened to us, and we are pleased to report that Chenbro was very prompt in returning our RMA request and was also very helpful. Clearly they take pride in customer service and stand behind their products. It also supports extended ATX motherboards which not all cases do, even if you don't have an extended ATX motherboard, this alone suggests that the case will be provide adequate room for peripherals and will make working in it a little easier.

[image]

<% print_image("01"); %>

A unique feature of this case is the cardholder. At first this appears like a bulky air flow restrictor, but in fact it can be used to provide additional pressure on your PCI or AGP cards, preventing them from falling out during shipping or moving of the computer. This cardholder can accommodate just about any PCI card, except for the very low profile network adaptors. I think more cases should incorporate this to ensure that the cards stay put.

[image]
<% print_image("02"); %>

This case is not perfect though. It is clearly designed for a business environment as removal of the side panels to access the interior involves first removing the top cover and then the side panel with a unique handle mechanism. It is not a screwless case either, but personally I prefer the security of old fashion screws, rather than friction or spring based retention mechanism that may fatigue over time. This is a steel case, so it is not a lightweight. The steel is of good quality with a dull grey finish, this is a less expensive steel than the polished shiny steel in the Supermicro case. It uses 1 mm thick steel, and the side panels exhibit minimal flex. In addition, this case won't get too much attention sitting on your desk, which may be a good or bad thing, depending upon your preferences. I would have also like to see some front USB and firewire ports as well.

Mods

So no case is perfect, but the question becomes what are you going to do about it? Well, we're taking our dremel out. The plastic front panel of the Chenbro obscures the front 80mm fan intake. With the stock setup, this 80mm fan has a restricted intake path, reducing airflow and increasing noise. What we've done is to simply cut out an opening in the plastic panel to facilitate airflow. This is a common, previously described mod for the Chenbro SR205 case.

[image]

<% print_image("03"); %><% print_image("04"); %>
<% print_image("05"); %><% print_image("06"); %>


SIDEBAR: Does anyone really read these?


Power SupplyPage:: ( 3 / 20 )

We're going with a 400w power supply for this single CPU setup. It never hurts to have extra power on tap. The most important thing when shopping for a power supply is not just the reported wattage, but the distribution of this power. Old cpus like the 486 used to run primarily off the +5v rail, in contrast the current generation of cpus use a +12v rail. So you need to make sure that the power supply has a robust +12v rail. Some power supplies even have a dedicated +12v rail for the processor. In fact, the new ATX12V version 1.3 protocol calls for more robust +12v rails as compared to prior standards.

Alan: We made initial plans to go with a Fortron/Sparkle Power power supply, probably the best in the low-cost, high-performance price range. Fortron/SPI power supplies are among the best in the industry and you'll find them rebadged in black with a quiet fan as the Zalman PSU. In this system, we've used a SilenX.com 400W PSU. SilenX is a company in Southern California that has begun developing their own power supplies in-house. Normally, we'd be wary of an unproven design house, however SilenX is building it around the Fortron/SPI platform that we trust, and the manufactured is handled by Fortron - this is the same way PC Power and Cooling also custom designs their PSUs.

Basically, SilenX worked with Fortron and contracted power supplies with better heatsinks, larger capacitors, upgraded MOSFETs, and AC line noise filters. This should improve stability of the power supply and SilenX.com also reports having a more stable 12V rail. After production, SilenX adds their own fan design with silicone dampening mounts to reduce the noise. One thing to consider is that the power supply fan is often not the primary source of your system noise, so unless you pair this Silenx power supply with silent system fans, your total system volume may not be much less.

[image]

<% print_image("07"); %>

I had initially planned to use the 460W or 550W SilenX active PFC power supply. SilenX.com let us know that they're very conservative with their power ratings and so we've gone with their 400W Active PFC model.

Going to an Active PFC with power factor > .99, will improve the efficiency of the unit when compared to a passive PFC (80% true efficiency as opposed to 76%). This means there will be just a little bit of extra power when it's needed. Since power supplies only draw power as needed, the Active PFC models should also reduce the current drawn from the wall. For anyone who leaves their system on 24/7, the small reduction in power use could potentially make a difference. Finally, active PFC models can handle voltage fluctuations better than their passive PFC models and also remove the need for 110/220V switching, something rarely important, but worth noting. At the end of the day, the improved efficiency and the environmental benefit from Active PFC makes it a worthwhile investment for $10.

That said, SilenX.com does believe in their lower-end PSUs are were fully confident that their 400W PSU passive PFC would be capable of driving even our a high-performance system in our challenging test environments.

Before I move on to the next point, it's worth noting that SilenX.com is currently fighting a battle against Ahanix, ExoticPC, and ColorCase who are also selling "SilenX" power supplies. The only SilenX PSUs that we have experience with are those sold through only.


SIDEBAR: A front panel gameport is the ugliest case feature I've ever seen.


MotherboardsPage:: ( 4 / 20 )

Motherboard

Alexis: The debate continues over the Via KT800 vs the nVidia nForce3 150 chipset. Benchmarks have both chipsets running neck and neck, so it really comes down to which motherboard has the features that you want. It's disappointing that Soundstorm hasn't made its way to the nForce3, and it seems like motherboard manufacturers aren't interested in getting Soundstorm support either. In the end, we just have to hope for a good software-based real-time Dolby Digital ICE to be possible.

The Asus K8V Deluxe motherboard that we chose uses the VIA K8T800 chipset and has an adequate inventory of features with USB 2.0, firewire, Serial and parallel ATA raid, and on-board 3Com gigabit LAN. Just as important is that the Asus is priced among the less expensive Athlon 64 boards. It doesn't have a fan on the chipset heatsink or any other fancy leds on the motherboard, but like we said above, it doesn't skimp on any features. Asus provides a slot for their wi-fi solution, which supports 802.11g. The advantage of this is that it can turn your system into a wireless access point. One nice feature is the SPDIF output on the back which facilitates connection to a digital receiver.

[image]

<% print_image("08"); %>

Memory

You can't do much without enough memory, we used to think that you can never have too much memory, but we'll get to that a little later. Memory needs to be fast and stable. RAM that can be overclocked to crazy speeds but which gives you occasional errors will be useless in a workstation environment. The hardcore IT guys shoot for "five nines" reliability, or 99.999% uptime, makes sense since their job is on the line if stuff fails. Experience has shown that Corsair's reputation for making fast, reliable RAM has been well earned. They are not the lowest price memory, but they always perform as advertised and sometimes a little more. If you are pushing your system to the limit you need to minimize the variables limiting your overclock, one way to do this is to get the fastest ram possible.

Today we are using the next generation of Corsair RAM, their XMS Pro 3200-LL ram. The Pro designation refers to the LEDs on the RAM module that indicates ram usage. Unfortunately we made a decision to go with a windowless case so we really can't take advantage of this, but it is rather nifty. The 18 LEDs show the activity of each bank on a module. In the corporate world though, I would have probably gotten the equivalent Corsair ram without the LEDs to save costs, especially if you are building multiple systems. Corsair does say that this new DIMM design has a larger heatsink, but we've already got two 120mm fans in our case. :) Although we're not using the LEDs, the bonded heatsinks give the modules a hefty and sturdy feel; this inspires confidence when installing them as there is no flex of the module. These modules are taller than standard XMS modules which could become a factor in smaller cases.

[image]

<% print_image("09"); %>


SIDEBAR: If a critical medical device had 99.999% uptime, every patient on it would die.


The ClawhammerPage:: ( 5 / 20 )

Athlon 64 3200+

Athlon 64 3200+, enough said. We went with the stock heatsink and fan with Cooler Master Premium heatsink compound (Shin Etsu). The AMD stock heatsink is rather simple, constructed of aluminum without any copper core and only a standard fan. In our testing, it did prove more than adequate in keeping the cpu cool. For extreme overclocking with voltage bumps, a larger heatsink may be needed. We'd go with the Zalman's that are used on our Opteron system. I prefer the aluminum-copper Zalman's vs the all copper ones because of the weight factor. In theory, with tower cases, there may be uneven stress on the heatsink mount since the forces that hold it onto the cpu are perpendicular to the weight of the heatsink. With a desktop case this would not be a factor. I would definitely consider the weight of a heatsink if you are continually transporting your system around. All copper heatsinks do have a performance advantage not only for the increased heat conductivity of copper, but also due to the increased mass. Recall that the rated heat capacity of the heatsinks are measured per gram of mass, so a heavier heatsink will be able to absorb more heat from the CPU with a smaller change in temperature.

Alan: We've heard a lot about Athlon 64 3200+ CPUs being hard to find, and so we got our entire Athlon64 setup at retail as a secret shopper. There wasn't any hassle at all. We found the best out-the-door price for the CPU from www.directron.com and got it without any shipping delays. In fact, although we used the first-time buyer free shipping discount for a Wednesday ship date, the package was already on its way here a day ahead of schedule on Tuesday. The motherboards seem easy to find. We got ours at www.ewiz.com, but we also saw the MSI Socket 754 motherboard on sale at CompUSA of all places!

Editor's Note: Although we purchased the CPU as a secret buyer, after publishing the article, Directron offered to sponsor the CPU. This means more money for us to spend on the next article for you guys :)

Alexis: In summary we spend the big bucks around the core of the system, the memory and processor. Our case wasn't cheap at about $80 without a power supply, but it offered us a good functionality with its big fans and flexibility with its Extended ATX capable form factor and many drive bays. For some people it may look a little bland, since it is only available in beige, although I prefer my system to be known by what's inside not how it looks. That said, Chenbro may recruit many more gamers if they offered their case with more front panel options. They definitely have the manufacturing capability as seen in their Xpider designer series of cases.


Alan:
Pentium 4 Systems (2.8GHz and 3.0GHz)
The Pentium 4 systems we used in this article are Micron 545G ClientPro's built around the Intel Desktop Board "Bonanza" D875PBZ. This is Intel's flagship reference board on flagship systems from a tier one supplier. They don't have as many fans, as many expansion possibilities, or luxury features as a custom system, but they are well built with quality components.

We'll shortly be building a file server around the Tyan Trinity i875P motherboard, powered by a PC Power and Cooling 510 ATX so we'll leave the discussion of building a high-end P4 to Alexis's "How to build a low-cost network attached storage server." Once again, it'll be one of those articles where you'll learn more about building high-performance systems not necessarily because you'll want to build a file server, but it'll teach you how to build better gaming systems or no compromise systems.


SIDEBAR: It always helps to know what you are going to do before you start anything.



Benchmark DesignPage:: ( 6 / 20 )

In order to answer the question of which CPU is best for gaming and work, we cannot rely on synthetic benchmarks. Instead, we need to get a good variety of real-world tests that can be interpreted and applied to other tasks. We also chose not to explore the overclocking potential of these setups because we wanted to show the minimum expected performance when buying these systems.

For my gaming benchmarks, I was planning on going head to head against systems of FiringSquad past, but in the process of running my gaming benchmarks, NVIDIA released 52.16 WHQL drivers and ATI released Catalyst 3.8, invalidating my comparison. In the interest of time, I will leave the 3D benchmark articles to Brandon and Chris, since they'll be able to provide the most meaningful results for you. For our CPU tests, we're focusing on real-world benchmarks for the digital photographer (single and dual CPU) and one scientific computing comparison.

If you look at the future of high-performance desktops, you'll likely find yourself looking at digital photography. This is a point where improvements in speed are noticeable. Digital photographers can take hundreds to thousands of pictures a day, but they aren't taking JPEGs. Instead they're capturing RAW images which contain all of the data they get back from the the shoot. They'll develop the images for JPEG for proofing, etc. This is done so that if the color balance is off or if the exposure is slightly off, it is possible to go back to the RAW "negative" and re-develop the image.

With regular workloads in 500-1500 images, even small differences in performance can make a big difference in the long run. These numbers are not unrealistic -- I'm not a professional photographer, but I took 980 images this year at San Francisco Fleet Week (a public air show in the San Francisco Bay) images with my digital SLR - I'm glad there is no marginal cost with digital pictures.

I need it now!

Unlike rendering animations where you can let it sit and one animation is a month-long project, photographers will need the end results right away so they can proof it, edit the images, and sell the images. They may be starting a new shoot the next day. Second, when it comes to 3D content creation, most of it is user-limited during the day (the CPU is idle when the artist is thinking about what he wants to draw). In the evening it's an overnight render of animation, which essentially means that if the render starts at the end of the workday, it just needs to be ready before the beginning of the workday the following day - it doesn't matter if it finishes at midnight or 15 minutes before the workday starts.

With increasing megapixel counts, cheaper flash cards, and more and more consumer-level cameras support the RAW file format, for real-world non-gaming CPU tests for the desktop we cannot imagine a better set of applications to test.


SIDEBAR: If you're interested in the Mac versus PC debate for digital photography, check out Rob Galbraith's article


Test SuitePage:: ( 7 / 20 )
Our test suite is as follows:

Capture One D-SLR (http://www.phaseone.com)

Ask any owner with a digital SLR what they look for in a computer and it'll be RAW processing performance. I don't mean "raw performance" in the marketing sense, but development of the unprocessed RAW files from the imaging sensor. That is, while the typical consumer digital camera saves images in JPEG, photography enthusiasts and professionals prefer to save RAW images. These "smaller-than-TIFF" files losslessly record exactly what the sensor sees and allow for greater flexibility in developing the film. Capture One D-SLR is the professional's choice for RAW image development and is a multithreaded application that takes advantage of multiple CPUs.

We run Capture One D-SLR on a set of five Canon EOS D30 images and three Canon EOS D60 images. The images are converted using JPEG High Quality settings in the sRGB colorspace.

NeatImage (http://www.neatimage.com)

Sometimes called the "Big-CCD-in-software," NeatImage is a revolutionary noise reduction application which removes noise while maintaining a surprising amount of detail. The algorithm is fine tuned and robust enough that it's possible to clean up images from webcam with good results to cleaning up images with flagship D-SLRs. If don't use the term revolutionary often, but this is truly revolutionary. We'll likely be including this benchmark in the future

Our test involves running NeatImage on 2 Canon EOS-10D JPEGs running a complete noise reduction algorithm. This is a single-threaded application.

Adobe Photoshop 7.01 - The Fred Miranda Actions (http://www.fredmiranda.com)

Look around on the 'net and chances are you'll see people using PSBench for their digital imaging benchmarks. While PSBench has its merits, when is the last time you saw a PhotoShop user use the "Lens Flare" filter? Ok, how about the last time it wasn't used as a spoof at SomethingAwful? Lens flares in Photoshop are like the flashing "under construction" sign of the 'net. Likewise, how frequently do you need to rotate large 100MB images by 0.9 degrees? While that's certainly used more frequently than the lens flare, it's still not very common.

You see, the problem with PSBench is that it doesn't accurately reflect the manner in which Photoshop is used in the real world. Graphics artists make their magic by drawing - not using filters. Based upon the direction Photoshop has taken over the last few years, we chose to test Photoshop performance from the perspective of the digital photographer. In order to do this, we used the Fred Miranda digital imaging scripts as our real-world test. These Photoshop actions represent the types of Photoshop activities a real digital photographer would use on a daily basis.

Specifically we tested using Custom Sharpen Pro 2.0, Medium Strength on a D60 image and Stair Interpolation Pro 2.0 which converts a 6 megapixel D60 image to a 13x19 @ 300dpi image (22 megapixels). Professional photographers will resample every image before it is printed out, so even small differences in performance will have an enormous benefit in the long run. Normally when you print out any image, the printer driver is doing the upsampling to the 1440 or 2880 dpi of the printer. The rationale for pre-upsampling your images is that products like Photoshop or Qimage use smarter interpolation strategies which make your pictures look sharper.

Adobe Photoshop 7.01 - Camera RAW ( http://www.adobe.com/products/photoshop/cameraraw.html)

Although Capture One D-SLR is probably the most advanced RAW development software, Adobe Camera Raw is also very popular and provides a good camera-independent test. We evaluated the time it took to open up a D60 image at the native 6 mpixels at 8-bit color and the time it took to internally upsample to a 15 megapixel image at 16-bit color through Camera RAW. These images were processed in Adobe RGB colorspace.


SIDEBAR: Try NeatImage, you'll like it


Scientific ComputingPage:: ( 8 / 20 )

MATLAB - N72 Script (http://www.mathworks.com)

MATLAB is your multipurpose scientific computing application. Every engineer and his brother has used Matlab at one point or another. It's a very flexible application used in high school to teach basic Newtonian physics and was used in industry to design the Joint Strike Fighter. This is also a single threaded application. Why? Mathworks has done their own studies and determined that for most Matlab tasks, a lot of computation is spent parsing and processing the script, something that isn't parallel at all. Parsing scripts isn't a very glamorous aspect of scientific computing, but it's very important to real-world use. Think of the car that does 0-60 in 4 seconds but requires you to refill the gas tank every 10 miles. There's no doubt that the car is fast, but no one would really use it.

To get some measure of real-world performance, I've gone with a lengthy script from our lab, which I've just left codenamed as N72. Due to proprietary technologies, I'll just leave the description brief: data from an MRI of the heart is read into memory and we process out the 3D geometry of the heart, and then "do some magic math" to figure out the position of muscle cells in the wall. This is fairly exotic stuff and the whole thing requires the system to have a gig of system RAM. This script represents a real-world example of reading in raw data and then processing it to get the meaningful data. The thing with scientific computing is that different fields of science have different tasks and different tasks will perform differently on different architectures.

It's obviously not designed to be a comprehensive Matlab performance benchmark, but it's a real-world test that has a realistic balance of true computation and script parsing performance. This is something that we feel is missing from ScienceMark. ScienceMark is better at representing the back-end calculation rather than the actual performance measures that affect user-input and response.

[image]

<% print_image("10"); %>

This is only one slice of what is scientific computing and each task is going to have a different performance signature for every application. That said, we would much rather develop a set of benchmarks on commercial scientific computing applications that are more likely to represent how CPUs are used instead of home-brew software. (i.e. it's the same question about buying a graphics card to run 3DMark versus buying a graphics card to play games).

Our version is Matlab 6.5 Release 13, and all scripts are in pcode.


SIDEBAR: I've given up coming up with new random facts for now


System SetupPage:: ( 9 / 20 )

Dual AMD Opteron 246
AMD Athlon 64 3200+
Intel Pentium 4 2.8C/800 (Hyper-Threading Enabled)
Intel Pentium 4 3.0C/800 (Hyper-Threading Enabled)

Tyan Thunder K8W (Opteron)
ASUS K8V Deluxe (Athlon 64)
Intel Desktop Board D875PBZ "Bonanza" (Pentium 4)

1 GB Corsair 2xCMX512RE-3200LL XMS DDR400 Registered ECC Ram (Opteron)
1 GB Corsair 2xCMX512-3200LL XMS Pro DDR400 (Athlon64)
1 GB Corsair 2xCMX512-3200LL XMS Pro DDR400 (P4 2.8)
1 GB Crucial 2x512MB DDR400 CL3 (P4 3.0)

SuperMicro SP-450 PSU (Opteron)
SilenX.com 400W 16 dB PSU (Athlon64)
Forton/SPI 300W ATX PSU - standard MicronPC (Pentium 4)
Monster Power HTS 3600 Power Conditioning

Windows XP Professional SP1

We did not have time to test the Pentium 4 3.0GHz on Corsair RAM, but the CL3 RAM is what is stock on many store-bought PCs, but Crucial ram is probably better than most generic. Don't worry, read through the entire article before you complain…

Final words before we show the numbers

Our focus on digital photography should be seen in two ways. If you're interested in digital photography, this will be an important evaluation of AMD versus Intel CPUs as well as a demonstration of the benefits of a dual processor system. If you're just interested in a "standard CPU" review, think of these as real-world synthetic tests of memory performance and FPU/SSE/MMX performance. All systems were running unfragmented hard drives. We had initially run these tests on a RAM drive to take the hard drive performance out of the equation, but found that for these tests, the hard drive performance was not significantly different between these ATA systems.


SIDEBAR: I told you, I have given up coming up with new comments.


Capture One D-SLRPage:: ( 10 / 20 )

Capture One D-SLR



The Pentium 4 seems to offer slightly better performance than the Athlon64, and single Opteron 246 although they are very closely matched. What's important to note is that the P4 2.8GHz outperforms the P4 3.0GHz. This is an example of the performance difference between Crucial CL3 RAM and Corsair XMS Pro 3200-LL. Someone who cuts corners by using slow RAM with a faster CPU would in fact have poorer performance in this test.

The addition of a second processor significantly improves the Opteron 246, improving times by 34%. Due to the batch nature of Capture One D-SLR, we think performance would be even better if Phase One programmed the SMP support such that each CPU dealt with its own image except when only one raw file is in the queue.

Photoshop 7.01 Open 6MP CRW



Again the three CPUs are very closely matched. The additional latency of the registered RAM appears to slow down the Opteron 246, however the dual processor system is able to take a lead. The addition of the second CPU reduces the time by 24%,

When the RAW image is upsampled to 15 megapixels, the ordering changes



In this case, the P4 3.0GHz is able to take a modest lead in single-CPU performance, but it's nothing significant. There is a 30% improvement in speed with the second Opteron.


SIDEBAR: Just kidding, FiringSquad writers never give up.


Fred MirandaPage:: ( 11 / 20 )



The Athlon 64 3200+ is the surprising leader here. The fast memory latency appears to bring it even above the rest of the group. Notice also that the P4 2.8 GHz and P4 3.0 GHz perform identically. This is yet more proof that you shouldn't buy the fastest CPU if you're not ready to back it up with the fastest memory.



Once again the three CPUs are very closely matched. The 5-second improvement with the dual Opteron 246 is significant given the frequency of how often this action is run, and really shows how well the Opteron takes to symmetrical processing.


SIDEBAR: The state of Florida is bigger than England.


NeatImagePage:: ( 12 / 20 )

Neat Image

So far our digital camera tests have been somewhat uneventful. The P4, Opteron 246, and Athlon64 3200+ have been neck and neck. To keep you interested, this is yet another example that fast memory can be more important than a fast CPU. In our last test with Neat Image, things get considerably more interesting.





Now there is a larger difference between the Pentium 4 and Athlon64 3200+ in this test. Since Neat Image is a single threaded application, we ran two copies of Neat Image simultaneously to evaluate the benefit of a second CPU since large image batches are common - adding the second CPU allows you to essentially process twice as many images in the same amount of time. Once again, on the Pentium 4, fast memory buys you better performance than extra megahertz.


SIDEBAR: Neat Image is almost like one of those image-enhancing programs you see in the movies. It doesn't let you see what a person looks like if they were disguised.


MatlabPage:: ( 13 / 20 )

Matlab N72



Matlab is a single threaded application and so it does not take advantage of the second CPU. Nonethless, here we see the advantage of the 128-bit memory architecture of the Opteron over the Athlon64. Matlab users have historically preferred AMD processors, and the Opteron and Athlon64 should continue this tradition.

Preliminary Conclusion

It seems like the Pentium 4 and Athlon 64 3200+ are very closely matched for RAW processing and the Fred Miranda scripts. However, the Athlon64 takes a considerable lead with Neat Image, the most CPU intensive application in our digital photography test suite. Equally important is the fact that NeatImage has become one of the "must have" applications for any digital photographer, amateur or professional.


SIDEBAR: You wish you knew what N72 referred to don't you?


It's not overPage:: ( 14 / 20 )

It's not over

If we weren't FiringSquad, we probably would have ended the article there and patted ourselves on the back. Most reviews on the net test 2x512MB memory setup or 2x256MB. It's cheap, popular, and certainly a recommended budget configuration. But this is all about high-end systems right? What if you want more than 2 DIMMs? That's where things get interesting.

Adding extra memory to your PC can slow it down

Athlon 64

Although Athlon64 motherboards ship with 3 DIMM slots, they can only run 2 DIMMS (4 banks) at full speed. This limits you an Athlon64 system to 2x256MB (two single bank), 2x512 (two double bank), or 2x1024 MB (two high-density double bank) memory configurations. Should you go with 3x256MB, your memory speed will drop to DDR333 and going with 3x512MB of RAM will drop you all the way down to DDR200 (PC1600)!

Registered Memory on Athlon 64 FX and Opteron

On the Athlon 64 FX and Opteron CPUs, there really isn't any difference in performance with additional DIMMs. Those CPUs are designed to handle 4 DIMMs (8 banks) without any problems, and since it's registered, adding DIMMs does not increase latency - you've already paid the toll at the register. Simply put, the registered DIMM design allows you to add additional memory to the system without adverse affecting system performance.

And Intel?

The Intel i865PE Springdale and i875P Canterwood platforms claim to preserve full DDR400 clock speeds, but will add additional latency with greater than two DIMMS. Those of you with careful reading skills will notice that I used the word "claim."

Lost in Translation? Here's the summary

On the Athlon64 and Pentium4 i875 platforms, the more memory you add, the slower the memory performance. In the case of the Athlon64, the drop occurs with a drop to 1.6GB/s bandwidth, half its peak. It can transfer less data for the same amount of time. In the case of the Pentium4, the performance drop is supposed to be an increase in latency, but maintains the DDR400 bandwidth of 3.2GB/sec. It's top speed is the same, but has slower acceleration.

What we are saying is that adding extra memory can reduce system performance if it is unused. You won't believe me without the proof, so let's take a look at the benchmarks again when running additional RAM.



SIDEBAR: Come on, you've played video games before and watched movies. The big boss at the end is never the REAL evil boss.


Test SetupPage:: ( 15 / 20 )

Dual AMD Opteron 246
AMD Athlon 64 3200+
Intel Pentium 4 2.8C/800 (Hyper-Threading Enabled)

Tyan Thunder K8W (Opteron)
ASUS K8V Deluxe (Athlon 64)
Intel Desktop Board D875PBZ "Bonanza" (Pentium 4)

2 GB Corsair 4xCMX512RE-3200LL XMS DDR400 Registered ECC Ram (Opteron)
2 GB Corsair 4xCMX512-3200LL XMS Pro DDR400 (Pentium 4)
1.5 GB Corsair 2xCMX512-3200LL XMS Pro DDR400 (Athlon64)

SuperMicro SP-450 PSU (Opteron)
SilenX.com 400W 16 dB PSU (Athlon64)
Forton/SPI 300W ATX PSU - standard MicronPC (Pentium 4)
Monster Power HTS 3600 Power Conditioning

Windows XP Professional SP1

Comments

1.5GB represents the peak memory use on the Athlon64 3200+ using our 512MB Corsair XMS Pro 3200LL DIMMS. Since the Athlon64 Clawhammer uses a single channel 64-bit memory controller, we wouldn't expect any problems from using an odd number of DIMMs.

On the Pentium 4 / i875P Canterwood platforms, memory is added in pairs to take advantage of the dual channel DDR. Using 1.5GB would be unfairly crippling the system, so we've gone with 2GB, or 4x512MB. We didn't have the 3.0GHz system to work on in time.

The Pentium 4 was an off-the-shelf MicronPC Client Pro 545. All we did was swap out the RAM for Corsair XMS3200LL.



SIDEBAR: We often use the product codenames such as Canterwood and Clawhammer to reiterate important points



Capture One DSLRPage:: ( 16 / 20 )

Capture One D-SLR





Remember how the Pentium 4 platform was the fastest single CPU platform for Capture One D-SLR? Imagine if you thought you were working with applications needing 2GB of RAM and thought you would get extra performance from your PC by adding more RAM? On the Pentium4, this severely decreases your performance. The Opteron, running registered RAM, has no significant change in performance.

Photoshop Camera Raw



Even though the Athlon64 drops to PC1600, it's lower latency is what is most critical in accelerating performance. "Upgrading" your Pentium 4 from 2x512MB to 4x512MB would actually decrease performance by 33%!


Even when working with a 15 megapixel image, the Athlon 64's PC1600 memory bandwidth is still able to beat the Pentium 4 @ 2 GB of RAM.


SIDEBAR: Astronauts Neil Armstrong and Buzz Aldrin ate roasted turkey from foil packets at their first meal on the moon. I have a novel autographed by Buzz Aldrin.


More testsPage:: ( 17 / 20 )

There's no point in commenting on these numbers anymore. It's pretty obvious what's going on.

Fred Miranda






Neat Image



Matlab






SIDEBAR: Stewardesses is supposedly the longest word typed with only the left hand. Last time I checked, asdfasdfasdfasdf is longer.


Trust No OnePage:: ( 18 / 20 )

SiSoft Memory Bandwidth




In every real-world test, the performance drop with 4 DIMMs on the Pentium 4 is much more significant than the performance drop with 3 DIMMs on the Athlon64. Using tools such as the artificial SiSoftware Sandra 2004 to measure memory bandwidth can be very misleading as you can see above here. The value of this benchmark is seeing the differences in how AMD and Intel handle this increased number of memory banks, with AMD slowing down the bandwidth and Intel increasing latency.


SIDEBAR: The dot over the letter "i" is called a tittle.


Closing ThoughtsPage:: ( 19 / 20 )

Closing Thoughts

There are two key teaching points in this article.

First, is that you should not buy a fast CPU without the fast memory to support it. In the first group of benchmarks, a P4 2.8 GHz is able to beat an identically configured P4 3.0 GHz system if the 2.8 is running Corsair XMSPro 3200-LL RAM, in certain applications. It's worth noting that we weren't running bottom-of-the-line RAM in the P4 - it was still Crucial brand memory.

Second, it is clear that those who complained about the Registered DDR requirement/support of the Athlon64 FX did not fully understand the technology, or they did not consider pushing high-performance computing envelope. Registered DDR is critical for ensuring that performance remains stable despite a large number of installed DIMMs. With "normal" CPUs such as the Pentium 4, Athlon 64, or Athlon XP only two DIMMs (4 banks) should be installed for maximum performance. Adding additional DIMMs can in fact reduce performance as we have shown. Nobody would agree to spend more money on their system only to have it run slower. It should be noted that these are not only our isolated observations, but discussions with the motherboard and chip engineers confirm this scenario, one even suggested that some motherboards (though not their own) would be unstable or would not run at all when challenged with more than 4 memory banks occupied.

If you are currently at 2x256MB on your P4 desktop and feel that more RAM is necessary, you can consider an additional 2x256MB for a total of 4 banks. Alternatively, you would likely see higher performance with 2x512MB and replace the existing RAM. Those of you needing more than 1GB of RAM, will either have to consider 2x1GB DIMMs available in quantity only in PC2700 speeds, or move to a Registered DDR PC3200 setup and run 4x512MB Corsair XMS 3200-LL.

It's disappointing that despite the immense press coverage of the i875P and Athlon64 platforms, no one has stopped to evaluate the performance issues of greater than 4 banks of memory in a set of real-world tests. CacheMem and other synthetics cannot tell the story that a few real-world tests can show. This is particularly true because RAM is getting cheaper by the day, and it is not uncommon or difficult for users to buy more than a gig of RAM, expecting better performance. Moreover, it is particularly unfortunate that many motherboard manufacturers continue to advertise their boards with maximum memory capacity rather than maximum memory capacity at full speed. ASUS does provide documentation in their manual regarding the maximum recommended memory configuration at a given speed. We commend ASUS for their interest in keeping the customer informed about the strengths and weaknesses of their products.


SIDEBAR: Honey is the only food that doesn't spoil, but you shouldn't give it to any child under 1 year old.


Final ThoughtsPage:: ( 20 / 20 )

Conclusion

With today's ultra-high performance memory controllers, adding extra memory modules can actually cause a decrease in performance. On all non-registered systems, the use of 2 DIMM slots should be considered the optimal configuration.

Since the Pentium 4 dual channel DDR system requires memory to be installed in pairs, your options are 2x256MB, 4x256MB and 2x512MB. We expect 512MB to be the bare minimum for a high-end gaming system and so if your budget allows, you should build your Pentium 4 system around 1GB of RAM. Going with 512MB is reasonable if you are willing to accept the fact that 256MB memory modules will have limited use in the future. These Pentium 4 systems are now nearing their performance peak, with a 3rd generation Socket 478 DDR-chipset and memory bandwidth, FSB, core clock speeds, and L3 cache that have at least doubled since its introduction.

In contrast, the AMD 64 bit platforms are still very new. We are still on the first generation of chipsets and it is likely that there are bugs that are still being worked out, such as no USB 2.0 on the AMD 8000 chipset. The single-DDR channel Athlon64 platform in this regard ends up being an excellent platform for enthusiast PC building. It performs excellently in our tests, and surprisingly maintains a respectable amount of speed when going beyond 2 DIMMs in comparison to Intel's i875P Canterwood chipset. Since the Athlon64 is a single channel design, it is possible to start with a 1x512MB configuration and then move up to 2x512MB in the future. Due to this flexibility and performance characteristics of the Athlon64, we give it our Bull's Eye recommendation so long as you can accept a limit of 1GB of system memory or are willing to wait for a greater availability of 1GB DDR 400 modules.

If more than 1GB of RAM is necessary now, our recommendation is to consider the Athlon64 FX or Opteron 100-series CPUs or Intel Dual Xeons for their registered DDR memory support.

Final Verdict


Chenbro SR-205 - 84%
The SR205 is a great workhorse case because of its great cooling capabilities and extended ATX support. In addition, it's thick steel construction provides excellent EMI shielding.

Optorite DVD-/+R - 85%
The Sanyo/Optorite DVD-/+R is exactly what you'd want in a basic optical drive. You can spend more if you're really interested in getting the top-of-the-line equipment, but we have no complaints. Other good low-cost DVD+/-R burners include the Pioneer DVR-106.

Opteron 200 series - 90% Editor's Choice



The Opteron 246 is not an ideal CPU for the enthusiast given its high price and so we're admittedly uneasy giving the Opteron our Editor's Choice award. We usually like to keep our Editor's Choice for those wonder products that also happen to give a lot of bang for the buck. We see the Opteron 200 series as being on par with the Saleen S7 or Audi RS6. None of the three will ever earn an award from Consumer Reports, but in their respective markets they are leaders. It's hard to find fault with the architecture. It's highly scalable with the 2nd CPU, supports Registered DDR400 RAM, and is capable of true data-critical applications with ECC Chip Kill support (Chip Kill is like RAID for memory; it's a step above ECC). The Opteron doesn't receive Editor's Choice for use as a gaming PC, but it does earn Editor's Choice for a workstation CPU.

The AMD64 platform is a true milestone for AMD. For most of AMD's life, the business model seems to have been the role of underdog to Intel, offering the Intel alternative at a lower cost. Despite this sentiment, AMD engineers have never settled for second-best and worked to produce "better-than-Intel" products despite having a smaller R&D budget.

AMD has overtaken Intel a few times in the past. In the 486 era, the AMD DX2/80 and DX4/120 had no competition from Intel's DX2/66 and DX4/100. In the mid-Athlon era, AMD won the race to 1GHz and in that period, there was no argument for the Pentium III. Indeed, AMD 1GHz processors launched at $1300 and it was Intel who had to play the underdog for a moment by pricing the Pentium III 1GHz at only $999. Now with the Opteron, AMD has again produced an excellent core.

What's changed is timing. AMD's lead in the 486 era came at the end of the architecture's life. It was not until the second generation Thunderbird Athlon processors did AMD truly have a strong argument for becoming the CPU of gamers; the Slot A boards had nowhere near the same respect. With the Opteron, AMD is delivering a first strike attack against Intel. For the first time, AMD has shown a superior architecture at the beginning of the 64 bit processor life cycle. This is even more apparent with the Athlon64 Clawhammer core.

One of the most common complaints about the Opteron is its slow native clockspeed, but the despite the perceived sluggishness, the clockspeed of the Opteron has been increasing at a faster rate than the original Athlon and the Pentium 4. There isn't magic AMD pixie dust that's bringing this ramping up of clock speed - it's the fact that AMD's Athlon XP architecture had enough momentum to keep it going while the 0.13 SOI process was refined. The Opteron had been delayed for some time before launching… as always, the challenge will be in keeping this momentum and continuing to ramp up clock speed at the appropriate pace. Hopefully as applications are able to support 64 bit instruction sets we will see the performance of the Athlon 64 and Opteron increase. Think of it as a software upgrade that will make your system much faster.

If you are a forward thinker and an optimist you'll go with the AMD 64 bit chips, for the greater potential upside as software engineers catch up to the hardware engineers. If you prefer saving bonds vs stocks, and are more conservative than a lower cost, more mature, Intel Pentium 4 system will be excellent for your current needs.

Thank you for reading through this entire article.



SIDEBAR: What did you think of Alan’s findings? Shocked by the results? We were surprised too! Talk about it in the news comments by clicking here!

© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]