Summary: Alan sent me this message a few days ago: “In one of our real-world multithreaded benchmarks, the dual Dual-Core AMD Opteron was so fast that I repeated the test five times to make sure something wasn’t amiss.”
CPUs have been on their “last legs” many times. There was once talk of the CPU outclassing the motherboard, memory, and peripheral bus – where components such as RAM or SCSI controllers simply could not deal with the high bus-speeds of the CPU. Moore’s Law was supposed to be doomed with CPUs no longer being able to improve, and lo and behold, the concept of clock multipliers was introduced with the 486 DX2/66. Imagine a world where we didn’t have clock multipliers and you were trying to run your soundcard on a bus speed of several gigahertz!
Well, most recently, there has been talk of CPUs drawing too much power, dissipating too much heat, and that improvements in fabrication weren’t progressing fast enough to allow CPUs to continue to grow at their current rate. The CPU industry was supposedly on its last legs again. Yet the solution to this problem was found in multi-core processor technology where multiple CPUs could be integrated into a single physical chip. In 10 years, we will all look at the introduction of multi-core CPUs to the mainstream to be as meaningful and significant as the invention of the clock multiplier – that is to say, one day we will all think that multi-core processors were how modern CPUs have always been.
The first dual core CPU on the market was IBM’s Power4 chip in 2001, but this was reserved for true enterprise grade servers and sites willing to shell out over a half a million dollars for a single server: by Christmas 2005, many gamers and PC enthusiasts will likely own a dual core CPU.
Today, we’ll be looking at AMD’s multi-core technology, the latest manufacturer to enter the multi-core world. Officially, in today’s announcement AMD’s dual core technology is limited to Socket 940 Opteron CPUs, however it doesn’t take any effort to predict that this technology will trickle down to the Athlon64 line in the future – AMD readily confirms that desktop dual core processors are on their way late in the 2nd quarter.
SIDEBAR: What would a dual processor article be without a reference to a Double-Double from In-N-Out?
The benefits of having multiple processors for non-gaming apps such as media and content creation are obvious. Digital photography programs such as Photoshop, Capture One DSLR, Bibble, and Noise Ninja are all written to take advantage of multiple CPUs. The same is true with scientific computing applications such as LS-DYNA, and most video NLE and compositing software also are multi-processor capable. After all, it is these kinds of software applications that have driven the market for high-end systems in the past.
However, a fair question is what multiple processors can offer for the rest of us who may simply play games and do “normal computer stuff.” Well one of the benefits of today’s modern operating system is that it’s all multithreaded. So, while doubling performance in a single application using a second CPU requires dedicated software support, improving your overall system performance with additional CPUs when running multiple applications simultaneously happens automatically. The classic best-case-scenario marketing examples are things like encoding a MP3 or DVD in the background while playing games. However, the benefits of multiple processors are still present in the day-to-day experience. As any owner of a dual processor can confirm, Windows itself is just a little bit faster since there’s always a “free” CPU ready to deal with your user input and clicks. Is it a sign of bloated software that you need dual processors to get the maximum responsiveness in a GUI? Probably, but if you need to run Windows XP, the point about efficiency is moot – dual CPUs are still faster.
What about Games?
The traditional teaching has always been that games don’t benefit from multiple processors. There are a number of reasons for this “was true but not for long” statement. One of the main reasons was that until Windows 2000, dual-processor systems required Windows NT, which could not support DirectX gaming. Not only that, since historically motherboards requiring dual processors were engineered for mission-critical stability, they often required slower, but more reliable registered ECC RAM resulting in poorer performance. Third, games have traditionally been single-threaded applications where the second CPU offered no performance advantage on benchmarks. Finally, the original Sound Blaster Live, the standard gaming sound card when dual CPUs first became affordable, had very unstable drivers in SMP setups.
In AMD’s case, the memory controller is tied to each physical processor. The two cores of a multi-core CPU are connected with AMD’s “Direct Connect Technology,” marketing talk for having two CPU cores with direct on-die access to the shared on-die memory controller.
In comparison to a dedicated dual Opteron system, a single dual-core Opteron has half the memory bandwidth but less latency. However, that’s the pessimistic way to look at it – a single dual core Opteron has the same memory bandwidth as one single-core Opteron, but has twice the computational units… and while I said a pair of single-core Opterons should outperform a single dual-core Opteron, a pair of dual core Opterons will be even more impressive.
At today’s launch AMD has three platforms, the Opteron x65, the x70, and x75 where the x is 1, 2, or 8 depending on whether or not the chip supports 1, 2, or more CPUs. The x65 is a dual-core 1.8GHz (i.e. two Opteron x44’s), the x70 is a dual-core 2.0 GHz (i.e. two Opteron 246’s), and the x75 is a dual-core 2.2GHz (i.e. two Opteron 248’s). The fastest clocked Opteron is the 252, which runs at 2.6GHz.
Intel’s dual-core approach is still built on top of Intel’s current design philosophy in which the memory controller is still part of the motherboard rather than the CPU. Likewise, instead of something such as HyperTransport, the interconnect between the two CPUs is also similar to a traditional Xeon architecture meaning that the processors don’t have as much bandwidth. Moreover, the current dual core processors from Intel only have a FSB of 800MHz as opposed to the fast 1066MHz FSB of the single-core Pentium 4 Extreme Edition 3.73GHz. Dual-core Pentium processors will require new motherboards and chipsets but in theory, AMD dual-core CPUs should be compatible with today’s chipsets and motherboards.
For the current dual-core processor launch, AMD is focusing their dual core technology on the high-end with Socket 940 Opterons. From a business perspective this makes sense since early on, it will be the typical Opteron user who will want dual-core CPUs first and many of these new customers will be looking to get two dual-core processors rather than just one. In a way, AMD now has the potential of selling the equivalent of 4 CPU cores per physical system. Not a bad deal for them.
Of course, that’s the beauty of the Internet and the free-access to review websites – you can always read reviews from multiple sources and even today, FiringSquad has two completely independent writers covering the same product (Chris’s review is coming later today).
We use SiSoft Sandra Professional for our synthetic tests. These tests are probably best representative of system performance when you’re developing customized software. That is to say, while these tests won’t be useful to the average end user, they are probably the most predictive for all those aerospace and semiconductor companies who need all the compute power they can get their hands on.
My scientific computing tests involve two applications: MATLAB and LS-DYNA.
MATLAB Release 14
MATLAB is your basic multipurpose scientific computing application. Every engineer and his brother have used Matlab at one point or another. It's a very flexible application used in high school to teach basic Newtonian physics and was used in industry to design the Joint Strike Fighter. It used to be said that Matlab was single-threaded because for most tasks, a lot of computation time is spent processing the script, something that isn't parallel at all. Parsing scripts isn't a very glamorous aspect of scientific computing, but it's very important to real-world use. Think of the car that does 0-60 in 4 seconds but requires you to refill the gas tank every 10 miles. There's no doubt that the car is fast, but no one would really use it. Well, starting with Release 14 of Matlab, multithreaded support is included through the use of the Intel Math Kernel Library. Although this library is optimized for Intel processors, it works with AMD CPUs too. AMD’s own optimized math performance library exists for Linux, where most of the applications lie, but they’ll have a Win32 version compatible with Matlab in the future.
LS-DYNA Release 970
LS-DYNA is a general purpose transient finite element solver capable of simulating complex real-world problems. That’s the party line at least. Essentially, it’s software that lets you simulate all sorts of things. Automakers use it to developer safer cars by simulating crash tests, the military uses it to simulate weapons explosions, and scientists can use it to study biomechanics. I will bench the CPUs using two classic tests, a 3-vehicle collision and a single front-collision. The 3-vehicle collision takes more than 24 hours to complete – we do not have these numbers ready for this round of articles.
With 1GB memory cards exceptionally affordable, digital photographers can take hundreds to thousands of pictures a day. With digital SLRs, photographers aren't taking JPEGs but instead are capturing RAW images containing all of the data at the time of the shot. With regular workloads from a few hundred to a few thousand images, even small differences in performance can make a big difference in the long run. Moreover, unlike 3D rendering applications where you can let it sit and an animation is a month-long project, photographers often need the end results right away so they can proof it, edit the images, and sell the images.
In addition, when it comes to 3D content creation, most of it is user-limited during the day (the CPU is idle when the artist is thinking about what he wants to draw). In the evening it's an overnight render of animation, which essentially means that if the render starts at the end of the workday, it just needs to be ready before the beginning of the workday the following day - it doesn't matter if it finishes at midnight or 15 minutes before the workday starts.
With increasing megapixel counts, cheaper flash cards, and more and more consumers able to afford cameras supporting the RAW file format, I cannot imagine a better genre to evaluate the CPUs of tomorrow.
Our digital photography benchmarking suite now consists of three applications, all worthwhile to check out (they all have free trial versions).
Capture One D-SLR 3.7 RC1
Bibble Pro 4.2.2
Noise Ninja 2
Digital Video Tests
Working with digital video is also an area that has gain much interest in recent years thanks to the convenience of DV and HDV format camcorders. Due to time constraints, we only had an opportunity to test two applications.
Adobe After Effects 6.5 Pro Bundle
Adobe After Effects is a compositing tool similar to discreet Combustion or Apple Motion. In our test, we’ve gone with a standard publicly available benchmark project starting with a generated fractal sequence and using that in a multi-layer composition.
Canopus ProCoder 2.0
Our video encoding tests were done with Canopus ProCoder 2.0. We ran two different tests. The first test was a torture test converting a 1440x1080p WMV-HD clip into a 24MBps 1920x1080p MPEG-2 file. This is a stress test for the system as real-time decoding of WMV-HD clips in real time already requires a 3GHz class CPU. Since the WMV decoding isn’t optimized for multiple processors, we also ran a second test that converted an uncompressed 1440x1080p AVI to a 24MBps 1920x1080p MPEG-2 file.
Colfax are the guys you turn to when you want that Quad Opteron 852 with 32GB of System RAM and 16 terabytes of HDD space in a RAID 5 configuration with a 1.3kW PSU (about $40k). Not only can you buy that system, but you’ll have the comfort of knowing that they’ve built that type of system before. While I was there, they were building dozens of rackmounted systems for customers demanding the full 16GB of memory.
One of the odd things that has happened as a result of building high-end servers and workstations for close to 20 years is that they’ve never needed to have a huge on-site service team. Yet, if you think about it, it makes sense –people who buy super-servers fall into two categories. There are people who are simply throwing hardware at a problem and need all of the handholding they can get, and then there are the academic and defense guys who need these systems to run exotic custom software, or even run classified custom-built PCI cards where any computer problems won’t be helped by GeekSquad.
Anyhow, they build high-performance desktops too. They bridge the gap between the power users who would rather not pay for the handholding of the full-service guys, but don’t want to build a system due to time constraints or those who prefer the convenience of having a system ready to go.
Probably most importantly, they’re actually one of the few retailers who have a large allotment of Dual Core AMD Opterons at launch, so if you’re in the market for one, they’ll definitely be a good source to find such systems.
We were actually surprised at how quiet the system was, although this is likely due to the high-end power supply and the fact that the Chenbro chassis uses a large 120mm rear exhaust fan and 80mm front fan, which produces less noise than multiple smaller fans.
SIDEBAR: Should we test a quad Dual-Core Opteron system for fun?
Dhrystone ALU (higher is better)
Whetstone FPU (higher is better)
Integer SSE2 (higher is better)
Floating Point SSE2 (higher is better)
Matlab N72 – Time to completion (shorter is better)
Although Capture One is an industry standard program used by professionals with medium format digital camera backs in the 5 digit price range, we were surprised to see that Capture One is only capable of processing to 2 threads. This means that on a 2x Dual-Core Opteron system, only one of the Dual Core CPUs is in use.
Time to Process a Canon EOS-20D CR2 (shorter is better)
Time to Process a Canon EOS 1D Mark II CR2 (shorter is better)
Time to Process a Canon EOS 1Ds Mark II CR2 (shorter is better)
Time to Process a Nikon D2H NEF (shorter is better)
Time to Process a Nikon D2X NEF (shorter is better)
Time to Process a Phase One P25 RAW image (shorter is better)
The $500 Capture One Pro only supported 2 CPUs, but the one-man team of Bibble Labs and their $129 Bibble Pro 3 OS license: Windows, Linux, and OSX supported all 4 processors. When I saw the performance of the new 2x Dual Core AMD Opterons, I did not believe it. I checked to make sure I wasn’t caching something and making sure Bibble wasn’t processing the background. The dual Dual-Core AMD Opteron was so fast on this test that I repeated the test five times to make sure something wasn’t amiss. Turns out, it really was that fast.
All images were converted using a “normal contrast” preset with no noise reduction.
Time to Process a Canon EOS-20D CR2 (shorter is better)
Time to Process a Canon EOS 1Ds Mark II CR2 (shorter is better)
Time to Process a Nikon D2X NEF (shorter is better)
Time to Process 1 gigabyte of Canon EOS-20D CR2 images (shorter is better)
One of the great things about Noise Ninja is that it’s noise-removing algorithm is designed for multiprocessor systems. While other software such as Neat Image support multiprocessor systems, all they’re really doing is running multiple files simultaneously. That is, with Noise Ninja, all CPUs are used when cleaning an image. With NeatImage, each CPU focuses on its own image.
Time to Process an ISO 800 Canon EOS-20D – 8 threads (shorter is better)
Time to Process an ISO 800 Canon EOS-20D – 4 threads (shorter is better)
Time to Process an ISO 800 Canon EOS-20D – 2 threads (shorter is better)
Time to Process an ISO 800 Canon EOS-20D – 1 thread (shorter is better)
We did not have much time to test After Effects using our own in-house compositions, but we did run a simple AE Total Benchmark Test.
Comp 1 – Fractal Generation (shorter is better)
Comp 2 – Multi-layer compositing (shorter is better)
Canopus Procoder 2.0
We used a 21 second clip.
1440x1080p WMV-HD to 1920x1080p 24MBps MPEG-2 (shorter is better)
1440x1080p uncompressed HD to 1920x1080p 24MBps MPEG-2 (shorter is better)
So who should get a Dual-Core CPU? Well, if you read the article, you’d know that the answer is everyone – by the end of the year, when things move from the workstation to the home desktop. How about right now? Well, people who’ve always looked to dual processor machines (IT infrastructure, custom software, digital media) should strongly consider these new systems. Professional photographers who regular work with hundreds to thousands of images a day with cameras such as the 1Ds or 1D Mark II from should strongly consider getting a Dual Opteron system as their next upgrade. It truly is the “hardware accelerated RAW processor” that many have dreamed of.
Where can I get one?
Although the Dual-Core Opterons were just announced today, the Dual-Core Opterons are actually shipping today. They are in relatively limited supply though. That said, Colfax does have one of the larger allotments of Dual-Core Opteron systems ready to ship, ranging from basic single Dual-Core Opteron systems to the full dual Dual-Core Opterons. You can browse their workstation system configurations online here and take a look at the launch day server offerings here.
|© Copyright 2003 FS Media, Inc.|