[ Print Article! ]

AMD Phenom II X4 940 Black Edition Performance
January 07, 2009 Brandon Sandman Bell

Summary: With 3X the L3 cache as Phenom and its 45-nm manufacturing process, AMD's Phenom II is going Penryn-hunting. Is it able to slay its prey though? What about OC'ing? We examine all these issues and more in today's article. See what's new with AMD's latest 45-nm CPUs and what we can expect from AMD in the near future inside!


AMD Phenom II X4 940 Black Edition ReviewPage:: ( 1 / 12 )

[image]

<% print_image("01"); %><% print_image("02"); %>



Let’s face it, Phenom didn’t cut it for most of the hardcore AMD crowd at launch. The Phenom 9850 Black Edition finally became somewhat tempting for these users as a result of the latest price cuts last summer. But nothing AMD has offered lately has dominated the market like the legendary 3500+ and X2 3800+ did for the budget-minded enthusiast a few years ago.

Before we get AMD enthusiasts hopes up too much though, a little reality check: Phenom II is not a Core i7-killer. Core i7 is still the world’s fastest CPU.

But AMD isn’t going after the bleeding edge sacrifice-your-first-born-child-in-order-to-afford-it crowd anymore. Instead they’re focusing on the value-conscious consumer who wants good performance, but at the same time also wants something affordable. Think of the guy who buys the Camaro SS instead of the Corvette, or the BMW 135i instead of the M3. You get the idea, ~80-90% of the performance of the high-end model, but at a significantly lower price.

This is the space where AMD hopes to make some money nowadays.

So now that you know how AMD is positioning their CPUs, it’s time to found out if Phenom II hits the mark or not. As the rumors have suggested for the past few months, AMD has prepped two CPUs for launch, a 3.0GHz model (the Phenom II X4 940) and a 2.8GHz part (the Phenom II X4 920). Both CPUs feature sub-$300 price tags and support AMD’s AM2+ socket, and are both backward-compatible with AMD’s existing AM2 Phenom/Athlon X2 infrastructure of motherboards. AMD has also disclosed the performance we can expect from DDR3-based AM3, as you can see in the following slide:



As you can see, AMD projects a 20% improvement in clock-for-clock performance over Phenom 9950, due largely to the increase in clock speed, which buys Phenom II 940 an additional 12% in performance. AMD estimates an additional 3% comes from instructions per clock (IPC) enhancements included in the new core, while another 5% comes from the CPU’s larger L3 cache. Finally, AMD projects a performance improvement of nearly 5% from DDR3-1333 when it becomes available.

This is the high-level overview of Phenom II though. Let’s take a closer look under the hood of the new CPU, and see how far the new chip overclocks.



Phenom II enhancementsPage:: ( 2 / 12 )

[image]

<% print_image("03"); %>


Over time AMD’s engineers managed to massage more speed out of the architecture, ultimately culminating with the 2.6GHz Phenom 9950, and eked yet more speed out of the core thanks to ACC, which was integrated into AMD’s SB750 South Bridge and 790GX chipset (newer 790FX motherboards were also updated to include SB750 and ACC), but Phenom never really lived up to its full potential.

AMD took the lessons learned through developing ACC with 65-nm Phenom and baked them into their 45-nm Phenom II silicon. As a result, ACC no longer provides the OC’ing benefits it did previously with 65-nm Phenom parts. In the words of AMD: “you can just as well leave ACC off for Phenom II OC testing. Since, the "go-fast" things we learned from ACC (and those CPU parameter adjustments) were factored into 45nm, the benefit unlocked previously by ACC in 65nm silicon is already being realized without having to use the ACC feature separately.” Over the Christmas holiday (before hearing this from AMD) we’d already attempted OC’ing our Phenom II CPU and sure enough, ACC provided no benefit when OC’ing the processor. We actually thought something was wrong with our 790GX motherboard until we heard back from AMD.

But that wasn’t the only tweak made with 45-nm. AMD’s new 45-nm manufacturing process continues to utilize strained silicon and silicon-on-insulator (SOI), but new to the manufacturing process is immersion lithography. With immersion lithography, liquid is used between the projection lens and the wafer’s die. According to AMD, this improves focus and provides a 40% gain in resolution versus conventional lithography. AMD believes immersion lithography is more efficient than Intel’s approach, and also cites that Intel won’t incorporate it until they switch to 32-nm.

The smaller process also improves energy efficiency. To further reduce power consumption though AMD has incorporated additional power states, including a new 800MHz P0 state as well as cache flush on halt: with 65-nm Phenom, an idle processing core would have to continue operating (albeit at a lower clock speed) in order to keep the data in its L1 and L2 caches available to the other cores. In contrast when an idle core in AMD’s 45-nm Phenom II enters halt state, it flushes the contents of its L1 and L2 caches into L3, which is a shared pool of memory that is accessible to the other cores. The idle core then essentially shuts down to save power.

Between the new manufacturing process, new power states, and other enhancements in Cool'n'Quiet 3.0, AMD estimates power savings of 40% at idle. Now obviously you won’t see all of that at a system level when testing at the wall, but this is a nice reduction that should also allow the new processors to generate less heat than 65-nm Phenom.

[image]
<% print_image("04"); %><% print_image("05"); %>

IPC enhancements

One of Phenom’s key weaknesses when compared against Core 2 Quad was its IPC. IPC was previously a hallmark of AMD’s Athlon/Athlon X2 when compared against Pentium 4/D, but today’s Phenom CPUs are simply behind Intel on a clock-for-clock basis.

Phenom II improves the situation a little, thanks mostly due to its larger L3 cache. AMD has also managed to incorporate a few tweaks that should directly improve the CPU’s number of instructions executed per clock cycle though. AMD has added path-based indirect branch prediction to improve the processor’s ability to handle branch instructions. We were also told that one algorithm for handling branches has been optimized to slightly improve branch prediction.

Phenom also boasts larger load/store and floating-point buffering. AMD wouldn’t provide specifics on how much larger the buffers are, but this tweak should improve missed buffer performance. AMD has also added floating-point register-to-register move instruction improvements into the processor.

The processing cores inside Phenom II can also probe their L1 and L2 caches twice as often as Phenom, effectively doubling core probe bandwidth. Enhanced pre-fetching allows Phenom II to recognize data access usage patterns and speculatively pre-fetch data instructions that are likely to be needed ahead of time into cache.

Phenom II also features improved LOCK pipelineing: under certain conditions, out-of-order CPUs like Athlon, Phenom, and Core 2 have to execute code in the order it was originally written. These CPUs normally like to take the instructions and reshuffle them in a way that maximizes the processor’s efficiency. Thanks to its improved LOCK pipelineing, Phenom II is able to once again shuffle these instructions more than previous AMD CPUs, locking down less of the pipeline and thus improving Phenom II’s efficiency. This really plays dividends when multiple LOCKs are in process simultaneously.

Finally Phenom II boasts lower latency to data stored in L3 cache than Phenom. The exact amount is open to debate, and it may not show up at all in some synthetic benchmarks, but AMD has attempted to reduce L3 latency. Phenom II’s L3 is more associative than Phenom as well, with Phenom II featuring 48-way associative L3 cache versus 32-way associative L3 in Phenom. This increases the L3 cache’s hit rate.



Specs and overclockingPage:: ( 3 / 12 )

The following chart summarizes the differences between AMD’s 45-nm Deneb Phenom II CPUs versus the original 65-nm Phenom with Agena core:

Phenom II vs Phenom Feature Comparison
Phenom II X4 940 Black EditionPhenom X4 9950 Black Edition
CoreDenebAgena
Clock Speed3.0GHz2.6GHz
L1 Cache Size64KB instruction+64KB data per core (512KB per CPU)64KB instruction+64KB data per core (512KB total per CPU)
L2 Cache Size512KB per core (2MB total per CPU)512KB per core (2MB total per CPU)
L3 Cache Size6MB (shared)2MB (shared)
Memory ControllerIntegrated 128-bit wide memory controllerIntegrated 128-bit wide memory controller
Memory Controller Speed1.8GHz with Dual Dynamic Power Management2.0GHz with Dual Dynamic Power Management
Memory Types SupportedSupport for unregistered DIMMs up to PC2 8500 (DDR2-1066MHz)Support for unregistered DIMMs up to PC2 8500 (DDR2-1066MHz)
HyperTransport 3.0 LinkOne 16-bit/16-bit link @ up to 3.6GHz full duplex (1.8GHz x2)One 16-bit/16-bit link @ up to 4.0GHz full duplex (2.0GHz x2)
Total Processor Bandwidth31.5 GB/s33.1 GB/s
PackagingSocket AM2+ 940-pin organic micro pin grid array (micro-PGA)Socket AM2+ 940-pin organic micro pin grid array (micro-PGA)
Process Technology45-nanometer DSL SOI (silicon-on-insulator) technology65-nanometer DSL SOI (silicon-on-insulator) technology
Approximate Transistor count~758 million~ 450 million
Approximate Die Size258 mm2285 mm2
Nominal Voltage0.875 - 1.5 Volts1.05-1.30 Volts
Max Ambient Case Temp62 degrees Celsius61 degrees Celsius
Max TDP125W140W
Price$275$174



Notes

As you can see in the chart above, thanks to its smaller manufacturing process Deneb features a smaller die size than Agena (258 mm2 vs 285 mm2) despite its increase in transistor count (758M vs 450M). The bulk of the new transistors obviously come from the dramatically increased L3 cache size of 6MB (versus 2MB in Agena).

Thanks to its smaller die, Deneb is actually cheaper for AMD to produce than Agena, assuming equal yields (which may or may not be the case at this point considering the maturity of AMD’s 45-nm process).

The other spec that stands out is Phenom II 940’s slower memory controller. Running at just 1.8GHz, the new memory controller actually trails Phenom 9950 by 200MHz and is the same speed as the original Phenom 9600. Similarly, the HyperTransport interface runs in sync with the memory controller, operating at 1.8GHz. The net affect is overall CPU bandwidth is down from 33.1GB/sec in Phenom 9950 to just 31.5GB/sec in Phenom II 940.

We were told that this compromise was made in order to bring Phenom II to market today. Future AM3-based Phenom CPUs launching later this year will have this issue corrected, with memory controller/HyperTransport speeds in line with Agena’s 2.0GHz. We’ll also see a more diverse range of AM3-based Phenom II CPUs in comparison to AM2, with AMD eventually offering dual, triple, and quad-core AM3 Phenom II CPU offerings. In comparison it’s been rumored that AM2-based Phenom II may have an awfully short lifespan, with the 920 and 940 being the only AM2 Phenom II offerings to hit the market, with the CPU’s reaching end-of-life (EOL) status perhaps as soon as late Q2 of this year.

Rumors also point to the first AM3 processors arriving in a little over a month.

Normally this would be a cause of concern, but considering that AM3 CPUs are backward-compatible with today’s AM2+ platform, this shouldn’t be much of a problem. You can upgrade to AM2+ CPU today, and then transition to AM3 a year from now without having to buy another motherboard.

Looking at the nominal voltage in the chart above, you’ll also notice that Phenom II now tops out at as high as 1.5V compared to Agena’s 1.3V. Considering its smaller process, you’d normally expect the nominal voltage to be a little lower than AMD’s 65-nm parts.

In this case we were told that the fab likes to have lots of room to play with in terms of voltage when manufacturing CPUs. This obviously helps AMD obtain better yields than sticking with a narrower voltage range. Fortunately this shouldn’t affect max power consumption. One 3.0GHz part may come off the line running at 1.35V, while another may need 1.4V, but they’ll both still maintain a 125W max TDP. OC’ers though will probably want to hunt down the CPUs with the lowest voltage in the hopes that it will give them more headroom when overclocking (this may also be a case where AMD’s leaving themselves room for future CPUs running at higher clock speeds).

For what it’s worth, we noted an operating voltage of 1.336V for our particular Phenom II 940 sample.

Pricing

As we mentioned earlier, AMD’s providing two Phenom II X4 SKUs at launch, the Phenom II X4 940 Black Edition clocked at 3.0GHz, and the Phenom II X4 920 running at 2.8GHz. As a “Black Edition” part, the 940 ships with an unlocked clock multiplier, allowing enthusiasts to adjust the multiplier however they wish, while the 920 has a fixed multiplier setting of 14.0. The 940 is priced at $275 while the 920 is priced at $235. As always keep in mind that these are bulk prices AMD charges in quantities of 1,000, and not the CPU’s actual street price, which can actually be lower. The pricing of AMD’s existing 65-nm Phenom processors remains unchanged, with the 9950 BE and 9850 BE both officially selling for the same $174 and the 9750, 9650, and 9550 priced at $154.

CPUs are officially shipping as of now, and should be available for purchase in systems and at the retail level starting today.

Overclocking

Considering the enhancements AMD has integrated into Phenom II and its new 45-nm manufacturing process, we were eager to see how far we could push our particular Phenom II 940 BE chip. Starting with OCZ DDR2 RAM, MSI’s 790GX-based DKA790GX Platinum, and our trusty Zalman CNPS 9700-Cu cooler we anxiously dialed in speeds via AMD’s Overdrive utility.

First we wanted to see how far we could OC the processor at stock voltage. Here we maxed out at just 3.2GHz (200x16)! Any attempts to go any further and we’d get BSODs when running apps like Cinebench or gaming. Even cranking the HT speed up to 204MHz with the multiplier set at 16.0 eventually resulted in a BSOD when benchmarking.

So our particular sample wants more juice. All right, we’ll give it more voltage. We proceeded to crank up the HT speed and clock multiplier, running benchmarks all along to check stability. Eventually we ended up settling on a speed of 3.745GHz (16.5x227). We needed 1.55V of juice to get the CPU to run stable at that speed. We could boot and load Windows at higher speeds approaching 3.9GHz, but stability was sketchy at best at those speeds.

[image]

<% print_image("06"); %><% print_image("07"); %>

Keep in mind that we tested with 64-bit Windows Vista. With a 32-bit OS we probably could have pushed the processor a little further.



System SetupPage:: ( 4 / 12 )

Intel Core 2 Quad Q9400
Intel Core 2 Quad Q6600
Intel Core 2 Duo E8600

ASUS P5E3 Premium

4GB (4x1GB) OCZ DDR3 PC3-16000 Platinum

Intel Core i7 920 w/Turbo Mode Enabled

ASUS P6T Motherboard

3GB (3x1GB) Qimonda 1067 CL7 non-ECC

AMD Athlon X2 6000+
AMD Phenom X4 9950 Black Edition
AMD Phenom II X4 940 Black Edition

MSI DKA790GX Platinum

4GB (4x1GB) OCZ DDR2 PC2-8500 Platinum

ATI Radeon HD 4870 X2
Catalyst 8.12

500GB Western Digital Caviar SE16

Windows Vista Ultimate 64-bit w/Service Pack 1


Benchmarks

Lost Planet
Left 4 Dead
Crysis
Far Cry 2

Notes

For all of our previous CPU tests we’ve relied on NVIDIA’s GeForce GTX 280 as the GPU used for testing. With a new year though we wanted to crank things up a bit, opting for ATI’s Radeon 4870 X2. By moving to a dual GPU card, we’ll be pushing the capabilities of these CPUs further, plus it prevents us from hitting the GPU-limited situations we’ve encountered previously with high quality settings at 1600x1200 and 4xAA. In all of our previous CPU reviews the GPU has always been a bottleneck in this situation, so by stepping up to a dual GPU card we were hoping to alleviate this.




Phenom II vs Phenom Clock-for-Clock Performance ComparisonPage:: ( 5 / 12 )

Eager to see how the clock-for-clock enhancements integrated into Phenom II improve its performance over Phenom, we downclocked our 940 CPU to the same 2.6GHz clock speed as Phenom. Let’s look at the results:

Gaming






We see some nice improvements in our gaming tests, with Crysis and Far Cry 2 performance improving 8% while Left 4 Dead sees a more substantial improvement of 14%! Lost Planet’s performance improves just 4% with Phenom II.


Media benchmarks






We continued to see clock-for-clock improvements in most of our media tests, Cinebench performance improves by 4%, while we shaved 6% off our MP3 encoding with Phenom II, shaving 8 seconds off our encode time. Phenom II DivX encoding was 4% faster than Phenom on a clock-for-clock basis.

Now let’s see how Phenom II stacks up against Intel’s Core i7 and Q9400 CPUs.




Power/Media Encoding/Rendering BenchmarksPage:: ( 6 / 12 )











Valve Particle Simulation Benchmark



Phenom II outperformed its closest rival, Core 2 Q9400 in both Cinebench (by 9%) and our Windows Media Encoder testing (6%), but the Q9400 reigned supreme in our DivX conversion tests thanks largely to SSE4. Valve’s particle simulation benchmark also ran faster on Core 2 Q9400, with the 940 trailing by 10%.

Our load power consumption testing was done with Far Cry 2 running very high settings at 1600x1200 4xAA/8xAF. At this setting the majority of the power actually comes from our Radeon 4870 X2 GPU, although you can see that Phenom II improves upon Phenom 9950, but the Q9400 bests it in overall power consumption.




Left 4 Dead PerformancePage:: ( 7 / 12 )

Left 4 Dead – Direct3D





Notes

Unlike previous Source engine games, Left 4 Dead is fully capable of taking advantage of multi-core CPUs and we made sure to enable that setting when testing. The Phenom II 940 trails the Q9400 by 3% at 800x600 with low quality settings. The gap narrows to less than 1% at 1600x1200 with high settings. As you can see though some of the slower CPUs are holding back the Radeon 4870 X2 GPU, with Core 2 Q6600 and Phenom 9950 running over 10% slower than the Phenom II 940.



Far Cry 2Page:: ( 8 / 12 )

Far Cry 2 – Direct3D





Notes

Far Cry 2 seems to really run well on Intel CPUs, even at 1600x1200 with very high settings and 4xAA. At 800x600 the Phenom II 940 processor trails Q9400 by 18%. Even the dual-core E8600 manages to outrun Phenom II in this game. This is a surprise given that the game is multi-threaded.



CrysisPage:: ( 9 / 12 )

Crysis – Direct3D





Notes

Crysis is another game that ran faster with Intel’s Q9400. At 800x600 the Phenom II 940 trails by 7%. Under high quality settings the 4870 X2 GPU begins to bottleneck and the margin separating both processors narrows to just 1%, within the margin of error.



Lost PlanetPage:: ( 10 / 12 )

Lost Planet – Direct3D





Notes

Lost Planet is still the best example of a multi-threaded game that scales well as you add more cores. This allows the Core i7-920 to really pull away from everyone else thanks to Hyper-Threading, where the game can run up to 8 threads simultaneously. Phenom II 940 manages to pull ahead of Q9400 in this game, running 6% faster than Intel at 800x600. This actually puts the processor more on par with the Q9550 than anything else.



OverclockingPage:: ( 11 / 12 )














ConclusionPage:: ( 12 / 12 )


The new process allows AMD to scale to dramatically higher clock frequencies than Phenom. Launch speeds top out at 3.0GHz, and we managed to OC our CPU over 3.7GHz! Getting Phenom 9950 to 3.0GHz wasn’t impossible without ACC (last year we managed 3.05GHz with our chip), but you needed a little bit of luck and good cooling. With Phenom II a 400MHz OC is nothing; a few lucky souls may even be able to hit 3.8 or 3.9GHz with the best air cooling.

When compared to Intel’s Penryn line of CPUs, the closest direct competitor to the Phenom II 940 Black Edition is Intel’s Q9400. Here the Phenom II trades wins with the Intel CPU, with each processor winning tests in our suite of benchmarks, but the Q9400 runs faster overall. It’s a much closer fight than where AMD sat a year ago however, but considering this, AMD may want to shave a little off the top of the Phenom II 940’s price. $250 looks about right in our opinion (Intel’s Q9400 officially lists for $266, making it $9 cheaper than the Phenom II 940 as of right now). AMD could then reestablish the $275 price tag when DDR3-based AM3 arrives. It should run a little faster than today’s AM2-based 940 CPU, and should be able to push ahead of the Q9400 in a couple of the closer benchmarks.

Honestly though variety is probably our biggest disappointment with today’s launch. As good as the AM2-based Phenom II CPUs are, we wish AMD offered more variety in the Phenom II lineup right now. A 2.6GHz sub-$200 SKU would be awesome for the gamer on a budget, while a high performance range of Phenom II models with 1MB of L2 cache per core and a really large 12MB L3 could probably take on Intel’s $300+ Penryn processors. On the other hand though considering today’s economy, AMD is probably taking the right approach by focusing on the $100-$300 mainstream sweet spot of the market.

In summary Phenom II isn’t going to blow away the CPU world like Conroe Core 2 did in 2006, but just as the Phenom 9850 and 9950 Black Edition CPUs gave AMD a very competent competitor to Core 2 Quad Q6600, Phenom II 940 and 920 give AMD a serious challenger to low-end quad-core Penryn CPUs like the Q9400. We’re not getting a next-generation performance leap over Phenom just yet, but thanks to its higher clock speeds and larger cache Phenom II is a significant improvement over Phenom 9950 and will get even sweeter when AMD moves to DDR3 and AM3. The frequency scaling is also good. AMD hasn’t quite caught up to Intel in this department, but they have narrowed the gap significantly with Phenom II.

We’re a little surprised Intel has chosen not to respond to Phenom II just yet. If they don’t watch out, AMD could win back some of the enthusiasts who may have planned to go Intel with their next upgrade. AMD’s certainly got the more stable platform in terms of providing a solid upgrade path; AM2+ and AM3 CPUs are interchangeable with both platforms. With Nehalem eventually relying on two different sockets, and LGA775 finally coming to an end, AMD’s platform roadmap is definitely more stable.

But will this be enough for AMD to win back lost market share? That’s a question that will be interesting to follow over the course of the next 12 months…


© Copyright 2003 FS Media, Inc.
[ Print Article! | Close Window ]