Summary: Performance junkies look out: Intel's next-generation Nehalem CPU has arrived! The CPU's architecture has been designed from the ground up to deliver improved IPC, while it's also capable of dynamically OC'ing itself to further enhance performance. See how the new Core i7 CPUs stack up against CPUs ranging from the Core 2 Duo and Athlon X2 6000+ up to the Core 2 Extreme QX9700. We also managed to OC these chips to really high levels. Read the full scoop inside!
If you recall, this was one of the chief weaknesses in Core 2ís predecessor, Pentium 4/D. Pentium 4 processors sacrificed the amount of work performed per clock in exchange for more pipeline stages, 31 in the case of latter Pentium D processors. Essentially Intel made a conscious decision to sacrifice IPC in exchange for higher clock speeds. Ultimately this decision came back to haunt them when Pentium 4/D had trouble scaling to higher clock speeds of 4GHz and beyond.
Core 2 never hit the clock speeds of Pentium 4, but because of its improved IPC, it didnít have too in order to achieve breakthrough performance.
But Intel didnít stop there. To further enhance performance, Core 2 also featured more accurate branch prediction, improved SSE/SSE2/3 performance, and a unified L2 cache with more advanced prefetchers residing in the L1 and L2 caches to reduce memory access.
Ultimately Core 2 was over two times faster than Intelís previous Pentium processor, and it also significantly outperformed AMDís fastest Athlon X2 and FX processors, all while generating very little power and with tons of frequency headroom for overclockers. It wasnít uncommon for Core 2 Duo E6300 and E6400 chips to push 3GHz.
Late last year Intel gave Core 2 a midlife upgrade with their Penryn architecture. Besides its smaller 45-nm manufacturing process, Penryn also featured double the divider speed over Conroe when handling math computations and a new super shuffle engine. This is a 128-bit wide, single-pass shuffle unit that improved Penrynís performance with SSE2, SSE3, and SSE4 instructions that have shuffle-like operations.
Penryn was also the first Intel processor to support SSE4.
The final ingredients Intel added to Penryn to improve performance were faster bus speeds and a larger L2 cache. Quad-core chips shipped with up to 12MB of L2 cache while dual-core parts featured 6MB of L2.
As a result of all these improvements, Penryn generally performed around 10-15% faster than Conroe/Kentsfield clock-for-clock. In apps that took advantage of SSE4, this advantage was even greater. In comparison, AMDís fastest Phenom CPU, the Phenom 9950, is just now approaching the performance of Intelís older quad-core Kentsfield CPUs like the Core 2 Quad Q6600 and Q6700.
And now, just as AMDís approaching the eve of the arrival of their first 45-nm CPUs, Intelís back again with the ďtockĒ of their tick-tock model that follows every process shrink (in this case Penryn) with a next-generation microarchitecture (Nehalem) each year.
As you probably know by now, Intelís next-generation microarchitecture (previously codenamed Nehalem) was officially given a brand name by Intel in August of this year: Core i7. Over the course of the past 18 months, Intel has slowly divulged most of the tech goodies that make up Core i7 including its integrated memory controller, Intelís Quick Path Interconnect (Intelís equivalent of AMD HyperTransport that previously went under the codename CSI), its new L3 cache, the return of Hyper-Threading, and Nehalemís Turbo Mode, but weíre going to briefly go over these changes before we take a look at the new Core i7 platform and the processors behind it.
This modular design helps to reduce power consumption. Features like the memory controller and QPI all run at voltages independent of each other.
Intel has incorporated a number of improvements into Nehalem that are designed to improve IPC. For instance, the number of micro-ops (microinstructions) in flight has increased from 96 in Conroe/Penryn to 128 in Nehalem. Intel also increased the size of the load and store buffers to ensure that they wouldnít become a limiting factor.
Intel also improved Nehalemís branch prediction. A new second-level branch target buffer has been added to improve branch prediction in applications that have large footprints such as databases. This second predictor has a much larger history table which should allow it to predict branches more accurately than the first level predictor. Intel has also added a new renamed return stack buffer (RSB). RSBs store forward and return pointers associated with call and return instructions. The RSB should help Nehalem avoid return instruction mispredictions.
With its faster synchronization primitives, Nehalem has also been tweaked to handle threaded software better.
Speaking of threading, with Nehalem we see the resurgence of simultaneous multi-threading (Hyper-Threading). With Hyper-Threading, one processing core can run two threads at the same time. With four processing cores inside Core i7, the OS ďseesĒ eight cores and sends eight instructions to the CPU, effectively doubling the number of overall threads that Nehalem can run simultaneously over a conventional quad-core CPU.
Whereas Hyper-Threading (HT) never really took off on the Pentium 4, Intel feels that Nehalem has a distinctive HT advantage thanks to its larger cache and greater memory bandwidth, all of which should allow it to deliver better HT performance. Additionally, there are also more apps capable of taking advantage of HT than there were a few years ago. As youíll see in our Lost Planet, Cinebench, and Valve benchmarks, Nehalem delivers a significant performance increase in HT-aware apps.
New cache subsystem
While Nehalem has the same 32KB instruction/32KB data L1 cache configuration as previous Core 2 CPUs, Intel has totally revamped the L2 cache and added a new L3 cache.
Nehalem is Intelís first CPU to offer SSE4.2 support. 7 new application targeted accelerators have been added to the new instruction set providing improved performance in string and text processing operations. One example Intel provides is the parsing of XML files at a much higher speed. The other two instructions are focused on accelerated searching and pattern recognition of large data sets (useful for voice/handwriting recognition) and the seventh is a CRC instruction focused on new communications capabilities such as accelerated network attached storage.
Intel QuickPath Interconnect
Rather than relying on the FSB for yet another processor, Intel has developed their QuickPath Interconnect (QPI) to link the CPU to the outside world.
Integrated memory controller
Nehalem sports an integrated triple-channel memory controller that supports DDR3 memory exclusively. Memory clocks are limited to just two speeds: 800MHz DDR3 and 1066MHz DDR3. Nehalem can run with faster DDR3-1333 and DDR3-1600 memory, but in this case the modules would be underclocked to run at 1066MHz (unless of course you decide to OC).
One lesson Intelís learned over the years is just how slow the software industry is to adapt to the multi-core CPU world we live in today. Games for instance are just now being written with dual-core in mind, there are only a handful of titles that truly take advantage of quad-core. As Intel goes from two, to four, and eventually eight processing cores in the future, thereís potential that many of these additional cores will sit idling completely untapped by the software. With this in mind Intel has developed a new power control unit (PCU) right onto the CPU die. The PCU is solely responsible for power management, actively monitoring the cores for aspects such as utilization and temperature. The PCU can then completely shut off cores that arenít being used, helping to reduce overall CPU power consumption. This brings us to Turbo Mode.
Intel is offering three Core i7 SKUs at launch: the flagship Core i7 965 Extreme Edition clocked at 3.2GHz, the midrange Core i7 940 running at 2.93GHz, and the entry-level Core i7 920 which runs at 2.66GHz:
Nehalem is built on Intelís 45-nm manufacturing process high-K metal gate transistor technology with a die size of 233 square millimeters and approximately 731 million transistors. In comparison Penrynís transistor count was 820M transistors and a 214mm2 die.
As some sites have mentioned ahead of the Nehalem launch, officially the CPU supports DDR3 memory rated up to 1.6V. According to Intel, memory running at voltages higher than 1.6V can potentially damage the CPU. Most memory manufacturers have announced their own triple-channel Nehalem-ready memory kits ahead of todayís launch, we recommend anyone interested in building their own Nehalem system go with one of these kits. Intel will be providing a list of certified memory modules on their developer website as well that youíll want to check out before purchasing anything.
Intelís X58 chipset is the only platform that supports Core i7 at this time. X58 is Intelís flagship chipset, with support for up to 36 PCIe lanes and supports PCIe 2.0. PCI Express Graphics solutions supported include 1x16, 2x16, and 4x8, with the chipset supporting ATI CrossFire and NVIDIA SLI (although as weíve reported in the past motherboard manufacturers must submit their X58 boards to NVIDIA for proper SLI certification).
The motherboard offers base clock speeds up to 240MHz (Nehalemís stock base speed is 133MHz with the i7 920 relying on a multiplier of 20.0x (20.0x133=2660), the 940ís multiplier is 22.0x (22x133=2926) and the 965 has a multiplier of 24) in 1MHz increments. Memory multipliers of 6.0 and 8.0 are also selectable in BIOS (6.0x133=800MHz DDR3, 8.0x133=1066MHz DDR3), as well as a 10.0x (1333MHz DDR3) and 12.0x (1600MHz DDR3). The latter two multipliers were only selectable for our Extreme Edition CPU however.
In terms of voltages, the board provides CPU voltage settings up to 1.6V in 0.0125V increments, chipset voltages up to 1.50V (0.025V increments) and voltages for the QuickPath Interconnect up to 1.8V (0.025V increments). Memory voltage settings up to 2.5V are available in increments of 0.04V. The QPI data rate is also adjustable.
We were pleasantly surprised with how far we were able to push our Core i7 processors. The Core i7-920 managed to hit speeds of 3.6GHz (20.0 multiplier x 180MHz host bus) and 1.4875V of juice, with the chip pushing 3.9GHz thanks to Turbo Mode. At stock voltage the chip maxed out at 3.1GHz (20x155MHz bus).
The Core i7-965 EE topped out even further, hitting speeds of 4.08GHz (30.0 x 136) with 100% stability. Once again we needed 1.4875V to get everything running stable, although in this chipís case weíre pretty confident we hit the ceiling of its capabilities. At any higher speeds Windows failed to load.
To cool the processor, we used a Thermalright Ultra-120 eXtreme RT for all our OC attempts.
Intel Core 2 Extreme Edition QX9770
Intel Core 2 Quad Q9650
Intel Core 2 Quad Q6700
Intel Core 2 Duo E8600
Intel Core 2 Duo E6400
ASUS P5E3 Premium
4GB (4x1GB) OCZ DDR3 PC3-16000 Platinum
Intel Core i7-965 Extreme Edition
Intel Core i7-920
3GB (3x1GB) Qimonda 1067 CL7 non-ECC
AMD Athlon X2 6000+
AMD Phenom 9950
ASUS M3A32-MVP Deluxe
4GB (4x1GB) OCZ DDR2 PC2-8500 Platinum
80GB Intel X25-M Solid State HDD
Windows Vista Ultimate 64-bit w/Service Pack 1
SiSoft Sandra 2009
Valve Particle Simulation Benchmark
World In Conflict Ė Direct3D
Company of Heroes Ė Direct3D
Crysis Ė Direct3D
Lost Planet Ė Direct3D
As glowingly as we all raved on Conroe and its Penryn successor however, things werenít as rosy for Intel in the server space. While Phenom has been a lackluster performer on the desktop, its server equivalent, Barcelona is highly popular among the IT crowd, particularly as you ramp up the number of CPUs. In this realm AMD is much more competitive with Intel. Nehalem is designed from the ground up to counter this very real threat.
Nehalemís QuickPath interconnect is Intelís answer to HyperTransport, while the chip also sports an integrated memory controller and L3 cache just like AMD. The second level TLB and branch predictor should improve Nehalemís performance when dealing with large data sets and the chip also features improved virtualization; all these goodies inside Nehalem should improve Intelís standing in the server segment.
But what about us gamers?
Fortunately some of these enhancements also benefit gaming. The integrated memory controller and QPI reduce latency and improve peak bandwidth, while the triple-channel memory improves overall memory bandwidth. Hyper-Threading is another new feature that could reap dividends if the app is multi-threaded. The only problem is most games are only dual-threaded, with only a handful of RTS and FPS titles using four or more threads. In this article we tested most of them: World In Conflict, Far Cry 2, Crysis, and Lost Planet. In the case of Far Cry 2, the Core i7 965 Extreme Edition ran 7% faster than Intelís fastest quad-core Penryn, the QX9770 (this is the same margin as the multi-threaded RTS WiC), while Lost Planet ran up to 32% faster on the i7-965. Finally, the Core i7-965 Extreme Edition ran 8% faster than the Core 2 Extreme QX9770 in Crysis.
Other than Lost Planet, this probably isnít the earth shattering performance improvement some gamers may have been hoping for.
At the same time however, the Core 2 Extreme QX9770 is one blazing-fast chip. Our benchmarks were run with DDR3-1600MHz memory and obviously a 1600MHz FSB. When you compare Core i7ís performance against more conventional Penryn CPUs and the Core 2 Quad Q6700, the Nehalem CPUs really begin to shine.
Whatís really remarkable is the performance showing of Intelís $284 Core i7-920. Despite its pedestrian 2.66GHz clock speed, this chip was able to give the QX9770 a run for its money in most of our gaming benchmarks. This is without a doubt the chip weíd wholeheartedly recommend to our readers interested in upgrading to the Core i7 platform. With a little bit of OCíing, this sub-$300 chip becomes even more of a screamer.
The biggest downside to Core i7 is probably the cost. Keep in mind weíre not referring to the price of the CPUs themselves, in fact we feel Intel has priced the CPUs very aggressively considering the performance youíre getting. The Core i7-920 is only a little slower than QX9770 yet it costs significantly less, while the Core i7-940 is also priced to move at $562. You can even make an argument that the Core i7-965 is a steal at $999. It is after all the worldís fastest processor and it's priced $400 less than the QX9770.
The real problem Core i7 faces is the cost of its underlying platform. X58 motherboards are expected to sell for $300+ when they go on sale later this month, while triple-channel memory kits currently start at $125. Thatís over $400 that youíll have to spend to upgrade to Core i7 before you even pick up the processor (assuming you donít already have DDR3 memory).
Fortunately Core i7ís enhancements can really reap dividends with the right software, and for some users a Core i7 upgrade would be a worthwhile investment. While Lost Planet was the only game that showed a substantial performance improvement thanks to Hyper-Threading, our 3D rendering apps are all multi-threaded and here Core i7 blew away the QX9770. Over time these apps will continue to become more prevalent, eventually becoming the norm rather than the exception. If youíre the type of user who only upgrades his processor once every few years, you should definitely keep this in mind.
So there you have it, our take on Core i7. Unlike Conroe, Intelís latest microarchitecture delivers an evolutionary rather than revolutionary performance increase over its predecessor, although in some apps it has the potential to deliver performance thatís truly groundbreaking. Core i7 is without a doubt the finest processor Intelís ever produced and we donít see AMD delivering anything thatís performance competitive with this CPU in the near future.
The only downside is we wish Intel offered a lower cost alternative to X58 at launch. As it stands now, the Core i7 CPU weíre recommending most, the Core i7-920, will probably end up selling for about the same price as the X58 motherboard underneath it. The cost of upgrading to the Core i7 platform is probably going to keep a lot of enthusiasts on a budget from upgrading today, and thatís a shame in our opinion, as itís certainly a fun platform to play with. Turbo Mode in particular is a really exciting feature.
In any case, Intelís done it again boys and girls. Core i7 is indeed a pretty sweet CPU. If Intel continues to execute on their roadmap like this, AMD could have a hard time playing catch up at the high-end of the CPU market. Intelís clearly the king when it comes to CPU performance.
|© Copyright 2003 FS Media, Inc.|