Intel Core i7 (Nehalem) Performance Preview
It’s been a little over two years since Intel introduced the world to their first Core 2 processors utilizing their next-generation Conroe microarchitecture. Based somewhat off their Pentium M “Yonah” CPU core, Conroe restored Intel’s leadership position in CPUs. The chip boasted a wider execution core, allowing the processor to complete up to four full instructions simultaneously, along with a more efficient 14-stage pipeline improving IPC (instructions per clock) in comparison to Pentium 4/D.
If you recall, this was one of the chief weaknesses in Core 2’s predecessor, Pentium 4/D. Pentium 4 processors sacrificed the amount of work performed per clock in exchange for more pipeline stages, 31 in the case of latter Pentium D processors. Essentially Intel made a conscious decision to sacrifice IPC in exchange for higher clock speeds. Ultimately this decision came back to haunt them when Pentium 4/D had trouble scaling to higher clock speeds of 4GHz and beyond.
Core 2 never hit the clock speeds of Pentium 4, but because of its improved IPC, it didn’t have too in order to achieve breakthrough performance.
But Intel didn’t stop there. To further enhance performance, Core 2 also featured more accurate branch prediction, improved SSE/SSE2/3 performance, and a unified L2 cache with more advanced prefetchers residing in the L1 and L2 caches to reduce memory access.
Ultimately Core 2 was over two times faster than Intel’s previous Pentium processor, and it also significantly outperformed AMD’s fastest Athlon X2 and FX processors, all while generating very little power and with tons of frequency headroom for overclockers. It wasn’t uncommon for Core 2 Duo E6300 and E6400 chips to push 3GHz.
Late last year Intel gave Core 2 a midlife upgrade with their Penryn architecture. Besides its smaller 45-nm manufacturing process, Penryn also featured double the divider speed over Conroe when handling math computations and a new super shuffle engine. This is a 128-bit wide, single-pass shuffle unit that improved Penryn’s performance with SSE2, SSE3, and SSE4 instructions that have shuffle-like operations.
Penryn was also the first Intel processor to support SSE4.
The final ingredients Intel added to Penryn to improve performance were faster bus speeds and a larger L2 cache. Quad-core chips shipped with up to 12MB of L2 cache while dual-core parts featured 6MB of L2.
As a result of all these improvements, Penryn generally performed around 10-15% faster than Conroe/Kentsfield clock-for-clock. In apps that took advantage of SSE4, this advantage was even greater. In comparison, AMD’s fastest Phenom CPU, the Phenom 9950, is just now approaching the performance of Intel’s older quad-core Kentsfield CPUs like the Core 2 Quad Q6600 and Q6700.
And now, just as AMD’s approaching the eve of the arrival of their first 45-nm CPUs, Intel’s back again with the “tock” of their
tick-tock model that follows every process shrink (in this case Penryn) with a next-generation microarchitecture (Nehalem) each year.
As you probably know by now, Intel’s next-generation microarchitecture (previously codenamed Nehalem) was officially given a brand name by Intel in August of this year: Core i7. Over the course of the past 18 months, Intel has slowly divulged most of the tech goodies that make up Core i7 including its integrated memory controller, Intel’s Quick Path Interconnect (Intel’s equivalent of AMD HyperTransport that previously went under the codename CSI), its new L3 cache, the return of Hyper-Threading, and Nehalem’s Turbo Mode, but we’re going to briefly go over these changes before we take a look at the new Core i7 platform and the processors behind it.