Intel Core 2 Extreme QX9650 Penryn Performance Preview
While NVIDIA’s GeForce 8800 GPU definitely comes close, you can make an argument that Intel’s Core 2 CPU was the most significant new hardware release in 2006.
Core 2 was largely designed around Intel’s mobile Pentium M “Yonah” CPU core, with several new performance enhancements. The chip featured a wider execution core, allowing the processor to complete up to four full instructions simultaneously (previous Pentium D CPUs were limited to just three simultaneous instructions), and Core 2 also featured a 14-stage pipeline, allowing the CPU to perform more work per clock cycle.
If you recall, this was one of the chief weaknesses in Core 2’s predecessor, Pentium 4/D. Previous Pentium processors sacrificed the amount of work performed per clock for more pipeline stages, 31 in the case of latter Pentium D processors. As we all know by now, this design decision ultimately came back to haunt Intel when Pentium had trouble scaling to higher clock speeds…
For increased efficiency, Core 2 utilized a single, unified L2 cache, while more advanced prefetchers in the L1 and L2 caches were added along with new cache prefetch algorithms to help hide memory latency and thus improve the effectiveness of the L2 cache. To further spice up the package, Core 2 also boasted improved performance when dealing with SSE, SSE2, and SSE3 instructions.
As a result of all these changes, Core 2 was not only considerably faster than Intel’s previous Pentium processor, it also significantly outperformed AMD’s fastest Athlon 64 X2 and FX processors, all while generating very little power. It truly was a breakthrough product that shook up the entire PC industry.
And now it’s time for Intel’s engineers to give Core 2 its midlife upgrade – just in time for the company to put a damper on AMD’s upcoming quad-core Phenom launch…
Introducing Penryn: the next-generation of Core 2 Processors
As you probably know by now, Penryn comprises Intel’s family of processors based on their new 45-nm manufacturing process. The smaller process allows Intel to cram more transistors into the processor’s die without significantly increasing its size. According to Intel, the new 45-nm high-k process gives them twice the transistor budget, this allows them to add performance enhancing features such as larger L2 caches while still delivering a cost effective die size. For example, a dual-core Penryn chip boasts a die size of 107mm² with 410 million transistors; in comparison today’s Core 2 chips cram 291 million transistors into a 143mm² die. Of course the other appeal of the smaller process
to enthusiasts who overclock is lower power: Intel notes a 30% reduction in transistor switching power between 65-nm and 45-nm. With lower power requirements also comes less heat generated by the CPU, resulting in a cooler-running PC.
Penryn is more than just a die shrink though. Intel has incorporated a number of architectural enhancements into Penryn that are designed to deliver clock-for-clock performance enhancements over today’s Core 2 CPUs at a given clock speed.
Fast Radix-16 divider: One key new technology Intel has incorporated into Penryn is their Fast Radix-16 divider. Intel’s Radix-16 divider is a new divider technique providing double the divider speed over previous processors when handling math computations (both floating-point and integer operations): 4-bits processed per cycle in Penryn versus 2-bits per cycle in today’s processors.
SSE4: Penryn will also support Intel’s new SSE4 instruction set. The majority of the new instructions are focused on compiler optimizations, but Intel has also added a number of “application targeted accelerators” which are hard-coded onto the processor’s die to improve performance in gaming, video encoding, 3D rendering, and photo imaging apps (provided that the software has been coded to use the new instructions of course).
Super Shuffle Engine: Penryn incorporates a 128-bit wide, single-pass shuffle unit. This allows it to perform full-width shuffles in a single cycle. The new shuffle unit will also improve Penryn’s performance with SSE2, SSE3, and SSE4 instructions that have shuffle-like operations.
Improved Virtualization: Penryn also features Intel’s enhanced virtualization technology. Intel claims virtual machine transition times have been improved from 25-75% with Penryn.
Larger L2 cache: Penryn processors will feature a considerably larger, more associative L2 cache. Dual-core Penryn CPUs will ship with up to 6MB of L2 cache while quad-core processors will contain up to 12MB of L2 cache. In comparison, today’s dual-core Core 2 CPUs ship with 4MB of L2 cache, while quad-core chips contain 8MB.
These larger caches help improve performance by increasing the probability that each execution core can access data from the processor’s L2 cache rather than having to get it from slower system memory.
Faster Clock Speeds/FSB: Intel’s already bumped the front-side bus (FSB) speed up to 1333MHz; Penryn will crank this up another notch, ultimately scaling all the way up to 1.6GHz. Penryn CPUs will also boast higher clock speeds. Speeds of 3.0GHz and up are expected.
In the coming months, Intel will be introducing several Penryn derivatives for the mobile, server, and desktop segments of the PC market. Today we’re going to be focusing on Intel’s latest enthusiast CPU for the desktop segment: the Core 2 Extreme QX9650.