Eighth-Generation Micro Architecture
There are many similarities between the Athlon XP we know today and the Opteron that is being introduced. At the same time, the passing of time has necessitated certain modifications to ensure that the Opteron, and later this year, the Athlon 64, are able to remain competitive. The first design consideration is a technique Intel employed to enhance the scalability of the Pentium 4. Mainly, AMD has added two stages to its operation pipeline, resulting in a 12-stage integer and 17-stage floating-point pipeline.
As we learned with the Pentium 4, a longer pipeline is a boon when it comes to increasing operating frequency, but it penalizes the number of instructions a processor can successfully execute in a clock cycle (IPC). AMD is fully aware of these ramifications and, like Intel, has taken measures to compensate. In fact, AMD claims it will be able to enhance IPC beyond what we’re currently seeing with the Athlon XP family.
Opteron execution units
Opteron block diagram
Integrated DDR Memory Controller
Another step AMD has taken to further improve operating frequency is retaining its .13-micron manufacturing process and adding Silicon on Insulator technology, allowing AMD to reduce transistor capacitance by roughly 25 percent, all the while adding a significant number of new transistors.
On the flip side, AMD is looking to increase IPC by moving the platform’s memory controller away from its traditional residence, the North Bridge or Memory Controller Hub, depending on whose architecture you follow, and onto the processor die itself. It’s no secret that memory bandwidth has become a pivotal statistic in referring to the capabilities of a graphics card, and it has also become increasingly important as processors have matured.
The Opteron’s memory controller is of the dual-channel DDR variety, resulting in a 128-bit interface with support for DDR200, DDR266, and DDR333 memory. In a single-processor system, DDR333 is able to provide up to 5.3GB per second of bandwidth. However, in a dual-processor machine like the ones we had the opportunity to test, the platform’s effective bandwidth it doubled to 10.6GB. The unfortunate consequence is that in a dual-channel system, four memory slots need to be populated in order to realize its full bandwidth potential. According to AMD, as the processor’s operating frequency scale upward, the latencies incurred by memory accesses continue to drop as a result of the on-die controller.