FiringSquad: Home of the Hardcore Gamer - Games, Hardware, Reviews and NewsSubmit your own or view users' CPU overclocking results!

  
 Home   News   THE MATRIX   Deals   Hardware   Games   Features   Media   Products   Forums   FS China 
AddThis Social Bookmark Button

Home : Hardware : CPUs : Intel Core i7 (Nehalem) Performance Preview
» Join the Greatest Gaming Community NOW! (It's free)

Already a member? Login
 


Random Gallery >> 
Click to view high-res Image!
Silent Hunter: Wolves of the Pacific Review Screenshots [74] (4)

Round 2 Rules! (20) by fs-lyle
Afghanistan and Iraq (0) by anastamoses@gmail.com
Banana (4) by p4l1ndr0m3
BioShock Review(Preliminary #2) (3) by Hyper
»» best haiku ever (0) by darkportal_4
Guitar Hero 3 - The thing that should not be (UPDATED) (5) by Beefysworld
Rodent Device (2) by PS2Fish
it could have been better T_T (0) by exe3
FTW! (0) by Gh3tTo5oLdIeR
What is so cool about Gigabyte’s Ultra-Durable 3 technology (0) by SuperCharge

More Blogs >>




Intel Core i7 (Nehalem) Performance Preview
November 02, 2008   Brandon Sandman Bell > [View My Other Articles]
Product Info | User Reviews | Article Images(16) | Image Gallery | Comments | Forum Thread
Nehalem Architecture


Intel Core i7 (Nehalem) Performance Preview [  @ 1098 x 820 ] > View Full-Size in another window.


Intel Core i7 (Nehalem) Performance Preview [  @ 1600 x 1111 ] > View Full-Size in another window.



Fundamentally Nehalem is designed to be scaleable. In Core i7 form, the chip has four processing cores, a triple-channel memory controller, bi-directional Quick Path Interconnect delivering up to 25.6GB/sec of bandwidth (12.8GB/sec in each direction), and 8MB of L3 cache. Server variants of Nehalem could have more cores, larger L3 cache, and more QPI links (desktop chips feature one link), while mobile variants could have fewer cores with less cache and a dual-channel (rather than triple-channel) memory controller. Intel has indicated that they will even add graphics to the equation at some point, taking yet another feature off the system chipset and onto the CPU itself.

This modular design helps to reduce power consumption. Features like the memory controller and QPI all run at voltages independent of each other.

Intel has incorporated a number of improvements into Nehalem that are designed to improve IPC. For instance, the number of micro-ops (microinstructions) in flight has increased from 96 in Conroe/Penryn to 128 in Nehalem. Intel also increased the size of the load and store buffers to ensure that they wouldn’t become a limiting factor.

Intel also improved Nehalem’s branch prediction. A new second-level branch target buffer has been added to improve branch prediction in applications that have large footprints such as databases. This second predictor has a much larger history table which should allow it to predict branches more accurately than the first level predictor. Intel has also added a new renamed return stack buffer (RSB). RSBs store forward and return pointers associated with call and return instructions. The RSB should help Nehalem avoid return instruction mispredictions.

With its faster synchronization primitives, Nehalem has also been tweaked to handle threaded software better.

Speaking of threading, with Nehalem we see the resurgence of simultaneous multi-threading (Hyper-Threading). With Hyper-Threading, one processing core can run two threads at the same time. With four processing cores inside Core i7, the OS “sees” eight cores and sends eight instructions to the CPU, effectively doubling the number of overall threads that Nehalem can run simultaneously over a conventional quad-core CPU.

Whereas Hyper-Threading (HT) never really took off on the Pentium 4, Intel feels that Nehalem has a distinctive HT advantage thanks to its larger cache and greater memory bandwidth, all of which should allow it to deliver better HT performance. Additionally, there are also more apps capable of taking advantage of HT than there were a few years ago. As you’ll see in our Lost Planet, Cinebench, and Valve benchmarks, Nehalem delivers a significant performance increase in HT-aware apps.

New cache subsystem

While Nehalem has the same 32KB instruction/32KB data L1 cache configuration as previous Core 2 CPUs, Intel has totally revamped the L2 cache and added a new L3 cache.

Nehalem’s L2 cache is much smaller than Penryn. Each core has its own 256KB L2 cache for handling data and instruction. While this is significantly less than previous processors, Nehalem’s L2 is lower latency than its predecessors.

In addition to the L1 and L2 caches, like AMD’s Phenom Nehalem also features an L3 cache that is shared across all the cores. Unlike Phenom however, Nehalem’s L3 is inclusive and not exclusive like AMD’s. Intel feels that this inclusive architecture gives them an advantage over AMD, as an exclusive architecture doesn’t store data from the lower level L1 and L2 caches. As a result, if a data request misses on the L3 cache, each processor core must be snooped (searched) in case its L1 or L2 cache has the requested data. This increases latency and snoop traffic between the cores.

With Nehalem these snoops are unnecessary, as the CPU already knows that the data doesn’t reside in L1 or L2, this helps to reduce latency and thus improve performance as well as reducing power consumption.

Like its two-level branch prediction, Nehalem features a two-level 512 entry translation lookaside buffer (TLB). Nehalem is the first CPU to feature a second TLB. This is another improvement Intel has incorporated into Nehalem to improve its performance with server apps like large databases.

Intel Core i7 (Nehalem) Performance Preview [  @ 1029 x 1200 ] > View Full-Size in another window.


Intel Core i7 (Nehalem) Performance Preview [  @ 1029 x 1200 ] > View Full-Size in another window.



SSE4

Nehalem is Intel’s first CPU to offer SSE4.2 support. 7 new application targeted accelerators have been added to the new instruction set providing improved performance in string and text processing operations. One example Intel provides is the parsing of XML files at a much higher speed. The other two instructions are focused on accelerated searching and pattern recognition of large data sets (useful for voice/handwriting recognition) and the seventh is a CRC instruction focused on new communications capabilities such as accelerated network attached storage.


Back! Page 1     Intel’s answer to HyperTransport and new Turbo Mode Next!
Blog + Share: Digg Del.icio.us Reddit SU furl • More: AddThis Social Bookmark Button
Send This Article to a Friend!  
Table of Contents
  Print Entire Article  

MATRIX CONTENT » RANDOM MEDIA BLOG More Blogs >>
No ratings yet
» Please rate this
I am an AMD AgentRead this Media-Blog entry!» The Bland Addiction: World of Warcraft (17)
by Discobiscuits (62) Talk with this user on their Shout Box (My other blogs) Posted 24 months ago


 Hottest Topics
New Modern Warfare 2 PC petition created (33)
Call of Duty: Modern Warfare 2 PC Review (33)
ATI Radeon 5970 Performance Preview (12)
Activision hopes to monetize some aspects of CoD multi (11)
BioShock 2 special edition includes vinyl LP (11)
Today's News >>
Today's Siteseeing >>


 Table of Contents


FiringSquad is powered by... Back to Top Site MapContact UsAdvertise With Us Privacy StatementAbout Us  
News RSSSiteseeing RSSArticle RSS   © 1998-2009 FS Media, Inc. All Rights Reserved