
Round 3 Editors Challenge Sponsored... 

Top 10 Challenge Round Sponsored by... 

Editors Challenge Sponsored by Inte... 

FiringSquad Editors Challenge Round... 

xts (26) 

|

|
|
|
4 entry(ies) in this category
|
 The CPU and GPU engaged for marriage (9 comments ) by: Power666 (25) | Posted in cluster FiringSquad Editors Challenge Round 1 Prelim 2 Posted 16 months ago in category DEFAULT The CPU and GPU engaged for marriage
On October 25, 2001, AMD announced that they completed their acquisition of graphics chip developer ATI. That same day, they also announced their plan to bring a CPU and a GPU together to become one, called Fusion. (1) So while AMD made a big news splash over the announcement, the direction hardware developers were moving in guaranteed this marriage at some point in time.
----
It is inevitable, Mr. Anderson
----
Computer history has a trend of individual components being put together on one chip to reduce costs. For example, the south bridge chip inside a typical PC today handles basic IO functions like USB, S-ATA, IDE and PCI. All the functionality in the south bridge chip today was once provided by independent specialized chips years ago. Integrating these controllers into the south bridge reduce costs, reduces power consumption and simplifies the board layout. At the same time, performance can be improved inside the south bridge by giving each of these controllers their own dedicated bandwidth internally whereas externally they would be shared.
While there are benefits to merging chips together, some thought has to be made as to where in a system's design this happens. Low end systems have integrated video and as the term suggests, it is part of another chip in the system. In particular, integrated video solutions are part of the north bridge chipset component often responsible for memory controllers, PCI-E controllers and onboard networking nowadays. High end graphics cards sit on the PCI-E bus connected to the systems north bridge.
So why are companies like AMD and Intel looking to join a GPU and a CPU directly and bypassing the north bridge completely? The answer to this is that the major function of the north bridge, the memory controller, is being integrated into the CPU as well. AMD users have had the benefit of this for several years and Intel is expected to integrate a memory controller into their PC processors in the future. This also has the performance enhancing effect of lowering latency and increasing bandwidth between the CPU and GPU.
----
AMD Fusion
----
Since AMD announced the Fusion project they have been rather silent on details. However, some educated guesses can be made of how AMD will deploy and implement the technology.
AMD has been doing a rather excellent job with their CPU's gaining market share on desktop PC's. While AMD has been gaining market share in the laptop area, their portion of the laptop market trails far behind that of their desktop sales. (2) The importance of this is highlighted by laptop sales surpassing desktops for the past several years. Fusions is the type of break through technology AMD needs to rapidly expand in the laptop market.
The important benefits of merging a CPU and GPU for the laptop market are space, power, cost and performance. Space in a laptop is a scarce resource and a hybrid CPU/GPU puts two of the largest components into a single point on the motherboard. An increase in power consumption of a Fusion processor over a regular mobile Turion 64 or mobile Radeon is to be expected but Fusion should run cooler than the Turion 64 and Radeon combined. The cost of a Fusion chip is likely to be a tad higher than a plain Turion 64 but less than the total price of a Turion 64 and a Radeon GPU. Laptop manufacturers will get a cost benefit from an easier to implement system design. Additionally, laptop manufacturers will be able to reduce the size of a laptop relative to products offering similar CPU and graphics performance.
The changes to the Athlon 64 and the Radeon GPU for the first generation of Fusion will deal more with linking the two components into one monolith than changing how each function individually. Many of the functional units inside of Athlon 64's can be found in a modern GPU. The obvious step here is to start sharing duplicated resources like sharing the on-die memory controller and the HyperTransport I/O bus. One thing laptop manufacturers may be asking is 'how do I plug the darn thing in?' and 'how do I hook up a display to it?' which AMD hasn't answered. AMD could easily move to a new socket for Fusion processors with direct path for display outputs. Socket compatibility in the laptop market is not as critical as it for desktops. AMD could also mandate display connectivity implemented into the chipset. With AMD now a serious player in the chipset business after purchasing ATI, they could provide such a chipset themselves without depending on third parties. Removing the logic for DVI and VGA outputs from the hybrid CPU/GPU will help keep the size of the chip under control. Regardless, the Fusion processor is going to be rather large chip to manufacture.
The second generation of Fusion processors will likely move towards greater performance and make an arrival on desktop systems. AMD has the socket AM3 on the horizon. The big change for socket AM3 is DDR3 support for Athlon processors, but AMD can easily change the spec now to add direct video output. Still the first socket AM3 processors are not likely to be Fusion based, but making such a change at this moment would not significantly disrupt product time tables.
One critical flaw in the Fusion design that enthusiasts can easily spot is the overall memory bandwidth. Memory bandwidth is the limiting factor in performance for graphics cards and Fusion could easily be lacking in this area. Dual channel DDR2-800 provides only 12.8 GB/sec of bandwidth which is the same as the Radeon X1300 Pro. While Radeon X1300Pro performance is going to be adequate for mundane business tasks and handling Vista's Aero Glass interface, it isn't going to be enough for gaming. ATI has experience with dealing with limited main memory bandwidth for gaming and making a graphics power house. The XBox 360's graphics chip designed by ATI only has 22.4 GB/sec of main memory bandwidth but it is supplemented by the eDRAM's 256 GB/sec of bandwidth for common functions. Putting eDRAM alongside a Fusion processor would make it a rather formidable gaming solution. Additionally, the eDRAM may be used as a L3 cache for the CPU when doing tasks that are not graphics intensive. Performance estimates of Fusion are futile at this time until more detailed information is released. So while AMD has the technology, the company has been silent on the necessary details to make any sort of performance estimate.
The other enthusiast concern is Crossfire. How will multiple GPU's be linked together when the GPU's are found inside CPU's? The easiest way is to use what the CPU's provide for I/O traffic - HyperTransport. AMD currently links the two Athlon FX processors in their new 4x4 platform via HyperTransport. It will require multiple sockets like those found on 4x4 systems for Crossfire to work. With the first iterations of Fusion likely to be laptop centric, Crossfire support won't find its way into Fusion until its desktop release later on.
----
Would you like shaders or CPU's?
----
Beyond what AMD and Intel are expected to release initially for a hybrid CPU/GPU, there is an opportunity for further integration to the point of a hybrid execution core. Such a radical change to both CPU and GPU architecture takes several years to design and validate. While it may very well take years for such an implementation to arrive in consumers hands, it is worth exploring the gritty technical details for further improvements in performance, reduced power, and cost savings.
Modern CPU's incorporate vector capabilities like SSE instructions on x86 chips or Altvec on PowerPC chips. GPU's are built around fast vector units inside their programmable shaders for doing state of the art graphical effects. So why not use the same vector units for commodity CPU work and heavy GPU loads? In other words, why not make hardware behind CPU and the GPU's shaders the same thing? Both vector work and GPU's are quickly becoming equals in terms of programmability with CPU's for this type of work. There are several barriers that prevent this at the hardware level. The amount of vector work a GPU does easily dwarfs that of a CPU but can just as easily be solved by adding more vector units to a hybrid CPU/GPU core.
The bigger, more looming barrier is that the machine level instructions are radically different. Programs compiled for a CPU and those compiled for a GPU are in different languages at the hardware level. This is the same barrier that prevents x86 programs from running on PowerPC chips. Thankfully, CPU designers already have some experience in translating code.
The Pentium 4 is a prime example of how translating machine level instructions would work with this level of CPU/GPU hybrid. x86 instructions are decoded into a different, more efficient language for internal use only inside of the Pentium 4. The Pentium 4 line is a bit unique as the decoded instructions in its own native language get stored in the L1 instruction cache for later use. Other processors are required to decode an instruction every time it encounters in the program. The language of the decoded instructions is completely hidden away from the programmer and compilers. It is through this type of abstraction that a hybrid CPU/GPU's may be able to share a core design. Once instructions reach the L1 cache, the vector units doing the work won't be able to tell the difference between a calculation done for the CPU or the GPU. From a programming standpoint, the same differences between CPU code and GPU code will still exist as they do today. CPU programs will work without modification. GPU functionality will be exposed by DirectX and OpenGL calls.
A hybrid CPU/GPU chip using the same core for CPU and GPU functionality will inherently offer a large number of CPU cores and GPU shaders. High end designs right now offer upwards of 128 shaders in the Geforce 8800GTX. A 128 core CPU may seem like overkill today, but it is on the horizon for the server market. A hybrid CPU/GPU of this nature would be able to function as a high end server CPU or a high end GPU. Most importantly for gamers, the number of available CPU core vs. shaders may be dynamically changed. Got a game that is optimized for only two threads? Lower the number of available CPU cores and increase shader performance for a better gaming experience. This level of performance allocation may reach the point of complete transparency for the user with proper hardware, software and operating system support.
----
The empire to strike back?
----
If you were to ask a PC gamer who provides the technology behind their graphics card, you'll likely hear about nVidia or ATI. Yet neither of those companies are behind the majority of graphics found inside PC's today. While horrible for gaming, Intel's integrated graphics solutions dominate the market in terms of total sales. For the business world, integrated graphics are acceptable solutions for looking at spread sheets all day long.
Intel hasn't publicly made any announcement towards a hybrid CPU/GPU in press releases or public road maps. However, Intel has demonstrated some technology that could be used to move rapidly into that direction. The biggest indication that Intel is considering such a product was a hybrid CPU/GPU product shown at the August 2005's Intel Developer Forum. (3) A Pentium M 738, 855GM chipset with integrated video and a dedicated voltage regulator were displayed in one multi-chip module for a single socket. The demonstration was to display power and space savings so those aspects of integration are not lost inside Intel.
The other technology demonstration which may lead to a hybrid CPU/GPU at Intel is their massive 80 core research project. Intel publicly showed off a prototype 80 core monster at this year's ISSCC in February. (4) The performance results were rather impressive for the 80 CPU's. The chip was able to produce over a trillion floating point operations per second (teraFLOP), but that is only equivalent to two Radeon X1900XTX's. (5) By the time the research project results in projects consumers can put into their systems, dedicated GPU's will have long surpassed the teraFLOP mark. Regardless, Intel has clearly shown their engineering capabilities to produce something competitive in their labs. It is now up to them to release that technology into the wild in the future.
----
The best is yet to come
----
The merger of a CPU and a GPU is a major step towards the ultimate goal of a system-on-a-chip (SoC). Other computer components like hard drive controllers, networking, audio processing, USB controllers and even main memory will eventually find their way on to the same piece of silicon. The result will be a product that has the same capabilities and processing power as a PC of today in the cell phones or a set up box setting in your living room. Before that happens, the joining of a CPU and GPU will combine the two largest, most expensive components found in computers today into one. When the marriage is finalized, the honeymoon will usher in a new wave of compact, energy efficient, yet powerful devices.
References:
(1) AMD's "Fusion" processorto merge CPU and GPU
http://www.tgdaily.com/2006/10/25/amd_announces_fusion_processor/
(2) Intel slows Advanced Micro Devices gains in server chips
http://www.marketwatch.com/news/story/intel-slows-advanced-micro-devices/story.aspx?guid=%7BD471BF94%2D388F%2D4081%2DB1D1%2D6A37EF8DBA5C%7D
(3) IDF Fall 2005. Day 3 (page 4)
http://www.xbitlabs.com/articles/editorial/display/idf-f2005-4_4.html
(4) Intel shows off 80-core processor
http://news.com.com/Intel+shows+off+80-core+processor/2100-1006_3-6158181.html
(5) AMD launches new flagship graphics chip family X1900
http://www.tgdaily.com/2006/01/24/ati_launches_x1900/ |
| 
| 9 User Comment(s) • 5 root comment(s) |
GrapeApe (36) Mar 05, 2007 - 12:33 am
| I was planning on writing a similar article, which makes me a little more ciritical than most I'd think, but my rating would be higher than your average so maybe not.
There's some factual innacuracies, and there's little mention of the nV, VIA future in this segment. Both companies are facing tough realities in this segment, and expanding on nV's and VIA's options would've added more to the article IMO.
Also you needed to differentiate the limited system on a chip solutions, from intel and AMD's future modular designs which aren't about system on a chip so much as a chip that can do everything a bit, and then multiply those chips to do more complex tasks.
Also mentioning IBM developments with eDRAM would have helped with that segment in that the prices are becoming more realistic for larger sizes than that small 10MB currently found on the Xenos package.
Also interesting that TeraFlop potential was already pased with the QPleax by nV, and more recently by the R600.
Also some assumptions, like a Fusion chip running cooler overall, kinda miss the problem of such a chips, the small surface area and large transistor count means difficult dissipation issues for a laptop (easier to dissipate 2 30W sources in two separate segments with a low rpm fan, than trying to disipate 55W of heat in the place of just one of those fan locations.
Overall though I agree, I'd rather see more technical reviews, and maybe with the right resources your reasearch will get better, you seem to have the interest.» Login to reply to this Power666 (25) Mar 05, 2007 - 10:38 am
| I know I goofed big time with the date on the Fusion announcement. I'm curious if you found any other inaccuracies.
I couldn't find any solid information on what nVidia and Via are doing to counter AMD's Fusion project. While they are certainly going to be doing something, exactly what is pure conjecture and dives a bit too far into the realm of rumors than I'd like.
IBM's current SoC designs are limited to embedded markets which have different needs than the laptop/desktop market that Fusion is targeted for. Intel and AMD are moving towards a SoC design that is truly a jack-of-all-trades implementation. The goal for them is to create a single low power component for a single, general purpose user system like a laptop. I also see a divergence between consumer centric devices and server centric devices the more the companies inch towards complete SoC designs. Simply cutting features off a high end design to make a low end part isn't going to be practical as more server specific technologies get implemented on-die.
The big IBM break through in eDRAM announced a few weeks ago isn't a revolutionary improvement in transistor density. Rather they're replacing transistor heavy SRAM with the inherently dense eDRAM. The three fold increase in cache size with eDRAM is due to eDRAM using less than a third the transistors SRAM uses. My mention of eDRAM was more akin to how that technology assists limited memory bandwidth. I don't see the actual capacity of the eDRAM being much larger than the necessary frame buffer sized used in laptop displays. 1280 x 800 displays are currently common in low end notebooks today and they wouldn't need more than 10 MB eDRAM. 1920 x 1200 displays in laptops will likely be much more common when Fusion arrives and that'll only need 20 MB of eDRAM to hold the frame buffer and z-buffer. It will be the desktop versions of Fusion that will need massive amounts of eDRAM for gaming if the memory bus is lacking bandwidth. 32 MB barely holds the buffers for a 2560 x 1600 display and their will certainly be higher resolutions available when Fusion arrives on the desktop. Running multiple displays in 3D requires even more eDRAM. Varying the amount of eDRAM would be a simple way of distinguishing between the low end and high end versions of a chip just like L2 size does today.
I went with two Radeon X1950XTX's for the teraFLOP figure as they're products people can purchase today and something readers of this site are familiar with already. It would be kinda disappointing if the R600 chip didn't break ATI's old record but the public still has to wait a few months for it to arrive.
While a Fusion chip would dissipate more heat than just a pure mobile CPU, it will still be a laptop part and have to fall under the same thermal ceiling. Having a larger die also aids in heat transfer with surfaces emitting the same total amount of energy. This also allows for system builders to implement one larger, more efficient heat dissipation system than having to utilize two smaller ones for a CPU and a dedicated GPU.» Login to reply to this |

|






» Note: You need to be logged in to write a comment!Login here, or if you don't have an account with FiringSquad, register here, it's FREE! |

My Media-Blog categories
No categories created yet.
|