Summary: In many ways, Xbox 360's graphics processor goes well beyond anything we currently have available on the PC. In today's article, we go over all the details of ATI's architecture with ATI VP of Engineering, Bob Feldstein. Bob discusses everything ranging from the 48 pipes, to ATI's "AA" for free comment. Catch all the details inside!
In order to glean more about the graphics inside Xbox 360 and its architecture, we recently had the chance to speak with Bob Feldstein, ATIís VP of Engineering on Xbox 360:
ATI: We have 48 shaders. And each shader, every cycle can do 4 floating-point operations, so that gives you 196. Thereís a 192 number in there too, so Iím just going to digress a little bit. The 192 is actually in our intelligent memory, every cycle we have 192 processors in our embedded intelligent memory that do things like z, alpha, stencil. So there are two different numbers and theyíre kind of close to each other, which leads to some confusion.
So we have a traditional shader, but itís not traditional at all though because itís a unified shader. So you have the shader instruction set. [pauses] In the past you had a vertex shader and a pixel shader, and the instruction set was different and you couldnít, you know, one couldnít operate on the otherís data. Now we have one set of resources, these 48 shaders, and they naturally dynamically balance between whatever the problem at hand is.
So if itís dominated by vertices you get more resources for vertices, but if itís dominated by pixels you get more resources towards pixels, or any other kind of problem. Itís a general purpose, well not a general purpose processor, but it is a processor with a good general instruction set and it can operate on a variety of different kinds of data. So unified shader means we have one set of shader hardware and it can operate on any problem.
So you have 64 threads, and itís all controlled by hardware so itís not like the programmer knows one way or another about threading at all, and the threads here are things like vertex buffers or pixel programs and the hardware just keeps the same [inaudible] in a thread buffer and we can just switch back and forth between the different threads. That way if weíre waiting for data from a vertex program or vertex array we can go ahead and work on a pixel program or we can work on a second vertex or whatever, a different instruction.
ATI: So we have 48 shaders, each one of them does 4 floating-point ops per cycle, so 196 floating ops per clock.
The part of the pipeline thatís in the daughter die, that lives with the memory is the z, the alpha, the stencil, the processors to do the resolve from multisample AA, the one that gives you AA for free, well 4x or 2x multisampling, and really what that does is it breaks one of the traditional bandwidth problems inside traditional graphics architectures. It keeps all that data so you donít have to do a read or a write..Well you donít have to read it into the general purpose, across the bus, process it and then write it back out to the memory. It all happens within the memory, and this just gives you phenomenally more bandwidth. And thatís how we get the anti-aliasing for free. The result for the anti-aliasing, you have 4 samples for every pixel, you ultimately end up with and the resolve is done right within that memory, so it doesnít have to be read back into the parent die.
On the intelligent memory thereís something that we call internally the, well itís a memory export path. So anywhere down the pipeline that you put data into the pipeline you can export that data out back to EDRAM or Level Two cache and then use that data again. So, and we really havenít been able to do that before, so it allows you to do things like higher order surfaces for real this time, so itís just another feature that gives us a lot of flexibility in the system.
FiringSquad: So the daughter die is where the embedded memory is?
ATI: Yes, the memory is there. Thereís 10MB of embedded memory surrounded by ďintelligenceĒ. Thatís the 192 component processors in there that do the work on the embedded memory data. Thereís also a very high speed connection between the parent and daughter die, in case you do get have to get data back and forth, and that connection is, well thereís a 2GHz wide bus connection between them.
ATI: The 2-terabit (256GB/sec) number comes from within the EDRAM, thatís the kind of bandwidth inside that RAM, inside the chip, the daughter die. But between the parent and daughter die thereís a 236Gbit connection on a bus thatís running in excess of 2GHz. It has more than one bit obviously between them.
Also weíre the memory controller for the system, so we have bandwidth between the CPU and the graphics engine, we have bandwidth between the memory, the DDR3 memory, which is also, well the DDR3 is the system memory so all these numbers are sometimes close to each other so things start to blur together
FiringSquad: And then 22.4GB/sec to the system memory?
ATI: 25GB/sec to the system memory, and 22GB/sec between the CPU and the GPU. Which, or in this case weíre more than the GPU, weíre the system memory controller too, so the bus between the CPU and GPU is 22GB. Itís really 11 in both directions, so 11 input and 11 output.
FiringSquad: And this is a 128-bit memory interface or 256-bit?
ATI: 128-bit, 700MHz interface.
FiringSquad: What types of operations do the EDRAMs 192 processors perform?
ATI: Well they do z-compares, they do alpha blends, they do blends of samples to make a pixel. That kind of thing. They do stencil operations also. And this is the first time memory has access to something like this, right in the memory, so it never leaves the memory die. The memory and the logic is all built into one die. And itís also a power savings by the way.
One of the big uses of power is actually driving I/O pins. In this case, you never have to go off chip so everything is just internal there. So power is important you know, of course, itís not quite like the handheld or mobile space but itís still important and you want to reduce it as much as possible because we are going pretty fast. You know we have a lot of logic going pretty fast in memory, a lot of CPU logic going fast, so you want to reduce power wherever you can.
FiringSquad: You said earlier that EDRAM gives you AA for free. Is that 2xAA or 4x?
ATI: Both, and I would encourage all developers to use 4x FSAA. Well I should say thereís a slight penalty, but itís not what youíd normally associate with 4x multisample AA. Weíre at 95-99% efficiency, so it doesnít degrade it much is what I should say, so I would encourage developers to use it. Youíd be crazy not to do it.
You know even though weíre in a high def resolution world with Xbox 360, well you know at standard definition with todayís TVs jaggies look bad, really bad with standard definition. In hi-def everything is so sharp, that when there are aliasing issues, youíre really going to see the jaggies there so I think anti-aliasing is just key and we have a great anti-aliasing story.
And of course you do know that while we support hi-def we will still output to standard def, although Microsoft has said that hi-def is the target platform, donít think ďoh well, it wonít work on my standard definition televisionĒ.
FiringSquad: How many textures per pixel are you performing per pass?
ATI: 4 textures. We have 4 texture units. And we donít have separate texture instructions, the shader just goes and it get textures and it applies them.
FiringSquad: Microsoft has announced 1080i support, but are there any plans to add support for 1080p?
ATI: I think 720p and 1080i are the sweet spot that developers are going for and thatís what weíre going to see in the next few years, for the next five years really as the main resolutions. It will be awhile before 1080p becomes standard. I think 720p would be the best to go for, and 1080i is supported as well of course. So hopefully developers will be doing, or at least the best would be 720p, 4xAA. Youíd get a teriffic image there.
FiringSquad: With the unified memory architecture, do you feel the VPU and CPU will be fighting over the available bandwidth?
ATI: Weíve optimized everything enough where I donít think itís going to be a problem. You know, weíve had silicon since November so I can tell you itís not going to be a problem. We were worried, you know obviously when you integrate the unified shaders and the EDRAM, we were a bit worried that there could be efficiency problems in the unified shaders, that there could just be general problems with EDRAM. And of course weíre still looking, but so far it has all worked out, it isnít a problem. So weíve had this debate for awhile. Not that we didnít have bugs. [laughs] None of them were our fault.
ATI: In terms of size, weíre a bit smaller. Of course, Iím not sure if thatís a good way to compare things, and to be honest I canít talk about the number of transistors for this design. Microsoft owns the IP and that has a lot to do with their cost model and all that sort of stuff. But weíre a very efficient engine and we feel very good about our design. You know, the bang for the buck is awesome. The power of the platform [pauses] weíre going to be the most powerful platform out there, weíve got a lot of innovation in there, weíre not just a PC chip.
I think the Sony chip is going to be more expensive and awkward. We make efficient use of our shaders, we have 64 threads that we can have on the processor at once. We have a thread buffer inside to keep the [inaudible]. The threads consist of 64 vertices or 64 pixels. We have a lot of work that we can do, a lot of sophistication that the developer never has to see.
FiringSquad: Looking at the parent die, what consumes the majority of the space?
ATI: The shaders consume a lot of the space. They donít own 50% of the chip, but any one thing would be the shaders. 48 shaders, but again, theyíre not 50% of the chip. We have texture units, we have shaders, we have caches, we have all kinds of things on the chip. We have a sequencer that controls the threads, we have lots of latency-reducing buffers. So the chip is complex.
FiringSquad: Which feature, or I guess group of features, really sets the Xbox 360 VPU apart from anything else?
ATI: Well, there canít be one, Iíve got to go with two. The unified shader and the embedded DRAM are both unique and just really important to the success of the platform. Theyíre both powerful features that just allow you to do things you couldnít otherwise do. They save bandwidth, they give you a richness. Did we talk with you about fluid reality yesterday?
FiringSquad: No, I donít think so.
ATI: Well, what weíve been trying to achieve in this particular go around is, well, realism, you can see it in games like Half-Life 2 where the walls, the environment, it all looks really good and you get a good sense of realism. But the next big hurdle is this fluid reality. The idea that characters in motion, lets say humans in motion, the joints look natural as they move along. Thatís involves a lot of vertex processing, and with this unified shader we can put all these shaders towards vertex processing. Cloth, as itís flying in the wind, like a flag for example, when it drops down on top of something, how that looks as it ripples.
Fur and feathers, the wind blows through them, grass, all that is where this idea of where fluid reality comes from. Iíd say weíve had static quality up until now, but now the fluid rhythm, the motion quality is this next realism that weíre really bringing.
Again, lots of power to devote to vertex processing, generating lots of pixels, to drive HD. HD is the platform of choice for this.
So if you have a bunch of commands that are going to the VPU, and letís say it requires a very light load of vertex shading, but a very heavy load of pixel shading, the developers donít have to specify that. That is intelligently figured out by the VPU. So the VPU looks at the workload is and says, ďokay hereís how Iím going to evenly distribute the workloadĒ, so we call it the adaptive shader array.
FiringSquad: Onto the video processor, is it an on-die TV encoder or something like a Rage Theater-type chip? Would that be a third chip?
ATI: It is a third tiny chip and actually Microsoft did that. Microsoft if you recall acquired, about five or six years ago, acquired WebTV. So the people in Mountain View, CA that were a part of that group, and of course, itís not just those people anymore, but they did that chip, and theyíve done a good job.
You know itís a good choice because itís a lot cheaper silicon, theyíre using 90nm.
FiringSquad: Do you know if it supports dual HD displays?
ATI: No it doesnít. I know the NVIDIA chip does, but thatís only because PC products do. It doesnít seem to have a real use inside the living room, but maybe you differ with me on that.
FiringSquad: Well, on the Sony console, I think theyíre looking at applications that go beyond just a console in the living room donít you think?
ATI: Yeah I really think itís just an accident because, well you know, last summer they had to change their plans. They found out that Cell didnít work as well as they wanted to for graphics. Remember originally you had two or three Cell processors doing everything and then in August last year they had to take an NVIDIA PC chip. And as you know, all PC chips do this, and so it [dual HD display outputs] just came for free.
FiringSquad: What features does Xbox 360 have that really set it apart from what we know so far about RSX and PS3?
ATI: Well, it has a lot. Iíll go through the main features. It has a great anti-aliasing story, it has a powerful shading story, in that we can, well performance but also it has a rich instruction set, giving you great image quality and its flexibility.
It has lots of headroom. This is something that developers will find is easy to program for and rich enough to last for years. It has the performance and feature set to last for years.
Weíd like to thank Bob Feldstein for taking the time out to answer our questions about Xbox 360ís graphics. As you can see, ATIís gone well beyond what we see today in RADEON X850 XT. While Bob didnít want to project how much more powerful the Xbox 360 VPU is over X850, clearly the chip sports many features that we wonít find in anything on the PC.
|© Copyright 2003 FS Media, Inc.|