Summary: In many ways, Xbox 360's graphics processor goes well beyond anything we currently have available on the PC. In today's article, we go over all the details of ATI's architecture with ATI VP of Engineering, Bob Feldstein. Bob discusses everything ranging from the 48 pipes, to ATI's "AA" for free comment. Catch all the details inside!
In order to glean more about the graphics inside Xbox 360 and its architecture, we recently had the chance to speak with Bob Feldstein, ATI’s VP of Engineering on Xbox 360: ATI: We have 48 shaders. And each shader, every cycle can do 4 floating-point operations, so that gives you 196. There’s a 192 number in there too, so I’m just going to digress a little bit. The 192 is actually in our intelligent memory, every cycle we have 192 processors in our embedded intelligent memory that do things like z, alpha, stencil. So there are two different numbers and they’re kind of close to each other, which leads to some confusion. [image]
So we have a traditional shader, but it’s not traditional at all though because it’s a unified shader. So you have the shader instruction set. [pauses] In the past you had a vertex shader and a pixel shader, and the instruction set was different and you couldn’t, you know, one couldn’t operate on the other’s data. Now we have one set of resources, these 48 shaders, and they naturally dynamically balance between whatever the problem at hand is. So if it’s dominated by vertices you get more resources for vertices, but if it’s dominated by pixels you get more resources towards pixels, or any other kind of problem. It’s a general purpose, well not a general purpose processor, but it is a processor with a good general instruction set and it can operate on a variety of different kinds of data. So unified shader means we have one set of shader hardware and it can operate on any problem. So you have 64 threads, and it’s all controlled by hardware so it’s not like the programmer knows one way or another about threading at all, and the threads here are things like vertex buffers or pixel programs and the hardware just keeps the same [inaudible] in a thread buffer and we can just switch back and forth between the different threads. That way if we’re waiting for data from a vertex program or vertex array we can go ahead and work on a pixel program or we can work on a second vertex or whatever, a different instruction.
ATI: So we have 48 shaders, each one of them does 4 floating-point ops per cycle, so 196 floating ops per clock. The part of the pipeline that’s in the daughter die, that lives with the memory is the z, the alpha, the stencil, the processors to do the resolve from multisample AA, the one that gives you AA for free, well 4x or 2x multisampling, and really what that does is it breaks one of the traditional bandwidth problems inside traditional graphics architectures. It keeps all that data so you don’t have to do a read or a write..Well you don’t have to read it into the general purpose, across the bus, process it and then write it back out to the memory. It all happens within the memory, and this just gives you phenomenally more bandwidth. And that’s how we get the anti-aliasing for free. The result for the anti-aliasing, you have 4 samples for every pixel, you ultimately end up with and the resolve is done right within that memory, so it doesn’t have to be read back into the parent die. On the intelligent memory there’s something that we call internally the, well it’s a memory export path. So anywhere down the pipeline that you put data into the pipeline you can export that data out back to EDRAM or Level Two cache and then use that data again. So, and we really haven’t been able to do that before, so it allows you to do things like higher order surfaces for real this time, so it’s just another feature that gives us a lot of flexibility in the system. FiringSquad: So the daughter die is where the embedded memory is? ATI: Yes, the memory is there. There’s 10MB of embedded memory surrounded by “intelligence”. That’s the 192 component processors in there that do the work on the embedded memory data. There’s also a very high speed connection between the parent and daughter die, in case you do get have to get data back and forth, and that connection is, well there’s a 2GHz wide bus connection between them.
ATI: The 2-terabit (256GB/sec) number comes from within the EDRAM, that’s the kind of bandwidth inside that RAM, inside the chip, the daughter die. But between the parent and daughter die there’s a 236Gbit connection on a bus that’s running in excess of 2GHz. It has more than one bit obviously between them. Also we’re the memory controller for the system, so we have bandwidth between the CPU and the graphics engine, we have bandwidth between the memory, the DDR3 memory, which is also, well the DDR3 is the system memory so all these numbers are sometimes close to each other so things start to blur together FiringSquad: And then 22.4GB/sec to the system memory? ATI: 25GB/sec to the system memory, and 22GB/sec between the CPU and the GPU. Which, or in this case we’re more than the GPU, we’re the system memory controller too, so the bus between the CPU and GPU is 22GB. It’s really 11 in both directions, so 11 input and 11 output. FiringSquad: And this is a 128-bit memory interface or 256-bit? ATI: 128-bit, 700MHz interface. FiringSquad: What types of operations do the EDRAMs 192 processors perform? ATI: Well they do z-compares, they do alpha blends, they do blends of samples to make a pixel. That kind of thing. They do stencil operations also. And this is the first time memory has access to something like this, right in the memory, so it never leaves the memory die. The memory and the logic is all built into one die. And it’s also a power savings by the way. One of the big uses of power is actually driving I/O pins. In this case, you never have to go off chip so everything is just internal there. So power is important you know, of course, it’s not quite like the handheld or mobile space but it’s still important and you want to reduce it as much as possible because we are going pretty fast. You know we have a lot of logic going pretty fast in memory, a lot of CPU logic going fast, so you want to reduce power wherever you can.
FiringSquad: You said earlier that EDRAM gives you AA for free. Is that 2xAA or 4x? ATI: Both, and I would encourage all developers to use 4x FSAA. Well I should say there’s a slight penalty, but it’s not what you’d normally associate with 4x multisample AA. We’re at 95-99% efficiency, so it doesn’t degrade it much is what I should say, so I would encourage developers to use it. You’d be crazy not to do it. You know even though we’re in a high def resolution world with Xbox 360, well you know at standard definition with today’s TVs jaggies look bad, really bad with standard definition. In hi-def everything is so sharp, that when there are aliasing issues, you’re really going to see the jaggies there so I think anti-aliasing is just key and we have a great anti-aliasing story. And of course you do know that while we support hi-def we will still output to standard def, although Microsoft has said that hi-def is the target platform, don’t think “oh well, it won’t work on my standard definition television”. FiringSquad: How many textures per pixel are you performing per pass? ATI: 4 textures. We have 4 texture units. And we don’t have separate texture instructions, the shader just goes and it get textures and it applies them. FiringSquad: Microsoft has announced 1080i support, but are there any plans to add support for 1080p? ATI: I think 720p and 1080i are the sweet spot that developers are going for and that’s what we’re going to see in the next few years, for the next five years really as the main resolutions. It will be awhile before 1080p becomes standard. I think 720p would be the best to go for, and 1080i is supported as well of course. So hopefully developers will be doing, or at least the best would be 720p, 4xAA. You’d get a teriffic image there. FiringSquad: With the unified memory architecture, do you feel the VPU and CPU will be fighting over the available bandwidth? ATI: We’ve optimized everything enough where I don’t think it’s going to be a problem. You know, we’ve had silicon since November so I can tell you it’s not going to be a problem. We were worried, you know obviously when you integrate the unified shaders and the EDRAM, we were a bit worried that there could be efficiency problems in the unified shaders, that there could just be general problems with EDRAM. And of course we’re still looking, but so far it has all worked out, it isn’t a problem. So we’ve had this debate for awhile. Not that we didn’t have bugs. [laughs] None of them were our fault.
ATI: In terms of size, we’re a bit smaller. Of course, I’m not sure if that’s a good way to compare things, and to be honest I can’t talk about the number of transistors for this design. Microsoft owns the IP and that has a lot to do with their cost model and all that sort of stuff. But we’re a very efficient engine and we feel very good about our design. You know, the bang for the buck is awesome. The power of the platform [pauses] we’re going to be the most powerful platform out there, we’ve got a lot of innovation in there, we’re not just a PC chip. I think the Sony chip is going to be more expensive and awkward. We make efficient use of our shaders, we have 64 threads that we can have on the processor at once. We have a thread buffer inside to keep the [inaudible]. The threads consist of 64 vertices or 64 pixels. We have a lot of work that we can do, a lot of sophistication that the developer never has to see. FiringSquad: Looking at the parent die, what consumes the majority of the space? ATI: The shaders consume a lot of the space. They don’t own 50% of the chip, but any one thing would be the shaders. 48 shaders, but again, they’re not 50% of the chip. We have texture units, we have shaders, we have caches, we have all kinds of things on the chip. We have a sequencer that controls the threads, we have lots of latency-reducing buffers. So the chip is complex. FiringSquad: Which feature, or I guess group of features, really sets the Xbox 360 VPU apart from anything else? ATI: Well, there can’t be one, I’ve got to go with two. The unified shader and the embedded DRAM are both unique and just really important to the success of the platform. They’re both powerful features that just allow you to do things you couldn’t otherwise do. They save bandwidth, they give you a richness. Did we talk with you about fluid reality yesterday? FiringSquad: No, I don’t think so. ATI: Well, what we’ve been trying to achieve in this particular go around is, well, realism, you can see it in games like Half-Life 2 where the walls, the environment, it all looks really good and you get a good sense of realism. But the next big hurdle is this fluid reality. The idea that characters in motion, lets say humans in motion, the joints look natural as they move along. That’s involves a lot of vertex processing, and with this unified shader we can put all these shaders towards vertex processing. Cloth, as it’s flying in the wind, like a flag for example, when it drops down on top of something, how that looks as it ripples. Fur and feathers, the wind blows through them, grass, all that is where this idea of where fluid reality comes from. I’d say we’ve had static quality up until now, but now the fluid rhythm, the motion quality is this next realism that we’re really bringing. Again, lots of power to devote to vertex processing, generating lots of pixels, to drive HD. HD is the platform of choice for this.
So if you have a bunch of commands that are going to the VPU, and let’s say it requires a very light load of vertex shading, but a very heavy load of pixel shading, the developers don’t have to specify that. That is intelligently figured out by the VPU. So the VPU looks at the workload is and says, “okay here’s how I’m going to evenly distribute the workload”, so we call it the adaptive shader array. FiringSquad: Onto the video processor, is it an on-die TV encoder or something like a Rage Theater-type chip? Would that be a third chip? ATI: It is a third tiny chip and actually Microsoft did that. Microsoft if you recall acquired, about five or six years ago, acquired WebTV. So the people in Mountain View, CA that were a part of that group, and of course, it’s not just those people anymore, but they did that chip, and they’ve done a good job. You know it’s a good choice because it’s a lot cheaper silicon, they’re using 90nm. FiringSquad: Do you know if it supports dual HD displays? ATI: No it doesn’t. I know the NVIDIA chip does, but that’s only because PC products do. It doesn’t seem to have a real use inside the living room, but maybe you differ with me on that. FiringSquad: Well, on the Sony console, I think they’re looking at applications that go beyond just a console in the living room don’t you think? ATI: Yeah I really think it’s just an accident because, well you know, last summer they had to change their plans. They found out that Cell didn’t work as well as they wanted to for graphics. Remember originally you had two or three Cell processors doing everything and then in August last year they had to take an NVIDIA PC chip. And as you know, all PC chips do this, and so it [dual HD display outputs] just came for free. FiringSquad: What features does Xbox 360 have that really set it apart from what we know so far about RSX and PS3? ATI: Well, it has a lot. I’ll go through the main features. It has a great anti-aliasing story, it has a powerful shading story, in that we can, well performance but also it has a rich instruction set, giving you great image quality and its flexibility. It has lots of headroom. This is something that developers will find is easy to program for and rich enough to last for years. It has the performance and feature set to last for years. We’d like to thank Bob Feldstein for taking the time out to answer our questions about Xbox 360’s graphics. As you can see, ATI’s gone well beyond what we see today in RADEON X850 XT. While Bob didn’t want to project how much more powerful the Xbox 360 VPU is over X850, clearly the chip sports many features that we won’t find in anything on the PC. | |||||||||||||||||||||||||||||
| © Copyright 2003 FS Media, Inc. |