Summary: In today's article, we're debunking some of the myths that are out there about DirectX 10 and unified shaders. We'll also go over some of the other notable changes such as DX10's new geometry shader. Finally, we discuss DirectX 10's impact on gaming with Epic's Tim Sweeney. If you're curious about DirectX 10 and want to know more about it, this is one article you won't want to miss!
Microsoft and DirectX
Notes
Prior to DirectX 6.0, Microsoft’s DirectX APIs were seldom used by most game developers, instead they predominantly opted for 3dfx’s Glide API. As such, we won’t list them here to save space:
The Birth of DirectX 10
On the graphics side, DirectX 10 has been reworked from the ground up: no aspect of the API was left untouched on the graphics side. The driver model has been completely reworked, under DX10 the driver is split into two parts: the user mode driver and the kernel mode driver. The kernel mode driver is kept distinct from the user mode driver to enhance stability. The idea here is to keep user mode drivers for Direct3D, OpenGL, and DirectX video playback (among others) isolated from the kernel driver for the operating system, so that one can’t affect the other. [image]
Under the current driver model, the majority of the graphics driver resides in the operating system’s kernel space, so if the driver were to crash while gaming for instance, it could cause the entire operating system to crash along with it. By separating the driver into distinct parts, the hope is that DirectX 10 will deliver better overall system stability than previous versions of DirectX. Another change in DirectX 10 is that Microsoft has removed entirely the fixed function pipeline, everything is programmable. Software developers will use shaders to emulate the fixed function pipeline for older, legacy apps that use fixed function. DirectX 10: Windows Vista Only
One thing to note about DirectX 10 is that it will only be made available for Windows Vista. Microsoft has no plans to make DX10 compatible with Windows XP or any other previous operating systems. The one thing that Microsoft has done with Vista is incorporate a subsystem that will comply with DirectX 9.0 graphics hardware due to the population of users that still own DX9-compliant hardware. This subsystem will be named none other than DirectX 9.0L. So, in short, if you have DirectX 9 hardware, you will be using DirectX 9.0L as your API in Windows Vista. These are the some of the main features of DirectX 10. In the following pages we will go in depth with some of these new features and give some in depth analysis as to what this will bring to the table with Vista and in the future.
It started with the 360
An example ATI has stated is demonstrated above. In the following case the GPU is responsible for handling two very different scenarios. At the top, rendering a shark, a task which is very vertex-intensive as the shark is filled with thousands of vertices to make up the triangles that give the shark its shape, and the second scene on the bottom which consists of the water. This scene has very few vertices, but is instead pixel shader-intensive as pixel shaders are used to handle the water and any waves, reflections, or other effects the developer may want to add to the water. With a traditional DX8/DX9 architecture, the vertex shaders are fully utilized in the top scenario, while the pixel shaders are only partially loaded. In the water scene, it’s the opposite, the pixel shaders are fully loaded while the pixel shaders are barely being used. [image]
Under a unified architecture, the shading units in the example above wouldn’t operate independently of each other, instead they’d tackle these tasks together. This not only increases efficiency, but also improves performance. The engineers at ATI who created the Xbox 360’s Xenos GPU developed a GPU architecture that merges the vertex and pixel shaders into a unified bank of shaders to handle the rendering workload. This dramatically improves efficiency, thus increasing relative performance throughput.
In the Xbox 360, this controller is dubbed the load balancing unit. The load balancing unit is responsible for organizing the flow of data that goes to the shading units. Its designed to hand out tasks as efficiently as possible in order to ensure that the shading units are being fully utilized while also processing data in the best order so that the scene can be rendered as quickly as possible. We got an excellent explanation of how the load balancing unit works, and its relationship with the thread arbiter (which lies above the shading units) from ATI’s Dave Baumann: “In a traditional architecture there is a fixed resource split between geometry and pixel processing, and the ratio of these splits often vary depending on what portion of the market the processor is aimed at. The workloads of games are rarely uniform though – the level of geometry and pixel processing can vary significantly depending on what is occurring within the game, not just from one from to the next, but within a frame. If there is a section of processing required that is very geometry heavy then it can mean that the some or all of the potential pixel processing power in the GPU is wasted as it bottlenecked by the geometry processing capabilities and vice versa. It’s important to note that the shading units can operate independent of each other, that way if the GPU is waiting for data from a vertex program or a vertex array the load balancing unit can assign the shading units to work on a pixel program or a second vertex. This further helps to ensure that the shaders are constantly being fed with data. All this is invisible to the software developer, no special programming is required. For the Xbox 360’s Xenos GPU, ATI employs 48 shading units. Each of these shaders is general purpose and shares the same instruction set; in other words, no one shading unit is more functional than another. This allows the shaders to operate on any type of data, whether it’s a pixel program, or a task for the vertex shader. The 48 shaders are grouped into three “banks” of shaders, each bank consists of 16 shaders. Unified shading in DirectX 10
Because the Xbox 360’s Xenos GPU utilizes a unified shader architecture and ATI/Microsoft have been promoting unified so heavily in comparison to PlayStation 3’s RSX GPU which isn’t unified, it has generally been assumed that DirectX 10 also requires a unified shader architecture. However, it turns out that this is not the case. In fact, we’ve poured over numerous DirectX 10 documents and none of them even discuss a unified architecture! This means that a hardware manufacturer like ATI, S3, or NVIDIA could develop a GPU with distinct pixel, vertex, and geometry shaders and still claim 100% DirectX 10 Shader Model 4.0 compliance. This is because with DirectX 10, Microsoft only defines the specifications of the API, it is then up to hardware manufacturers to determine what they feel is the best method to meet those specifications. In ATI’s case, they decided to go with a unified architecture for Xbox 360 and their upcoming R600 GPU because they felt it made the most sense, particularly since the shaders all share the exact same functionality anyway. In the words of ATI’s Dave Baumann, the decision to go unified came down to one simple question "if all these parts of the pipeline [geometry, pixel, and vertex shaders] have to have the same capabilities, does it make sense to have a traditional pipeline with discrete units or a single pool that can execute all shader program types?" [image]
We think a lot of the confusion around unified shaders came from what Microsoft describes as DirectX 10’s “unified shader core”. This refers to the fact that in DirectX 10, all shaders rely on the same instruction set, in previous shader models there were restrictions in functionality between the pixel and vertex shaders. This is no longer the case in DirectX 10, they all have the same unified programming model for pixel, vertex, and geometry shaders.
DirectX 10’s Geometry Shader
DirectX 10 will introduce a new shader in to the mix called the geometry shader. This shader is pretty big news considering what it can be used for. The geometry shader sits right in between the vertex shader and the pixel shader in the Direct3D10 graphics pipeline (although its conceivable that results from the geometry shader can be sent back to the vertex shader, and then back to the geometry shader, as there are no restrictions in sending the results from one shader type to another).
After the vertices are processed by the vertex shader, the geometry shader can be used to perform further work on them. The geometry shader can be used to amplify the number of triangles, so it can take the vertices and create a new set of triangles. One limitation of the vertex shader is that it can’t create new vertices. The geometry shader can even be used to work on the edges of a triangle to create a different shape. [image]
In essence, the geometry shader allows for the scene being created to fully utilize all geometry primitives which include: lines, points, and triangles. Additionally, it can handle adjacent primitives. In the past, these primitives were handled in different areas of the pipeline and exclusively in those areas. What does that mean? Well, think about a triangle and its shape. The geometry shader can take full control of the triangle and control its vertices, treating it like an object before it is passed to the rasterizer and pixel shader for further processing. The geometry shader will open up a slew of possibilities for developers in regards to creating new, more elaborate effects in DirectX 10 games, or enhancing performance: from working concurrently with the pixel shader and the vertex shader, or offloading CPU cycles to the GPU, it’s a huge breakthrough. Here are some of the cool new effects that we could see from the geometry shader. They include: • Animating organic forms (one demo Microsoft has demonstrated in the past had dynamically growing vines rendered 100% on the GPU) • Geometry/data amplification • Motions Blur • More realistic wrinkles on faces • Realistic Shadow Volume Generation • Modeling fluid-like behavior in games (particle systems which model fluids) • Cartoon and Falloff Effects • Stencil Shadow Extrusion • Procedural geometry and detailing • Add noise to create turbulent fields • Displacement Mapping • Isosurface extraction • And many more possibilities! [image]
The first topic that is mentioned in the presentation is what DirectX 9 has to offer. Detailed characters, complex materials, and lighting effects such as HDR lighting are some of the things that they make mention of. DirectX 9 currently makes great use of these and this is why games today look as good as they do. They go on by saying that game developers are doing a great job of bringing a false reality closer to life than ever before. But then the presentation gets to the issue at hand, DirectX 9’s overhead. As it stands now, game developers are getting close to utilizing DX9 to its fullest potential. But eventually software developers are going to get to a point where they can do no more with DirectX 9’s feature set because of bottlenecks and constraints they are encountering in the API. The issue at hand is in the DX9 pipeline and how it functions. In the DirectX 9 pipeline, the app feeds the API objects. In this case, an object can be anything in the scene, an example would be a character model. (In fact, complex characters may be composed of many objects.) [image]
In the current DX9 pipeline, the object passes from the application to the API; the API in turn will feed these objects to the driver, and then ultimately to the graphics hardware. The issue is that each time the object is passed from the API to the driver, it introduces a bit of overhead. With one scene requiring dozens of objects for the driver to handle, this can drastically affect execution time to process them. Longer execution time directly relates to lower performance, also known as the small batch problem. [image]
[image]
While DirectX 10 can’t remove this overhead entirely, it is significantly reduced in DX10 thanks to new state objects. ATI’s slides indicate significantly less execution time will be devoted towards the API+Driver in DX10 (40% in DX9 vs 20% in DX10), which will allow developers to put more objects, materials, and other eye candy effects in their DX10 games. [image]
Other additions and improvements
With all of this talk about the new additions to DirectX 10 and the new driver model, we tend to look past some of the important things that have been upgraded from the previous version. A few of these areas that have been refined are:
[image]
Alongside what we have talked about thus far there are far many more refinements that have been made to DirectX 10. These are some of the biggest changes we’ve found, but we wanted to hear what the people who will actually be using the new API to create games had to say. For this, we turned to Epic Games Tim Sweeney!
As great as DirectX 10 looks on paper, it’s ultimately up to the game developers to really take advantage of the API’s new features. To get some perspective on DX10 from the developers’ perspective, we turned to Tim Sweeney, one of the founders of Epic Games and technology director on Unreal Engine 3. FiringSquad: As a game developer who is used to working on the cutting edge, which new features in DirectX 10 excite you the most? Tim Sweeney: I see DirectX 10's support for virtualized video memory and multitasking as the most exciting and forward-looking features. Though they're under-the-covers improvements, they'll help a great deal to bring graphics into the mainstream and increase the visual detail available in future games. FiringSquad: Is there anything in DirectX 10 that you couldn’t do in DirectX 9.0? Tim Sweeney: Realistically, DirectX 10 doesn't introduce fundamentally new capabilities, but brings many new features that will enable developers to optimize games more thoroughly and thus deliver incrementally better visuals and better frame rates. If you look at the long-term graphics roadmap, there have only been a few points where we've gained fundamentally new capabilities. The most visible was the move from DirectX 6, 7 and 8, which in practice were fixed-function, 8-bit rendering APIs, to DirectX 9 with programmable shaders and support for high-precision arithmetic. Most of the in-between steps have brought welcome but incremental improvements, and DirectX 10 falls into that category. From here on, there is really only one major step remaining in the evolution of graphics hardware, and that's the eventual unification of CPU and GPU architectures into uniform hardware capable of supporting both efficiently. After that, the next 20 years of evolution in computing will just bring additional performance. FiringSquad: A lot has been made about the speed boost DirectX 10 will bring over DX9. In part due to the new driver model and in part due to other efficiencies. In your position you get to work with the latest hardware – can you tell us without violating any NDAs if these speedups are realistic or not? Will we really see a 6X increase in games or is this all theoretical? Tim Sweeney: We don't have hard data yet, but it looks like there's potential to reduce the CPU cost of submitting rendering by a factor of 2-4. Since DirectX9 games are often CPU-limited, this should lead to significant visible improvements in frame rate. More important, this lower overhead will enable us to render more objects per frame and increase the visual complexity of scenes in a more organic way than simply adding more polygons to existing objects. FiringSquad: Based on what you’ve seen with DirectX 10, do you think it will be easier for game developers to program for than DirectX 9 was? If yes, which features really stand out? Tim Sweeney: You can't really use the word "easier" in conjunction with supporting DirectX 10. Because it's only available on Windows Vista and not XP, all developers who support it will have to continue supporting DirectX9, and henceforth maintain two versions of the rendering code in their engine. It's worth doing this, and we're doing it for Unreal Engine 3. But, far from making our lives easier, it brings a considerable amount of additional development cost and overhead to PC game development, FiringSquad: With games using higher resolution textures and screen resolutions also going up, memory bandwidth is sucked up quickly, particularly on lower-end cards with slower graphics memory. How big of a problem is this and should hardware developers be focusing more of their time on solving this problem than on adding more functions to the GPU such as physics? Tim Sweeney: PC games deal with bandwidth differences between the high-end and low-end quite effectively by scaling our video resolutions. Today's games generally support resolutions from 640x480 up to 2560x1600, which means we can easily scale by a factor of 13 in frame buffer bandwidth. Talk of "adding physics features to GPUs" and so on misses the larger trend, that the past 12 years of dedicated GPU hardware will end abruptly at some point, and thereafter all interesting features -- graphics, physics, sound, AI -- will be software problems exclusively. The big thing that CPU and GPU makers should be worrying about is this convergence, and how to go about developing, shipping, marketing, and evolving a single architecture for computing and graphics. This upcoming step is going to change the nature of both computing and graphics in a fundamental way, creating great opportunities for the PC market, console markets, and almost all areas of computing. FiringSquad: We know that Unreal Engine 3 was largely developed with shader model 3.0 in mind, but do you plan on adding any DirectX 10 aspects into Unreal Engine 3 and ultimately Unreal Tournament 2007 or is that coming in UE4? Tim Sweeney: Unreal Engine 3 will make full use of DirectX 10, and many of our and our partners' games will ship in 2007 with full support for DirectX 10 and Windows Vista. But, despite the marketing hype, DirectX 10 isn't all that different from DirectX 9, so you'll mainly see performance benefits on DirectX 10 rather than striking visual differences. FiringSquad: What are some of the things you would have liked to have seen Microsoft add to DirectX 10 that aren’t in there currently? Tim Sweeney: Microsoft made the right key decisions in developing DirectX 10. They invested heavily in a couple of bold operating-system-wide initiatives, including video memory virtualization and support for preemption, and introduced many welcome incremental improvements. Ultimately, the DirectX 10 feature set resulted from about 7 years of discussion with key game developers. A lot of major ideas were proposed, including a multi-year effort by John Carmack to lobby for video memory virtualization. The features that didn't make it into DirectX 10 either weren't particularly beneficial, or clearly weren't practical for this timeframe. FiringSquad: We know that the first games that are capable of taking advantage of some of DX10’s features will ship next year. But how long do you think it will take before games require DirectX 10? When should gamers really care about this new API, when will it really begin to affect them? Tim Sweeney: Requiring DirectX 10 is tantamount to requiring Windows Vista, and we have a lot of historical data we can use as a guide to such transitions. 2006 is the first year where it became economical for developers to ship games that don't support Windows 98 and Windows ME, which implies that an operating system has a 6-year lifespan. Vista will ship in 2007, so mainstream games that require it should start appearing in 2012 or 2013. So much can happen in that kind of time period that we ought not even consider it. On behalf of FiringSquad, we’d like to thank Epic’s Tim Sweeney for taking time out of his day to answer our questions on DirectX 10. Before we move on to the next page though, Tim had one last point he wanted to pass along on DirectX 10. From Tim:
Meanwhile, the new geometry shader can be used to offload functions that previously were done on the CPU to the GPU, freeing the CPU up for other tasks, or the geometry shader can be used to output more geometry into a game’s scene (Microsoft refers to this as limited amplification) or for other applications such as shadow volumes. As game developers gain more experience with the geometry shader, its usage will no doubt increase. All these new additions to DirectX 10 will allow for games that look much better than today’s latest titles, with richer, more detailed worlds that are filled with more objects and eye candy effects such as HDR lighting, volumetric fog, and depth of field, as well as more detailed characters. This brings us to this pair of screenshots: [image]
The screenshots above come from Microsoft’s Flight Simulator X, one of the first titles that has been designed to take advantage of DirectX 10. The screenshot on the left comes from the game’s DirectX 9 mode, while the screenshot on the right is running DX10. It doesn’t take 20/20 vision to see the difference in the two screenshots, the water in the DX10 screenshot looks photorealistic, with many more waves, while the sky is filled with thick clouds and rays from the sun striking the side of the mountain. In comparison the DX9 shot looks rather pathetic. However, the DX10 screenshot is a little misleading, as technically DX9 is fully capable of rendering everything seen in the DX10 shot with a few extra passes. Basically the frame rate would be slower under DX9 (by how much is unknown at this point), but that’s about it. In a lot of ways, this reminds us of the first batch of shader model 3.0 screenshots that went out a few years back with Far Cry. We all saw how that one played out – early on shader model 3.0 was mainly used for performance gains, not improving graphics. Epic’s Tim Sweeney basically said as much on the previous page: “But, despite the marketing hype, DirectX 10 isn't all that different from DirectX 9, so you'll mainly see performance benefits on DirectX 10 rather than striking visual differences.” Don’t get us wrong, we’re not saying DirectX 10 won’t be a significant improvement over DirectX 9, as clearly Microsoft has implemented lots of improvements that will significantly enhance the gaming experience visually, just don’t expect to see a sweeping change overnight. Most likely the first wave of titles will use DX10 enhancements liberally, and these enhancements will be subtle, focusing on improving performance rather than image quality. After all, there will only be so many gamers out there with DX10 hardware and a copy of Windows Vista 6-12 months from now, so many game developers will probably spend the bulk of their time tweaking their games towards their much larger DX9 audience. As DX10 hardware becomes more popular (and more powerful), it’s likely that game developers will then focus more on delivering the graphical enhancements everyone’s wanting. This isn’t necessarily a bad thing either, as clearly DX9 titles look good today. Those Crysis screenshots and gameplay footage from E3 for instance were all running on DX9 hardware. The same applies to the Unreal Engine 3 content that’s been released to date. The next-generation of gaming is coming though. And by all accounts, it’s looking quite good. Now we’re eagerly awaiting the arrival of next-gen hardware. If the rumors about NVIDIA’s G80 GPU are true, we won’t have to wait too much longer. We certainly can’t wait to see what both ATI and NVIDIA have in store for their upcoming DX10 parts. If history is any indication, we should be seeing some substantial increases in graphics performance once these next-gen GPUs arrive… Find this article helpful? If you enjoyed this article, please digg it to share with others! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| © Copyright 2003 FS Media, Inc. |