Summary: Can you spot the G92 cards in this article? Our visit to NVIDIA’s Santa Clara HQ earlier this week was certainly revealing…
We were essentially given unfiltered access to see and talk with the people at NVIDIA. The engineers did not have to turn off their monitors when we walked into their labs. We were simply asked to black out any parts of the image that could reveal confidential information. NVIDIA’s Santa Clara campus is actually the third building the NVIDIA employees have called home. They started just down the street from the original Fry’s Electronics in Sunnyvale, before moving to Apple’s old facilities in Santa Clara. These offices had the capability to support about 500 employees or so. They finally moved to their current offices in 2001. They’ve spent the last six years there, growing from GeForce 4 to today’s GeForce 8 series. [image]
There has been talk about NVIDIA moving yet another time to San Jose, but it’ll be up to the city to offer a good enough financial package to get the NVIDIA team to move yet another time. As it stands, NVIDIA will continue to expand and grow at their Santa Clara HQ. Most of NVIDIA is made up of cubicles; the newer cubicles have shorter walls. [image]
[image]
NVIDIA’s Santa Clara HQ isn’t just a building full of executives and cubicles. It’s also home to some world-class laboratories. One of these laboratories is NVIDIA’s Failure Analysis Lab. This isn’t a software lab… it’s actually NVIDIA’s Silicon Failure Analysis Lab. [image]
Here, Howard Marks and his elite team validate the manufacturing quality of their manufacturing partners including TSMC, UMC, and IBM. That is, even as a fabless semiconductor company, NVIDIA maintains a full-scale silicon analysis lab to ensure that their chips work properly. After all, when an Intel chip fabbed at an Intel facility breaks, you blame Intel. When an NVIDIA chip fabbed somewhere else fails, you still blame NVIDIA. By having an in-house failure analysis lab, NVIDIA also speeds up the turnaround time between the “failed chip” arriving via FedEx and coming up with the solution. That’s because Howard’s team also has access to the design engineers. If there’s a problem with the memory controller, he can call the engineering team behind that functional block into the lab for added insight. The scanning electron microscopes that NVIDIA houses are second to none and have a resolution of approximately 1 nm. On the used market, they’re worth about half a million dollars – I imagine NVIDIA spent much more. There are several other instruments in the small room including this focused ion beam imaging device. Instead of using electrons to image the object (i.e. an electron microscope), the FIB uses a focused beam of gallium ions. These ions have considerably more energy than a typical electron beam. When the gallium ions strike the chip, the atoms of the chip are converted from the solid phase to a gaseous phase. I don’t think these phasers can be set to “stun.” [image]
Focused Ion Beams
While the scanning electron microscope lets the failure analysis team see the errors, the gallium ion beam can be used to cut the electrical connections between individual transitors or even deposit material to create new electrical connections. That is, not only can NVIDIA troubleshoot defective chips in-house, they can also patch the chip to test their hypothesis.
The Agilent 93000 allows NVIDIA to test 400 million individual transistors in 5 seconds. They can do their tests by sending a prescribed signal to the chip and validating the response is correct. [image]
Despite having access to some state of the art automated testing equipment, the failure analysis lab also houses a full-scale chemistry lab where NVIDIA can do a composition analysis. This part of the lab isn’t a relic, it’s a fully functional lab -- they even test their emergency eyewash station on a weekly basis like they are supposed to. They also have some precision sanding wheels which let them take a layer of silicon off at a time. [image]
Although the Agilent test devices do a great job with analysis, Howard’s team also has several other tools. One of those is a QFI InfraScope which helps NVIDIA isolate even more complex failures (especially as ICs get more complex too). Sometimes, NVIDIA has to test their chips in an actual functioning environment. To do that, NVIDIA runs their chips in a production system. Since they need to run the GPUs without a heatsink, NVIDIA relies on four Peltier elements to keep things cool. [image]
At this point, they can use the InfraScope to take a thermal image of the chip to identify problem spots. They can also do a dynamic analysis by using a laser to heat up individual groups of transistors to help isolate the defective part. Recently, NVIDIA added an Advantest T2000 to their arsenal. This was originally purchased to improve validation of the RSX chip inside the PlayStation 3 (Sony fabs use this instead of Agilent’s product). Howard was so impressed by the product that he’s starting to use for other chips as well. As I’m sure you’ve noticed, we’ve taken pictures of several of NVIDIA older chips. That’s because NVIDIA will receive older chips that have failed in the field for analysis. By understanding the mechanism of failure, NVIDIA can improve the design of newer ASICs. Therefore, NVIDIA’s Silicon Failure Analysis lab has a library of every chip that has been manufactured. These boards make it possible to map every pin of the IC to a conductor on the board itself. These boards allow NVIDIA to interface their chips with the testing equipment. [image]
We can go on and on about all the cool toys they have in the lab, such as the tools they use to evaluate ESD durability (through better chip design, NVIDIA’s modern GPUs are more resistant to static electricity than their earlier GPUs), or inverted infrared microscopes that allow them to see through flip chips, but it’s time to move onto the next part of the facility. [image]
NVIDIA also houses a world-class high-performance computing cluster. Besides the usual rack after rack of Intel Clovertowns, NVIDIA also has several Unisys ES700/one racks. These machines have 768 GB of RAM. Yeah, that’s actual accessible RAM not something like “96 nodes with 8GB of RAM each”. [image]
NVIDIA also has several hundred nodes with Intel Clovertown CPUs. Believe it or not, these are connected using 100MBps and GigE – no fancy Myrinet or Infinband needed. [image]
To back up the data, NVIDIA has several large tape libraries. They don’t need to backup all of the data (i.e. the intermediate computations), just the critical elements [image]
NVIDIA tries to keep their cluster at near full capacity. Idle computers mean wasted money, but running at maximum capacity means that there’s no room to handle increased demands. NVIDIA’s limitations are power and cooling. Although today’s Clovertown CPUs offer exceptional performance per watt, a full rack of Clovertown nodes will produce more heat than what can be cooled by air. They are seriously looking at water cooling as their computational demands increase. [image]
Believe it or not, NVIDIA tries to keep their compute nodes for 3 or 4 years. They’re still in the process of swapping their Pentium 4 based compute nodes for Clovertowns. So what’s the point of this compute cluster? All of these systems working in synchrony allow NVIDIA to simulate their chips is near real-time. That is, they can validate their chip, run performance metrics, debug, and even start writing drivers before any silicon is actually produced. This ensures that when a chip is taped out, it has already been validated and proven in a simulated environment. The silicon failure analysis lab then crosses the gap between the theory of the simulation and the actual product, ensuring the fastest possible turnaround from paper to product. For you and me, it’s these two labs that let NVIDIA keep up the pace with our demands for faster hardware and more immersive graphics.
Over 90% of NVIDIA’s 3000+ employees chose to eat lunch on-site. That’s about half the size of Stanford University undergraduate class. At the nth Street Café, you’ll find everyone from NVIDIA ranging from the summer intern to the NVIDIA’s CEO. [image]
There’s something for everyone. There’s a dedicated noodle bar, a grill for fresh burgers or BBQ, a huge salad bar, and more. Google’s cafeteria gets a lot more press, but NVIDIA’s cafeteria is at least as good as the stuff I used to get in a Stanford Dining Hall.
The high performance computing center and silicon failure analysis lab help make NVIDIA’s GPU a reality. But what happens once the chip has been validated and the hardware is completed? At this point, NVIDIA engineers work on additional real-world testing. One such lab is NVIDIA’s Display Compatibility Lab and thermal analysis center. They run GPUs at 32 F and 104 F to test their products under extreme conditions. This is also part of the validation process to ensure that their chips are capable of running at the specified clockspeed across a wide thermal range. [image]
In the display lab, NVIDIA validates display compatibility with what is considered to represent over 80% of monitors on the market. They have LCDs, CRTs, TVs, and even one of those 9 megapixel LCD monitors that IBM/Viewsonic/Iiyama put out a few years ago. These are the guys who ensure that NVIDIA GPUs work with your monitor seamlessly. There have been a few times where a monitor manufacturer’s EDID is out-of-spec, and this lab helps NVIDIA identify those errors and provide a patch in the next release of Forceware. [image]
On our way to the marketing building, we stopped by NVIDIA’s IT Help Window. They’re around to answer any questions or issues the NVIDIA staff may have. [image]
In order to maintain their performance edge, NVIDIA has a full-scale performance lab which allows them to validate and compare the performance of their chips against the competition. They have several automated testing tools in their arsenal to help things move quickly. [image]
We were only able to take a limited number of shots in this area. Apparently, they were testing something new… After having seen the place where NVIDIA GPUs are designed, validated, and tested for compatibility and performance, the natural place to end our trip to NVIDIA was the marketing and creative department. [image]
It’s this part of the company where NVIDIA’s sales artwork, powerpoint presentations, and marketing materials get finalized. Everyone seemed to be busy working on something big… Leaving NVIDIA[image]
[image]
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| © Copyright 2003 FS Media, Inc. |