Failure Analysis Lab
NVIDIA’s Santa Clara HQ isn’t just a building full of executives and cubicles. It’s also home to some world-class laboratories. One of these laboratories is NVIDIA’s Failure Analysis Lab. This isn’t a software lab… it’s actually NVIDIA’s Silicon Failure Analysis Lab.
Here, Howard Marks and his elite team validate the manufacturing quality of their manufacturing partners including TSMC, UMC, and IBM. That is, even as a fabless semiconductor company, NVIDIA maintains a full-scale silicon analysis lab to ensure that their chips work properly. After all, when an Intel chip fabbed at an Intel facility breaks, you blame Intel. When an NVIDIA chip fabbed somewhere else fails, you still blame NVIDIA. By having an in-house failure analysis lab, NVIDIA also speeds up the turnaround time between the “failed chip” arriving via FedEx and coming up with the solution. That’s because Howard’s team also has access to the design engineers. If there’s a problem with the memory controller, he can call the engineering team behind that functional block into the lab for added insight.
The scanning electron microscopes that NVIDIA houses are second to none and have a resolution of approximately 1 nm. On the used market, they’re worth about half a million dollars – I imagine NVIDIA spent much more. There are several other instruments in the small room including this focused ion beam imaging device. Instead of using electrons to image the object (i.e. an electron microscope), the FIB uses a focused beam of gallium ions. These ions have considerably more energy than a typical electron beam. When the gallium ions strike the chip, the atoms of the chip are converted from the solid phase to a gaseous phase. I don’t think these phasers can be set to “stun.”
Focused Ion Beams
While the scanning electron microscope lets the failure analysis team see the errors, the gallium ion beam can be used to cut the electrical connections between individual transitors or even deposit material to create new electrical connections. That is, not only can NVIDIA troubleshoot defective chips in-house, they can also patch the chip to test their hypothesis.
NVIDIA also employs several other tools to help them. The failure analysis lab often has to find a single errant transistor out of 681 million (that’s the number of transistors inside GeForce 8800). We had a chance to see the actual core, and we can reassure you that most of those transistors represent core logic – not cache.
To figure out where the failure lies, the team uses several other tools including an Agilent 93000.
The Agilent 93000 allows NVIDIA to test 400 million individual transistors in 5 seconds. They can do their tests by sending a prescribed signal to the chip and validating the response is correct.