Parallelism for Everyone
Imagine that you're a manager of a shipping company and you're given a choice between a vehicle that can deliver 1 pallet from the warehouse to the shipping yard and return back in 1 hour … or a vehicle that can deliver 10 pallets in one trip, but moves slower and needs 8 hours for the round-trip?
At first glance, the second option looks better. It can transport more goods in the same amount time. In that same 8 hour work day, the second vehicle can deliver an additional 2 pallets. However, it's not so simple. What if your company only shipped 3 pallets a day? The first vehicle would only need 3 hours and the second vehicle would need almost three times as much time because you'd have unused capacity. What if your company needs to ship 11 pallets a day? It's the difference between an 11 hour task and a 16 hour one…
This captures the fundamental challenge of designing software for multi-core CPU architectures. From a hardware perspective, it's much easier to produce a CPU or GPU with parallelism than it is to increase clock speed. It's easier and cheaper to manufacturer a tri-core 3.2 GHz CPU or 48-shader GPU at 500MHz than it would be to manufacture an equivalent single-core CPU running at 9.6 GHz or an equivalent 12-shader GPU at 2GHz. Unfortunately, for software developers, multi-threaded designs are substantially tougher to work with. There’s a lot of potential in the hardware, but the trick is making sure that the workload on each unit is balanced and managing the overhead needed to distribute that data.
There’s no recipe for producing high-quality multi-threaded programming other than experience. Fortunately, Microsoft benefits from the work that Sony has done.