Understanding the anatomy of GPUs using Pokémon

Please welcome this beautiful new born in GPGPU Nvidia Family Ampere
BLOG UPDATE FROM MAY 14, 2020

In the previous episode…

In our previous blog post about Deep Learning, we explained that this technology is all about massive parallel matrix computation, and that these computations are simplistic operations: + and x.

Fact 1: GPUs are good for (drum roll)…

Once you get that Deep Learning is just massive parallel matrix multiplications and additions, the magic happens. General Purpose Graphic Processing Units (GPGPU) (i.e. GPUs, or variants of GPUs, designed for something other than graphic processing) are perfect for…

matrix multiplications and additions !

Perfect isn’t it ? But why ? Let me tell you a little story

Fact 2: There was a time when GPUs were just GPUs

Yes, you read that correctly…

The first GPUs in the 90s were designed in a very linear way. The engineer took the engineering process used for graphical rendering and implemented it into the hardware.

To keep it simple, this is what a graphical rendering process looks like:

Uses for GPUs included transformation, building lighting effects, building triangle setups and clipping, and integrating rendering engines at a scale that was not achievable at the time (tens of millions of polygons per second).

The first GPUs integrated the various steps of image processing and rendering in a linear way. Each part of the process had predefined hardware components associated with vertex shaders, tessellation modules, geometry shaders, etc.

In short, graphics cards were initially designed to perform graphical processing. What a surprise!

Fact 3: CPUs are sports cars, GPUs are massive trucks

As explained earlier, for image processing and rendering, you don’t want your image being generated pixel per pixel – you want it in a single shot. That means that every pixel of the image – representing every object pointed in the camera, at a given time, in a given position – needs to be calculated at once.

It’s a complete contrast with CPU logic, where operations are meant to be achieved in a sequential way. As a result, GPGPUs needed a massively parallel general-purpose architecture to be able to process all the points (vertex), build all the meshes (tessellation), build the lighting, perform the object transformation from the absolute referential, apply texture, and perform shading (I’m still probably missing some parts!). However, the purpose of this blog post is not to look in-depth at image processing and rendering, as we will do that in another blog post in the future.

As explained in our previous post, CPUs are like sports cars, able to calculate a chunk of data really fast with minimal latency, while GPUs are trucks, moving lots of data at once, but suffering from latency as a result.

Here is a nice video from Mythbusters, where the two concepts of CPU and GPU are explained:

Fact 4: 2006 – NVIDIA killed the image processing Taylorism

The previous method for performing image processing was done using specialised manpower (hardware) at every stage of the production line in the image factory.

This all changed in 2006, when NVIDIA decided to introduce General Purpose Graphical Processing Units using Arithmetic Logical Units (ALUs), aka CUDA cores, which were able to run multi-purpose computations (a bit like a Jean-Claude Van Damme of GPU computation units!).

GoDaddy Commercial (2013) featuring Jean-Claude Van Damme Source : https://imgur.com/r/gifs/PvuZxBZ

Even today, modern GPU architectures (such as Fermi, Kepler or Volta) are composed of non-general cores, named Special Function Units (SFUs), to run high-performance mathematical graphical operations, such as sin, cosine, reciprocal, and square root, as well as Texture Mapping Units (TMUs) for the high-dimension matrix operations involved in image texture mapping.

Fact 5: GPGPUs can be explained simply with Pokémon!

GPU architectures can seem difficult to understand at first, but trust me… they are not!

Here is my gift to you: a Pokédex to help you understand GPUs in simple terms.

The Micro-Architecture Family

Here’s how you use it…

You basically have four families of cards:

This family will already be known to many of you. We are, of course, talking about Fermi, Maxwell, Kepler, Volta, Ampere etc.

A beautiful picture of new born with all the other familier

The Architecture Family

This is the center, where the magic happens: orchestration, cache, workload scheduling… It’s the brain of the GPU.

The Multi-Core Units (aka CUDA Cores) Family

This represents the physical core, where the maths computations actually happen.

The Programming Model Family

The different layers of the programming model are used to abstract the GPU’s parallel computation for a programmer. It also makes the code portable to any GPU architecture.

How to play

Start by choosing a card from the Micro-Architecture family
Look at the components, and choose the appropriate card from the Architecture family
Look at the components within the Micro-Architecture family and pick them from the Multi-Core Units family, then place them under the Architecture card
Now, if you want to know how to program a GPU, place the Programming Model – Multi-Core Units special card on top of the Multi-Core Units cards
Finally, on top of the Programming Model – Multi-Core Units special card, place all the Programming Model cards near the SM
You then should have something that look like this:

Examples of card configurations:

Fermi

Kepler

Maxwell

Pascal

Volta

Turing

After playing around with different Micro-Architectures, Architectures and Multi-Core Units for a bit, you should see that GPUs are just as simple as Pokémon!

Enjoy the attached PDF, which will allow you to print your own GPU Pokédex. You can download it here: GPU Cards Game