How do CPUs execute instructions in parallel ?

Greg
4 min readFeb 8, 2021

--

https://images.bit-tech.net/content_images/2009/11/memory-and-multi-core-programming/4.jpg

This blog post follows the previous one on the Central Processing Unit (CPU).

“CPU, cores, tasks, hyperthreading, multithreading, user threads, multi-cores”
A lot of vocabulary exists when it comes to parallelism in CPUs / OS.
Moreover, we don’t know exactly what is responsible for what:
Hardware ? CPU ? Assembly ? OS kernel code ? User code ?
I tried to get a better understanding and made some illustrations.

Plan

  • Single core CPU
  • How are user processes and threads scheduled ?
  • Hyper-threading
  • Multi-cores CPU

Single core CPU

A first and naive representation of a CPU could be the following:

Figure 1 — single core CPU
  • The CPU executes assembly instructions located in the main memory.
  • These instructions are user code or kernel code.
  • The kernel code is regularly forced to be executed thanks to hardware interrupts (pausing the current program).
    This allows the kernel code to schedule which programs are going to be executed next (based on different priority parameters).
    Scheduling is a broad topic feel free to read this for more details.

Old CPUs were only able to execute instructions one by one.
The quick speed of the CPU’s clock gave the illusion of parallelism for the user.

How are user processes and threads scheduled ?

From our user perspective processes and threads are different things.
However, inside the kernel, they are not as different as they all use the same C structure: ‘task_struct’.

Threads in Linux are treated as processes that just happen to share some resources. https://stackoverflow.com/questions/21360524/linux-kernel-threading-vs-process-task-struct-vs-thread-info

Figure 2 — Single core CPU handling multiple processes and threads
  • The CPU schedulable entity is a task_struct (the word ‘thread’ is often used interchangeably).

One may think that using different threads for a process on a single core CPU doesn’t add much because there is no real parallelism.
However:

There are still advantages to be gained, but they’re a bit situational:
If you are dealing with multiple potentially blocking resources — like file IO or GUI interaction or whatnot, then multithreading can be vital. https://stackoverflow.com/questions/20476638/single-vs-multi-threaded-programming-on-a-single-core-processor

Hyper-threading

Hyper-threading begins with an observation:

  • CPUs have different sub-components (ALU, MMU, registers, …).
  • A CPU instruction often uses only a little subset of these components at a time. Example: CPUs often write or read in the RAM which takes a lot of time (multiple CPU cycles).

Hyper-threading which has been created by Intel is a CPU hardware technology that aims to execute 2 instructions at the same time (if possible).

Figure 3 — Hyper-threading

CPU Hyper-threading is implemented by duplicating registers.
It makes available to the OS kernel 2 logical cores (there is only one physical core in reality).

Figure 4 — Hyper-threading big picture

As you can see in Figure 4, only the P3 process truly uses the parallelism feature offered by the Hyper-threading as it has 2 threads which can be dispatched to the 2 logical cores at the same time.

Programs which do not use threading features (such as a legacy one) cannot be parallelised and are doomed to be executed on one core at a time.

Another note: even if all the processes running are single threaded, Hyper-threading still improves performance as 2 different processes could be using 2 different logical cores (thus some instructions could be parallelized).

Hyper-threading improves performance up to 30%.

Multi-cores CPU

Figure 5 — Multi-core CPU with Hyper-threading enabled

In Figure 5:

  • The CPU has 2 physical cores.
  • Each physical core is divided into 2 logical cores for the OS thanks to Hyper-threading.

Therefore, the OS has 4 logical cores (“threads”) to schedule his tasks (also called “threads”).

Obviously, a process can be divided into more threads than available logical cores.
However, the threads won’t be able to execute all at the same time, some will have to wait.

It is possible to check the maximum number of threads a Linux OS can handle thanks to the following command:

|cat /proc/sys/kernel/threads-max

In my case the result is 62099.

Synthesis

Figure 6 — Vocabulary memo

--

--

Greg
Greg

No responses yet