Parallelism and Concurrency: The basics

2020-11-28

This post documents my understanding of the basics regarding concurrency and parallelism and their manifestation in operating systems.

Definitions

Before I begin I want to define what exactly parallelism and concurrency are. However as it turns out, this isn’t exactly set in stone, and the definitions are somewhat convoluted.

The StackOverlfow question (What is the difference between concurrency and parallelism?) in particular has quite a number of somewhat conflicting answers, which is understandable, since if someone asked me what the difference between the two terms are - I wouldn’t be able to answer. To me, both of them share the meaning of “doing multiple things at the same time”. In fact, the dictionary definitions share very similar meanings as well.

To cut the whole story short, I will choose two widely accepted definitions and abide by them from now on:

Concurrency is the ability of a system that allows multiple tasks to run in overlapping time periods. There is never more than one task executing at a single point in time, instead the system “juggles” between tasks as needed. This gives us the illusion of parallelism - an example of which is multitasking on a single-core CPU.
- Also: Concurrency: A condition that exists when, during a given period of time, multiple threads (“multiple tasks” in the general sense) are making progress. Doesn’t mean that they were ever both executing at the same single point in time.
Parallelism is the ability of a system that allows multiple tasks to run simultaneously. They are truly running at the same time.
- Also: Parallelism: A condition that arises when, given a particular point in time, multiple threads (“multiple tasks”) are executing simultaneously.

There are many metaphors that can be applied to more easily explain the aforementioned definitions, for example:

Concurrency is a single ATM serving a line of people over the course of an hour.
Parallelism is two ATMs serving two lines of people at the same time.

Also see this excellent answer with more real-world examples.

Linux

Now that I’ve established what parallelism and concurrency are - how do they manifest themselves in real world environments, or rather operating systems?

First it’s important to realize that modern CPUs achieve parallelism via the use of multiple physical cores.

To avoid the confusion in the following parts of this document I’ll get this out of the way right now: “physical threads” that you hear about in the context of “the new Intel CPU has 8 cores and 16 threads” are different to “logical threads” (or software threads) that I’ll talk about shortly. This confusion has been a slight obstacle in my early research of the topic and I really wish that they didn’t share the name, despite both of them being related to concurrency/parallelism. Also, I’ll talk about most of the software concepts as they relate to linux, however a large portion of semantics are shared between modern OSs, especially Unix derivatives.

Each core executes instructions independent of the others and it also has its own registers and cache. That’s why some resources even refer to physical cores simply as CPUs.

Under Linux (and other modern OSs) concurrency and parallelism manifest themselves in a few different ways that I’ll briefly explain now:

Processes:

Processes represent “running programs” (flows of execution, tasks) on the system. Each process has an ID (PID) and an associated PPID (parent process ID).

Processes cannot share memory between each other (be that the stack or the heap).

They are composed of one or more (one by default) “smaller” flows of execution called threads.

Threads:

Every spawned process contains one thread by default. All threads belonging to a process share the same heap memory, but each one has its own stack. They are lighter to start-up than processes since there is less initialization involved (namely the fact that they shared the same virutal address space so a new one doesn’t have to be created each time).

The most prevelant standard for threading is the pthreads (POSIX Threads) specification. In Linux it is implemented using the NPTL (Native POSIX Thread Library) library, which was a standalone project for some time after its creation, but is a part of glibc now. It’s described in detail by pthreads(7).

It is immediately apparent that processes and threads differ only in what “data” they share between each other. That is because the Linux kernel doesn’t differentiate between threads and processes at all (which makes sense considering that pthreads is a userspace library)! Under the hood, the kernel knows only of “tasks” and exposes their interface via the clone(2) system call. The arguments that clone(2) takes determine (among other things) what will be shared between them (virtual address space, fd table, etc…). So, both threads and processes are created using clone(2), just with different parameters.

The runnable tasks that clone(2) spawns are referred to simply as “processes” in some places by the linux kernel, however I called them “tasks” for the sake of understanding and clarity. The gist of it, and the most important thing to understand is that there is no difference between threads and processes to the kernel and that they are both created by clone(2) using different arguments. See the Are Linux kernel threads really kernel processes? and Are threads implemented as processes on Linux? questions on the Unix StackExchange.

Also, reading fork(2) (the system call for spawning processes), we see:

C library/kernel differences: Since version 2.3.3, rather than invoking the kernel’s fork() system call, the glibc fork() wrapper that is provided as part of the NPTL threading implementation invokes clone(2) with flags that provide the same effect as the traditional system call. (A call to fork() is equivalent to a call to clone(2) specifying flags as just SIGCHLD.) The glibc wrapper invokes any fork handlers that have been established using pthread_atfork(3).

Couple of interesting points here:

when you call fork(2) you’re actually calling the glibc wrapper function, not the syscall (which is not weird, since when you call syscalls from glibc you’re never executing them directly)
this fork(2) wrapper is actually implemented by NPTL (which is a part of glibc), however the traditional syscall is of course part of the kernel (and it is also implemented using clone(2)!!)
the bolded part explains that fork(2) is in fact implemented using clone(2), further solidifying the previous couple of paragraphs

So far I’ve talked more about what “tools” are used to achieve concurrency rather than how to actually use them. But before I do I want to explore them a bit further:

Let’s write a trivial binary that proves that the fork(2) wrapper really calls the clone(2) syscall under the hood:

 1#include <pthread.h>
 2#include <stdint.h>
 3#include <stdio.h>
 4#include <stdlib.h>
 5#include <unistd.h>
 6#include <sys/types.h>
 7
 8int main(void) {
 9    pid_t return_pid = fork();
10
11    if(return_pid == -1) {
12        perror("Error forking");
13        return EXIT_FAILURE;
14    }
15
16    if(return_pid != 0) {
17        // parent
18        fprintf(stdout, "Hello from parent, PID: %d\n", getpid());
19
20    } else {
21        // child
22        fprintf(stdout, "Hello from child, PID: %d\n", getpid());
23        return EXIT_SUCCESS;
24    }
25
26    return EXIT_SUCCESS; 
27}

First, the entire reason that we’re able to write data to stdout from the child process (whose code is in the last ’else’ block) is that the spawned child retains access to all the open file descriptors of the parent (and stdout is file descriptor 0):

The child inherits copies of the parent’s set of open file descriptors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two file descriptors share open file status flags, file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).

Now let’s use strace to see which syscalls the binary performs:

1$ strace --trace=%process ./example

--trace=%process filters the output to only show syscalls relating to process manipulation.

1execve("./example", ["./example"], 0x7ffdfb5127c8 /* 36 vars */) = 0
2clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f38384c0810) = 120121
3Hello from child, PID: 120121
4--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=120121, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
5Hello from parent, PID: 120120
6exit_group(0)                           = ?
7+++ exited with 0 +++

We see that a call to clone(2) is in fact performed, and its return value (the child’s PID) matches the one printed in our message. We also see that the parent is sent a SIGCHLD singal once the child that it spawned has finished running, but that’s a topic I’ll maybe explore later.

Let’s get back on topic: how do tasks ran by the linux kernel correlate to parallelism or concurrency? On the surface it’s really simple: the kernel uses a scheduler to determine which currently active processes run when. If it has access to multiple CPU cores, it is able to achieve true parallelism by balancing multiple tasks over multiple cores (allowing them to run truly at the same time). If there are no multiple cores (or if there are multiple tasks running on a single core), only one task is able to run at a time, and thus the scheduler quickly “juggles” between tasks, giving the illusion of parallelism.

The scheduler currently used by the linux kernel is the Completely Fair Scheduler (CFS), since kernel 2.6.23.

I don’t have enough experience or knowledge to talk about how these actually work, but I plan to in the future.