Multithreading VS Multiprocessing in Python (74/100 Days of Python)

Martin Mirakyan
3 min readMar 16, 2023

--

Day 74 of the “100 Days of Python” blog post series covering the multithreading VS multiprocessing

Python has two powerful modules for concurrent programming, namely Multithreading, and Multiprocessing. Both modules allow running code concurrently and speed up execution time. However, they differ in how they implement concurrency. Therefore, the use cases of those modules become different. So, one might choose one over another in different scenarios.

Multithreading in Python

In Multithreading, multiple threads run concurrently within the same process. Each thread shares the same memory space, which allows them to share data easily. Python’s Global Interpreter Lock (GIL) limits the effectiveness of multithreading, though. The GIL ensures that only one thread can execute Python bytecode at a time, which can lead to performance issues.

Multiprocessing in Python

Multiprocessing, on the other hand, creates multiple processes instead of multiple threads. Each process has its own memory space and runs independently of other processes. This eliminates the GIL’s performance issues and allows programs to take full advantage of multiple CPUs and cores.

What Are the Shortcomings of Multithreading?

The main disadvantage of multithreading is the GIL, which can limit performance. Multithreading is also not suitable for CPU-bound tasks since only one thread can execute Python bytecode at a time.

What Are the Shortcomings of Multiprocessing?

Multiprocessing has some overhead since it needs to create new processes. This overhead can make multiprocessing slower than multithreading for small tasks or tasks that involve a lot of inter-process communication. Data sharing can also be more complicated with multiprocessing since each process has its own memory space. So, sending data from one process to another can take a lot of time.

Data Sharing Issues in Both Multithreading and Multiprocessing

In both multithreading and multiprocessing, data sharing can be a challenge. When multiple threads or processes access the same data, race conditions, and deadlocks can occur. To avoid these issues, we can use locks or other synchronization primitives to ensure that only one thread or process accesses the data at a time.

Examples Where One Approach is Preferrable to Another

If the task is I/O-bound, such as web scraping or downloading files, multithreading is a good choice since the GIL does not impact I/O operations. If the task is CPU-bound, such as image processing or machine learning, multiprocessing is a better choice since it allows for full CPU utilization.

For example, consider a program that needs to resize a large number of images. Since image resizing is a CPU-bound task, multiprocessing would be a good choice. We can create multiple processes, each of which can resize a subset of the images. On the other hand, if we need to download a large number of files, multithreading would be a better choice since downloading is an I/O-bound task, and the GIL would not limit performance.

What’s next?

--

--

Martin Mirakyan
Martin Mirakyan

Written by Martin Mirakyan

Software Engineer | Machine Learning | Founder of Profound Academy (https://profound.academy)

Responses (1)