Multiprocessing in Python (72/100 Days of Python)

Martin Mirakyan
6 min readMar 14, 2023

--

Day 72 of the “100 Days of Python” blog post series covering multiprocessing

Multiprocessing is a powerful feature in Python that allows you to run multiple processes concurrently. This can significantly improve the performance of your code, especially for CPU-intensive tasks.

When using the threading module, Python doesn’t actually utilize all the threads available on the computer, and instead uses only a single thread (a single CPU) which switches back and forth between tasks. This is done as a consequence of the Global Interpreter Lock (GIL) which makes a lot of things very easy in Python but makes true parallelism very difficult to accomplish in Python. We will discuss the Global Interpreter Lock in more detail in future episodes. The multiprocessing module offers true parallelism and can utilize all the CPU cores on your machine making the execution of certain tasks several times faster.

What is multiprocessing?

Multiprocessing is a way of running multiple processes simultaneously on a multi-core processor or multiple processors. Each process runs independently of the other and can share data through a shared memory space or inter-process communication (IPC) mechanisms.

Python’s multiprocessing module provides a simple and consistent interface for creating and managing processes. It allows you to spawn processes, communicate between processes, and synchronize processes with locks, semaphores, and queues.

Basic usage of multiprocessing

The multiprocessing module provides a Process class to create new processes. The multiprocessing.Process class follows a similar API to threading.Thread making it easy to reason about processes and threads in a similar way. Here is a simple example that shows how to create a new process:

import multiprocessing


def worker():
print('Worker')


if __name__ == '__main__':
p = multiprocessing.Process(target=worker)
p.start()

In this example, we defined a worker function that prints a message. We then created a new process using the Process class and passed the worker function as the target to execute in the new process. Finally, we started the process using the start() method.

Note that we enclosed the code to create and start the process inside the if __name__ == '__main__': block. This is necessary to prevent the newly created process from trying to start its own subprocesses, which can lead to unexpected behavior.

Passing arguments to processes

You can also pass arguments to the target function when creating a new process:

import multiprocessing


def worker(num):
print(f'Worker {num}')


if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
p.start()

In this example, we passed the i variable as an argument to the worker function using the args parameter. We created five processes, each with a different value of i. So, this program might output the numbers from 0 to 4 in no particular order:

Worker 0
Worker 2
Worker 1
Worker 3
Worker 4

Sharing Data Between Processes

When working with multiprocessing in Python, it is often necessary to share data between processes. However, sharing data between processes can be a challenging task, as each process has its own memory space and cannot access the memory of other processes directly.

When passing some data from one process to another, the data needs to be first serialized in the source process, then copied to the other, and then deserialized into a Python object.

Fortunately, Python provides several mechanisms for sharing data between processes, including shared memory, queues, pipes, and synchronization primitives.

Sharing Data with Shared Memory

Shared memory is one of the most efficient ways to share data between processes in Python. With shared memory, multiple processes can access the same block of memory, which can be used to share data between them. Python provides a module called multiprocessing.shared_memory that allows processes to create and access shared memory blocks. multiprocessing.shared_memory exposes two classes, SharedMemory and ShareableList. SharedMemory is a class used to create and manage shared memory blocks, while ShareableList is a class used to create and manage shared lists.

For example, the following code creates a new shared memory block of size 1024+ bytes (Depending on how the platform prefers to allocate the memory, it can allocate more memory than the requested size):

from multiprocessing.shared_memory import SharedMemory

shm = SharedMemory(create=True, size=1024)
print(shm.size)

Sharing Data with Queues

Multiprocessing queues are a communication mechanism provided by the multiprocessing module in Python. They allow multiple processes to communicate with each other by sending and receiving messages through a common queue.

Queues are particularly useful when implementing producer-consumer patterns, where one or more processes produce data to be processed by one or more other processes. This pattern is common in many real-world scenarios, such as in data processing pipelines, where data is generated by a source, processed by multiple stages, and finally consumed by a sink:

import multiprocessing


def producer(queue):
for i in range(10):
queue.put(i)
queue.put(None)


def consumer(queue):
while True:
item = queue.get()
if item is None:
break
print(item)


if __name__ == '__main__':
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(q,))
p2 = multiprocessing.Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()

In this example, we define two functions: producer and consumer. The producer function adds numbers 0 to 9 to the queue, followed by a None sentinel value to indicate the end of the data stream. The consumer function loops indefinitely, retrieving items from the queue and printing them until it receives the None sentinel value.

We then create a multiprocessing queue object, q, and two processes, p1 and p2. We pass the queue object to both processes as an argument. The p1 process runs the producer function, while the p2 process runs the consumer function.

Finally, we start both processes, and wait for them to complete using the join() method, and thus wait for the data to be produced and consumed.

Using multiprocessing queues allows for efficient communication between processes while avoiding issues with shared memory access. Additionally, using queues can help balance the load between processes, by allowing a single producer to feed multiple consumers or vice versa.

Sharing Data with Pipes

Multiprocessing pipes are another communication mechanism provided by the multiprocessing module in Python. They allow multiple processes to communicate with each other by sending and receiving messages through a pair of connected pipes.

Pipes are particularly useful when implementing two-way communication between processes. For example, one process might send commands to another process, and receive responses back:

import multiprocessing


def worker(conn):
while True: # Loop until the parent process sends a stop message
msg = conn.recv() # Receive a message from the parent process
if msg == 'stop': # Check for the stop
break
res = msg * 2 # Process the message
conn.send(res) # Send the result back to the parent process


if __name__ == '__main__':
parent_conn, child_conn = multiprocessing.Pipe()
p = multiprocessing.Process(target=worker, args=(child_conn,))
p.start()

# Send some messages to the child process
parent_conn.send(10) # The child process will receive these messages
parent_conn.send(20) # The child process will receive these messages
parent_conn.send(30) # The child process will receive these messages
parent_conn.send('stop') # Send the stop message
p.join() # Wait for the child process to finish

# Receive the results from the child process
while parent_conn.poll():
result = parent_conn.recv()
print(result)

In this example, we define a function worker that runs in a separate process. The function receives messages from the parent process via a pipe, processes the messages (in this case, doubling them), and sends the results back to the parent process.

We then create a pipe object, parent_conn and child_conn, using the Pipe function. We pass the child_conn object as an argument to the worker process.

We start the worker process using the Process function, passing the child_conn object as an argument. We then send some messages to the worker process using the send method on the parent_conn object.

Finally, we signal the worker process to stop by sending the 'stop' message. We wait for the worker process to complete using the join() method. Finally, we receive the results from the worker process using the recv method on the parent_conn object.

Synchronization Primitives

Synchronization primitives are used to coordinate access to shared resources between processes. Python provides several synchronization primitives, including locks, semaphores, and events. These primitives can be used to ensure that only one process can access a shared resource at a time or to signal when a particular event has occurred. The API for synchronization primitives in multiprocessing is very similar to that of the threading module. You can have a look at how those work in our previous discussions on locks, semaphores, and barriers.

What’s next?

--

--

Martin Mirakyan
Martin Mirakyan

Written by Martin Mirakyan

Software Engineer | Machine Learning | Founder of Profound Academy (https://profound.academy)

No responses yet