Multiprocessing in Python (72/100 Days of Python)
Multiprocessing is a powerful feature in Python that allows you to run multiple processes concurrently. This can significantly improve the performance of your code, especially for CPU-intensive tasks.
When using the threading
module, Python doesn’t actually utilize all the threads available on the computer, and instead uses only a single thread (a single CPU) which switches back and forth between tasks. This is done as a consequence of the Global Interpreter Lock (GIL) which makes a lot of things very easy in Python but makes true parallelism very difficult to accomplish in Python. We will discuss the Global Interpreter Lock in more detail in future episodes. The multiprocessing
module offers true parallelism and can utilize all the CPU cores on your machine making the execution of certain tasks several times faster.
What is multiprocessing?
Multiprocessing is a way of running multiple processes simultaneously on a multi-core processor or multiple processors. Each process runs independently of the other and can share data through a shared memory space or inter-process communication (IPC) mechanisms.
Python’s multiprocessing
module provides a simple and consistent interface for creating and managing processes. It allows you to spawn processes, communicate between processes, and synchronize processes with locks, semaphores, and queues.
Basic usage of multiprocessing
The multiprocessing
module provides a Process
class to create new processes. The multiprocessing.Process
class follows a similar API to threading.Thread
making it easy to reason about processes and threads in a similar way. Here is a simple example that shows how to create a new process:
import multiprocessing
def worker():
print('Worker')
if __name__ == '__main__':
p = multiprocessing.Process(target=worker)
p.start()
In this example, we defined a worker
function that prints a message. We then created a new process using the Process
class and passed the worker
function as the target to execute in the new process. Finally, we started the process using the start()
method.
Note that we enclosed the code to create and start the process inside the if __name__ == '__main__':
block. This is necessary to prevent the newly created process from trying to start its own subprocesses, which can lead to unexpected behavior.
Passing arguments to processes
You can also pass arguments to the target function when creating a new process:
import multiprocessing
def worker(num):
print(f'Worker {num}')
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
p.start()
In this example, we passed the i
variable as an argument to the worker
function using the args
parameter. We created five processes, each with a different value of i
. So, this program might output the numbers from 0 to 4 in no particular order:
Worker 0
Worker 2
Worker 1
Worker 3
Worker 4
Sharing Data Between Processes
When working with multiprocessing in Python, it is often necessary to share data between processes. However, sharing data between processes can be a challenging task, as each process has its own memory space and cannot access the memory of other processes directly.
When passing some data from one process to another, the data needs to be first serialized in the source process, then copied to the other, and then deserialized into a Python object.
Fortunately, Python provides several mechanisms for sharing data between processes, including shared memory, queues, pipes, and synchronization primitives.
Sharing Data with Shared Memory
Shared memory is one of the most efficient ways to share data between processes in Python. With shared memory, multiple processes can access the same block of memory, which can be used to share data between them. Python provides a module called multiprocessing.shared_memory
that allows processes to create and access shared memory blocks. multiprocessing.shared_memory
exposes two classes, SharedMemory
and ShareableList
. SharedMemory
is a class used to create and manage shared memory blocks, while ShareableList
is a class used to create and manage shared lists.
For example, the following code creates a new shared memory block of size 1024+ bytes (Depending on how the platform prefers to allocate the memory, it can allocate more memory than the requested size):
from multiprocessing.shared_memory import SharedMemory
shm = SharedMemory(create=True, size=1024)
print(shm.size)
Sharing Data with Queues
Multiprocessing queues are a communication mechanism provided by the multiprocessing
module in Python. They allow multiple processes to communicate with each other by sending and receiving messages through a common queue.
Queues are particularly useful when implementing producer-consumer patterns, where one or more processes produce data to be processed by one or more other processes. This pattern is common in many real-world scenarios, such as in data processing pipelines, where data is generated by a source, processed by multiple stages, and finally consumed by a sink:
import multiprocessing
def producer(queue):
for i in range(10):
queue.put(i)
queue.put(None)
def consumer(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
if __name__ == '__main__':
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(q,))
p2 = multiprocessing.Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
In this example, we define two functions: producer
and consumer
. The producer
function adds numbers 0 to 9 to the queue, followed by a None
sentinel value to indicate the end of the data stream. The consumer
function loops indefinitely, retrieving items from the queue and printing them until it receives the None
sentinel value.
We then create a multiprocessing queue object, q
, and two processes, p1
and p2
. We pass the queue object to both processes as an argument. The p1
process runs the producer
function, while the p2
process runs the consumer
function.
Finally, we start both processes, and wait for them to complete using the join()
method, and thus wait for the data to be produced and consumed.
Using multiprocessing queues allows for efficient communication between processes while avoiding issues with shared memory access. Additionally, using queues can help balance the load between processes, by allowing a single producer to feed multiple consumers or vice versa.
Sharing Data with Pipes
Multiprocessing pipes are another communication mechanism provided by the multiprocessing
module in Python. They allow multiple processes to communicate with each other by sending and receiving messages through a pair of connected pipes.
Pipes are particularly useful when implementing two-way communication between processes. For example, one process might send commands to another process, and receive responses back:
import multiprocessing
def worker(conn):
while True: # Loop until the parent process sends a stop message
msg = conn.recv() # Receive a message from the parent process
if msg == 'stop': # Check for the stop
break
res = msg * 2 # Process the message
conn.send(res) # Send the result back to the parent process
if __name__ == '__main__':
parent_conn, child_conn = multiprocessing.Pipe()
p = multiprocessing.Process(target=worker, args=(child_conn,))
p.start()
# Send some messages to the child process
parent_conn.send(10) # The child process will receive these messages
parent_conn.send(20) # The child process will receive these messages
parent_conn.send(30) # The child process will receive these messages
parent_conn.send('stop') # Send the stop message
p.join() # Wait for the child process to finish
# Receive the results from the child process
while parent_conn.poll():
result = parent_conn.recv()
print(result)
In this example, we define a function worker
that runs in a separate process. The function receives messages from the parent process via a pipe, processes the messages (in this case, doubling them), and sends the results back to the parent process.
We then create a pipe object, parent_conn
and child_conn
, using the Pipe
function. We pass the child_conn
object as an argument to the worker
process.
We start the worker
process using the Process
function, passing the child_conn
object as an argument. We then send some messages to the worker
process using the send
method on the parent_conn
object.
Finally, we signal the worker
process to stop by sending the 'stop'
message. We wait for the worker
process to complete using the join()
method. Finally, we receive the results from the worker
process using the recv
method on the parent_conn
object.
Synchronization Primitives
Synchronization primitives are used to coordinate access to shared resources between processes. Python provides several synchronization primitives, including locks, semaphores, and events. These primitives can be used to ensure that only one process can access a shared resource at a time or to signal when a particular event has occurred. The API for synchronization primitives in multiprocessing
is very similar to that of the threading
module. You can have a look at how those work in our previous discussions on locks, semaphores, and barriers.
What’s next?
- If you found this story valuable, please consider clapping multiple times (this really helps a lot!)
- Hands-on Practice: Free Python Course
- Full series: 100 Days of Python
- Previous topic: Synchronizing Threads in Python With Barriers
- Next topic: What Is the Python Global Interpreter Lock (GIL)?