The `multiprocessing` module in Python

The `multiprocessing` module in Python

This article explains the multiprocessing module in Python.

This article introduces practical tips for writing safe and efficient parallel processing code using the multiprocessing module.

YouTube Video

The multiprocessing module in Python

Basics: Why use multiprocessing?

multiprocessing enables parallelization on a process basis, so you can parallelize CPU-bound tasks without being restricted by Python's GIL (Global Interpreter Lock). For I/O-bound tasks, threading or asyncio may be simpler and more suitable.

Simple usage of Process

First, here is a basic example of running a function in a separate process using Process. This demonstrates how to start a process, wait for its completion, and pass arguments.

 1# Explanation:
 2# This example starts a separate process to run `worker` which prints messages.
 3# It demonstrates starting, joining, and passing arguments.
 4
 5from multiprocessing import Process
 6import time
 7
 8def worker(name, delay):
 9    # English comment in code per user's preference
10    for i in range(3):
11        print(f"Worker {name}: iteration {i}")
12        time.sleep(delay)
13
14if __name__ == "__main__":
15    p = Process(target=worker, args=("A", 0.5))
16    p.start()
17    print("Main: waiting for worker to finish")
18    p.join()
19    print("Main: worker finished")
  • This code shows the flow where the main process launches a child process worker and waits for its completion using join(). You can pass arguments using args.

Simple parallelization with Pool (high-level API)

Pool.map is useful when you want to apply the same function to multiple independent tasks. It manages worker processes internally for you.

 1# Explanation:
 2# Use Pool.map to parallelize a CPU-bound function across available processes.
 3# Good for "embarrassingly parallel" workloads.
 4
 5from multiprocessing import Pool, cpu_count
 6import math
 7import time
 8
 9def is_prime(n):
10    # Check primality (inefficient but CPU-heavy for demo)
11    if n < 2:
12        return False
13    for i in range(2, int(math.sqrt(n)) + 1):
14        if n % i == 0:
15            return False
16    return True
17
18if __name__ == "__main__":
19    nums = [10_000_000 + i for i in range(50)]
20    start = time.time()
21    with Pool(processes=cpu_count()) as pool:
22        results = pool.map(is_prime, nums)
23    end = time.time()
24    print(f"Found primes: {sum(results)} / {len(nums)} in {end-start:.2f}s")
  • Pool can automatically control the number of workers, and map returns results in the original order.

Interprocess communication: Producer/Consumer pattern using Queue

Queue is a First-In-First-Out (FIFO) queue that safely transfers objects between processes. Below are some typical patterns.

 1# Explanation:
 2# Demonstrates a producer putting items into a Queue
 3# and consumer reading them.
 4# This is useful for task pipelines between processes.
 5
 6from multiprocessing import Process, Queue
 7import time
 8import random
 9
10def producer(q, n):
11    for i in range(n):
12        item = f"item-{i}"
13        print("Producer: putting", item)
14        q.put(item)
15        time.sleep(random.random() * 0.5)
16    q.put(None)  # sentinel to signal consumer to stop
17
18def consumer(q):
19    while True:
20        item = q.get()
21        if item is None:
22            break
23        print("Consumer: got", item)
24        time.sleep(0.2)
25
26if __name__ == "__main__":
27    q = Queue()
28    p = Process(target=producer, args=(q, 5))
29    c = Process(target=consumer, args=(q,))
30    p.start()
31    c.start()
32    p.join()
33    c.join()
34    print("Main: done")
  • Queue allows you to safely pass data between processes. It is common to use a special value such as None to signal termination.

Shared memory: Value and Array

You can use Value and Array when you want to share small numbers or arrays between processes. You need to use locks to avoid conflicts.

 1# Explanation:
 2# Use Value to share a single integer counter
 3# and Array for a small numeric array.
 4# Show how to use a Lock to avoid race conditions.
 5
 6from multiprocessing import Process, Value, Array, Lock
 7import time
 8
 9def increment(counter, lock, times):
10    for _ in range(times):
11        with lock:
12            counter.value += 1
13
14def update_array(arr):
15    for i in range(len(arr)):
16        arr[i] = arr[i] + 1
17
18if __name__ == "__main__":
19    lock = Lock()
20    counter = Value('i', 0)  # 'i' = signed int
21    shared_arr = Array('i', [0, 0, 0])
22
23    p1 = Process(target=increment, args=(counter, lock, 1000))
24    p2 = Process(target=increment, args=(counter, lock, 1000))
25    a = Process(target=update_array, args=(shared_arr,))
26
27    p1.start(); p2.start(); a.start()
28    p1.join(); p2.join(); a.join()
29
30    print("Counter:", counter.value)
31    print("Array:", list(shared_arr))
  • Value and Array share data between processes using lower-level mechanisms (shared memory at the C language level), not Python itself. Therefore, it is suitable for quickly reading and writing small amounts of data, but it is not suitable for handling large amounts of data..

Advanced sharing: Shared objects (dicts, lists) with Manager

If you want to use more flexible shared objects like lists or dictionaries, use Manager().

 1# Explanation:
 2# Manager provides proxy objects like dict/list
 3# that can be shared across processes.
 4# Good for moderate-size shared state and easier programming model.
 5
 6from multiprocessing import Process, Manager
 7import time
 8
 9def worker(shared_dict, key, value):
10    shared_dict[key] = value
11
12if __name__ == "__main__":
13    with Manager() as manager:
14        d = manager.dict()
15        processes = []
16        for i in range(5):
17            p = Process(target=worker, args=(d, f"k{i}", i*i))
18            p.start()
19            processes.append(p)
20        for p in processes:
21            p.join()
22        print("Shared dict:", dict(d))
  • Manager is convenient for sharing dictionaries and lists, but every access sends data between processes and requires pickle conversion. Therefore, frequently updating large amounts of data will slow down processing.

Synchronization mechanisms: How to use Lock and Semaphore

Use Lock or Semaphore to control concurrent access to shared resources. You can use them concisely with the with statement.

 1# Explanation:
 2# Demonstrates using Lock to prevent simultaneous access to a critical section.
 3# Locks are necessary when shared resources are not atomic.
 4
 5from multiprocessing import Process, Lock, Value
 6
 7def safe_add(counter, lock):
 8    for _ in range(10000):
 9        with lock:
10            counter.value += 1
11
12if __name__ == "__main__":
13    lock = Lock()
14    counter = Value('i', 0)
15    p1 = Process(target=safe_add, args=(counter, lock))
16    p2 = Process(target=safe_add, args=(counter, lock))
17    p1.start(); p2.start()
18    p1.join(); p2.join()
19    print("Counter:", counter.value)
  • Locks prevent data races, but if the locked region is too large, parallel processing performance will decrease. Only the necessary parts should be protected as a critical section.

Differences between fork on UNIX and behavior on Windows

On UNIX systems, processes are duplicated using fork by default, making copy-on-write for memory efficient. Windows starts processes using spawn (which re-imports modules), so you need to take care with entry point protection and global initialization.

 1# Explanation: Check start method (fork/spawn) and set it if needed.
 2# Useful for debugging platform-dependent behavior.
 3
 4from multiprocessing import get_start_method, set_start_method
 5
 6if __name__ == "__main__":
 7    print("Start method:", get_start_method())
 8
 9    # uncomment to force spawn on Unix for testing
10    # set_start_method('spawn')
  • set_start_method can only be called once at the start of your program. It's safer not to change this arbitrarily within libraries.

Practical example: Benchmarking CPU-bound workloads (comparison)

Below is a script that simply compares how much faster processing can be with parallelization using multiprocessing. Here, we use Pool.

 1# Explanation:
 2# Compare sequential vs parallel execution times for CPU-bound task.
 3# Helps understand speedup and overhead.
 4
 5import time
 6from multiprocessing import Pool, cpu_count
 7import math
 8
 9def heavy_task(n):
10    s = 0
11    for i in range(1, n):
12        s += math.sqrt(i)
13    return s
14
15def run_sequential(nums):
16    return [heavy_task(n) for n in nums]
17
18def run_parallel(nums):
19    with Pool(processes=cpu_count()) as p:
20        return p.map(heavy_task, nums)
21
22if __name__ == "__main__":
23    nums = [2000000] * 8  # heavy tasks
24    t0 = time.time()
25    run_sequential(nums)
26    seq = time.time() - t0
27    t1 = time.time()
28    run_parallel(nums)
29    par = time.time() - t1
30    print(f"Sequential: {seq:.2f}s, Parallel: {par:.2f}s")
  • This example shows that depending on task load and the number of processes, parallelization can sometimes be ineffective due to overhead. The larger (“heavier”) and more independent the tasks, the greater the benefit.

Important basic rules

Below are the basic points for using multiprocessing safely and efficiently.

  • On Windows, modules are re-imported when child processes start, so you must protect your script's entry point with if __name__ == "__main__":.
  • Inter-process communication is serialized (with pickle conversion), so transferring large objects becomes costly.
  • Since multiprocessing creates processes, it's common to decide the number of processes based on multiprocessing.cpu_count().
  • Creating another Pool within a worker becomes complex, so you should avoid nesting Pool instances as much as possible.
  • Since exceptions occurring in child processes are hard to detect from the main process, it is necessary to explicitly implement logging and error handling.
  • Set the number of processes according to the CPU, and consider using threads for I/O-bound tasks.

Practical design advice

Below are some useful concepts and patterns for designing parallel processing.

  • It is efficient to separate processes into roles such as input reading (I/O), preprocessing (multi-CPU), and aggregation (serial) via 'pipelining.'.
  • To simplify debugging, first check the operation in a single process before parallelizing.
  • For logging, separate log outputs per process (e.g., include the PID in file names) to make isolating issues easier.
  • Prepare retry and timeout mechanisms so you can safely recover even if a process hangs.

Summary (Key points you can use right away)

Parallel processing is powerful, but it's important to correctly judge the nature of tasks, data size, and inter-process communication cost. multiprocessing is effective for CPU-bound processing, but poor design or synchronization mistakes can reduce performance. If you follow the basic rules and patterns, you can build safe and efficient parallel programs.

You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.

YouTube Video