The `concurrent` Module in Python

Articles

In this article, we'll explain the concurrent module in Python.

While clarifying the concepts of concurrency and parallelism, we will explain how to implement asynchronous processing using the concurrent module with practical examples.

YouTube Video

The `concurrent` Module in Python

When speeding up processing in Python, it's important to be mindful of the differences between concurrency and parallelism. The concurrent module is an important means of safely and simply handling asynchronous processing with these differences in mind.

The Difference Between Concurrency and Parallelism

Concurrency means designing a process so that multiple tasks proceed by switching between them in small units of work. Even if tasks aren't actually running simultaneously, making use of "waiting time" allows you to make the entire process more efficient.
Parallelism is a mechanism that physically executes multiple tasks at the same time. By using multiple CPU cores, processing is advanced simultaneously.

Both are techniques to speed up processing, but concurrency is a design issue of 'how to proceed,' while parallelism is an execution issue of 'how it runs,' making them fundamentally different.

What Is the `concurrent` Module?

concurrent is a standard library in Python that provides a high-level API to handle concurrency and parallelism safely and simply. It is designed so that you can focus on 'executing tasks' without having to worry about low-level operations like creating and managing threads or processes.

Roles of `ThreadPoolExecutor` and `ProcessPoolExecutor`

The concurrent module provides two main options depending on the nature of the task.

ThreadPoolExecutor This is suitable for concurrent implementations, especially tasks with many I/O waits, such as network or file operations. By switching between tasks, it makes effective use of waiting time.
ProcessPoolExecutor This implementation is aimed at parallel processing and is suitable for CPU-intensive tasks. It uses multiple processes to take full advantage of available CPU cores in parallel.

Thus, a main feature of the concurrent module is that it provides a structure that allows you to properly choose between concurrency and parallelism as needed.

Basics of `ThreadPoolExecutor` (For I/O Tasks)

ThreadPoolExecutor is suitable for I/O-bound tasks, such as network communication and file operations. It distributes tasks among multiple threads, making effective use of waiting time.

 1from concurrent.futures import ThreadPoolExecutor
 2import time
 3
 4def fetch_data(n):
 5    # Simulate an I/O-bound task
 6    time.sleep(1)
 7    return f"data-{n}"
 8
 9with ThreadPoolExecutor(max_workers=3) as executor:
10    futures = [executor.submit(fetch_data, i) for i in range(5)]
11
12    for future in futures:
13        print(future.result())

In this example, multiple I/O tasks that wait for one second are executed concurrently. By using submit, function calls are registered as asynchronous tasks, and by calling result(), you can wait for completion and get the results, allowing you to implement concurrent processing that makes effective use of waiting time in a concise way.

Simple Concurrency Using map

If complex control is unnecessary, using map can make your code more concise.

 1from concurrent.futures import ThreadPoolExecutor
 2import time
 3
 4def fetch_data(n):
 5    # Simulate an I/O-bound task
 6    time.sleep(1)
 7    return f"data-{n}"
 8
 9with ThreadPoolExecutor(max_workers=3) as executor:
10    results = executor.map(fetch_data, range(5))
11
12    for result in results:
13        print(result)

In this example, multiple I/O tasks are executed concurrently using ThreadPoolExecutor.map. Since map returns results in the same order as the inputs, you can write code close to sequential processing, and concurrent execution is possible without being conscious of asynchronous processing—this is a major advantage.

Basics of `ProcessPoolExecutor` (For CPU-bound Tasks)

For heavy computations that fully utilize the CPU, you should use processes rather than threads. This allows you to avoid the Global Interpreter Lock (GIL) limitation.

 1from concurrent.futures import ProcessPoolExecutor
 2
 3def heavy_calculation(n):
 4    # Simulate a CPU-bound task
 5    total = 0
 6    for i in range(10_000_000):
 7        total += i * n
 8    return total
 9
10if __name__ == "__main__":
11    with ProcessPoolExecutor(max_workers=4) as executor:
12        results = executor.map(heavy_calculation, range(4))
13
14        for result in results:
15            print(result)

In this example, CPU-intensive computations are executed in parallel using ProcessPoolExecutor. Because creating processes is involved, a __main__ guard is required, which enables safe parallel processing utilizing multiple CPU cores.

Processing by Completion Order Using `as_completed`

as_completed is useful when you want to handle results in the order they complete.

 1from concurrent.futures import ThreadPoolExecutor, as_completed
 2import time
 3
 4def fetch_data(n):
 5    # Simulate an I/O-bound task
 6    time.sleep(1)
 7    return f"data-{n}"
 8
 9with ThreadPoolExecutor(max_workers=3) as executor:
10    futures = [executor.submit(fetch_data, i) for i in range(5)]
11
12    for future in as_completed(futures):
13        print(future.result())

In this example, multiple asynchronous tasks are executed simultaneously, and results are retrieved in the order in which they complete. Using as_completed lets you handle results quickly regardless of task order, making it suitable for progress display or situations requiring sequential handling.

Handling Exceptions

In concurrent, exceptions are raised when you call result().

 1from concurrent.futures import ThreadPoolExecutor
 2
 3def risky_task(n):
 4    # Simulate a task that may fail for a specific input
 5    if n == 3:
 6        raise ValueError("Something went wrong")
 7    return n * 2
 8
 9with ThreadPoolExecutor() as executor:
10    futures = [executor.submit(risky_task, i) for i in range(5)]
11
12    for future in futures:
13        try:
14            print(future.result())
15        except Exception as e:
16            print("Error:", e)

This example demonstrates that even if some tasks raise exceptions, the others continue execution and you can handle exceptions individually when retrieving results. By using concurrent's Future, it's important that successes and failures in asynchronous processing can be safely handled.

Guidelines for Choosing Between Threads and Processes

To use concurrency and parallelism effectively, it's important to choose the right approach based on the nature of the task.

In practice, the following criteria can help you decide.

For processes with a lot of I/O waits, such as communication or file operations, use ThreadPoolExecutor.
For CPU-bound, heavy computational tasks, use ProcessPoolExecutor.
If there are many simple tasks, using map allows you to write more concise code.
If precise control over execution order or exception handling is important, combine submit with as_completed.

Benefits of Using `concurrent`

By using the concurrent module, you can handle asynchronous processing safely and intuitively.

The main benefits are as follows:.

You don't have to bother with low-level thread or process management.
It's provided as part of Python's standard library, so you can use it with confidence.
The code becomes more readable and maintainable.
It's ideal as a first step in learning about concurrency and parallelism.

Simply keeping these guidelines in mind can greatly reduce failures in implementations using concurrent.

Summary

The concurrent module is the standard option for practical concurrency and parallelism in Python. It allows you to boost performance without greatly changing the content of your processing, which is a significant benefit in actual practice. By using concurrent, you can implement asynchronous processing concisely while safely managing exception handling and execution control.

You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.

YouTube Video

The `multiprocessing` module in Python Best Practices in Python

The `concurrent` Module in Python

YouTube Video

The concurrent Module in Python

The Difference Between Concurrency and Parallelism

What Is the concurrent Module?

Roles of ThreadPoolExecutor and ProcessPoolExecutor

Basics of ThreadPoolExecutor (For I/O Tasks)

Simple Concurrency Using map

Basics of ProcessPoolExecutor (For CPU-bound Tasks)

Processing by Completion Order Using as_completed

Handling Exceptions

Guidelines for Choosing Between Threads and Processes

Benefits of Using concurrent

Summary

YouTube Video

The `concurrent` Module in Python

What Is the `concurrent` Module?

Roles of `ThreadPoolExecutor` and `ProcessPoolExecutor`

Basics of `ThreadPoolExecutor` (For I/O Tasks)

Basics of `ProcessPoolExecutor` (For CPU-bound Tasks)

Processing by Completion Order Using `as_completed`

Benefits of Using `concurrent`