If you program in Python, you have most likely encountered situations where you wanted to speed up some operation by executing multiple tasks in parallel or by interleaving between multiple tasks.
Python has mechanisms for taking both of these approaches, which we refer to as parallelism and concurrency. In this article we’ll detail the differences between parallelism and concurrency, and discuss how Python can employ these techniques where it makes the most sense.
Concurrency vs. parallelism
Concurrency and parallelism are names for two different mechanisms for juggling tasks in programming. Concurrency involves allowing multiple jobs to take turns accessing the same shared resources, like disk, network, or a single CPU core. Parallelism is about allowing several tasks to run side by side on independently partitioned resources, like multiple CPU cores.
Concurrency and parallelism have different aims. The goal of concurrency is to prevent tasks from blocking each other by switching among them when one is forced to wait on an external resource. A common example is completing multiple network requests. The crude way to do it is to launch one request, wait for it to finish, launch another, and so on. The concurrent way to do it is to launch all requests at once, then switch among them as they get responses back. Through concurrency, we can aggregate all the time spent waiting for responses.
Parallelism, by contrast, is about maximizing the use of hardware resources. If you have eight CPU cores, you don’t want to max out only one while the other seven lie idle. Rather, you want to launch processes or threads that make use of all those cores, if possible.
How Python implements concurrency and parallelism
Python provides mechanisms for both concurrency and parallelism, each with its own syntax and use cases.
Python has two different mechanisms for implementing concurrency, although they share many common components. These are threading and coroutines, or async.
For parallelism, Python offers multiprocessing, which launches multiple instances of the Python interpreter, each one running independently on its own hardware thread.
All three of these mechanisms — threading, coroutines, and multiprocessing — have distinctly different use cases. Threading and coroutines can often be used interchangeably, but not always. Multiprocessing is the most powerful, used for scenarios where you need to max out CPU utilization.
If you’re familiar with threading in general, threads in Python won’t be a big step. Threads in Python are units of work where you can take one or more functions and execute them independently of the rest of the program. You can then aggregate the results, typically by waiting for all threads to run to completion.
A simple example of threading in Python:
from concurrent.futures import ThreadPoolExecutor import urllib.request as ur datas =  def get_from(url): connection = ur.urlopen(url) data = connection.read() datas.append(data) urls = [ "https://python.org", "https://docs.python.org/" "https://wikipedia.org", "https://imdb.com", ] with ThreadPoolExecutor() as ex: for url in urls: ex.submit(get_from, url) # let's just look at the beginning of each data stream # as this could be a lot of data print ([_[:200] for _ in datas])
This snippet uses threading to read data from multiple URLs at once, using multiple executed instances of the
get_from() function. The results are then stored in a list.
Rather than create threads directly, the example uses one of Python’s convenient mechanisms for running threads,
ThreadPoolExecutor. We could submit dozens of URLs this way without slowing things down much because each thread yields to the others whenever it’s only waiting for a remote server to respond.
Python users are often confused about whether threads in Python are the same as threads exposed by the underlying operating system. In CPython, the default Python implementation used in the vast majority of Python applications, Python threads are OS threads — they’re just managed by the Python runtime to run cooperatively, yielding to one another as needed.
Advantages of Python threads
Threads in Python provide a convenient, well-understood way to run tasks that wait on other resources. The above example features a network call, but other waiting tasks could include a signal from a hardware device or a signal from the program’s main thread.
Also, as shown in the snippet above, Python’s standard library comes with high-level conveniences for running operations in threads. You don’t need to know how OS threads work to use Python threads.
Disadvantages of Python threads
As mentioned before, threads are cooperative. The Python runtime divides its attention between them, so that objects accessed by threads can be managed correctly. As a result, threads shouldn’t be used for CPU-intensive work. If you run a CPU-intensive operation in a thread, it will be paused when the runtime switches to another thread, so there will be no performance benefit over running that operation outside of a thread.
Another downside of threads is that you, the programmer, are responsible for managing state between them. In the above example, the only state outside of the threads is the contents of the
datas list, which just aggregates the results from each thread. The only synchronization needed is provided automatically by the Python runtime when we append to the list. Nor do we check the state of that object until all threads run to completion anyway.
However, if we were to read and write to
datas from different threads, we’d need to manually synchronize these processes to ensure we get the results we expect. The
threading module does have tools to make this possible, but it falls to the developer to use them — and they’re complex enough to deserve a separate article.
Python coroutines and
async are a different way to execute functions concurrently in Python, by way of special programming constructs rather than system threads. Coroutines are also managed by the Python runtime but require far less overhead than threads.
Here is another version of the previous program, written as an async/coroutine construct and using a library that supports asynchronous handling of network requests:
import aiohttp import asyncio urls = [ "https://imdb.com", "https://python.org", "https://docs.python.org", "https://wikipedia.org", ] async def get_from(session, url): async with session.get(url) as r: return await r.text() async def main(): async with aiohttp.ClientSession() as session: datas = await asyncio.gather(*[get_from(session, u) for u in urls]) print ([_[:200] for _ in datas]) if __name__ == "__main__": loop = asyncio.get_event_loop() loop.run_until_complete(main())
get_from() is a coroutine, i.e. a function object that can run side by side with other coroutines.
asyncio.gather launches several coroutines (multiple instances of
get_from() fetching different URLs), waits until they all run to completion, and then returns their aggregated results as a list.
aiohttp library allows network connections to be made asynchronously. We can’t use plain old
urllib.request in a coroutine, because it would block the progress of other asynchronous requests.
Advantages of Python coroutines
Coroutines make perfectly clear in the program’s syntax which functions run side by side. You can tell at a glance that
get_from() is a coroutine. With threads, any function can be run in a thread, making it more difficult to reason about what may be running in a thread.
Another advantage of coroutines is that they are not bound by some of the architectural limitations of using threads. If you have many coroutines, there is less overhead involved in switching between them, and coroutines require slightly less memory than threads. Coroutines don’t even require threads, as they can be managed directly by the Python runtime, although they can be run in separate threads if needed.
Disadvantages of Python coroutines
async require writing code that follows its own distinct syntax, the use of
async def and
await. Such code, by design, can’t be mingled with synchronous code. For programmers who aren’t used to thinking about how their code can run asynchonously, using coroutines and async presents a learning curve.
Also, coroutines and
async don’t enable CPU-intensive tasks to run efficiently side by side. As with threads, they’re designed for operations that need to wait on some external condition.
Multiprocessing allows you to run many CPU-intensive tasks side by side by launching multiple, independent copies of the Python runtime. Each Python instance receives the code and data needed to run the task in question.
Here is our web-reading script rewritten to use multiprocessing:
import urllib.request as ur from multiprocessing import Pool import re urls = [ "https://python.org", "https://docs.python.org", "https://wikipedia.org", "https://imdb.com", ] meta_match = re.compile("<meta .*?>") def get_from(url): connection = ur.urlopen(url) data = str(connection.read()) return meta_match.findall(data) def main(): with Pool() as p: datas = p.map(get_from, urls) print (datas)
# We're not truncating data here,
# since we're only getting extracts anyway
if __name__ == "__main__": main()
Pool() object represents a reuseable group of processes.
.map() lets you submit a function to run across these processes, and an iterable to distribute between each instance of the function — in this case,
get_from and the list of URLs.
One other key difference in this version of the script is that we perform a CPU-bound operation in
get_from(). The regular expression searches for anything that looks like a
meta tag. This isn’t the ideal way to look for such things, of course, but the point is that we can perform what could be a computationally expensive operation in
get_from without having it block all the other requests.
Advantages of Python multiprocessing
With threading and coroutines, the Python runtime forces all operations to run serially, the better to manage access to any Python objects. Multiprocessing sidesteps this limitation by giving each operation a separate Python runtime and a full CPU core.
Disadvantages of Python multiprocessing
Multiprocessing has two distinct downsides. First, there is additional overhead associated with creating the processes. However, you can minimize the impact of this if you spin up those processes once over the lifetime of an application and re-use them. The
Pool object we used in the example above can work like this: Once set up, we can submit jobs to it as needed, so there’s only a one-time cost across the lifetime of the program to start the subprocesses.
The second downside is that each subprocess needs to have a copy of the data it works with sent to it from the main process. Generally, each subprocess also has to return data to the main process. To do this, it uses Python’s
pickle protocol, which serializes Python objects into binary form. Common objects (numbers, strings, lists, dictionaries, tuples, bytes, etc.) are all supported, but some exotic object types may not work.
Which form of Python concurrency to use
Whenever you are performing long-running, CPU-intensive operations, use multiprocessing. “CPU-intensive” can involve both work that happens directly in the Python runtime (e.g., the regular expressions example above) and work done with an external library like NumPy. In either case, you don’t want the Python runtime constrained to a single instance that blocks when doing CPU-based work.
For operations that don’t involve the CPU but require waiting on an external resource, like a network call, use threading or coroutines. While the difference in efficiency between the two is insignificant when dealing with only a few tasks at once, coroutines will be more efficient when dealing with thousands of tasks, as it’s easier for the runtime to manage large numbers of coroutines than large numbers of threads.
Finally, note that coroutines work best when using libraries that are themselves async-friendly, such as
aiohttp in the example above. If your coroutines are not async-friendly, they can stall the progress of other coroutines.