With the new async syntax in Python 3.5, defining asynchronous functions has become a lot simpler. In this article, you can see an example for this new feature.
The example code here pulls in a set of homepages of popular websites and
displays the first 10 characters of every HTTP response. It uses the awesome
aiohttp
library. Please make
sure that your machine has aiohttp
and Python 3.5 installed.
Synchronous approach
First, let’s take a look at how this would have been solved in a naive, synchronous fashion. Let’s define our set of URLs that we want to retrieve.
sites = [
'https://www.google.com',
'https://www.yahoo.com',
'https://www.bbc.co.uk',
'https://en.wikipedia.org',
'https://de.wikipedia.org',
'https://news.ycombinator.com',
'https://www.tagesschau.de',
]
FORMAT_STRING ="{site.url:<30.30}: {site.text:.10} in {site.elapsed}s"
We also need some logic to retrieve the pages.
from requests import get
def get_site_snippet(site):
return FORMAT_STRING.format(site=get(site))
def main():
for site in sites:
print(get_site_snippet(site))
main()
Once we run our example, we see the following response
https://www.google.com/ : <!doctype in 0:00:00.798940s
https://www.yahoo.com/ : <?xml vers in 0:00:00.883083s
http://www.bbc.co.uk/ : <!DOCTYPE in 0:00:01.479646s
https://en.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.172367s
https://de.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.166793s
https://news.ycombinator.com/ : <html op=" in 0:00:01.464526s
https://www.tagesschau.de/ : <!DOCTYPE in 0:00:00.991291s
This looks good. But, knowing how this kind of code works, we quickly realize that a lot of time is being spent waiting for and being blocked by external resources. HTTP servers can take a long time to respond to some requests, and depending on a multitude of factors, results are almost always not being delivered instantly.
Since one HTTP request has to be finished to execute the next HTTP request, our program loses a considerable amount of time because it’s blocked by an external resource. This is also known as an I/O bound computation.
By interleaving the execution of more than one function call at once, you are able to save a considerable amount of time. In this example, this means that while one function call is busy retrieving HTTP results, another function call can already get the next site name and start the next HTTP request.
Async all the things
We run the same task again, but with a twist. Instead of synchronously
executing the get_site_snippet
function, we are going to asynchronously get
all website results and join the results in the end. Let’s take a look at how
to achieve that.
from asyncio.client import get
async def async_get_site_snippet(site):
response = await get(site)
content = await response.read()
return FORMAT_STRING.format(site=content)
A keen observer immediately notices the usage of async
and await
. I won’t
try to get too much into the details of how these are being handled in CPython
internally. Let’s just say that this means that the functions we’re calling
does not return the desired result immediately. Instead, calling an async
function, returns a promise to eventually calculate a result. To be more
precise, calling async_get_site_snippet('http://www.google.com')
returns a
coroutine object. The same applies for the two await
calls: the functions
get(site)
and response.read()
both return a promise.
Now, by itself the coroutine object does nothing. Getting to the result of every coroutine call involves some extra code that we are going to add now.
First, a list of tasks needed to be created, containing all the coroutine objects that need to be run.
from asyncio import get_event_loop, wait
def async_main():
tasks = [async_get_site_snippet(site) for site in sites]
Then, we create a
BaseEventLoop
object, that runs all our tasks until they all have completed. To execute all
coroutines concurrently, the list of tasks needs to be wrapped with
asyncio.wait
. The BaseEventLoop
object can then run the wrapped tasks until
all results are returned.
loop = get_event_loop()
# We can safely discard pending futures
result, _ = loop.run_until_complete(wait(tasks))
loop.close()
The results of loop.run_until_complete
are now contained in a list and are
ready to be retrieved.
for task in result:
print(task.result())
async_main()
Pretty neat. The benefits become obvious immediately when we compare execution times:
print("Running synchronous example")
start = time()
main()
duration = time() - start
print("Running asynchronous example")
async_start = time()
async_main()
async_duration = time() - async_start
print("Synchronous example took {} seconds".format(duration))
print("Asynchronous example took {} seconds".format(async_duration))
This outputs the following on my trusty laptop:
Running synchronous example
https://www.google.com/ : <!doctype in 0:00:00.798940s
https://www.yahoo.com/ : <?xml vers in 0:00:00.883083s
http://www.bbc.co.uk/ : <!DOCTYPE in 0:00:01.479646s
https://en.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.172367s
https://de.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.166793s
https://news.ycombinator.com/ : <html op=" in 0:00:01.464526s
https://www.tagesschau.de/ : <!DOCTYPE in 0:00:00.991291s
Running asynchronous example
https://www.google.com/ : <!doctype in 0:00:00.618827s
http://www.bbc.co.uk/ : <!DOCTYPE in 0:00:00.501347s
https://en.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.169479s
https://www.yahoo.com/ : <?xml vers in 0:00:00.762460s
https://news.ycombinator.com/ : <html op=" in 0:00:00.711696s
https://www.tagesschau.de/ : <!DOCTYPE in 0:00:00.645607s
https://de.wikipedia.org/wiki/: <!DOCTYPE in 0:00:00.167020s
Synchronous example took 12.025413990020752 seconds
Asynchronous example took 6.950876951217651 seconds
While the result may vary slightly depending on the network and server load, it becomes clear that we can shave off a few seconds of execution time by not letting HTTP requests block us. In this run the execution time was halved, impressive.
Further reading
If you want to read more about Python’s asyncio
refer to the following
sources:
Python documentation
- PEP 0492 - Coroutines with async and await syntax
- asyncio - Asynchronous I/O, event loop, coroutines and tasks