March 14th, 2016

Idiomatic Python: comprehensions

Brett Cannon
Principal Software Engineering Manager

We’re lucky to have a few people on our team who have been programming in Python for quite a while (I myself have been using the language now for over 15 years). Over the course of time we have picked up various idioms for programming in Python that may not be obvious or widely known for various reasons. To help share some of this knowledge we plan to do occasional blog posts entitled “Idiomatic Python” where we choose a topic and give some tips about it. Some of these posts may be long, some may be short. Either way the hope is that you will discover something useful from the posts (even for experienced developers as we may sprinkle in bits of Python history here and there).

For the inaugural post of this series I thought we could talk about comprehensions. In pretty much all programming languages a common idiom to fill a container is with a for loop:

container = []
for x in another_container:
  container.append(x**2)

It’s a very simple, straight-forward way to fill something like a list with objects. But this idiom is so common that it becomes a bit tiresome to do over and over again. This is why in Python 2.0, list comprehensions were added to the language:

container = [x**2 for x in another_container]

(bit of history: list comprehensions for Python were inspired by list comprehensions in Haskell). List comprehensions have become widely used in the Python community as they provide an easy way to create a list in a compact fashion from another container (technically it’s from another iterable, but every type of container should be an iterable). List comprehensions are also faster than a simple for loop thanks to trickery that Python can do under the covers.

But what you may not be aware of is that there are versions of comprehensions for generators, dicts, and sets in Python as well. In Python 2.4, generator expressions were added to the language. A cross between a generator and a list comprehension, generator expressions are great when you want to save on memory by not creating an entire list at once in memory and instead only want to use the memory required to create a single item in a sequence at a time (and who doesn’t want to save memory?):

container = (x**2 for x in another_container)

Much like list comprehensions, generator expressions are more-or-less syntactic sugar for the following:

def _():
    for x in another_container:
        yield x**2
container = _()
del _

To really get the point across you can try the following two examples out individually, but be aware that the latter one will take a while to complete:

# Use xrange() instead of range() if you're using Python 2.7.
# Generator expression
('genexp' for _ in range(100000000))
# List comprehension
['listcomp' for _ in range(100000000)]

Any time you might use a list comprehension you should stop and think about how the resulting list will be used. If the list is just going to be used as an iterable — all objects in the list will be accessed sequentially like in a for loop — then you should just use a generator expression.

This also segues into the idiom of specifying your APIs to accept or return an iterable instead of a specific type like a list when possible; when you only care about a sequence of object, specify your API accepts/returns iterables instead of a concrete type like a list. This point ties nicely into how list comprehensions work in Python 3 as basically generator expressions plus a call to list(), which means returning an iterable like a generator expression doesn’t prevent users of your code from easily getting a list if that’s what they want:

container = list(x**2 for x in another_container)

List comprehensions and generator expressions became so popular that Python continued on with the concept of comprehensions and introduced dict and set comprehensions in Python 3.0 (which were backported to Python 2.7). As the name implies, dict and set comprehensions provide syntactic sugar to create dicts and sets from a for loop. Set comprehensions look like a generator expression but with curly braces:

container = {x**2 for x in another_container}

Dict comprehensions look like set comprehensions, but use a colon to separate the key and value for each item in the dict, just like in dict literals:

container = {x: x**2 for x in another_container}

Now you may have noticed that out of all the built-in container types in Python, only tuples don’t have a comprehension form. This is on purpose because comprehensions are meant to be thought of as just syntactic sugar for a for loop. Since you can’t create a tuple in a for loop it wouldn’t make sense to have a comprehension form for tuples either. It’s also easy to create a tuple from a generator expression so there isn’t any lost functionality and there is no performance cost compared to using a for loop to make a list that you ultimately convert to a tuple:

container = tuple(x**2 for x in another_container)

To summarize:

  • List comprehensions were inspired by Haskell: [x**2 for x in another_container]
  • Generator expressions are like list comprehensions but for generators: (x**2 for x in another_container)
  • You should always try to use generator expressions over list comprehensions, and only use list comprehensions when you specifically need a list
  • Specify your APIs take/return iterables when you can instead of concrete types like a list when you only care about a sequence of objects
  • Python 2.7/3.0 added set comprehensions: {x**2 for x in another_container}
  • Python 2.7/3.0 also added dict comprehensions: {x: x**2 for x in another_container}

Hopefully you found this post useful! If you did please let us know so we can gauge whether we should do more posts like this in the future.

Author

Brett Cannon
Principal Software Engineering Manager

Dev lead on the Python extension for Visual Studio Code. Python core developer since April 2003.

0 comments

Discussion are closed.