Idiomatic Python: functions versus classes

In Python, everything is an object. Compared to Java which forces you to code everything in an object-oriented programming style but still has the concept of primitive types which are not objects on their own (although Java 5 added autoboxing to help hide this discrepancy), Python has no primitive types which aren’t objects but provides the concept of functions to support procedural programming (Python also has minor support for functional programming but that’s another conversation). While this dichotomy in Python of supporting both object-oriented and procedural programming allows using the right approach for the needs of the problem, people coming from other languages that are only procedural or object-oriented tends to lead to people missing the opportunity to harness this dichotomy that Python provides. The hope of this post is to show people coming from object-oriented-heavy languages like Java and C# that using functions when it makes sense is an idiomatic use of Python.

The biggest question you need to ask yourself when working with Python is whether a class is even appropriate for the problem (Jack Diederich gave a talk at PyCon US 2012 on this exact topic)? If you take the time to think over the problem you’re trying to solve you may find the answer is “no” on a somewhat regular basis.

All of this begs the question of what exactly is object-oriented programming for? For many this is a question that they have never stopped to think about because they came from a language that was entirely object-oriented to begin with or a language that just completely lacked it. The naïve answer is that object-oriented programming provides a way to organize code that puts methods with the data they work with. But really that’s just a kind of namespace, and namespaces are not unique to object-oriented programming.

The truly important part of object-oriented programming is dispatching/messaging. This is what inheritance is built on top of and what lets an object choose which method to use at run-time. The classic ability of object-oriented programming to affect semantics by overriding a method in a subclass comes into play and shows its usefulness in this regard. When you want to change semantics of a method that another method uses, essentially injecting your differing semantics into the middle of executing code without having to copy-and-paste the surrounding code to make such a change is where object-oriented programming shines.

But is the ability to override methods always important? Consider the heapq module in Python’s standard library. While a heap could be viewed as a classic object-oriented programming problem by providing a heap object, it might not always make sense to dictate upfront that someone use a special class just to have a heap. In the case of heapq, you pass in a mutable sequence that is to be the heap into functions instead of creating a heap class to begin with. This becomes important in terms of API flexibility as it means that while your code may want to use a heap, you can have users of your code just give you a mutable sequence and you can turn it into a heap as you deem necessary. So in this instance, it’s much better to have your code which implements a heap be functions that can work with any mutable sequence instead of a class that someone must start out using.

Essentially, unless you see a need for users of your code to customize some bit of functionality that is in the middle of a call chain, don’t bother with using a class. Namespacing in Python is very rich, so you don’t need object-oriented programming to organize data with the code that works with it (simply create a module that groups code that operates on similar objects together). And if you expect users to only change semantics of your code at the beginning or end of a computation, then you still don’t need object-oriented programming as that’s just wrapping a function call to manipulate data before/after calling your function. In the end, you really only need to object-oriented programming to provide a mechanism for which the semantics of how something is computed or accessed at a fundamental level that permeates through code implicitly.

To help make this distinction more clear, think about heapsorting again. This is a straight-forward concept and it’s easy to write a function that takes a sequence and performs a heapsort which requires just comparing objects as less-than or equal to each other. Since the algorithm is already object-agnostic, the sorting function itself doesn’t need to be a method on the sequence object. But since comparing two objects can need to vary between objects and is embedded within the heapsort algorithm, the comparison operation should be a method that can be easily overridden. While you could take a comparison function as an argument to the sorting function, that comparison function would need to know how to handle all possible objects in the sequence, while having the comparison code attached to the object means an object just needs to worry about itself in the comparison. And so in this instance it makes sense for comparing to be an object-oriented feature while the sorting of a sequence isn’t.

And even when it does make sense to define a class, don’t overdo it. For instance, do not overdo the use of staticmethod. While it might be tempting to put all related code on a class even when it doesn’t require access to the instance or class itself, the use of staticmethod should be relegated to only things which a subclass may want to override. If you have a staticmethod which is private or is not called by other methods on the class then you should simply make it a function and pass the instance into the function. It works just as well and clearly separates what is tightly coupled to the class’ implementation and what isn’t.