Python Packages and You

Jean-Paul Calderone wrote an excellent blog post on the right way to structure a python project. This post will build on that post by covering concrete examples of how to write imports, how to distribute your package, and what not to do.

As an example, we’ll be looking at my own project, passacre. There is a python package called passacre immediately in the root of the project (i.e. not in a lib or src directory). Here’s what’s inside it:

1
2
3
4
5
6
7
8
9
10
11
12
13
passacre/__init__.py
passacre/__main__.py
passacre/application.py
passacre/config.py
passacre/generator.py
passacre/multibase.py
passacre/test/__init__.py
passacre/test/test_application.py
passacre/test/test_config.py
passacre/test/test_generator.py
passacre/test/test_multibase.py
passacre/test/util.py
passacre/util.py

Intra-package imports

There is exactly one way code in your package should be importing other code inside your package, and that’s with absolute imports.

Basic imports

So, in this code, passacre.config needs to import some things from passacre.multibase. The import looks like this:

1
from passacre.multibase import MultiBase

Note that the import uses the fully-qualified name of the module, passacre.multibase. This is important for two reasons:

  1. It makes your code very explicit in what you’re importing.
  2. Your code will continue to work in the future. There’s a very, very old method of doing imports (“implicit relative imports”) wherein one would write from multibase import Multibase instead, but this is gone in python 3. It should’ve been removed from python 2.7, but unfortunately, they forgot to do it.

There is another style of imports also covered in PEP 328, wherein the code would look instead look like this:

1
from .multibase import MultiBase

I’ll only mention explicit relative imports once more in this post; I very much prefer the look of absolute imports, but this is primarily an issue of style.

Test module imports

passacre also comes with a test suite in the passacre.test submodule. These tests necessarily have to import the passacre code to test it. Here’s what the passacre.test.test_multibase module does to import passacre.multibase:

1
from passacre.multibase import MultiBase

“But wait! That’s the same as before!” you cry. Since the fully-qualified module name is always the same, the imports are also always the same. This is another advantage of absolute imports—you always know exactly what you’re getting.

Sibling subpackage imports

For demonstration purposes, we’ll pretend that there’s another subpackage passacre.test2 that contains even more tests. Here’s the imaginary addition to the package structure:

1
2
passacre/test2/__init__.py
passacre/test2/test_multibase.py

This test_multibase module needs to use some utility function from passacre.test.util, so it imports the function:

1
from passacre.test.util import excinfo_arg_0

Using absolute imports means that it doesn’t matter where in the project a module is when you try to import another module. Python will always look up the module to import from the root of your package.

Shadowed standard library module imports

Something I’m sure everyone has done at some point in their python career is name a module the same as a module in the standard library. If I wanted a passacre.socket module that used the standard library socket module, import socket should give me the standard library module instead of passacre.socket. PEP 328 covers this too by adding a __future__ feature absolute_import, which is on by default in python 3.0 and higher.

Activating absolute_import has exactly one effect: disabling implicit relative imports. Since __future__ features only affect the module enabling them, I can simply add from __future__ import absolute_import to passacre/socket.py and any import socket calls in it will import the standard library socket module.

Distribution

Once your package has some code in it that can run, you’ll probably want to let other people use your package as well.

setup.py

setup.py is your package’s entry point into distutils or setuptools. Every package should have one, as that’s how your code gets put into the right place for your users. I won’t go too much into detail in how to use them; the links above are for tutorials which will explain how to get started.

My own rule of thumb is to use distutils unless I absolutely require a feature from setuptools. Some people will say to always use setuptools, and that’s okay too.

Executable scripts

Jean-Paul’s post recommends putting python scripts in the bin directory of your project root. This is fine advice, but there are newer methods which mean you don’t need a bin directory at all.

First, there’s __main__.py. This is not quite the same as an executable script, but it allows a package to be executed via python -m. For example, passacre/__main__.py means that one can execute python -m passacre and the __main__.py file will be executed. PEP 338 has more details on exactly how this works.

passacre/__main__.py is fairly small:

1
2
from passacre.application import main
main()

Since __main__.py is executed directly, the actual main function should live elsewhere if you actually want to be able to test it. This way, test code can also from passacre.application import main and be able to call that function.

python -m is not limited to executing packages. For times when you want to quickly run some module that has an if __name__ == '__main__': block in it, you can use python -m passacre.thatmodule. Importantly, this will ensure that python recognizes your whole package, which means that your imports won’t fail.

Sometimes a project wants explicit binaries instead of requiring their users to use python -m. For this case, setuptools has automatic script creation. Here’s an excerpt from passacre’s setup.py file:

1
2
3
4
5
6
7
8
9
10
from setuptools import setup

# [...]

setup(
    # [...]
    entry_points={
        'console_scripts': ['passacre = passacre.application:main'],
    },
)

(The actual setup.py uses extras, but that’s not important for this blog post.)

This console_scripts definition makes a passacre executable that does the same thing as the __main__.py script did: imports the main function from passacre.application and calls it.

Using distutils or setuptools to generate and/or install your python scripts is important—as a part of the install process, the shebang line (e.g. #!/usr/bin/env python) will be rewritten to use the python that ran setup.py. So, if someone does /usr/local/bin/python2.6 setup.py install, passacre will start with #!/usr/local/bin/python2.6.

Additionally, python -m and automatic script creation can be used beautifully together—a project can have a scripts subpackage containing one module for every executable script. For example, passacre_generate could be passacre_generate = passacre.scripts.generate:main and also contain:

1
2
if __name__ == '__main__':
    main()

For projects with a lot of scripts, this is a good organization method to keep the scripts separate from the more library-ish code.

Documentation

Please have some, especially if you’re distributing your package. Docstrings are a good start, but a tutorial written with sphinx and/or a README with usage examples goes a long way.

Common pitfalls

Don’t directly run modules inside packages

Seriously. Never ever ever do this. What I mean specifically is doing python passacre/__main__.py, or python passacre/test/test_application.py, or anything else that starts with python passacre/. This will prevent python from knowing where the root of your package is, and so absolute imports and explicit relative imports will fail, or not import the code you think it should be importing.

If you think you have to do this, you’re wrong. Instead, you probably want to use python -m, or generate an executable script to run, or use a test runner. If you don’t think any of those cover your use case, leave a comment and I’ll show you a better method.

Don’t set PYTHONPATH to try to make it go

If you think you have to set PYTHONPATH, you’ve probably fallen victim to the first pitfall and are trying to execute a module directly. With proper package layout and proper imports, you won’t need to set PYTHONPATH to run your code.

Don’t modify sys.path from code in your package

Like modifying PYTHONPATH, but worse because it’s easier to affect other people using your code. Never do this, as it will break and make people trying to use your code very annoyed.

Don’t make your project root a package

Your project root should contain a python package, not be a package itself. If you do this, your setup.py will be very confusing (or not work at all) and it will be very difficult to run your code.

Conclusion

Really, writing a python package isn’t hard. The rules above are very simple and will lead to simple, easy-to-read-and-understand python code.

Comments