Python Packages and You
Jean-Paul Calderone wrote an excellent blog post on the right way to structure a python project. This post will build on that post by covering concrete examples of how to write imports, how to distribute your package, and what not to do.
As an example, we’ll be looking at my own project, passacre. There is a
python package called passacre
immediately in the root of the project
(i.e. not in a lib
or src
directory). Here’s what’s inside it:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Intra-package imports
There is exactly one way code in your package should be importing other code inside your package, and that’s with absolute imports.
Basic imports
So, in this code, passacre.config
needs to import some things from
passacre.multibase
. The import looks like this:
1
|
|
Note that the import uses the fully-qualified name of the module,
passacre.multibase
. This is important for two reasons:
- It makes your code very explicit in what you’re importing.
- Your code will continue to work in the future. There’s a very, very old
method of doing imports (“implicit relative imports”) wherein one would
write
from multibase import Multibase
instead, but this is gone in python 3. It should’ve been removed from python 2.7, but unfortunately, they forgot to do it.
There is another style of imports also covered in PEP 328, wherein the code would look instead look like this:
1
|
|
I’ll only mention explicit relative imports once more in this post; I very much prefer the look of absolute imports, but this is primarily an issue of style.
Test module imports
passacre
also comes with a test suite in the passacre.test
submodule. These
tests necessarily have to import the passacre code to test it. Here’s what the
passacre.test.test_multibase
module does to import passacre.multibase
:
1
|
|
“But wait! That’s the same as before!” you cry. Since the fully-qualified module name is always the same, the imports are also always the same. This is another advantage of absolute imports—you always know exactly what you’re getting.
Sibling subpackage imports
For demonstration purposes, we’ll pretend that there’s another subpackage
passacre.test2
that contains even more tests. Here’s the imaginary addition
to the package structure:
1 2 |
|
This test_multibase
module needs to use some utility function from
passacre.test.util
, so it imports the function:
1
|
|
Using absolute imports means that it doesn’t matter where in the project a module is when you try to import another module. Python will always look up the module to import from the root of your package.
Shadowed standard library module imports
Something I’m sure everyone has done at some point in their python career is
name a module the same as a module in the standard library. If I wanted a
passacre.socket
module that used the standard library socket
module,
import socket
should give me the standard library module instead of
passacre.socket
. PEP 328 covers this too by adding a __future__
feature absolute_import
, which is on by default in python 3.0 and higher.
Activating absolute_import
has exactly one effect: disabling implicit
relative imports. Since __future__
features only affect the module enabling
them, I can simply add from __future__ import absolute_import
to
passacre/socket.py
and any import socket
calls in it will import the
standard library socket
module.
Distribution
Once your package has some code in it that can run, you’ll probably want to let other people use your package as well.
setup.py
setup.py
is your package’s entry point into distutils or
setuptools. Every package should have one, as that’s how your code gets
put into the right place for your users. I won’t go too much into detail in how
to use them; the links above are for tutorials which will explain how to get
started.
My own rule of thumb is to use distutils unless I absolutely require a feature from setuptools. Some people will say to always use setuptools, and that’s okay too.
Executable scripts
Jean-Paul’s post recommends putting python scripts in the bin
directory
of your project root. This is fine advice, but there are newer methods which
mean you don’t need a bin
directory at all.
First, there’s __main__.py
. This is not quite the same as an executable
script, but it allows a package to be executed via python -m
. For example,
passacre/__main__.py
means that one can execute python -m passacre
and the
__main__.py
file will be executed. PEP 338 has more details on exactly
how this works.
passacre/__main__.py
is fairly small:
1 2 |
|
Since __main__.py
is executed directly, the actual main function should live
elsewhere if you actually want to be able to test it. This way, test code can
also from passacre.application import main
and be able to call that
function.
python -m
is not limited to executing packages. For times when you want to
quickly run some module that has an if __name__ == '__main__':
block in it,
you can use python -m passacre.thatmodule
. Importantly, this will ensure that
python recognizes your whole package, which means that your imports won’t fail.
Sometimes a project wants explicit binaries instead of requiring their users to
use python -m
. For this case, setuptools has automatic script creation.
Here’s an excerpt from passacre’s setup.py
file:
1 2 3 4 5 6 7 8 9 10 |
|
(The actual setup.py
uses extras, but that’s not important for this blog
post.)
This console_scripts
definition makes a passacre
executable that does the
same thing as the __main__.py
script did: imports the main
function from
passacre.application
and calls it.
Using distutils or setuptools to generate and/or install your python scripts is
important—as a part of the install process, the shebang line (e.g.
#!/usr/bin/env python
) will be rewritten to use the python that ran
setup.py
. So, if someone does /usr/local/bin/python2.6 setup.py install
,
passacre
will start with #!/usr/local/bin/python2.6
.
Additionally, python -m
and automatic script creation can be used beautifully
together—a project can have a scripts
subpackage containing one module for
every executable script. For example, passacre_generate
could be
passacre_generate = passacre.scripts.generate:main
and also contain:
1 2 |
|
For projects with a lot of scripts, this is a good organization method to keep the scripts separate from the more library-ish code.
Documentation
Please have some, especially if you’re distributing your package. Docstrings
are a good start, but a tutorial written with sphinx and/or a README
with usage examples goes a long way.
Common pitfalls
Don’t directly run modules inside packages
Seriously. Never ever ever do this. What I mean specifically is doing python
passacre/__main__.py
, or python passacre/test/test_application.py
, or
anything else that starts with python passacre/
. This will prevent python
from knowing where the root of your package is, and so absolute imports and
explicit relative imports will fail, or not import the code you think it should
be importing.
If you think you have to do this, you’re wrong. Instead, you probably want
to use python -m
, or generate an executable script to run, or
use a test runner. If you don’t think any of those cover your use case,
leave a comment and I’ll show you a better method.
Don’t set PYTHONPATH
to try to make it go
If you think you have to set PYTHONPATH
, you’ve probably fallen victim to the
first pitfall and are trying to execute a module directly. With proper package
layout and proper imports, you won’t need to set PYTHONPATH
to run your code.
Don’t modify sys.path
from code in your package
Like modifying PYTHONPATH
, but worse because it’s easier to affect other
people using your code. Never do this, as it will break and make people
trying to use your code very annoyed.
Don’t make your project root a package
Your project root should contain a python package, not be a package itself.
If you do this, your setup.py
will be very confusing (or not work at all) and
it will be very difficult to run your code.
Conclusion
Really, writing a python package isn’t hard. The rules above are very simple and will lead to simple, easy-to-read-and-understand python code.