Most Python programmers are at least vaguely aware of sys.path, PYTHONPATH,
and the effect they have on importing modules. However, there's a lot
of confusion about how to use them properly, and how powerful these concepts
can be if you know how to apply them. Twisted - and in particular the
plugin system - make very nuanced use of the python path, which can
sometimes make things that use them a bit hard to explain, since there isn't
a well-defined common terminology or good library support for working with
paths, except to the extent that they are used by importers.
This article is really about two things: the general concept of paths, and
the Twisted module "twisted.python.modules", which provides some specific
implementations of my ideas about the python path.
First of all, why should you care about python paths? To put it
simply, because very bizarre problems can result if you use them
incorrectly. Also, you need to know about them in order to use
Twisted's plugin system effectively, and of
course you want to use
Twisted, right? :)
What kind of problems? Even very popular, well-regarded Python
packages by very experienced Python programmers sometimes mess this up
pretty badly. Here's a simple example of what can go wrong with a
package you probably know of, the Python Imaging Library:
>>> import Image
>>> import PIL.Image
>>> img = PIL.Image.Image()
>>> Image.__file__
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'
>>> PIL.Image.__file__
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'
Here we can see that you can import PIL's "Image" module as
either "PIL.Image" or simply "Image". Both these modules are loaded
from the same file. On the face of it, this is simply a
convenience. But let's dig deeper:
>>> PIL.Image == Image
False
The modules aren't the same object! This has some nasty
practical repercussions:
>>> isinstance(img, Image.Image)
False
For example, Image objects created from one of these PIL
modules do not register as instances from the other, even though they're all
the same code. Worse yet, this mistake can become "sticky" if you use
them along with a module like pickle, which carries the module and class
name into the data:
>>> from cPickle import dumps
>>> img2 = Image.Image()
>>> dumps(img)
"(iPIL.Image\nImage\n ...
>>> dumps(img2)
"(iImage\nImage\n ...
Many Python features and packages depend on matching
types. Zope Interface, for example, will not let you use adapters for
one Image type for the other, the objects will not compare equivalent even
if they really are, and so on. And none of this is a bug in the
code! Why does it happen?
PIL is a package; that is, a directory with Python source code and an
"__init__.py" in it, named "PIL". However, it also installs a ".pth"
file as part of its installation. ".pth" files are one way to add
entries to your sys.path. This particular one adds the "PIL" directory
to your path, which means it can be loaded from two entries: as a package,
from your "site-packages" directory.
This isn't to pick on PIL or the Effbot; I've seen lots of projects which
have a "lib" directory with an __init__.py and change its name at
installation time, or inconsistently reference subpackages with relative and
absolute imports, or do any number of things which are just as bad. I
hope that I've convinced you not to do the same thing with
your
project, but I won't dwell on the problem here, since I have a solution
handy.
Unless you already know what is going on (although I'm sure many of you
reading this already do), this can be a bit confusing to figure out.
You can use twisted.python.modules to ask this question rather
directly. Here's how:
>>> from twisted.python.modules import getModule
>>> imageModule = getModule("Image")
>>> pilImageModule = getModule("PIL.Image")
>>> imageModule.pathEntry
PathEntry<FilePath('/usr/lib/python2.5/site-packages/PIL')>
>>> pilImageModule.pathEntry
PathEntry<FilePath('/usr/lib/python2.5/site-packages')>
Here we're asking twisted.python.modules to give us objects
that represent metadata about two modules, without actually loading
them. The attribute here is the 'pathEntry' attribute, which tells us
what entry on sys.path the module would be loaded from, if it's
imported.
>>> import sys
>>> pilImageModule.isLoaded()
False
>>> imageModule.isLoaded()
False
>>> 'PIL.Image' in sys.modules
False
>>> 'Image' in sys.modules
False
Look, no modules!
Of course, if we
wanted to load those modules, it's easy
enough:
>>> pilImageModule.load()
<module 'PIL.Image' from
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
>>> imageModule.load()
<module 'Image' from
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
You can also get lists of modules. For example, you can
see that the list of modules in the "PIL" package is suspiciously similar to
the list of top-level modules that comes from the path entry
where the "Image" module was loaded:
>>> pilModule = getModule("PIL")
>>> pprint(list(pilModule.iterModules())[:5])
[PythonModule<'PIL.ArgImagePlugin'>,
PythonModule<'PIL.BdfFontFile'>,
PythonModule<'PIL.BmpImagePlugin'>,
PythonModule<'PIL.BufrStubImagePlugin'>,
PythonModule<'PIL.ContainerIO'>]
>>> pprint(list(imageModule.pathEntry.iterModules())[:5])
[PythonModule<'ArgImagePlugin'>,
PythonModule<'BdfFontFile'>,
PythonModule<'BmpImagePlugin'>,
PythonModule<'BufrStubImagePlugin'>,
PythonModule<'ContainerIO'>]
As you might imagine, the ability to list modules and load the
ones that seem interesting is a great way to load plugins - and that's
exactly how Twisted's plugin system is implemented. While the plugin
system itself is a topic for another post (or perhaps you could just
read the documentation) the way it finds plugins is interesting.
For example, let's take a look at the list of Mantissa plugin modules I have
installed:
>>> xmplugins = getModule('xmantissa.plugins')
>>> pprint(list(xmplugins.iterModules()))
[PythonModule<'xmantissa.plugins.adminoff'>,
PythonModule<'xmantissa.plugins.baseoff'>,
PythonModule<'xmantissa.plugins.free_signup'>,
PythonModule<'xmantissa.plugins.offerings'>]
This simple query is actually an incomplete list. It's
just the modules that come with Mantissa itself. Python has a special
little-known rule when loading modules from packages, and
twisted.python.plugins honors it:
if
there is a special variable called "__path__" in a package, it is a list of
path names to load modules from. However, twisted.python.plugins
doesn't load modules unless you ask it to, so it can't determine the value
of that attribute. As it so happens, twisted.plugins uses the __path__
attribute in order to allow you to keep your development installations
separate, so twisted.python.plugins can't determine all the places you might
need to look for plugins without some help. Let's just load that
package so we can look at its __path__ attribute:
>>> xmplugins.load()
<module 'xmantissa.plugins' from
'/home/glyph/Projects/Divmod/trunk/Mantissa/xmantissa/plugins/__init__.pyc'>
Now that we've loaded it, let's have a look at that list:
>>> pprint(list(xmplugins.iterModules()))
[PythonModule<'xmantissa.plugins.adminoff'>,
PythonModule<'xmantissa.plugins.baseoff'>,
PythonModule<'xmantissa.plugins.free_signup'>,
PythonModule<'xmantissa.plugins.offerings'>,
PythonModule<'xmantissa.plugins.mailoff'>,
PythonModule<'xmantissa.plugins.radoff'>,
PythonModule<'xmantissa.plugins.sineoff'>,
PythonModule<'xmantissa.plugins.hyperbolaoff'>,
PythonModule<'xmantissa.plugins.imaginaryoff'>,
PythonModule<'xmantissa.plugins.blendix_offering'>,
PythonModule<'xmantissa.plugins.billed_signup'>,
PythonModule<'xmantissa.plugins.billoff'>,
PythonModule<'xmantissa.plugins.derivoff'>]
That's my full list of Mantissa plugins, including my
super secret
Divmod proprietary plugins.
This list is generated because plugins packages use a feature (which was
previously kind of a gross hack but
will
be an officially supported feature of the next version of Twisted) to
set their path to every directory with the same name as the plugin package
which is not also a package on your python path. In other
words, if you have 2 sys.path entries, a/ and b/, and one package,
x.plugins, in b/x/plugins/__init__.py with this trick in it, then if you
have a file b/x/plugins/foo.py, it will be considered to contain the module
"x.plugins.foo". This requires that you
do not have a file
b/x/__init__.py or b/x/plugins/__init__.py. If you do, this hack will
treat the two paths the same way that Python does: duplicate packages in
your path, so the package in a/ is loaded and the package in b/ is
ignored.
The distinction between packages and path entries is why all the Twisted and
Divmod projects conventionally have capitalized directory names but
lowercase package names. "Twisted" is where your
path entry
should point; "twisted" is the
python package that is loaded from
that
path entry. "Twisted" should never have an __init__.py
in it. "twisted" always should. This goes the same for "Axiom"
and "axiom", "Mantissa" and (the
unfortunately named)
"xmantissa". You will sometimes encounter other examples of this style
of naming
floating around
the web.
When using Twisted and Divmod infrastructure, keeping this distinction is
clear is critical, because otherwise it is difficult to develop plugins
independently. You probably don't want to copy your development
plugins into your Twisted installation - they're part of your source
repository, after all, not ours. However, keeping the distinction
clear in your mind will avoid lots of obscure problems with duplicate
classes and naming, so it's generally a good idea even if you don't like our
naming conventions.
Please let me know in the comments which parts of this post you found
useful, if any. I know it's a bit rambling, and covers a number of
different topics, some of which may be obvious and some of which might be
inscrutable. I've experienced quite a bit of confusion when talking to
other python programmers about this stuff, but I'm not sure if it was my
awkward explanation of Twisted's plugin system or some inherent issue in
Python's path management.