Python mainpoints

Wednesday July 19, 2006
In Python, sometimes you want to make a module runnable from the commandline, but also usable as a module.  The normal idiom for doing this is:

# myproject/gizmo.py
from myproject import thing1

class Gizmo(thing1.Thingy):
def doIt(self):
print "I am", self, "hello!"

def main(argv):
g = Gizmo()
g.doIt()

if __name__ == "__main__":
import sys
main(sys.argv)

There are several problems with this approach, however.  Immediately, you will notice that packages no longer work.  Assuming that 'myproject' has an __init__.py, here is the output from an interpreter and the shell:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d38c0c> hello!
>>>
% python myproject/gizmo.py
Traceback (most recent call last):
File "myproject/gizmo.py", line 2, in ?
from myproject import thing1
ImportError: No module named myproject

In the interpreter case, sys.path includes ".", but in the script-as-program case, sys.path includes dirname(__file__).  The normal fix for this is to make sure that your PYTHONPATH is properly set, but that can be quite a drag if you're just introducing a program to a user who wants to run something right away.  Quite often, people either fix or don't experience this problem by changing 'from myproject import thing1' to 'import thing1' to ignore this distinction.  It almost looks like it works:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d72c0c> hello!
>>>
% python myproject/gizmo.py
I am <__main__.Gizmo object at 0xb7ccfb6c> hello!

You might think, with the exception fixed, that this program is now ready for action.  Not so fast, though.  There are a couple of subtle problems here.  Notice how Gizmo's class name is different - it thinks it's in the "myproject.gizmo" (correct) package in one case, and the "__main__" (incorrect) package in another?  That has the nasty side-effect of making all Gizmo objects broken with respect to Pickle, or any other persistence system which uses class-names to identify code.  Depending on your path configuration this problem can get almost intractably difficult to solve.  There's another solution which I have seen applied, which is to change the 'if __main__' block at the end to look something like this:
if __name__ == '__main__':
import myproject.gizmo
myproject.gizmo.main([])

That works fine, but it does all the work of setting up the module twice, unless Gizmo uses a destructive metaclass - which many libraries, including at least a few that I've written, use - i.e. an axiom.item.Item subclass is declared.

I'd like to suggest an idiom which may help to alleviate this problem, without really introducing any additional overhead.  If you're going to declare a __main__ block in a module that can also be a script, do it at the top of your module, like so:

# myproject/gizmo.py
if __name__ == "__main__":
import sys
from os.path import dirname
sys.path.append(dirname(dirname(__file__)))
from myproject.gizmo import main
sys.exit(main(sys.argv))

from myproject import thing1

class Gizmo(thing1.Thingy):
# ...

With only two extra lines of code, your module is only imported once, path-related problems are dealt with, 'main' can return an error-code just like in C, and your module will function in the same way when it runs itself as when it is imported.  Here's some output:


glyph@alastor:~% python
>>> from myproject.gizmo import main
>>> main([])
I am <myproject.gizmo.Gizmo object at 0xb7d69c2c> hello!
>>>
% python myproject/gizmo.py
I am <myproject.gizmo.Gizmo object at 0xb7cb7cac> hello!

As a little bonus, this even works properly with Python's "-m" switch - although at least in 2.4 you have to use the odd '/' separator rather than '.' like everywhere else:

% PYTHONPATH=. python -m myproject/gizmo
I am <myproject.gizmo.Gizmo object at 0xb7d0af4c> hello!