Let's Get Pumped

I want to be excited about the python 3000 effort.  Every programmer loves a green field project; there's none of that icky legacy stuff holding you back, and you can have a beautiful, graceful new creation that exceeds the limitations of its hobbled and mis-designed ancestors.  Python 3000 could fix all of Python's warts, giving us a clean, simple language with more power and flexibility.

However, the "icky legacy stuff" happens to include every program I've written in the last 5 years, as well as really important functionality like GTK bindings, database support, and compatibility with applications (like GIMP, Gaim and Blender) which embed Python themselves. so I need something more substantial to get excited about.  Something to make it worth the rather onerous effort of upgrading the Twisted codebase, and simultaneously breaking support for millions of existing Python installations.  Some of the "substantial" new features I've seen, like the new "iostack" library, seem to be controversial.  I haven't done a thorough code review myself, but a few comments on the mailing list, like "8k is a perfect buffer size for any application", suggest that while there may be improvements, these changes are far from problem-free.  (Not to mention the fact that a large portion of what iostack is trying to accomplish is the sort of thing that would be features in Twisted anyway, so it's likely we won't be using much of that code...)

PyPy brings some of this effort along with it as well, but PyPy's advantages are much clearer: it will be about a zillion times faster, it will make writing bindings to existing native functionality fundamentally easier, and it might be possible to add core language interpreter features, like restricted execution, without having to patch the core itself.  Also, PyPy is currently targeting fundamentally the same language as Python 2.4, whereas Python 3000 is intentionally incompatible, so it will be possible to support Python 2.4 and PyPy, although PyPy may require a lot of conditional blocks to work right in the real world.

This is all a high-level understanding gathered from listening to rumors, perusing mailing list archives, reading a few websites, and attempting to read between the lines.  I could be wrong about both projects.  With my current understanding though, the plan for the Twisted project is to support the 2.x Python series and PyPy as soon as it's feasible, but ignore Py3k until there is a compatibility layer which would allow us to migrate gradually rather than in one fell swoop.

Lots of Python fans seem to read this blog, so maybe you can help me out.  What new features or idioms should I be really excited about in Python 3000?  Am I missing something fundamental about what it's trying to achieve?

(Those of you who don't have an OpenID login can feel free to answer this question by sending email to glyph@divmod.com rather than posting a comment.)

Scientific Method(ology)

I'm an empiricist at heart.  It's hard to integrate the principles of scientific discovery into one's daily life; while it makes for a good ontological basis for existence, it has the unfortunate side-effect of being damned expensive to apply consistently.  Still, I try when I can.

Recently it occurred to me that UQDS (Divmod's development methodology) has a striking similarity to the scientific process.  I can't find something that neatly lays it out like I remember learning it in school, but wikipedia has a very nice treatment of the whole process.

As I understand the process of scientific knowledge acquisition, there are four phases:


  • A hypothesis is developed, which is an idea that might possibly be true.

  • An experiment is designed, and performed, to test the hypothesis.  if the experiment fails, a new hypothesis is designed.

  • The experiment is documented in a paper and published, and the publication is subjected to peer review by other scientists; the experiment is repeated.  if the experiment can't be repeated, the experiment must be re-designed to eliminate errors or an alternative hypothesis designed to take errors into account.

  • The hypothesis is accepted as a fact, which may be later be used to develop a model, theory, or even a law.


I know a few of my scientist friends out there might read this, so I want to be clear that I'm not proposing that this is what scientists do: scientists do a lot of things. this is the merest subset of the process required for things to be accepted as "scientific fact", and only in the abstract sense.  Different scientific communities have different official standards.  Still, any new scientific discipline would probably have to start with these rules first to be considered "science", and then probably develop additional ones later.

UQDS corresponds rather directly, as if a code repository were its own particular branch of science.


  • hypothesis: a ticket is created, which is a feature which may be possible to implement correctly.

  • experiment: a branch is developed, which includes test cases.  here, as the test cases fail, the branch is refined and the ticket adjusted, much as the hypothesis must be adjusted.

  • peer review: well, uh... peer review.  Another developer reviews the branch, and verifies that the tests test the hypothesis, runs the tests (replicates the experiment) to verify that they pass for them as well, possibly in a different environment, and reports their findings as well.  If the tests fail, the branch must be adjusted more.

  • fact: the changes are then accepted for merging, where they are incorporated into the codebase, which corresponds to the scientific body of knowledge.

The Web One Hundred Point Oh Challenge

I haven't had much time for blogging lately, so rather than post my musings, here are some questions. I credit these to a combination of Iain M. Banks, r0ml, and the Long Now foundation Please comment and post a link to your blog if you write about one of these.

What kind of a program would take one hundred years to write? How would one manage such a project? How would you manage planning and estimating at that scale?

What if you had to write a program that would only be run once, but was supposed to run forever - could never, ever crash, notwithstanding hardware failures? That was expected to outlast humankind on earth? What would such a program do? How would you test it?

What if you had to write a program that would be maintained for ten thousand years - how would you factor it so that the lower levels could still be improved without breaking anything? What would the release process for modules look like? Imagine that you had inherited a program that was ten thousand years old but had never been designed to be maintained this way?

Finally - how does thinking about these problems of heretofore impossible scale affect the way you write programs today?

Update: I forgot to mention Alan Perlis's Epigram #28.

a scheme program

some jerk is not learning about functional programming

(define (parse callback)
(lambda (data)
(if (>= (string-length data) 10)
((parse (callback (substring data 0 10))) (substring data 10))
(lambda (moredata)
((parse callback) (string-append data moredata))))))

(define (make-line-writer n)
(lambda (line)
(display (format "~A: '~A'\n" n line))
(make-line-writer (+ n 1))))

((((((parse (make-line-writer 1))
"hello") " world") " radix") " is") " dumb!!!!!!!!!!!!!!!")

Update: In addition to not being buggy, I've changed it so that this version is actually functional as well.
Update 2: Forgot the formatting change in the last update.

Python mainpoints

In Python, sometimes you want to make a module runnable from the commandline, but also usable as a module.  The normal idiom for doing this is:

# myproject/gizmo.py
from myproject import thing1

class Gizmo(thing1.Thingy):
def doIt(self):
print "I am", self, "hello!"

def main(argv):
g = Gizmo()
g.doIt()

if __name__ == "__main__":
import sys
main(sys.argv)

There are several problems with this approach, however.  Immediately, you will notice that packages no longer work.  Assuming that 'myproject' has an __init__.py, here is the output from an interpreter and the shell:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d38c0c> hello!
>>>
% python myproject/gizmo.py
Traceback (most recent call last):
File "myproject/gizmo.py", line 2, in ?
from myproject import thing1
ImportError: No module named myproject

In the interpreter case, sys.path includes ".", but in the script-as-program case, sys.path includes dirname(__file__).  The normal fix for this is to make sure that your PYTHONPATH is properly set, but that can be quite a drag if you're just introducing a program to a user who wants to run something right away.  Quite often, people either fix or don't experience this problem by changing 'from myproject import thing1' to 'import thing1' to ignore this distinction.  It almost looks like it works:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d72c0c> hello!
>>>
% python myproject/gizmo.py
I am <__main__.Gizmo object at 0xb7ccfb6c> hello!

You might think, with the exception fixed, that this program is now ready for action.  Not so fast, though.  There are a couple of subtle problems here.  Notice how Gizmo's class name is different - it thinks it's in the "myproject.gizmo" (correct) package in one case, and the "__main__" (incorrect) package in another?  That has the nasty side-effect of making all Gizmo objects broken with respect to Pickle, or any other persistence system which uses class-names to identify code.  Depending on your path configuration this problem can get almost intractably difficult to solve.  There's another solution which I have seen applied, which is to change the 'if __main__' block at the end to look something like this:
if __name__ == '__main__':
import myproject.gizmo
myproject.gizmo.main([])

That works fine, but it does all the work of setting up the module twice, unless Gizmo uses a destructive metaclass - which many libraries, including at least a few that I've written, use - i.e. an axiom.item.Item subclass is declared.

I'd like to suggest an idiom which may help to alleviate this problem, without really introducing any additional overhead.  If you're going to declare a __main__ block in a module that can also be a script, do it at the top of your module, like so:

# myproject/gizmo.py
if __name__ == "__main__":
import sys
from os.path import dirname
sys.path.append(dirname(dirname(__file__)))
from myproject.gizmo import main
sys.exit(main(sys.argv))

from myproject import thing1

class Gizmo(thing1.Thingy):
# ...

With only two extra lines of code, your module is only imported once, path-related problems are dealt with, 'main' can return an error-code just like in C, and your module will function in the same way when it runs itself as when it is imported.  Here's some output:


glyph@alastor:~% python
>>> from myproject.gizmo import main
>>> main([])
I am <myproject.gizmo.Gizmo object at 0xb7d69c2c> hello!
>>>
% python myproject/gizmo.py
I am <myproject.gizmo.Gizmo object at 0xb7cb7cac> hello!

As a little bonus, this even works properly with Python's "-m" switch - although at least in 2.4 you have to use the odd '/' separator rather than '.' like everywhere else:

% PYTHONPATH=. python -m myproject/gizmo
I am <myproject.gizmo.Gizmo object at 0xb7d0af4c> hello!