Highlighting buried treasure in Twisted

I've previously blogged about twisted.python.modules, but it assumes you know about another API inside Twisted, twisted.python.filepath.  Unfortunately this module is rather under-documented and under-publicized, despite being extremely useful.  Unlike a lot of Twisted, much of the code in twisted.python can be extracted and used by itself, regardless of whether the program in question is networked or even event-driven.  This is especially true of FilePath, which is completely blocking, although sometimes I wish there were at least a version of it that wasn't.

A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents.  For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.

Here's the script using good old os.path:
import sys
import os

def os_shebangs(pathname):
for dirpath, dirnames, filenames in os.walk(pathname):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
if (fullpath.endswith(".py") and
file(fullpath, "rb").readline().startswith("#!")):
yield fullpath

def os_show_shebangs(pathname):
for path in os_shebangs(pathname):
sys.stdout.write("%s: %s\n" % (
path,
file(path, "rb").readline()[2:].strip()))

if __name__ == '__main__':
os_show_shebangs(sys.argv[1])

Pretty normal looking python code; not too much wrong with it.  At 20 lines and 596 characters long, it's not too complex.

Now let's have a look at a similarly idiomatic version using FilePath:
import sys
from twisted.python.filepath import FilePath

def shebangs(path):
for p in path.walk():
if (p.basename().endswith(".py") and
p.open().readline().startswith("#!")):
yield p

def showShebangs(pathobj):
for path in shebangs(pathobj):
sys.stdout.write("%s: %s\n" % (
path.path,
path.open().readline()[2:].strip()))

if __name__ == '__main__':
showShebangs(FilePath(sys.argv[1]))
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path.  However, a small space savings is hardly the most interesting property of this code.  The advantages over the version that uses os.path:
  • It's easier to test.  You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
  • It's easier to read.  You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
  • It's easier to write.  How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
  • It's easier to secure.  If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful.  What about "/"?  What about ".."?  With FilePath, you simply supply the input to the 'child' method, and...
    >>> from twisted.python.filepath import FilePath
    >>> fp = FilePath(".")
    >>> x = fp.child("okay")
    >>> y = fp.child("..")
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "twisted/python/filepath.py", line 308, in child
    raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
    twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
    >>> z = fp.child("hello/world")
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "twisted/python/filepath.py", line 305, in child
    raise InsecurePath("%r contains one or more directory separators" % (path,))
    twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators
  • It's easier to extend.  As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.
Not only does FilePath provide these benefits, it has very few dependencies.  Even if you don't like Twisted much, you can use twisted.python.filepath by copying only 3 modules into your project (twisted.python.filepath, twisted.python.win32, and twisted.python.runtime) and twiddling the appropriate imports to be relative.  Since FilePath is only one import for your code, and mostly consists of method calls, it will easily work with Twisted's version or your own.  So, share and enjoy!

Do Not Want

I do not want a book called "The Ghost Brigades", but someone thought I did, for a minute.  Those of you following this via my Blendix activity page will know what I mean.

One of the interesting things about working on Blendix is that we get to see all the ways in which other services export bad data.  Amazon is particularly weird about it, though.  As we were testing our code we saw a variety of bits of bad data intermittently published, some of which were fusions of the names of random products, some of which were actual products that had nothing to do with the user in question.  I'm a little sad that my first bogus entry was a real product, though.  Some of the incorrect entries we saw during testing were pretty hilarious.

Has anyone else out there used to using Amazon's various APIs to pull wishlist data and seen similar results?  Is there any way to work around it, or to recognize the bogus data?  Advice is welcome.

It Will Blend

So, as you may already have heard, we did a thing.  While I've been able to tell a few people, I've been waiting to talk about it publicly for a while.

We've gradually begun to launch Blendix.  Blendix aims to be the one site that you need to visit every day to tell you what's happening in your world.  Right now, it's an aggregator for various kinds of data around the web.  You can pull in the usual suspects (RSS, ATOM), but it's a bit more than just an RSS aggregator: today, it understands a few more specific things (last.fm tracks, amazon wishlists, yahoo weather, flickr photos), and lots more are planned.  Although I'm practically bursting with awesome ideas for its future development, I will try to refrain from commenting too much on those plans.  As I've said before, software is like frisbee — predictions more specific than "hey, watch this!" can be dangerous :-).  However, since I know it's the first thing you'll all suggest, I can say that yes, there will be richer integration with social networks.

In brief, it works like this.  You log in, and you create some people.  Some feeds may be automatically discovered based on their email addresses, and you can add your own.  Maybe you subscribe to some people: if you're looking for one to subscribe to, may I suggest "Glyph Lefkowitz".  I hear he's pretty interesting.  Finally, you visit your "dashboard" page, which — thanks to the magic of Athena — will update whenever blendix detects that one of the people you maintain or are subscribed to publishes some new data.  You can expect to see more of that magic as it develops.

A word of warning, though: it doesn't work with Safari (or IE, but I don't imagine that a lot of you are using IE).  We're working on it, but for the time being, Firefox is strongly recommended.  (Firefox for the mac works fine.)  Most of the work involved in supporting Safari is in Nevow, which is all open source, so if you are familiar with these sorts of problems, please submit patches!

This is our first live, fully public deployment of a Mantissa server, and I'm really glad to have it out there.  We are, of course, working through the usual kinks of getting our first batch of users ("beta" isn't a web 2.0 buzzword for nothing), but I'm fairly pleased with it so far.

Blendix itself isn't open source — yet.  We've mainly been keeping the product as a whole behind closed doors as a matter of expediency.  We didn't want to support an API that was heavily in development.  However, one of my goals as we get closer to a bigger launch is to get enough of the code out there for you folks in the community to write extensions and improvements for it.  There's already enough for some things (like supporting Safari!), even at the application level.  For example, a big chunk of Blendix is the "Person" object, which is available in the public Mantissa code, along with UI for editing, browsing, and viewing.

I'm really glad to have something "out there" to share with you all, and I'd like to encourage you to share back.  Please check out Blendix, and make liberal use of the information you find under the "Contact" link at the bottom of every page.  Let me know what you think, especially if you're a programmer and you've got some ideas for hacking on the code. This will be especially useful as get into the initial phase of pushing the core out to the community.  Also, we really want to make sure the experience is as bug-free as possible, so let us know about any problems you have.

Export for Python

I've started playing around with minor projects in my personal launchpad space, partially to try out bzr.

Most recently I wrote a hack, temporarily named "pyexport", which allows you to control the names which your library module namespaces export to application code.

So far, I've implemented a few features.
  1. export.alias(), which registers an alias for a method in another module that will not be imported until that module is imported,
  2. export.explicitly(), a convenience function which makes cooperating with __all__ easy
  3. export.internal(), which marks a module as "internal", and warns any application code (code outside the package which defines the module) which tries to import it
  4. export.restrict(), a method which prevents "leakage" of extraneous imported or private names - for example, if you have a module 'foo' which imports 'sys', you can normally do 'from foo import sys' in Python and get a result.
  5. export.singleton(), which replaces the calling module with a proxy that shares a namespace between the given singleton and the module itself.
It'll be some work to turn this rough prototype into something really usable for a large system like Twisted; at the very least it will need to be rebuilt test-first and integrated with the pydoctor and pydoc documentation tools.  Let me know what you think, and if I should pursue it!

Twenty Post-Dollars

I finally hit the limit of my flickr account; I've posted 200 pictures, and so, starting from the beginning of my photo stream, some of them are going to start disappearing.

I have about 100 more pictures to post, but I've been intentionally keeping my account free, because I'm not sure if it's actually worth my time (or money) to use the features of a "pro" account if nobody is really listening.

So I'm wondering, are there 20 people out there — one for each of the dollars it would take to upgrade my account — who actually care about what I'm posting on flickr?  No need to get effusive, just ping me in the comments (or email) to let me know I should upgrade and keep posting pictures.