Back from PyCon

Tuesday March 18, 2008

Summary: PyCon 2008 was a great time. I didn't go to a single talk, except a few of the keynotes; I spent pretty much my entire time cross-pollinating with other projects and plugging the until-recently-secret Twisted Software Foundation - our nickname for the Twisted project's membership of the Software Freedom Conservancy (TSF/SFC).

I briefly addressed the audience (of over one thousand python users) to kick off the TSF announcement, mostly to introduce Duncan. However, someone managed to snap a picture of me that I think captured the feeling of awe that we've come so far in such a relatively short time.

I spoke to about 400 people at the conference, and I have a lot to follow up on. I also have a day job, 阿as those of you that I spoke to about Mantissa rather than Twisted know ;-).

If I talked to you about something at the conference, please don't hesitate to send me email reminding me about it. Ideally, send me email reminding me in one to three weeks. I have almost 1000 unread emails right now and while I try to be rigorous about using some unique features of Divmod's mail system to make sure I reply to each one, there is a certain volume of communication which no tool can help me cope with.

My first priority is blogging about interesting aspects of the conference in the next few days, before I've forgotten.

That Ain't Workin'

Wednesday March 05, 2008

I'm a big fan of Nine Inch Nails. Not quite to the degree of buying every Halo, but I have a number of his albums. So of course I've been intrigued by his latest offering. Today, while looking at Nine Inch Nails "Ghosts I-IV' website, I noticed an interesting bit of information:

We have SOLD OUT of the 2500 Limited Edition Packages.

My great-uncle is fond of a saying. "It is better to be rich and healthy than to be poor and sick." Seeing this, I was reminded of it. It's not quite as catchy, but it's better to be customer-friendly and a huge success than reviled as corrupt and a failure.

The music and movie industries have been telling us for the last few years that digital restrictions are required in order to save their businesses from destruction. Well, Mr. Reznor has proven them wrong quite dramatically. Before I continue, let me address an obvious objection up front - I realize he's a superstar. However, the spokespeople for the RIAA that support their claims are also superstars: Lars Ulrich is hardly a starving artist laboring in obscurity. I'm not saying everyone can do what he did, only that the people who are already rich in the music industry can continue to be rich without the bullshit that they claim is critical.

The "Limited Edition", for those of you not up on the latest NIN happenings, is a three hundred dollar version of the album, containing a bunch of extras and a signature from Mr. Reznor himself. The full album, in lossless, non-restricted format, costs five dollars. There were 2500 copies of the limited edition.

Let me emphasize for those of you who might not be quite as up on the terminology that "lossless" formats (which NIN is selling for $5 here) are the highest quality format that it is possible to distribute over the internet. Other music producers, out of fear for eating their CD revenues, have mostly refused to provide digital copies of their music of this quality.

Also, to compare pricing: Apple typically charges 99¢ for a DRM-free song: it's not lossless, but it's 256kbps, which is fairly high quality. (I would not believe it if someone told me they could hear the difference, but there is a marginal difference in the perception of value here.) There are 36 songs on "Ghosts". $5 is roughly 14% of $35.64.

So now that I've established that NIN is selling higher quality goods, in a customer-friendly way, for a fraction of the price of the competition, let's do some math:

march 6, today,
minus the march 4th (the date that the "ghosts" website became fully operational (according to wikipedia),
is two days, times
2500 copies
times 300 dollars
equals SEVEN HUNDRED AND FIFTY THOUSAND DOLLARS IN TWO DAYS.

Trent has now proven that if you are a superstar, you can make three quarters of a million dollars in two days, on a ridiculously expensive premium edition alone. This is to say nothing of the people who bought, and continue to buy, the $75 version, the $10 version, or the $5 version. This says nothing of the people who are buying it through Amazon.

Coincidentally, it also proves that you don't need any RIAA thugs to help you do this, or "market" your work, assuming people already know who you are. You just need a web server, and a swimming pool big enough to put a million dollars.

Highlighting buried treasure in Twisted

Wednesday February 06, 2008

I've previously blogged about twisted.python.modules, but it assumes you know about another API inside Twisted, twisted.python.filepath. Unfortunately this module is rather under-documented and under-publicized, despite being extremely useful. Unlike a lot of Twisted, much of the code in twisted.python can be extracted and used by itself, regardless of whether the program in question is networked or even event-driven. This is especially true of FilePath, which is completely blocking, although sometimes I wish there were at least a version of it that wasn't.

A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents. For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.

Here's the script using good old os.path:

import sys
import os

def os_shebangs(pathname):
    for dirpath, dirnames, filenames in os.walk(pathname):
        for filename in filenames:
            fullpath = os.path.join(dirpath, filename)
            if (fullpath.endswith(".py") and
                file(fullpath, "rb").readline().startswith("#!")):
                yield fullpath

def os_show_shebangs(pathname):
    for path in os_shebangs(pathname):
        sys.stdout.write("%s: %s\n" % (
                path,
                file(path, "rb").readline()[2:].strip()))

if __name__ == '__main__':
    os_show_shebangs(sys.argv[1])

Pretty normal looking python code; not too much wrong with it. At 20 lines and 596 characters long, it's not too complex.

Now let's have a look at a similarly idiomatic version using FilePath:

import sys
from twisted.python.filepath import FilePath

def shebangs(path):
    for p in path.walk():
        if (p.basename().endswith(".py") and
            p.open().readline().startswith("#!")):
            yield p

def showShebangs(pathobj):
    for path in shebangs(pathobj):
        sys.stdout.write("%s: %s\n" % (
                path.path,
                path.open().readline()[2:].strip()))

if __name__ == '__main__':
    showShebangs(FilePath(sys.argv[1]))

At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path. However, a small space savings is hardly the most interesting property of this code. The advantages over the version that uses os.path:

It's easier to test. You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
It's easier to read. You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
It's easier to write. How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?

It's easier to secure. If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful. What about "/"? What about ".."? With FilePath, you simply supply the input to the 'child' method, and...

>>> from twisted.python.filepath import FilePath
>>> fp = FilePath(".")
>>> x = fp.child("okay")
>>> y = fp.child("..")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "twisted/python/filepath.py", line 308, in child
    raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
>>> z = fp.child("hello/world")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "twisted/python/filepath.py", line 305, in child
    raise InsecurePath("%r contains one or more directory separators" % (path,))
twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators

It's easier to extend. As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.

Not only does FilePath provide these benefits, it has very few dependencies. Even if you don't like Twisted much, you can use twisted.python.filepath by copying only 3 modules into your project (twisted.python.filepath, twisted.python.win32, and twisted.python.runtime) and twiddling the appropriate imports to be relative. Since FilePath is only one import for your code, and mostly consists of method calls, it will easily work with Twisted's version or your own. So, share and enjoy!

Do Not Want

Thursday January 31, 2008

I do not want a book called "The Ghost Brigades", but someone thought I did, for a minute. Those of you following this via my Blendix activity page will know what I mean.

One of the interesting things about working on Blendix is that we get to see all the ways in which other services export bad data. Amazon is particularly weird about it, though. As we were testing our code we saw a variety of bits of bad data intermittently published, some of which were fusions of the names of random products, some of which were actual products that had nothing to do with the user in question. I'm a little sad that my first bogus entry was a real product, though. Some of the incorrect entries we saw during testing were pretty hilarious.

Has anyone else out there used to using Amazon's various APIs to pull wishlist data and seen similar results? Is there any way to work around it, or to recognize the bogus data? Advice is welcome.

It Will Blend

Friday January 18, 2008

So, as you may already have heard, we did a thing. While I've been able to tell a few people, I've been waiting to talk about it publicly for a while.

We've gradually begun to launch Blendix. Blendix aims to be the one site that you need to visit every day to tell you what's happening in your world. Right now, it's an aggregator for various kinds of data around the web. You can pull in the usual suspects (RSS, ATOM), but it's a bit more than just an RSS aggregator: today, it understands a few more specific things (last.fm tracks, amazon wishlists, yahoo weather, flickr photos), and lots more are planned. Although I'm practically bursting with awesome ideas for its future development, I will try to refrain from commenting too much on those plans. As I've said before, software is like frisbee — predictions more specific than "hey, watch this!" can be dangerous :-). However, since I know it's the first thing you'll all suggest, I can say that yes, there will be richer integration with social networks.

In brief, it works like this. You log in, and you create some people. Some feeds may be automatically discovered based on their email addresses, and you can add your own. Maybe you subscribe to some people: if you're looking for one to subscribe to, may I suggest "Glyph Lefkowitz". I hear he's pretty interesting. Finally, you visit your "dashboard" page, which — thanks to the magic of Athena — will update whenever blendix detects that one of the people you maintain or are subscribed to publishes some new data. You can expect to see more of that magic as it develops.

A word of warning, though: it doesn't work with Safari (or IE, but I don't imagine that a lot of you are using IE). We're working on it, but for the time being, Firefox is strongly recommended. (Firefox for the mac works fine.) Most of the work involved in supporting Safari is in Nevow, which is all open source, so if you are familiar with these sorts of problems, please submit patches!

This is our first live, fully public deployment of a Mantissa server, and I'm really glad to have it out there. We are, of course, working through the usual kinks of getting our first batch of users ("beta" isn't a web 2.0 buzzword for nothing), but I'm fairly pleased with it so far.

Blendix itself isn't open source — yet. We've mainly been keeping the product as a whole behind closed doors as a matter of expediency. We didn't want to support an API that was heavily in development. However, one of my goals as we get closer to a bigger launch is to get enough of the code out there for you folks in the community to write extensions and improvements for it. There's already enough for some things (like supporting Safari!), even at the application level. For example, a big chunk of Blendix is the "Person" object, which is available in the public Mantissa code, along with UI for editing, browsing, and viewing.

I'm really glad to have something "out there" to share with you all, and I'd like to encourage you to share back. Please check out Blendix, and make liberal use of the information you find under the "Contact" link at the bottom of every page. Let me know what you think, especially if you're a programmer and you've got some ideas for hacking on the code. This will be especially useful as get into the initial phase of pushing the core out to the community. Also, we really want to make sure the experience is as bug-free as possible, so let us know about any problems you have.