Scientific Method(ology)

I'm an empiricist at heart.  It's hard to integrate the principles of scientific discovery into one's daily life; while it makes for a good ontological basis for existence, it has the unfortunate side-effect of being damned expensive to apply consistently.  Still, I try when I can.

Recently it occurred to me that UQDS (Divmod's development methodology) has a striking similarity to the scientific process.  I can't find something that neatly lays it out like I remember learning it in school, but wikipedia has a very nice treatment of the whole process.

As I understand the process of scientific knowledge acquisition, there are four phases:


  • A hypothesis is developed, which is an idea that might possibly be true.

  • An experiment is designed, and performed, to test the hypothesis.  if the experiment fails, a new hypothesis is designed.

  • The experiment is documented in a paper and published, and the publication is subjected to peer review by other scientists; the experiment is repeated.  if the experiment can't be repeated, the experiment must be re-designed to eliminate errors or an alternative hypothesis designed to take errors into account.

  • The hypothesis is accepted as a fact, which may be later be used to develop a model, theory, or even a law.


I know a few of my scientist friends out there might read this, so I want to be clear that I'm not proposing that this is what scientists do: scientists do a lot of things. this is the merest subset of the process required for things to be accepted as "scientific fact", and only in the abstract sense.  Different scientific communities have different official standards.  Still, any new scientific discipline would probably have to start with these rules first to be considered "science", and then probably develop additional ones later.

UQDS corresponds rather directly, as if a code repository were its own particular branch of science.


  • hypothesis: a ticket is created, which is a feature which may be possible to implement correctly.

  • experiment: a branch is developed, which includes test cases.  here, as the test cases fail, the branch is refined and the ticket adjusted, much as the hypothesis must be adjusted.

  • peer review: well, uh... peer review.  Another developer reviews the branch, and verifies that the tests test the hypothesis, runs the tests (replicates the experiment) to verify that they pass for them as well, possibly in a different environment, and reports their findings as well.  If the tests fail, the branch must be adjusted more.

  • fact: the changes are then accepted for merging, where they are incorporated into the codebase, which corresponds to the scientific body of knowledge.

The Web One Hundred Point Oh Challenge

I haven't had much time for blogging lately, so rather than post my musings, here are some questions. I credit these to a combination of Iain M. Banks, r0ml, and the Long Now foundation Please comment and post a link to your blog if you write about one of these.

What kind of a program would take one hundred years to write? How would one manage such a project? How would you manage planning and estimating at that scale?

What if you had to write a program that would only be run once, but was supposed to run forever - could never, ever crash, notwithstanding hardware failures? That was expected to outlast humankind on earth? What would such a program do? How would you test it?

What if you had to write a program that would be maintained for ten thousand years - how would you factor it so that the lower levels could still be improved without breaking anything? What would the release process for modules look like? Imagine that you had inherited a program that was ten thousand years old but had never been designed to be maintained this way?

Finally - how does thinking about these problems of heretofore impossible scale affect the way you write programs today?

Update: I forgot to mention Alan Perlis's Epigram #28.

a scheme program

some jerk is not learning about functional programming

(define (parse callback)
(lambda (data)
(if (>= (string-length data) 10)
((parse (callback (substring data 0 10))) (substring data 10))
(lambda (moredata)
((parse callback) (string-append data moredata))))))

(define (make-line-writer n)
(lambda (line)
(display (format "~A: '~A'\n" n line))
(make-line-writer (+ n 1))))

((((((parse (make-line-writer 1))
"hello") " world") " radix") " is") " dumb!!!!!!!!!!!!!!!")

Update: In addition to not being buggy, I've changed it so that this version is actually functional as well.
Update 2: Forgot the formatting change in the last update.

Python mainpoints

In Python, sometimes you want to make a module runnable from the commandline, but also usable as a module.  The normal idiom for doing this is:

# myproject/gizmo.py
from myproject import thing1

class Gizmo(thing1.Thingy):
def doIt(self):
print "I am", self, "hello!"

def main(argv):
g = Gizmo()
g.doIt()

if __name__ == "__main__":
import sys
main(sys.argv)

There are several problems with this approach, however.  Immediately, you will notice that packages no longer work.  Assuming that 'myproject' has an __init__.py, here is the output from an interpreter and the shell:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d38c0c> hello!
>>>
% python myproject/gizmo.py
Traceback (most recent call last):
File "myproject/gizmo.py", line 2, in ?
from myproject import thing1
ImportError: No module named myproject

In the interpreter case, sys.path includes ".", but in the script-as-program case, sys.path includes dirname(__file__).  The normal fix for this is to make sure that your PYTHONPATH is properly set, but that can be quite a drag if you're just introducing a program to a user who wants to run something right away.  Quite often, people either fix or don't experience this problem by changing 'from myproject import thing1' to 'import thing1' to ignore this distinction.  It almost looks like it works:

% python
>>> import myproject.gizmo
>>> myproject.gizmo.main([])
I am <myproject.gizmo.Gizmo object at 0xb7d72c0c> hello!
>>>
% python myproject/gizmo.py
I am <__main__.Gizmo object at 0xb7ccfb6c> hello!

You might think, with the exception fixed, that this program is now ready for action.  Not so fast, though.  There are a couple of subtle problems here.  Notice how Gizmo's class name is different - it thinks it's in the "myproject.gizmo" (correct) package in one case, and the "__main__" (incorrect) package in another?  That has the nasty side-effect of making all Gizmo objects broken with respect to Pickle, or any other persistence system which uses class-names to identify code.  Depending on your path configuration this problem can get almost intractably difficult to solve.  There's another solution which I have seen applied, which is to change the 'if __main__' block at the end to look something like this:
if __name__ == '__main__':
import myproject.gizmo
myproject.gizmo.main([])

That works fine, but it does all the work of setting up the module twice, unless Gizmo uses a destructive metaclass - which many libraries, including at least a few that I've written, use - i.e. an axiom.item.Item subclass is declared.

I'd like to suggest an idiom which may help to alleviate this problem, without really introducing any additional overhead.  If you're going to declare a __main__ block in a module that can also be a script, do it at the top of your module, like so:

# myproject/gizmo.py
if __name__ == "__main__":
import sys
from os.path import dirname
sys.path.append(dirname(dirname(__file__)))
from myproject.gizmo import main
sys.exit(main(sys.argv))

from myproject import thing1

class Gizmo(thing1.Thingy):
# ...

With only two extra lines of code, your module is only imported once, path-related problems are dealt with, 'main' can return an error-code just like in C, and your module will function in the same way when it runs itself as when it is imported.  Here's some output:


glyph@alastor:~% python
>>> from myproject.gizmo import main
>>> main([])
I am <myproject.gizmo.Gizmo object at 0xb7d69c2c> hello!
>>>
% python myproject/gizmo.py
I am <myproject.gizmo.Gizmo object at 0xb7cb7cac> hello!

As a little bonus, this even works properly with Python's "-m" switch - although at least in 2.4 you have to use the odd '/' separator rather than '.' like everywhere else:

% PYTHONPATH=. python -m myproject/gizmo
I am <myproject.gizmo.Gizmo object at 0xb7d0af4c> hello!

Unswitch

Ted Leung has a very good list of the things the open-source desktop should be doing.


I started to respond by giving pointers to various half-solutions to these problems; Deskbar or Arnic to replace Quicksilver, for example; but he really has a point. There are some definite weak areas on the Linux desktop that MacOS X addresses well. The lack of some unified scripting approach is particularly embarrassing; the fact that it is easier to script applications in both major proprietary desktops (OS X through Applescript, Windows through COM) is sad. Also, DVI and EDID are (as far as I can tell) far enough along that the hardware side of color profiling would be possible in Linux these days, it's just up to the OS and the desktop to support it.


(I am not mentioning KDE here because the last few times I've tried it, it has crashed constantly, and that mirrors my abysmal experience with programming PyQT. I understand a great many people use it, and more power to you, but it's just not likely to be relevant to me any time in the near future.)


So, rather than pointing out how a user such as Ted might hobble through Ubuntu-land, I'll give an explanation from my point of view: for me, Ubuntu is indispensible, and the Mac is basically unusable.


I'm not prejudiced against Apple. Far from it, in fact. I grew up on the Mac, and I will always have a soft spot in my heart for them. Through the first few years of my career as a professional programmer, I had the absolute first release of MacOS X server, and every beta of MacOS X. My experiences with Ubuntu have made me disappointed with Apple though, and I despair of MacOS ever being my platform of choice again. Following the structure of Ted's rant, here are some things that Apple might do to make me "switch":


Add some keyboard shortcuts. During normal use of my Ubuntu desktop, the only time I have to use the mouse is to quickly select links in a poorly-designed application (usually a web application, where keyboard bindings are really hard to get right). I move windows, maximize, minimize, resize and otherwise shuffle things around all the time - not using any crazy third-party utility, but the built-in keybindings in Metacity.


For that matter, building virtual desktops into the OS wouldn't hurt. I'm aware that there are a few third-party programs, but it would be much nicer to have there be one right way to do it. I've heard it said that this would be "too hard" for new users, but I think that's wrong. The idea of a single virtual monitor you can "move to" makes more sense to every user I've ever talked to than the mac's "invisible sheets with a menu bar and icon stuck to them" mental model you're supposed to adopt with the mac desktop. It took about five tries to explain the difference between an "open" and "closed" application on the mac to my grandmother, because the icon to indicate this difference is literally about 9 pixels, and applications will happily stay open forever with no visible windows.


Give me a panel I can really customize, not just drop applications onto. There are a few things I want to see on the screen at all times: memory usage, mounted disks, the current weather, the current time. The GNOME panel lets me put all of these somewhere omnipresent in a nice, small form factor with no fuss. These are not things that every user wants to see, so it has to be customizable. This is not a case of not having selected the right default or not having "designed" it right. Some people's needs are different enough that you need some freedom to choose.


Work on scaling applications up. I have dozens of gigabytes of free stock photos and artwork and photos that I've taken, and while I haven't tried Aperture, iPhoto did not handle this well, and it didn't let me use it as a way to organize metadata without making copies of all the files. My mac's hard drive wasn't big enough for all the files I wanted to categorize: I have a terabyte network appliance for this purpose. I have dozens of gigabytes of music; every CD I have ever owned has been encoded (in multiple formats, including FLAC. iTunes chokes if I try to load my entire music library, let alone load it from multiple computers all on the shared drive. Unfortunately the default music player in Ubuntu (Rhythmbox) isn't great, but the fantastic Quod Libet lets me not only load my massive library, but perform bulk cleanups on the ID3 tags (many of my CDs were ripped before I had easy access to the CDDB, some are in OGG format, I didn't adopt a consistent naming convention until recently).


Frankly, iTunes' DRM is offensive. I had a problem with my ITMS account, reinstalled my OS and reformatted my iPod one too many times, and now I can't play a bunch of my legitimately purchased music. (Update: Bob Ippolito corrects me in the comments; this was simply a problem I've persionally had with the ITMS, not a matter of policy. It's unlikely that you'd have this problem with music you purchased today.) If iTunes were otherwise a really great music management tool and had worked well with my large repository of non-DRM'd music, I probably would have cut it some slack: but part of the problem is that iTunes corrupts its own database, and my iPod crashes when confronted with the volume of music I'm storing on it and periodically needs reformatting if I want it to work.


Finally, rather than respond to Ted's list of applications, let me talk about Free Software's killer application: APT, the Advanced Packaging Tool.


It would take me forever to list all the applications that MacOS X is missing out of the box which Ubuntu includes in a fresh installation. Just to name a few: Gaim, a messaging client where I can seamlessly integrate IRC, yahoo, AIM, jabber and a half-dozen other protocols. Inkscape, an Illustrator-style vector graphics editor which works natively in SVG. Gimp, a photoshop-style bitmap editor. OpenOffice. There are, of course, equivalents to all of these on the mac, but they're expensive, underfeatured, or clunky and obviously not native, and sometimes two of the three. The real magic here though isn't one particular application (after all, Ubuntu isn't packaged with an equivalent to GarageBand or Aperture) but the fact that they were preinstalled; I don't need to mess around with ten clicks per application just to get them set up: they're all there, immediately.


As a user, this is convenient, but as a developer, it quickly becomes indispensable. To set up a new working environment on a Mac, I would have to spend hours downloading, untarring, building, checking dependencies, installing -- or if I were lucky, clicking on packages, accepting EULAs, etc. For example: "apt-get build-dep python" summons all the packages I need to compile my own version of Python; a collection of software it might take an hour to identify without this facility, let alone install.


Sure, there's fink and darwinports, but those don't manage user-visible, GUI applications in the same way that they do UNIX-style development stuff. Note that the first things I mentioned were actual end-user apps, not arcane requirements for programming tools. In other words, these aren't really a part of the OS on MacOS X, they are a port of features from other OSes, and they feel like it.


Fundamentally, what my user experience comes down to is this: I install ten or twenty programs or libraries per week, but I spend almost no time at all actually doing the installing. If each of those twenty libraries took me five minutes to set up, that's an hour of wasted time per week, which really adds up fast. Especially in weeks like last one, setting up a new computer, where I installed several hundred packages: 200 packages would be almost 2 solid work-days of just installing software!


I feel like this all boils down to an attitude. Succeed or fail, Ubuntu is just trying to provide me with tools to work with my data. It installs software as fast as it can go, it loads as much music and as many pictures as it can handle, and it doesn't bother me with the details if it can help it. The mac is too, but I feel as though each application is suffering from some sort of inferiority complex. Each new application needs to make itself heard. It can't just be ready to use instantly; it must make itself felt, through whatever mechanism. I must commune with each sacred EULA alone; I must wait for the download bar, mount the virtual disk, draaaaag the icon to my Applications folder. I must select the drive that I wish the software to be installed to. I must confirm that yes, I am sure I want to overwrite files. Yes, all of them.


This could probably be fixed at a technical and legal level - after all, a good deal of these applications are free to download already, the delivery mechanism can't make that much of a difference, and Apple already delivers all of its updates through a generic "software updates" facility which could be expanded to offer the features of an APT repository. I have no idea how to deal with the cultural problem, though, where each application author (even if all their application does is launch other applications!) feels they deserve twenty minutes of your attention and five dollars.


Sometimes free software breaks. Sometimes, all software breaks. I haven't really had one of these moments with Ubuntu yet, but I'm sure I will: Sometimes it breaks really, really bad and I have to do those ridiculous things that Apple users in commercials laugh at Windows users in Apple commercials doing to fix it. I have, in a dim and distant past I would rather not recall, had to edit the spiritual equivalent of my AUTOEXEC.BAT. I can see why to many people it just isn't worth it. I'm not suffering under some stern ascetic desktop because I want to "support free software" or anything like that though. In general, the experience is quite pleasant, and I don't think it would really improve that much if I used a Mac. Freedom does have some practical consequences, though. Those times when I want to do something with Linux that really would be better on a mac, it's worth suffering through for the knowledge that, if it breaks, it's not going to intentionally stop me from fixing it because it wants to charge me $0.99 for another copy of the song.