A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents. For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.
Here's the script using good old os.path:
import sys
import os
def os_shebangs(pathname):
for dirpath, dirnames, filenames in os.walk(pathname):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
if (fullpath.endswith(".py") and
file(fullpath, "rb").readline().startswith("#!")):
yield fullpath
def os_show_shebangs(pathname):
for path in os_shebangs(pathname):
sys.stdout.write("%s: %s\n" % (
path,
file(path, "rb").readline()[2:].strip()))
if __name__ == '__main__':
os_show_shebangs(sys.argv[1])
Pretty normal looking python code; not too much wrong with it. At 20 lines and 596 characters long, it's not too complex.
Now let's have a look at a similarly idiomatic version using FilePath:
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path. However, a small space savings is hardly the most interesting property of this code. The advantages over the version that uses os.path:import sys
from twisted.python.filepath import FilePath
def shebangs(path):
for p in path.walk():
if (p.basename().endswith(".py") and
p.open().readline().startswith("#!")):
yield p
def showShebangs(pathobj):
for path in shebangs(pathobj):
sys.stdout.write("%s: %s\n" % (
path.path,
path.open().readline()[2:].strip()))
if __name__ == '__main__':
showShebangs(FilePath(sys.argv[1]))
- It's easier to test. You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
- It's easier to read. You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
- It's easier to write. How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
- It's easier to secure. If you wanted to allow untrusted users to
supply input to the os.path version, you need to be very, very
careful. What about "/"? What about ".."? With FilePath,
you simply supply the input to the 'child' method, and...
>>> from twisted.python.filepath import FilePath
>>> fp = FilePath(".")
>>> x = fp.child("okay")
>>> y = fp.child("..")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "twisted/python/filepath.py", line 308, in child
raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
>>> z = fp.child("hello/world")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "twisted/python/filepath.py", line 305, in child
raise InsecurePath("%r contains one or more directory separators" % (path,))
twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators - It's easier to extend. As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.