I Can Draw Graphs Too

So, I set out to provide a slightly more cogent debunking, since various members of the audience were unsatisfied with my previous, dismissive response. I haven't gotten terribly far, since I'm pretty busy, but I did cook up some rudimentary results for you.

My goal was to show that threads don't scale terribly well in terms of processing concurrent I/O, and in particular, Twisted scales better, even in its slowest, most naive mode than a simple threaded implementation of a similar concept.

Unfortunately I was unable to do this easily, because Debian GNU/Linux (in its default configuration, using a 2.6.7 kernel) is not able to even spawn enough threads to generate a fair test.

My test program was to write a Twisted protocol handler that would take some simple input, perform a trivial computation, generate some output, and close the connection. I then wrote a Twisted client for it, and write a simple threaded reactor which would handle each connection in a separate thread, to compare the amount of time each multiplexor took to finish connecting vs. the amount of time it took to complete the entire test. At around 256 threads, though, I could no longer spawn more threads to handle simultaneous connections, a problem which others have experienced.

This indicates in part the cultural issues surrounding the performance of threads, to wit, nobody really cares enough about good threading performance to even bother making it possible to spawn a lot of them.

This test was carried out on an otherwise quiescent Linux machine, a 2000XP+ Athlon with a gig of RAM, using python 2.3. Without further ado:




I don't expect these graphs to be taken as serious performance measurements - after all, I'm not even providing source code yet. Still, the fact that the threading connection-start performance is so abysmally bad (even compared to select() - remember, this is using Twisted's default configuration, with no tuning or optimization) and the fact that with an otherwise excellent development machine I don't even have the software to test larger numbers of concurrent connections might give you an idea of why I find it so hard to take this kind of comparison seriously.

It has brought to light the fact that although Twisted would be an excellent test platform for different multiplexing mechanisms, nobody has bothered to put together a survey of those yet. I will do a more rigorous benchmark of threads vs. Twisted later this week, release the code, and hopefully some useful work will come of this if it can be used as a simple reactor benchmark.

A grotesque parody of science

Unbelievably naive mistake, or politically motivated lies? You be the judge.

I have already spent way too much time talking about this piece of garbage, so I am putting up a little persistent comment about it.

This so-called "benchmark" is nothing more than an insult. Yes, he has graphs: but these graphs are not actually measuring anything. It claims to be performance metrics on an MDI application doing image transformations.

The advocacy position which he is so anxious to misinterpret is this: Twisted is an alternative to threads for multiplexed I/O. Many, many applications multiplex I/O, and they do it in a variety of ways which are inefficient and bug-prone, the most popular (and most dangerous) approach being thread-per-connection. There is an explanation of the problem (as well as a link to Twisted) at threading.2038bug.com.

He accuses Twisted developers of letting emotions get in the way. Am I emotional about this? Yes, I am. I am angry about it for good reason. The Twisted team has spent almost a hundred man-years of effort producing something that we could be proud of, to provide our users with a better, more test-friendly, less error-prone way of programming. It's not a new idea, but I like to think we've brought something useful to the table by providing an integrated platform that supports it.

I don't think that my efforts, or those of my colleagues, deserve to be misrepresented in this way. I am upset and writing about this because I am afraid that programmers who don't know any better will see graphs and think that he's proposing a legitimate approach to solve some problem, and I will one day be called upon to help them with their code. Code which will, thanks to "benchmarks" like this one, inevitably, perform extremely poorly and be a total mess of race conditions. I am trying to improve the state of the art in the industry, and it is a lot of work. Widespread circulation of an afternoon's dalliance with a plotting program like the above can undo years of work trying to educate people.

I thought that I would write some benchmarks where the graphs went in the other direction, but the internet is already brimming with graphs that demonstrate the superior scalability of multiplexing with events rather than threads. Dignifying this sophomoric potshot with actual data would be a lot more than it deserves. If you are truly skeptical, I would recommend attending Itamar Shtull-Trauring's Twisted talk at PyCon 2005, "Fast Networking with Python". This will give you a much clearer view of Twisted's current performance problems than the graphs drawn by some script-kiddie who has a grudge because he got banned from an IRC channel for spreading lies.

Just as refuting the numbers would be a waste of effort, refuting every one of the lies and/or misunderstandings in each paragraph could take all day. Instead, I'll just debunk a sample of the most egregious stuff, and hopefully the pattern won't be too hard to extrapolate, for those of you unfamiliar with the subject matter.

Threads will work on multiprocessor and hyperthreading machines automatically. On similar hardware Twisted will use only 1/n of the available processing power, where n is the number or virtual or physicial [sic] processors.

What he means is, python will only use 1/n of the available processing power. For code that actually makes use of SMP, you need to relinquish the global interpreter lock, and write all parallelized code in C. This is thanks to the global interpreter lock, a problem which is hard to solve, since making Python multi-CPU friendly actually makes it slower. Note - you don't need to do anything special to take advantage of SMP if you use Twisted's recommended, non-threaded way of parallelizing things, which is spawning multiple worker subprocesses.

So, Twisted can use exactly as much processing power as Python, especially because Twisted supports threads.

Twisted people think you should spawn a separate process for intensive work. If you do this you need to synchronize resource access as you would with threads. That is probably the most mentioned problem with threads.

You don't need to synchronize resource access when you use subprocesses. You can copy data to multiple subprocesses and serialize resource access, without doing any extra work, since Twisted will deliver attempts at resource access through normal I/O delivery channels, which are processed by the main-loop. Also, you can run your subprocesses in isolation, without concern for synchronized access to shared data structures. In the case which he seems to be talking about, e.g. that of large shared memory objects which need to be mutated and then operated upon by multiple cooperating processes, you can still avoid locking by using a tuple-space model of interaction and delivering work to subprocesses through pipes rather than delivering data. This still only has one simple program managing the interactions of many, taking advantage of the OS's much-vaunted thread-switching abilities, and doesn't require mutexes on every operation.

You also need to find an effective portable method of inter-process communication. This is not much of a problem, but it is something you wouldn't have to do with threads.

You certainly have to do this with threads. It might appear as though you do not, and in some cases it may be slightly easier to implement, but if you don't track which objects are in use by which threads, mutex overhead will quickly cripple any performance gains that you'd see in a threaded application. So your IPC mechanism with threads is queues, or condition variables, rather than pipes or shared memory, but it's still there and it still requires a lot of maintenance.

In conclusion, I stand by one of arensito's last claims:

When I approached Twisted people with questions about these results I was told I was not worth listening to. Followers stated bluntly they were smarter than me.

In fact, he isn't worth listening to, and I am proud to say that the Twisted team is smarter than he is. More than that: he's worth not listening to. The "benchmark" that he has proposed does not test anything about Twisted, and does not test anything meaningful about his hypothesis.

There's lots of work to be done on Twisted, and it certainly has its share of performance problems. It's by no means the fastest system of its kind. I am always excited to hear about ways it can be improved, but don't just make up a bunch of lies, write a while loop, slap a graph on it, and claim you've discovered something better.

every program does this, here is how

There is a pretty common thing that various applications do now, generally touted as a major feature, such as "tabbed browsing" or "single-window chat" or whatever. Basically applications start out like this:

MDI Windows

and then at some point later they do this:

MDI Notebook

I have been spending some of my spare time messing around with PyGTK, so here is some sample code in Python which does this thing.




import gtk

from zope.interface import Interface, implements, Attribute

class IMultiPlug(Interface):
"""Implements multi-plug stuff.
"""

def addPlug(plug, position=0):
"""
Add plug.

@param plug: implementor of IPlug

@param position: int - useful when using tabs or other visibly ordered
MPI interfaces
"""

def removePlug(plug=None):
"""
Remove a plug.

@param plug: implementor of IPlug
"""

def iterPlugs():
"""Return an iterator of all plugs
"""

class IPlug(Interface):
gtkwidget = Attribute('a gtk.Widget')
name = Attribute('a unicode name')
icon = Attribute('a gdk.Pixmap')

class Plug:
implements(IPlug)
def __init__(self, name, gtkwidget, icon=None):
self.name = name
self.gtkwidget = gtkwidget
if icon is None:
icon = gtk.icon_theme_get_default().load_icon(gtk.STOCK_YES,
64, 0)
self.icon = icon

class MultiWindow:
implements(IMultiPlug)

def __init__(self, prefix = ''):
self.windows = {}
self.plugs = []
self.prefix = prefix

def addPlug(self, plug, position=0):
win = gtk.Window()
win.set_title(self.prefix + plug.name)
self.windows[plug] = win
win.add(plug.gtkwidget)
win.show_all()
self.plugs.insert(position, plug)
def killit(ev):
self.removePlug(plug)
win.connect("delete_event", killit)
win.set_icon(plug.icon)

def iterPlugs(self):
return self.plugs[:]

def removePlug(self, plug):
for enum, oplug in enumerate(self.plugs):
if plug is oplug:
self.windows[plug].remove(plug.gtkwidget)
self.windows[plug].destroy()
self.plugs.pop(enum)
self.windows.pop(plug)
return

class MultiTab:
implements(IMultiPlug)

def __init__(self, notebook, window):
self.notebook = notebook
self.window = window
self.plugs = []

def labelFactory(self, plug):
hb = gtk.HBox()

# Icon box
icbox = gtk.Image()
scaled = plug.icon.scale_simple(16, 16, gtk.gdk.INTERP_BILINEAR)
icbox.set_from_pixbuf(scaled)
icbox.set_padding(2, 0)
hb.add(icbox)

# Label
hb.add(gtk.Label(plug.name))

# Close Button
cb = gtk.Button()
cb.set_relief(gtk.RELIEF_NONE)
im = gtk.Image()
im.set_from_pixbuf(gtk.icon_theme_get_default().load_icon(gtk.STOCK_CLOSE,
12, 0))
im.show_all()
cb.add(im)
hb.add(cb)
hb.set_child_packing(im, False, False, 0, gtk.PACK_START)

def killit(ev):
self.removePlug(plug)
cb.connect('clicked', killit)

return hb

def addPlug(self, plug, position=0):
lab = self.labelFactory(plug)
self.notebook.insert_page(plug.gtkwidget, lab, position)
self.plugs.insert(position, plug)
plug.gtkwidget.show_all()
lab.show_all()
self.window.show_all()

def removePlug(self, plug):
for enum, oplug in enumerate(self.plugs):
if oplug is plug:
self.plugs.pop(enum)
self.notebook.remove_page(enum)
break
if not self.plugs:
self.window.hide()

def iterPlugs(self):
return self.plugs[:]


class Cycler:

def __init__(self, mdi1, mdi2):
self.mdi1 = mdi1
self.mdi2 = mdi2
win = gtk.Window()
win.set_title("Cycler")
but = gtk.Button("cycle MDI")
but.set_size_request(200,200)
win.add(but)
win.connect('destroy', gtk.main_quit)
but.connect('clicked', self.swap)
win.show_all()

def swap(self, ev):
for plug in self.mdi1.iterPlugs():
self.mdi1.removePlug(plug)
self.mdi2.addPlug(plug)
self.mdi1, self.mdi2 = self.mdi2, self.mdi1

def test():
mdi1 = MultiWindow('MDI Demo: ')
n = gtk.Notebook()
nw = gtk.Window()
nw.set_title("MDI Notebook")
nw.add(n)
mdi2 = MultiTab(n, nw)

widgets = [gtk.Button('Button %d' % x) for x in range(4)]
import random
itheme = gtk.icon_theme_get_default()
for e, widget in enumerate(widgets):
widget.set_size_request(300,200)
mdi1.addPlug(Plug('Plug #%d'%e, widget, itheme.load_icon(random.choice(itheme.list_icons()),64,0) ))
Cycler(mdi1, mdi2)
gtk.main()


if __name__ == '__main__':
test()


Unrelated Epiphany

Warning: Deep Nerd Zone

I was thinking recently about Imagination, and the problems that I was having with event propagation, and collect() as the kernel.being problematic because of the need for constant re-collect()ing, which is re-executing an expensive tree-walking algorithm.

Then I realized: the solution is basically correct, but should be inverted. collect() illustrates the correct mediator/observer relationship, but is broken in the face of complex event hierarchies. Well, even simple ones, given that the one exposed by "look" and a stateful UI is fairly "complex" by these standards. The real design I'm trying to get to is an observer-based model with mediated observation. In other words, when an object arrives at some point in the object graph, it sends out tendrils looking for all the interfaces to its external environment and asks to be notified of changes in that direction / through that mediator.

Back to our favorite example. A glass box publishes an "external" mediation factory and an "internal" mediation factory. External things register an observer on the external mediator. Internal things register an observer on the internal mediator. These mediators are aware at all times of whatever objects they can access, so collect() isn't necessarily an expensive, arbitrarily-complex graph-walk; the edges of the graph are pre-defined, mediators aren't constructed on the fly, and there is some structural notion of containment so that you can iterate the items that your mediator presents as available without worrying about random changes.

The collect() interface remains available for when it's convenient, such as automatically locating targets and tools for actions, but we lessen the focus on it so that actors can also be stateful observers instead of simply making periodic requests of the model. Now all I have to do is think of names for the methods involved.


I hope that there is at least one person in the world who understands what all of those words that I just typed mean, in that order that I typed them.

Good Luck

My luck seems to be improving: I just had the shortest and most uneventful ride back from New York ever. I got back in time to order some pizza.

So now, I'm waiting for the other shoe to drop...