This Isn't How PyPy Works, But it Might as Well Be

Sunday February 12, 2012

It seems like a lot of the Python programmers I speak with are deeply confused by PyPy and can't understand how it works. The stereotypical interlocutor will often say things like: A Python VM in Python? That's just crazy! How can that be fast? Isn't Python slower than C? Aren't all compilers written in C? How does it make an executable?

I am not going to describe to you how PyPy actually works. Lucky for you, I'm not smart enough to do that. But I would like to help you all understand how PyPy could work, and hopefully demystify the whole idea.

The people who are smart enough to explain how PyPy actually works will do it over at the PyPy blog. At some level it's really quite straightforward, but this impression of straightforwardness is not conveyed well by posts with titles like "Optimizing Traces of the Flow Graph Language". In addition to being a Python interpreter in Python, PyPy is a mind-blowingly advanced exploration of the cutting-est cutting-edge compiler and runtime technology, which can make it seem complex. In fact, the fact that it's in Python is what lets it be so cutting-edge.

Most people with a formal computer science background are already familiar with the fairly generic nature of compilers, as well as the concept of a self-hosting compiler. If you do have that background, then that's all PyPy is: a self-hosting compiler. The same way GCC is written in C, PyPy is written in Python. When you strip away the advanced techniques, that's all that's there.

A lot of folks who are confused by PyPy's existence, though, I suspect don't have that background; many working programmers these days don't. Or if they do, they've forgotten it, because the practical implications of the CSS box model are so complex that they squeeze simpler ideas, like turing completeness and the halting problem, out of the average human brain. So here's the easier explanation.

A compiler is a program that turns a string (source code: your program text written in Python, C, Ruby, Java, or whatever) into some kind of executable code (bytecode or runtime interpreter operations or a platform-native executable).

Let's examine that last one, since it seems to be a sticking point for most folks. A platform-native executable is simply a bunch of bytes in a file. There's nothing magic about it. It's not even a particularly complex type of file. It's a packed binary file, not a text file, but so are PNGs and JPEGs, and few programmers find it difficult to believe that such files might be created by Python. The formats are standard and very long-lived and there are tons of tools to work with them. If you're curious, even Wikipedia has a good reference for the formats used by each popular platform.

As to Python being slower than C: once a program has been transformed into executable code, it doesn't matter how slow the process for translating it was: the running program is now just executable instructions for your CPU, so it doesn't matter that Python is slower than C, because it was just the compiler that was in Python, and by the time your program is running, the original Python has effectively vanished and all you're left with is your program executing.

(Actually, Python is faster than C anyway, especially at producing strings.)

In reality, PyPy takes a hybrid approach, where it is a program which produces a program and then does some stuff to it and creates some C code which it compiles with the compiler of your choice and then creates some code which then creates other code and then puts it into memory, not a file, and then executes it directly, but all of that is ancillary tricks and techniques to make your code run faster, not a fundamental property of the kind of thing that PyPy is. Plus, as I said, this article isn't actually about how PyPy works anyway, it's just about how you should pretend it works. So you should ignore this whole paragraph.

For the sake of argument, assume that you know all the ins and outs of binary executable formats for different operating systems, and the machine code for various CPU architectures. The question you should really ask yourself is: if you have to write a program (a compiler) which translates one kind of string (source code) into another kind of string (a compiled program): would you rather write it in C or Python? What if the strings in question were a template document and an HTML page?

It shouldn't be surprising that PyPy is written in Python. For the same reasons that you might use Django templates and not snprintf for generating your HTML, it's easier to use Python than C to generate compiled code. This is why PyPy is at the forefront of so many advanced techniques that are too sophisticated to cover in a quick article like this. Since the compiler is written in a higher-level language, it can do more advanced things, since lower-level concerns can be abstracted away, just as they are in your own applications.

The Concurrency Spectrum: from Callbacks to Coroutines to Craziness

Thursday January 19, 2012

Concurrent programming idioms are on a spectrum of complexity.

Obviously, writing code that isn't concurrent in any way is the easiest. If you never introduce any concurrent tasks, you never have to debug any problems with things running in an unexpected order. But, in today's connected world, concurrency of some sort is usually a requirement. Each additional point where concurrency can happen introduces a bit of cognitive overhead, another place you need to think about what might happen, so as a codebase adds more of them it becomes more difficult to understand them all, and it becomes more challenging to understand subtle nuances of parallel execution.

So, at the simplest end of the spectrum, you have callback-based concurrency. Every time you have to proceed to the next step of a concurrent operation, you have to create a new function and new scope, and pass it to the operation so that the appropriate function will be called when the operation completes. This is very explicit and reasonably straightforward to debug and test, but it can be tedious and overly verbose, especially in Python where you have to think up a new function name and argument list for every step. The extra lines for the function definition and return statement can be an impediment to quickly understanding the code's intentions, so what facilitates understanding of the concurrency model can inhibit understanding of the code's actual logical purpose, depending on how much concurrent stuff it has to do. Twisted's Deferreds make this a bit easier than raw callback-passing without fundamentally changing the execution dynamic, so they're at this same level.

Then you have explicit concurrency, where every possible switch-point has to be labeled somehow. This is yield-based coroutines, or inlineCallbacks, in Twisted. This is more compact than using callbacks, but also more limiting. For example, you can only resume a generator once, whereas you can run a callback multiple times. However, for a logical flow of sequential concurrent steps, it reads very naturally, and is shorter, as it collapses out the 'def' and 'return' lines, and you have to think of at least two fewer names per step.

However, that very ease can be misleading. You might gloss over a 'result = yield ...' more easily than a 'def whatever(result): return result; something(whatever)'. Nevertheless, if you have 'yield's everywhere you might swap your stack, then when you have a concurrency bug, you can look at any given arbitrary chunk of code and know that you don't need any locks in it, as long as you can't see any yield statements. Where you do see yield statements, you know that you have some code that needs to be inspected.

To continue down that spectrum, a cooperatively multithreading program with implicit context switches makes every line with any function call on it (or any line which might be a function call, like any operator which can be overridden by a special method) a possible, but not likely culprit. Now when you have a concurrency bug you have to audit absolutely every line of code you've got, although you still have a few clues which will help you narrow it down and rule out certain areas of the code. For example, you can guess that it would be pathological for 'x = []; ...; x.append(y)' to context switch. (Although, given arbitrary introspection craziness, it is still possible, depending on what "..." is.) This is way more lines than you have to consider with yield, although with some discipline it can be kept manageable. However, experience has taught me that "with some discipline" is a code phrase for "almost never, on real-life programming projects".

All the way at the end of the spectrum of course you have preemptive multithreading, where every line of code is a mind-destroying death-trap hiding every possible concurrency peril you could imagine, and anything could happen at any time. When you encounter a concurrency bug you have to give up and just try to drink your sorrows away. Or just change random stuff in your 'settings.py' until it starts working, or something. I never really did get comfortable in that style. With some discipline, you can manage this problem by never manipulating shared state, and only transferring data via safe queueing mechanisms, but... there's that phrase again.

Some programming languages, like Erlang, support efficient preemptive processes with state isolation and built-in super-cheap super-fast queues to transfer immutable values. (Some other languages call these "threads" anyway, even though I would agree with Erlang's classification as "processes".) That's a different programming model entirely though, with its own advantages and challenges, which doesn't land neatly on this spectrum; if I'm talking about left and right here, Erlang and friends are somewhere above or below. I'm just describing Python and its ilk, where threads give you a big pile of shared, mutable state, and you are constantly tempted to splash said state all over your program.

Personally I like Twisted's style best; the thing that you yield is itself an object whose state can be inspected, and you can write callback-based or yield-based code as each specific context merits. My opinion on this has shifted over time, but currently I find that it's best to have a core which is written in the super-explicit callback-based approach with no coroutines at all, and then high-level application logic which wraps that core using yield-based coroutines (@inlineCallbacks, for Twisted fans).

I hope that in a future post, I may explain why, but that would take more words than I've got in me tonight.

I'm Sorry It's Come To This

Thursday December 15, 2011

If you want to be a great leader,

you must learn to follow the Tao.

Stop trying to control.

Let go of fixed plans and concepts,

and the world will govern itself.

- Tao Te Ching ch. 59,

as translated by S. Mitchell

I usually try not to get too political in my public persona – on blogs, twitter, IRC, mailing lists et cetera – and that's a conscious choice.

I work on open source software. I have for the last ten years. I am lucky enough to have founded a project of my own, but in open source, leaders are more beholden to their followers than vice versa. I depend on people showing up to effectively work for me, for free, on a regular basis. So, I try to avoid politics not because I don't have strong convictions (anyone who knows me personally can tell you that I certainly do) but because I don't want someone to avoid showing up and helping do some good in the world in one area, just because we might disagree in another.

This is a benefit of living in a free and democratic society: we have ways to dispute issues that we have strong feelings about, so we can cooperate on some things without having to agree on everything. It's rarely perfect but we can usually get some good stuff done, with rough consensus and running code.

Today though, there's a political issue which I can't ignore. The purpose of Twisted (the open source project which I founded) is to facilitate the transfer of information across the Internet. A new law, SOPA, is threatening to radically alter the legal infrastructure of the Internet in the United States, granting sweeping new powers to copyright cartels and fundamentally restricting the legal right to transfer any information, and to build tools that transfer it. Twisted is designed to make it easy to implement new protocols, to easily experiment with improvements to systems like the Domain Name System. SOPA might well make those potential improvements, and with only a little paranoid fantasizing, Twisted itself, illegal.

It's my view that this law is a blatantly unconstitutional restriction on free speech. It will kill job creation, at a time when our nation can scarce afford another blow to its economy. It will create the infrastructure to suppress political dissent, similar to the infrastructure in China and Syria, at a time when our corrupt political system needs dissent more than ever. It is the wrong thing at the wrong time.

This bill is being discussed in the house today. If you're in the US, call your representative right now.

(As always, I don't speak for anyone but myself; no one else has reviewed or endorsed these remarks.)

Blocking vs. Running

Friday November 04, 2011

I've heard tell of some confusion lately around what the term "non-blocking" means. This isn't the first time I've tried to explain it, and it certainly won't be the last, but blogging is easier than the job Sisyphus got, so I can't complain.

A thread is blocking when it is performing an input or output operation that may take an unknown amount of time. Crucially, a blocking thread is doing no useful work. It is stuck, consuming resources - in particular, its thread stack, and its process table entry. It is sucking up resources and getting nothing done. These are resources that one can most definitely run out of, and are in fact artificially limited on most operating systems, because if one has too many of them, the system bogs down and becomes unusable.

A thread may also be "stuck" doing some computationally intensive work; performing a complex computation, and sucking up CPU cycles. There is a very important distinction here, though. If that thread is burning up CPU, it is getting work done. It is computing. This is why we have computers: to compute things.

It is of course possible for a program to have a bug where a program goes into an infinite loop, or otherwise performs work on the CPU without actually getting anything useful to the user done, but if that's happening then the program is just buggy, or inefficient. But such a program is not blocking: it might be "thrashing" or "stuck" or "broken", but "blocking" means something more specific: that the program is sitting around, doing nothing, while it is waiting for some other thing to get work done, and not doing any of its own.

A program written in an event-driven style may be busy as long as it needs to be, but that does not mean it is blocking. Hence, event-driven and non-blocking are synonyms.

Furthermore, non-blocking doesn't necessarily mean single-process. Twisted is non-blocking, for example, but it has a sophisticated facility for starting, controlling and stopping other processes. Information about changes to those processes is represented as plain old events, making it reasonably easy to fold the results of computation in another process back into the main one.

If you need to perform a lengthy computation in an event-driven program, that does not mean you need to stop the world in order to do it. It doesn't mean that you need to give up on the relatively simple execution model of an event loop for a mess of threads, either. Just ask another process to do the work, and handle the result of that work as just another event.

2L2T: DjangoCon Feedback

Friday September 09, 2011

I've been having a great time over here at DjangoCon, but now that I've had an opportunity to relax and process some feedback from my talk, I have noticed a couple of themes to that feedback. This isn't really a full article, just a response, but it's too long to tweet.

If you're curious about the talk, you can view it here.

For the most part, the talk was exceedingly well-received and I want to thank the Django community both for the opportunity to speak and for the overwhelmingly positively response. Thanks for making an outsider to your community feel welcome and appreciated.

There have been a couple misconceptions though, and perhaps I didn't express myself clearly on a few points.

I realize that there are times – plenty of times, even – when using some component that's in a different language from your main application is the right choice. I wasn't trying to say "all Python all the time no matter what, no exceptions". I just want you all to consider that there is a cost to using a component that's in a different language, and you should be aware of that cost. It's not as simple as a tick-a-box feature comparison of the features and drawbacks of multiple products. If I came out as sounding really extreme on this, it just was to provoke a response.
You can have an architecture which is driven by Python and organized by Python without actually having all the implementation be in Python. For example, an inordinate number of people asked me about memcache. If you want using something like that, sure, use memcache, there's not a lot that it being in Python would buy you. Some might say that the whole point of memcache is that it isn't very deeply configurable and doesn't have much in the way of behavior. Plus, it's an internal component, not an externally visible service, so even my usual flimsy "no buffer overflows" argument doesn't really hold up; it's more like a library than a server. You can incorporate memcache into a Python-in-the-driver's-seat architecture by spawning memcache from your Python process instead of making memcache a configuration dependency. That way, you don't need a separate configuration file and a separately managed service or a chef script that boots memcache for you before your application. This applies equally well to any other, similar services: write their config files from your Python code, and start them automatically.

Finally, thanks to everyone who really thought about what I said, took the time to respond, and prompted me to write this.