The problem is, on the internet, nobody can hear you.

Today I realized what Q2Q is. It is a (I swear, this just came to me, I was not even trying to make it sound like anything) Self-Certifying Remote Endpoint Authentication Mechanism, or "SCREAM".

A SCREAM in this sense is a mechanism whereby connections are authenticated by cryptographic means; where the handshake includes information identifying the connector to an arbitrary level of precision (in Q2Q's case, via an SSL certificate, that the connection is authenticated with)

It is self-certifying because the connection itself identifies itself, via both an in-band nonce and by TLS. All security is transport security.

It refers to a remote endpoint which is the other end of a networked communication. It identifies not only the user, but their agent, and optionally the capabilities and permissions of their agent.

It is an authentication mechanism because you use it to prove that your connection is authentic.

Also, Vertex will blow a hole in your NAT device the size of a watermelon: no kidding. Vertex is the Divmod implementation of Q2Q. We really want Q2Q to become a standard so we are making a big deal out of the separation between product and protocol.

(I really feel like there are some uses for this thing that I've missed. I really hope I have enough time to work on it in the next 6 months to see something through to fruition: other, less focused, worse P2P and identity solutions are starting to get some traction, and it bothers me.)

Six Megabytes

Alan Cox on Twisted:
"6Mbytes of unauditable weirdness"
"First they laugh at you", etc. :)

Knowing Santa Claus is Fake Doesn't Ruin Christmas

There's no such thing as magic. So when someone tells you that you can magically transform blocking code into Deferreds, as in this Python Cookbook posting, From blocking functions to Deferred functions, you should be suspicious.

As Itamar suggested, this particular goal can be accomplished with Twisted's standard twisted.internet.threads.deferToThread, which lacks the horrible, possibly crashing bugs present in the recipe presented above.

But, I'm not really here to talk about the recipe, or to impugn its author, Michele Simionato, who has written several other excellent recipes on ASPN; I have even personally used the DOT-grapher for inheritance hierarchies. I doubt Michele spent much time on this quick hack, or considered it a statement in the holy war I'm about to bring up, so please don't interpret what follows as a personal attack.

What concerns me is that there is a persistent meme around the periphery of the Twisted community that asynchronous programming is too hard, and that things would be easier if it looked like it were multi-threaded. This recently came up in a mailing list post I wrote as well.

My personal opinion on this, and I believe this is a matter of public record, is as follows: CONCURRENCY IS HARD. If you are going to write concurrent programs you need to think about it all the time; you need to plan for race conditions and draw your state-transition diagrams and have big explicit comments in any section of the code that has critical-section requirements even if you don't have to "lock" it as with an event driven system. No inventions have really significantly eased the cognitive difficulty of writing scalable concurrent applications and it is unlikely that any will in the near term. Systems like Twisted and Erlang have both provided powerful tools, but only if you are smart and willing to invest energy in learning to use them properly; they don't make the basic problems any easier. Most of all, threads do not help, in fact, they make the problem worse in many cases. To plagiarize a famous Lisp fellow, if you have a concurrency problem, and you decide to use threads, now you have two problems.

Let's put that aside for the moment, though.

Whether you agree with me about threads or not, though, Twisted was written by, and is maintained by a large group of people who feel basically the same way about this. We have some subtle differences about it, the consensus is the same. Threads are bad. Only use them when you have to, and understand clearly what that means, don't loudly provide "conveniences" for threads or use those "conveniences" for code which could otherwise be written as non-blocking.

Please, Twisted users, please stop trying to turn Twisted into something it isn't. If you want to use threads, write a multi-threaded program and please stop trying to write infrastructure for Twisted to turn it into a big multi-threaded application platform. WSGI and Zope efforts are excluded from this comment, by the way: that's not trying to help people to write threaded Twisted code, that's about trying to help Twisted be the container for code written using a totally different paradigm, on a different framework, and not written to directly use the Twisted libraries.

Programs written with these kinds of thread-happy conveniences are generally the ones which end up the buggiest, the hardest to test, and most likely the least efficient as well. Worst of all, when you do run into those problems, if you ask the Twisted dev team, you are likely to get a lot of smug "I told you so", and very little actual help, since we have seen the problem before and we keep trying to tell folks not to get started down this path. Personally, It's frustrating to have that advice disregarded again and again and still to get help requests from people who ignore it.

Imagine a man walks into a doctor's office, and says, "Doctor doctor, it hurts when I do this OW", promptly shooting himself in the hand with a nailgun. If this is the third time this week the doctor has removed such a nail, do you think the doctor is going to show this patient much patience? Now, imagine the doctor isn't getting paid for his services. The fellow would be lucky to walk out without a second nail...

You may have some awesome ideas about how multi-threaded programs should work. Good for you. I love reading the work of people who think differently than I do and succeed. It's a good way to learn. However, if you ask me, or if you use Twisted, you are going to run into a lot of advice to discard those ideas, a lot of roadblocks related to pervasive multi-threading, and general "impedance mismatch" problems with the differences between the way you think and the way Twisted works. It would probably be less work for you to start from scratch, or to use a system that has threads as a fundamental part of its programming model.

So, please, I'm not offended if you don't like Twisted, but if you like it, appreciate it for what it is, and if you don't, don't bother with it at all. Trying to use it while sweeping the most central parts of it under the rug isn't going to help anyone, least of all you.

Tiny Flag Day

If anyone out there is using Q2Q, the divmod.net server is getting upgraded from the code in the Quotient repository to the code in the Vertex project, in the new Divmod repository mentioned in earlier posts. This will make it slightly incompatible with the divmod.com server for the next few weeks, as upgrading that is more of a significant issue.

It will still mostly work, but you'll see a lot of tracebacks and none of the NAT-traversal code is compatible any more.

This will probably happen 2 or 3 more times before the protocol is totally stable. (There are backwards compatibility mechanisms implemented, but at the small scale of deployment we're at now, they're hardly worth using.)

Encoding.

Mr. Bicking wants to change his default encoding. Since there is some buzz about this I figure it would be a good opportunity to answer something that has already emerged as a FAQ during Axiom's short life, about its treatment of strings.

Axiom does not have strings. It has 2 attribute types that look suspiciously like strings: text() and bytes().

However, 'text()' does not convert a Python str to text for you, and never, ever will. This is not an accident, and it is not because guessing at this sort of automatic conversion is hard. Lots of packages do it, including Python - str(unicode(x)) does do something, after all.

However, in my mind, that is an unfortunate coincidence, and I avoid using the default encoding anywhere I can. Let me respond directly to part of his post, point-by-point:
Are people claiming that there should be no default encoding?
That's what I would say, yes. The default encoding is a process-global variable that sets you up for a lot of confusion, since encoding is always context and data-type dependent. Occasionally I get lazy and use the default encoding, since I know that regardless of what it is it probably has ASCII as a subset (and I know that my data is something like an email address or a URL which functionally must be ASCII), but this is not generally good behavior.
As long as we have non-Unicode strings, I find the argument less than convincing, and I think it reflects the perspective of people who take Unicode very seriously, as compared to programmers who aren't quite so concerned but just want their applications to not be broken; and the current status quo is very deeply broken.
I believe that in the context of this discussion, the term "string" is meaningless. There is text, and there is byte-oriented data (which may very well represent text, but is not yet converted to it). In Python types, Text is unicode. Data is str. The idea of "non-Unicode text" is just a programming error waiting to happen.

The fact that English text, the sort that programmers commonly use to converse with, code with, identify network endpoints with and test program input with, looks very similar in its decoded and encoded forms, is an unfortunate and misleading phenomenon. It means that programs are often very confused about what kind of data they are processing but appear to work anyway, and make serious errors only when presented with input which differs in encoded and decoded form.

SQLite unfortunately succumbs to this malady as well, although at least they tried. Right now we are using its default COLLATE NOCASE for case-insensitive indexing and searches. This is defined according to the docs as "The same as binary, except the 26 upper case characters used by the English language are folded to their lower case equivalents before the comparison is performed." Needless to say, despite SQLite's pervasive use of Unicode throughout the database, that is not how you case-insensitively compare Unicode strings.

Using the default encoding and Unicode only worsens this. Now the program appears to work, and may in fact be correct in the face of non-English, or even non-human-language input, but breaks randomly and mangles data when moved to a different host environment with a different locally-specified default encoding. "Everybody use UTF-8" isn't a solution either; forgetting the huge accidental diversity in this detail of configuration, In Asian countries especially, the system's default encoding implies certain things to a lot of different software. It would be extremely unwise to force your encoding choice upon everyone else.

I don't think that Ian has an entirely unreasonable position; the only reason I know anything about Unicode at all was that I was exposed to a lot of internationalization projects during my brief stint in the game industry, and mostly on projects that had taken multilingual features into account from the start.

The situation that I describe, where text and bytes are clearly delineated and never the twain shall meet, is a fantasy-land sort of scenario. Real-world software still handles multilingual text very badly, and encoding and decoding properly within your software does no good and is a lot of extra work when you're interfacing with a system that only deals with code points 65-90. Forcing people to deal with this detail is often viewed as arrogance on the part of the system designer, and in many scenarios the effort is wasted because the systems you're interfacing with are already broken.

Still, I believe that forcing programmers to consider encoding issues whenever they have to store some text is a very useful exercise, since otherwise - this is important - foreign language users may be completely unable to use your application. What is to you simply a question-mark or box where you expected to see an "é" is, to billions of users the world over, a page full of binary puke where they expected to see a letter they just typed. Even pure English users can benefit: consider the difference between and . Finally, if you are integrating with a crappy, non-Unicode-aware system (or a system that handles Unicode but extremely poorly) you can explicitly note the nature of its disease and fail before passing it data outside the range (usually ASCII) that you know it can handle.

Consider the other things that data - regular python 'str' objects - might represent. Image data, for example. If there were a culture of programmers that expected image data to always be unpacked 32-bit RGBA byte sequences, it would be very difficult to get the Internet off the ground; image formats like PNG and JPEG have to be decoded before they are useful image data, and it is very difficult to set a 'system default image format' and have them all magically decoded and encoded properly. If we did have sys.defaultimageformat, or sys.defaultaudiocodec, we'd end up with an upsetting amount of multi-color snow and shrieking noise on our computers.

That is why Axiom does not, will not, and can not, automatically decode and encode your strings for you. Your string could be a chunk of oscilloscope data, and there is no Unicode encoding for that. If you need to store it, store it unencoded, as data, and load it and interpret it later. There are good reasons why people use different audio and image codecs; there are perhaps less good, but nevertheless valid reasons why people use different Unicode codecs.

To avoid a similar common kind of error, I don't think that Axiom is going to provide a 'float' type before we've implemented a 'money' type - more on why money needs to be encoded and decoded just like Unicode in my next installment :)