Last week I checked in
a branch which implements the kernel of Twisted in Java, including the
core and TCP server portions of the reactor API and an
AMP IProtocol implementation.
A very small part of this code was "serious work", for establishing a bridge
between a Java server and a Python client; however, the actual protocol
being spoken was not AMP, I merely needed to decode some AMP data in Java.
The rest of it was written over the weekend on a lark, and to see how much
Java I remembered from the days of my youth, to see what it's like to write
a Java application these days.
It may seem pretty odd that I'm doing hacking in Java for fun, especially
after being a
semiprofessional
java-hater some years ago. Although I must concede that Java has
made some strides in the last few years, my pythonista fans need not
worry. I still find Java far more tedious to write than Python, and
the development process more cumbersome. I'm not doing this work
because I want to switch, but rather, because I want to give AMP some
cross-language street cred.
AMP, the Asynchronous Messaging Protocol, is a new protocol that will be
first released in the upcoming Twisted version 2.5. The idea behind AMP is
fairly simple. In some ways, it's the same idea behind XML-RPC. The gist of
the idea behind XML-RPC is that there are a large variety of applications
which have need of a protocol library with some properties:
- extremely simple - a simple protocol means you need to think about
what you expose, which means compact, easy-to-understand interfaces.
- language neutral - the protocol should have a reasonable set of data
types, but not be tied to any specific programming language, to gather the
widest possible adoption (and thereby provide the greatest value to
implementors, as there will be more services they can access)
- easy to deploy - you don't need to allocate a new server, you can just
drop new XML-RPC responders in by creating a new file in most cases
These properties, it turns out, are amazingly useful for a huge class of
applications, and XML-RPC services have cropped up all over the place.
Twisted's main audience is often substantially different from the originally
target audience of XML-RPC. Although superficially both are building network
services, traditional network services are well-served by ... well,
traditional infrastructure, like Apache, MySQL and the plethora of web
frameworks available to Python, Perl, PHP, Ruby, Lisp, and so on.
Applications that are quirky, hard to classify, and have a bit of a "twist"
need Twisted (no pun intended -- okay maybe a little). In fact, much as I
wish it were not so, "normal" applications will often hit roadblocks in
Twisted where they won't on other platforms, because so much more effort has
been spent on edge cases and behavior in exceptional situations than a
smooth transition for "average" usage.
Although, for the reasons given above, Twisted applications defy genre to
some extent, they do have a few features in common. They are network
applications, after all. Many are message-routing or custom control panel
applications, which require a two-way monitoring, streaming, or control
protocol.
- bit-for-bit correct - you can send chunks of data without error-prone
encoding or escaping them in a format like base64. Typical applications
exchange small chunks of text-based data. Many Twisted applications are
exchanging media chunks or encrypted packets, where a botched newline
conversion or mismatched codec is a catastrophic failure rather than an
intermittent annoyance. Anyone who has used an XML library in the real
world can attest to the difficulty of getting this right.
- two-way - in many cases, notifications need to be generated and pushed
down an active connection, rather than the request/response model offered
by XMLRPC. In principle, of course, this is possible with an HTTP-based
protocol as well as with a connection-oriented one, but in practice the
presence of NATs and firewalls will prevent incoming connections in many
situations where outgoing connections are easy.
- asynchronicity - the ability to send and receive messages while
outstanding requests are pending. The nature of Twisted's event loop is,
of course, suited to this, but not all protocols provide this. IMAP4, for
example, has several states where the client must implicitly synchronize
with the server and halt sending further requests until all of its
outstanding requests have been answered, regardless of whether it's
possible to send those messages programmatically or not. (This is
why I had to write a Reactor and a Deferred for Java - it's very hard to
get real two-way communication going when you have to manage large groups
of threads.)
For quite a while now, PB, or "Perspective Broker" has been providing
Twisted users with the latter triad of properties, and more. PB has lots of
fun features: distributed garbage collection, string table compression, a
pluggable object marshalling framework, dynamic proxies, authentication
support, and remote error reporting.
In case that buzzword stew didn't clue you in, PB doesn't exactly have the
"extremely simple" property. Unfortunately, nor is it language neutral in a
practical way. There are Emacs Lisp, Scheme, and Java implementations of the
protocol's encoding, and it was possible to write cross-language
applications, but in practice it is prohibitively difficult to declare or
replicate all the implicit nuances of Python objects serialized via
PB. As an example, most Python objects are dictionaries, and the PB
protocol reflects this - it is possible to have an object's state be a
dictionary that refers to itself . That configuration is an
impossibility in most languages. In some ways, this is a strength; you
can simply assemble your objects and ship them over the network, which saves
a lot of time during development, especially in protocols which need to
communicate about a wide variety of different types.
In short, PB is great for the applications where you need it. It works
especially well for prototyping games, and could probably be optimized into
an awesome protocol even in production, and it can serve in the cases where
you don't really need all of its power. However, over years of working with
PB and discussing it with others who use Twisted, I've discovered that there
are many applications that need a protocol with both the first, and the
second set of properties. There's nothing inherent about wanting to send
asynchronous notifications which requires exact, implicit over-the-wire
copying of any type of object. It seems to me that there's a fairly
widespread need - at least widespread among Twisted users - for a protocol
that falls somewhere between PB's incredible power and XML-RPC's militant
simplicity.
AMP is an attempt to build that protocol. It is a simple protocol that
bridges both sets of requirements. It has some features of PB.
Messages are tagged with request/response headers, so that you can easily
encapsulate one into a Deferred. It has some features of
XML-RPC. Only a few, very simple parameter types are described in the
protocol itself, and there's an extremely simple low-level encoding (far
simpler than XML) so that you don't have to implement very much at all to
get something useful.
A mediocre programmer like yours truly can implement AMP in an hour or two
in a language that they are familiar with. It was designed to be language
neutral - and this brings me back to the Java implementation - I think that
the Java implementation is a good proof of concept that an AMP protocol can
be equally idiomatic in non-Python languages. In Python, this is what
defining and sending a command to increase the volume on a media player
might look like:
class IncreaseVolume:
commandName = 'IncreaseVolume'
arguments = [('howMuch', Integer()]
response = [('currentVolume',
Integer())]
# ...
D = ampInstance.callRemote(IncreaseVolume, howMuch=4)
def showVol(result):
print 'Current volume:',
result['currentVolume']
D.addCallback(showVol)
in Java, this is a bit more verbose, but I don't feel that Java
is a second-class citizen in AMPtown, merely that the language itself is
more verbose:
class IncreaseVolume {
class Arguments {
public int howMuch;
}
class Response {
public int
currentVolume;
}
}
// ...
IncreaseVolume.Arguments ae = new
IncreaseVolume.Arguments();
ae.howMuch = 7;
ampInstance.callRemote("IncreaseVolume", ae, new
IncreaseVolume.Response())
.addCallback(new Deferred.Callback(Object
response) {
System.out.println("Current volume: " +
((IncreaseVolume.Response) response).currentVolume);
});
I hope that in the coming year, AMP will enable a new
generation of networked applications. I know that Divmod will be using
it for a few. It's dead simple to write a custom chat protocol with
it. For example, a whiteboarding application which wanted to mix user
text messages with application control messages. Other two-way,
asynchronous applications include custom GUIs that update when data is
changed, lightweight notification of "push" software or RSS-feed updates,
and monitoring software that aggregates and reports on the status of large
numbers of devices in real time. It's as hard to give a complete
survey of the list of possible AMP applications as it would be to give a
complete list of possible socket applications, and I'm sure that the really
interesting ones are the ones that haven't occurred to me yet.
In the nearer term, the code that spawned this post can be used for a
lightweight bridge between Java and Python applications which need to
exchange a small set of messages. The few initial users who are
looking at this will probably be using it in a configuration where a Java
application needs a feature from Twisted which is not very well supported in
Java-land. Virtualizable IMAP4 server support is one such example, but
there are others. Rather than try to convince an SVN snapshot of
Jython to run Twisted's mainloop, you can run the IMAP4 code in a subprocess
from Java and communicate with it over a socket, exchanging only the
necessary messages. Since there's a reactor in Java as well as one in
Python, the code looks fairly similar at both ends of the wire and the
impedance mismatch is minimized. The java reactor can happily run in a
thread, and need not interfere with existing Java code. The same is
true if you need to make use of a Java library from a Twisted
application.
As more AMP implementations are done, these kinds of integrations can be
used for multi-language plugin systems. Rather than defining a thick
API and loading various scripting systems as shared libraries, your
application can define a simple AMP protocol and run subprocesses which are
expected to speak it. I believe that would substantially reduce the
complexity of most application-scripting systems that want to support
multiple languages. That would also let you turn an existing Python
scriptability system into one which could connect to an arbitrary language,
by spawning that process from python code.
Finally, AMP in more languages means a way to spread the asynchronous
Twisted gospel beyond the confines of the Python implementation
itself. Python is great, but there are a lot of programmers out there
writing a lot of code in a lot of other languages. I think that all of
it could be improved by moving away from bundles of threads and back towards
a simple, unified, event-driven model.