The Television Writer's Guide to Cryptography

On television shows, sometimes characters encounter encrypted data.  There are a number of popular tropes regarding this:
  1. A technically savvy villain has encrypted some data.  The hero needs to guess the password to decrypt it.  To do so, the hero delves into the villain's psychology.  Eventually we discover that the most important thing to the villain is actually their pet rabbit, named "fluffy bunny", not their secret terrorist organization as we initially guessed.  The hero enters "fluffy", just in the nick of time.  Hooray, the hero has cracked the encryption!
  2. A technically savvy villain has encrypted some data, and the hero has their hard drive.  It will take 10 hours to decrypt, but the first bomb goes off in 8 hours!  The hero manages to deal with the first blast, giving our diligent technicians time to decrypt the data.
  3. A technically savvy villain has encrypted the data.  Normally it would be easy to break, but there are multiple layers of encryption, each somehow more devious than the last!  However, our diligent technicians report hourly progress as they break through each "layer".
  4. A technically savvy villain has a computer system that the heroes wish to acquire remote access to.  In order to access this system, the hero hacker must "break the encryption".  This will take some time, but, when the "encryption" is "broken", they have access to the villain's computer, and can control it completely.
These are wrong.  They are so wrong that they set my teeth on edge.

I am not an expert on cryptography.  I have a passing interest in computer security, but I am by no means an expert.  So, I will not approach the topic as an expert.  I won't try to explain any of the math involved; I suspect that previous explanations may have failed to reach these writers' ears because they were too confusing.  Here are a few simple facts about the plot-lines above:
  1. Nobody who has even twenty minutes of experience with encryption software will choose a password like "fluffy".  Of course, many users have weak passwords for their Facebook accounts, but a child-prodigy criminal mastermind who expects federal agents to get his encrypted hard drive will have a password like "qua2IeshvePhu2QuAeShohd8".  They will train themselves to type this from memory, very quickly.  Better yet, if their data is encrypted, it is likely encrypted with a key.  This key will most likely be separate from their data, and the key will itself be encrypted with the password.  These are not crazy military-grade precautions; this is the default behavior of the free encryption software present in various operating systems.
  2. Here's a simple rule of thumb.  If you only take one thing away from this article, I hope it will be this:

    You cannot "break" encryption.  Ever.

    In the days where movie stars will spend months and millions of dollars intensely learning kung-fu so that they can accurately portray martial-arts moves, it is amazing to me that it isn't worth one hour's time for the average television writer who is incorporating cryptography as a plot device to learn this one, very basic piece of information.
    Brute-force attacks against current cryptographic methods would, using present-day cryptographic technology, take — and this is not an exaggeration — a billion billion billion billion billion years to crack.  While there have been a few successful attacks against modern cryptographic methods, they are almost exclusively attacks which involve a bug in a popular piece of software, not a flaw in the cryptographic math.  Those bugs are fixed quickly when they are discovered, and someone concerned about the integrity of their encrypted data could quickly and easily find out about them and get a fixed version of the software in question.  If one cryptographic algorithm were well and truly cracked, there are dozens of others which our villains could upgrade to.  Again, none of this is crazy military-grade security.  This is software that any teenager with a free hour to search the internet could find.  I was encrypting my hard drive with stuff like this when I was 12.
    That's not to say that you can't have encryption being cracked on a television show.  Please be aware, however, that generalized crypto-cracking as a routine task performed by technicians, even extremely skilled technicians, is science fiction.  It is inappropriate in a dramatic show that is trying to be realistic.
    Again, for emphasis: cracking crypto isn't "really hard".  It isn't "practically impossible".  You don't need an "elite hacker" who is "really good" to do it.  Breaking crypto is really, totally, theoretically impossible, and there is a worldwide, very public community of mathematicians and researchers trying to make sure it stays that way.  If your heroes work for some kind of secret spy agency, they should remark upon the ethical considerations of their special access to technology that the general public and the scientific community does not have and are not aware of.
    The one exception to this rule is if the villain chooses a weak password, which can be guessed by a random password guesser.  Our heroes may get lucky and discover that they chose a password which a brute-force decryptor guesses in the first quintillion or so tries.  However, in this case, there is no way to know how long the cracking will take, before it is done.  Each new guess for the password is totally blind; either it decrypts the data or it doesn't.  There's no way to tell how many more guesses you have to go, or in fact whether any of the guesses will work before your guesser runs out of things it could reasonably try.
  3. Since one "layer" of encryption is effectively impossible to break, it would be very strange for our villain to use "layers" of encryption. There's rarely a need.  Ther e are some obscure possible exceptions: the villains might be if they wanted to ensure co-operation within their group, and encrypted data in such a way that multiple keys were required to decrypt it.  Or they might be using onion routing.  However, each "layer" of encryption is equally impossible to break, so it still wouldn't make sense to talk about breaking them one at a time.
  4. All "encryption" is, is converting a block of sensible data ("plaintext") into a block of what appears to be unreadable nonsense ("ciphertext") unless you have the secret decoder ring.  If the hero "breaks the encryption" (which, as I've said above, is probably impossible) they still can't access the villain's computer over the internet, unless the thing that was encrypted was the villain's remote access password.
In summary, the worst recurring theme here - although I recognize its dramatic value - is the "progress bar" approach to computer security problems.  If someone is going to break into a attempt to decrypt some data or remotely access a computer system, either it will work nearly instantly (we know the password for the encrypted data, we know an exploit for the remote system) or it will not work at all. "Your progress indicator will sit at 0% complete forever."

The underlying misconception, I think, is to believe that cryptography is like a locked box that the villain has put their data into.  If the cops found a locked box with some evidence in it, they could ask you for the key (which you would have to hide in one of a limited number of places) or they could simply drill a hole in the box.  Stressed technicians in these TV shows frequently declare that they are "going as fast as they can" with the decryption, as if they were drilling through some very hard metal.

Cryptography is not a metal box.  It's more like a parallel dimension.  There isn't really a good analogy, because no physical security system is quite like cryptography.  But since you're a TV writer if you're reading this (right?) think of it like a Stargate.  Imagine that portable stargates are cheap to manufacture.  Everybody has one; when you buy stuff over the internet, you put your credit card into a stargate and it comes out near the payment processor securely.  (This is how the little lock on your browser works.)

The Cryptogate is not exactly like a Stargate, either.  There isn't a small, limited number of places it can go.  These little devices can go to any point in the multiverse.  Rather than a rotating wheel with a number of characters, they have a little slot, where you insert a piece of glass.  It etches a random pattern on the glass (this is your "private key") that describes the point where your object will be sent: you don't know where it is, except that it will be a spot where it's safe to stick your hand to retrieve it.  It could be anywhere in an infinite number of worlds, in a cave, in the sky: nobody knows, not even you.  You put your "private key" in the key slot, the gate opens up, you drop your valuables in, and then you take your key out.  Those valuables are gone forever.  The gate is a useless hoop of metal without your "key"; there's no way to guess what mysterious pattern of scratches it put on that glass, the destination was random.

You may notice there's no password in that extended metaphor, and indeed, one can use cryptography entirely without passwords; the private key is the important bit.  However, since many people leave the private key on their hard drive, rather than separately, it is itself encrypted with a password.  We can extend the metaphor even further to include this: let's say that your little piece of glass only describes what galaxy will be selected, and you choose a magic phrase that selects what location within that galaxy will be selected.  So, you insert the key, but the gate is still useless until you say the word.  Then it opens up to reveal your stuff.

If you need a physical analogy in your mind, this is what you should imagine breaking cryptography is like.  A bunch of very frustrated technical people sitting around, staring at a useless loop of metal, knowing that it contains what they need, but totally unable to make it do anything useful without a tiny piece of glass that they don't have, and a magic word that they don't know.  They can sit around guessing words and scratching random patterns on glass all day, but they will never know if they're "20% done".

Now that I've destroyed any possible dramatic tension that can come from the race to "break the code", here are some suggestions you can replace these tired old fallacies with:
  1. It's not just bad guys who use cryptography.  In any secure super-secret anti-terrorism anti-supervillain government organization, encrypting all communication is likely to be routine.  What if one of the villains got hold of one of the heroes' private keys, via some kind of deception?  The heroes would be confident that their communications were secure and authentic, because the code is "unbreakable" — but humans are always the weakest link.
  2. A bad guy is planning something bad, and encryping their plans.  The good guys know that if they barge in, the bad guy is going to instantly destroy the key, making the data they need permanently irretrievable.  Cryptography may be secure, but there are some real-life things that aren't.  Like monitors and keyboards.  (Wouldn't it be spooky to show your spy characters determining what someone was typing by listening to them with a stethoscope against a wall?  Or looking at their screen through a solid object?  That's something you can really do!)
  3. A bad guy is using SSL encryption to communicate with a web site.  Luckily our baddy doesn't really know how security works, so the good guys execute a man in the middle attack with the complicity of the baddy's ISP and a valid certificate authority such as VeriSign, for all intents and purposes becoming the "real" web site.  If you're one of those too-clever-by-half writer types that likes that highfalutin social commentary stuff, this might be an intriguing look at our society's blind trust of the flawed security model of the web.
  4. I took away four plot devices, so I'll give you four back: one of our heroes (either temporarily or permanently) loses their encryption key, and cannot access vital information.  Can they get the key back in time?  Or: can they remember enough of their data to work without access to their computerized information?
  5. As a bonus: Spooks ran an interesting episode about a game-over exploit for TLS.  There was still a lot of cringeworthy misunderstanding of what crypto really is, though.  (In a typical mistake, the guy who possesses the crypto crack can mysteriously control computers with it.  But I could suspend my disbelief, because if he could really break crypto that easily, he could observe any communication with the supposedly secure systems, including network sessions that included passwords.)
If anyone reading this knows someone who works as a writer for television shows or movies, please, please recommend that they read this post.  These days, a lot of people learn about technology from popular culture.  We need to have better understanding of basic, everyday technologies like cryptography and digital media, if we are ever going to get sane laws about those things.

Full-Duplex Metablog

I've been doing some thinking about what I use this medium for.  I've come to the conclusion that I'm not really sure.  And yet, I know there are a lot of the things I don't use it for.

I try not to write about blogging (although this post is clearly evidence that I do sometimes) becuase a medium that does nothing but navel-gaze is dull.  I got really tired of reading a few otherwise good authors (names withheld to protect the guilty) who seemed unable to stop writing about how amazing it was that they were writing all this stuff!

I try not to write about politics, because I think it'll do more harm than good.  Here, I will name names, because one person in particular stood out: ESR did a lot to damage his credibility in my eyes with his "Anti-Idiotarian" manifesto and subsequent political blathering.  (Link omitted on purpose. I'd rather remember his useful writing, not go read that junk again.)  My distaste was not really because of the views he espoused, but because they demonstrated the poorly-socialized adolescent perspective on politics that the popular media is so quick to ascribe to all nerds: "If we insult our opponents enough eventually they'll realize our superior reasoning is obviously right".  Now, I can't help but read all of his writing in that light.  I live in mild fear that I might be a poorly-socialized man-child myself.  I don't want to walk around wearing that fact on my sleeve.

I try not to write about personal events, because in many cases they're not that interesting to a wide audience.  For every blog post about how cool blogging is, there are two about how angry somebody is that somebody else is wasting all their time with articles about their cats.  MC Frontalot told me so himself.  (Although I wonder: does anyone complain about how much time is wasted by people forcing them to read blog posts about blog posts about people talking about their cats?  If so, is the irony intense enough to be a carbon-neutral power source?)

I don't bother to write on a schedule, because it seems somewhat arbitrary.  Nobody is desperately waiting to hear my experience of audio problems in Linux.  Nobody's week is going to be ruined if they don't hear me complain about shared-state multithreading yet again.  And of course, keeping to a schedule is hard.

There are lots of things I don't do.  So why am I publishing this stuff at all?

The one thing I can say that I do already do is write about technical topics.  I'm pretty confident in what I'm saying.  I have complex ideas that I want to refer to later, so writing things up in detail is useful to me personally.  It seems like other people sometimes enjoy my perspective.

More generally, I think writing is an important skill, and practicing it helps to develop it.  The process of writing itself is enjoyable.  It's like programming, but easier.  I don't have to be right all the time; if I misspell a few words or use the passive voice, the article doesn't crash.  I find writing for an audience a useful way to explore my own ideas, and having audience is a good way to draw attention to my work.  This is particularly useful in the dog-eat-dog world of open source, attention is the fuel that projects run on.  It's also nice to keep others up to date with what I'm doing.  Public writing serves as a sort of social lubricant.  It's always nice to start a conversation with, "so I heard on your blog...".  It short circuits awkward "I-don't-know-what-you're-about" smalltalk, as well as repeating oneself a lot when asked "what's going on with you".

My motivation, as stated, partially contradicts the above list of self-imposed prohibitions.  If I want to draw an audience, writing about emerging communication technologies, such as blogs (perhaps, especially blogs), seems to attract a default audience that is enthusiastic.  New Media geeks do, after all, read a lot of New Media.  Not that I'd have to force it, either, I have a lot of ideas about the world wide net that I could share.  Similarly, the popular reaction I mention to stories-about-cats blogs indicates that lots of people read them (albeit with mixed levels of disdain).  Writing about personal events would be useful to some of my audience; members of my family, for example, who don't really care about, or even understand, my technical stuff.  Even writing about politics might clarify my thinking.  Since I'm very concerned about politeness to those I disagree with, maybe I could write about my ideas without dropping an "anti-idiotarian" firebomb.

Writing on a schedule might also improve my writing skills somewhat, in that it would cause me to come up with things to write about even if the words weren't bursting forth.  In fact, it was Jeff Atwood talking about "success" as a blogger having to do with keeping a schedule that got me thinking about this in the first place, both about writing on a schedule and defining "success" for myself.

If I were on a schedule, I'd have to learn tricks like coming up with concise ways to phrase things quickly, rather than re-editing and deleting and re-editing and polishing in odd moments for weeks.  For reference, I wrote this post in one sitting, so it was easy to time how long it took.  Granted, it's non-technical, so it's a less comfortable area for me, and I wasn't giving it my full attention, but still: it took 9 hours, 4 of which were almost exclusively proofreading and deleting.  If this were a typing test, I'd clock in at just under 3 words per minute.  I could stand to get a bit faster.

Ultimately these thoughts end up being circular though.  I don't know how I'm going to evolve my writing habits because I don't know what my audience really is.  The whole point of writing for an audience is that one has to leave aside one's particular whims and try to communicate what the audience is interested in.  Feedburner hasn't given me too much information about you thus far.  I know there's about 300 of you.  I know that most of you use Google Reader to read my blog.  Beyond that, the general trend seems to be that later posts have gotten more interest, but that may just be a function of the fact that I'm picking up readers as I go along.  (Plus, feedburner never seemed to work quite right for my LiveJournal.  I'm sure I'm missing a lot of data.)

That's the magic of blogs as a medium though, right?  The instant feedback from the audience, transforming the creation process from the simple act of creation to a feedback loop?  From an isolated experience in the mind of the author to a continuous process, a genuine collaboration between creator and consumer? By the way, if you ask me for more blogging about blogging, don't complain if you get more pretentious nonsense like this.  I've got buckets of it.

Please, tell me what you'd like to read.  I'd also like to expand my audience to other people who might not have exactly your interests, so I'd like to hear what you wouldn't mind.  Would it bother you to see the occasional funny picture of my friends' cats?  To read all about my plan to revitalize the economy by investing a trillion dollars of federal money in buying me a lot of really nice computers, cars, and houses?  How about my Culture / Star Trek crossover fan-fiction?

If I do tackle a wider diversity of topics, do you have any preferences for how I should segregate them?  Different blogs?  Tags?  Does it matter if I segregate them, or would knowing that I'm a card-carrying member of the American Reptoid Control Party destroy your confidence in my technical acumen forever, even if you had to click through a bunch of links to find out?

I'd also be interest in when you'd like to read it.  Personally, I've cared about posting schedules for comics and serialized fiction.  I've never really been waiting around for the next episode of Joel on Software or Coding Horror though.  Have you?  Would you like to see a posting schedule here?

So, there you have it.  Your move, Internet!

The Dark Art of Sound on Linux

Introduction and Goal

Experiments need to be slotted into some larger context of research, and their results need to be communicated to other practitioners. That's what makes them true "experiments" instead of private fetishes.
 — Bruce Sterling, The Last Viridian Note
I've been trying to get a USB headset to work gracefully with a variety of applications on Linux for quite some time.  Recently I had a bit more time to investigate why this is so difficult, and to learn a few things about ALSA.  Inspired by Mr. Sterling, I feel compelled to share the results of this ... experimentation.

I realize that many of the things I am going to describe here are bugs.  Some are feature requests.  If I had only discovered one or two, I'd just file them myself, but in this case I feel that producing a comprehensive report, detailing the consequences of the relationships between bugs in different packages, would be more useful.  However, such a report is only useful as a resource for others to come along and extract the individual bug reports and do something about it.  I strongly encourage those of you with the appropriate know-how to extract bug reports from this article, report them, and link here for reference.  I will update the article, and subscribe / vote on any bug reports that get made.  When appropriate, I encourage you all to use the Launchpad bug-tracking service to report these issues.

My purpose here is to provide a snapshot of the state of audio on linux, and to ask the maintainers of various packages to reflect upon their complicity in this sprawling disaster.  To prompt them, perhaps, to fix the relatively basic parts of the audio stack before enhancing the crazy extra features that it sports.  There is plenty of finger-pointing and pointless whinging about the state of Linux's audio setup, but I haven't found much in the way of a detailed critique.

Let me be clear: every single layer in the Linux audio stack is broken: the ALSA kernel drivers, the ALSA library, the sound servers (Pulse, of course, but also ones that I don't cover here, like ESD, aRts, and Jack), and the applications which use all of these things.  I don't want to excuse their brokenness.  But the developers of these things have each given us some interesting code, for free, and we shouldn't blame them for that unless they are actively opposed to making it work right, which I don't believe they are.  We should give them as much clarity as possible into the nature of the problems, because every layer can work around the deficiencies of the others.

In the meanwhile, if elements of my own jury-rigged setup are helpful to anyone else, I hope that reading about my investigations will be easier than performing their own.  Before you dive into this whole mess, though, I'd like to be quite clear: it is not really possible to have multiple audio devices attached to a linux machine which arbitrary applications can select from and use.

Methodology

Right now I'm doing everything with Ubuntu Hardy.  When I eventually upgrade to Intrepid, I will post an update that describes any differences in my results.  However, in the course of reading the various forums out there, looking for answers to my questions, I believe that the problems persist, not just in Hardy, but in recent versions of RedHat and SuSE desktop distributions as well.  At least, my main problem with pulseaudio remains.

I'm writing about the problems in the most direct way that I perceive them.  That means that I oscillate between APIs, implementation code, configuration and end-user experience.  Although I'd like to motivate everyone to make a sound system that Aunt Tillie would find so pleasant and easy to use that it's almost invisible, I think that a sound system which is usable by a deeply knowledgeable, motivated, skilled programmer is a necessary pre-requisite to that.  So right now, I'm only looking for something that I can use.

Finally, I'm not really editing this.  It's already been a depressing amount of effort to compile all this information, so I'm trying to write it up quickly and avoid

The Problem(s)

I have a headset with a USB sound card, and a normal, onboard sound card that is hooked up to speakers.  I would like to use each of these sound cards at different times for different activities.

For one thing, I like to listen to music.  My desk is right next to my wife's; sometimes we feel like enjoying the same music, sometimes not.  So I'd like to be able to switch my music player between my headphones and my speakers easily.

I also like to make and receive calls via Voice-over-IP.  I would prefer to use the headset to actually make the calls, but when a call comes in, as with a regular phone, I'd like to be able to hear it on my speakers until I make it to the desktop.  Importantly I'd like to be able to do this even if I'm listening to music already.  Without this requirement, I'd probably be fine with just one sound device.  I'm beginning to wonder if it's worth the trouble.

I also play video games.  In Linux, my selection is somewhat limited, but World of Warcraft (when tweaked appropriately) runs very nicely under Cedega.  I'd like to be able to do VoIP and gaming on the same headset.

In order to do all of this audio stuff, this means I need to be able to tell the following applications which audio device to use, and to be able to use the same device concurrently from more than one of them at a time:
  • For VoIP
    • Ekiga
    • Skype
    • Twinkle
    • Ventrilo (via WINE)
    • Adobe Flash 10
  • For Gaming
    • World of Warcraft (via Cedega)
    • Neverwinter Nights
    • Quake4
    • On the Rain-Slick Precipice of Darkness (in other words, Torque)
  • For Media Playback
    • Quod Libet
    • Totem-GStreamer
    • VLC
On MacOS or Windows, telling the equivalent applications what device to use is trivial.  On Linux, I think it's impossible in the general case, and certainly challenging otherwise.

The Solution (Not Really)

I'm sure that, even given this disclaimer, somebody is going to tell me that if I used Pulse, all of my problems would be solved and life would be great.

I've previously written about not using PulseAudio.  In that article I briefly mentioned random lockups and crashes.  That's a pretty serious issue, and it makes Pulse unsuitable for daily use.  I've had others tell me that more recent versions of Pulse are more reliable, and so perhaps this is not such a concern any more.  However, it's not the only problem.

Many applications — including all of my VoIP applications, Ekiga, Skype, Ventrilo (i.e. WINE) and Twinkle — do not yet have support for Pulse.  That should be fine, because there's a "pulse" ALSA plugin which connects ALSA clients to PulseAudio servers.

The only problem there is that the compatibility layer doesn't work.  Until recently, it didn't work for Flash; this has been fixed in the Flash 10 update, but it still doesn't work for Skype.  By the way, If you have any interest in audio and Linux, please click on that issue and vote for it.  Even if you don't care about Skype in particular, other audio programmers are going to look at Skype for an example, so it would be good if they fixed their most serious problems.

The pulse plugin, in my experience (although this is less recent than the rest of the article) also has weird, intermittent issues.  Audio artifacts creeping into streams from certain applications.  Timing issues.  Latency, programs sometimes locking up for a few seconds when opening the audio device.

Another issue with Pulse is that non-pulse-native applications can't tell that there are multiple devices that Pulse is managing.  The whole point of my attempt at a setup here is to have different applications use different devices for different purposes.  Applications like Ekiga and Skype don't

Death of 1000 Bugs

The first and most obvious problem that I face is that although the default device is properly configured so that multiple programs can open it at once, other devices (such as my USB headset) do not inherit this configuration.  If you want software mixing you have to set it up yourself.  Luckily I sort of knew how to do that already.  Unfortunately the online documentation describes a setup that defines lots of extra PCM devices.  These cause programs like Skype (among others) to open lots of half-valid PCMs, emitting lots of scary-looking errors on stderr, pausing while they wait for the device, and generally looking broken.

Application Bugs

First, the bugs I already knew about when I started this.

Most applications choke and don't display a useful error message (or, at best, pop up a modal dialog) when they can't open their device.  They won't tell me what's wrong, so if I don't want to waste time trying to map a PID to a PCM device to an alsa configuration and figure out what the heck went wrong, I have to make sure everybody uses dmix all the time.

Quod Libet has the bad habit of leaving the device open when the music is paused.  Worse, there's no "stop" button, so the only way to free the device is to quit the application.  If anything needs exclusive access to the sound device it's using, too bad.

Flash and most native Linux games (quake4, NWN, OTRSPOD) don't let you configure what audio device they're going to use.  Quod Libet requires you to edit a text file and memorize gstreamer pipeline syntax, which I can never remember.

ALSA Configuration

I set out to define the entire thing in one simple alsaconf stanza, understanding each line of it rather than just copying and pasting.  This is the area where I'd like to raise my first complaint.  ALSA configuration is really, really poorly documented.  Reading the various wikis, one gets the impression that nobody really understands how it works.

In fact, reading pcm.c, I'm not sure that the ALSA people themselves really understand how it works.  There's no intermediary structure to represent actual devices and such, just the tree of config data itself.  Therefore there's no way to verify that a config file is valid without actually calling a bunch of snd_pcm_* functions.

However, with the help of two pieces of documentation, "PCM (digital audio) plugins" and "Configuration files", as well as some reading of the aforementioned .c files, I worked out what was expected in ~/.asoundrc.

I've annotated the final results of this adventure in the resultant configuration file, which is almost an article in its own right.

Broken By Default

Another complaint here is that ALSA's policy seems to be as broken as possible by default.  When you start defining a device, you get a device that can't do software mixing.  Why isn't dmix just part of a regular PCM?  What possible value does making device access exclusive have?  If there is some value, why isn't it explained anywhere?  Okay, okay, so I'll set up dmix.  Wait, now I can mix input, but can't multiplex output?  Why do I have to learn about dsnoop?  All right, now I've set up dsnoop.  Now I have to learn about "asym" to put them together.  Okay.  Wait, now 'arecord' can't record from the device?  What?  The sample rate is wrong?  Oh, I have to use the "plughw" plugin to allow mixing at any sample rate.  Wait what?  dmix doesn't work with plughw?  Oh, I have to wrap a "plug" around the outer device?  Why?  Wait, now that I've set this all up, I have to turn on access to other users who already have been explicitly granted permission to write to this audio device?  And of course, if I unplug and plug in my USB devices in a different order, or restart my computer, all my configuration is now wrong, because all the examples use card indexes rather than more stable identifiers.  So I have to go find the (mostly undocumented) stable identifier and start using that.

Which device?

However, now that I've defined my custom device, there's still another problem.  From my list of software above, Skype can see the new device, but Ekiga and Twinkle can't.  Neither "aplay -l" nor "arecord -l" shows my new device.  However, at this point, Ventrillo is interrogating the Windows sound API, it's getting a list of devices which always contains "default", but randomly contains "dsnoop:0", "dmix:0" — or sometimes, two or three devices with blank names.

A workaround is possible with aplay, arecord, and Twinkle, all of which all allow me to explicitly supply a device.  Leaving WINE aside for the moment, I decided to investigate why it was that Ekiga (purportedly as desktop- and sound-savvy a program as one is likely to find) could not see my custom device

A Detour: The Mystery of ALSA Device Enumeration

Ekiga uses a library, Portable Windows Library, or PWLib, to address audio devices.  When the ALSA PWLib plugin lists devices, it follows the example of the alsa utility "aplay" listing devices, which is to say it uses the API to list only hardware devices.  Ekiga has an explicit provision for the "default" device, realizing that someone may have reconfigured that to do something dynamic, but with no provision to allow you to select a different one (even by explicitly typing it in).

The Bluetooth-Alsa project has also noticed this problem, and has cooked up a patch which looks a little silly to me, hard-coding the device name "headset".  Oddly this would have fixed my problem, even though my "headset" device is not (currently) bluetooth.

After discovering all this, I resolved to find the "right" way to enumerate ALSA devices.  With Skype as the only "good" example, unable to get its source code, I spent a while downloading source to different programs.  Eventually I discovered an example in PortAudio which yielded similar results to Skype's introspection.  I implemented a brief C program of my own to verify it.

The lesson here is for programmers: if you are writing an application or library which uses libasound directly, you need to enumerate the configuration hierarchy under "pcm" and make some enlightened guesses about which devices are interesting.  It's not completely correct, but it will at least get you a list that contains the stuff the user wrote in their ~/.asoundrc.

No really, which device?

So, in the absence of any trick to convince Ekgia and friends to actually look at my newly built configuration — and remember, WINE was even worse — I needed a way to trick the ALSA library into paying attention to environment variables.  As it happens, ALSA does pay attention to environment variables!  Unfortunately, it pays just enough attention to hurt.

In its stock configuration, ALSA respects several environment variables: ALSA_PCM_CARD, ALSA_CARD (which mean "what hardware card to use by default") and ALSA_PCM_DEVICE (which means "what hardware device to use by default").  None of these options allow you to specify an additional config file.  None of these variables allow you to select a non-hardware PCM by default.

And it's quite tricky to tell it how to do that.  There's an example on the ALSA wiki which describes how to do this if you're using Pulse (which, sadly, I am not).  ALSA provides some very useful configuration on the default device, including (in my case, at least) automatic dmix, and I'm not really sure how it does it, so I don't want to override it for most applications.  In the case of the unusual, uncooperative ones, I wanted to be able to set an environment variable.

The configuration language for ALSA really sucks, which is weird, because it contains a LISP implementation that would have been perfect for doing this sort of thing.  However, asound.conf does not have support for conditionals (so you can't say "if this environment variable isn't set, omit this stanza").  Most importantly, the user configuration is evaluated before the system configuration, so even using the extremely primitive facilities available to you, you can't copy the default pcm.default and ctl.default declarations before you decide to override them with your own.

I discovered a fun fact at this point: while "pcm.!default = pcm.default" will just exit with an error, more creative forms of looping (i.e. with "getenv") will segfault applications that use ALSA.  You can probably guess why.  So this ad-hoc mini-language is just complex enough to be dangerous, but not enough to be useful.  This is a good example of the repeated theme that once you need to invoke functions, it's better to just use a real programming language.

With enough head-banging, I eventually realized that while I couldn't conditinally load a configuration file, I could decide to load a default (i.e. empty) file or a file of the user's choosing based on an environment variable.  While this definitely isn't as graceful as "sh -c 'ALSA_PCM=mypcm ekiga'", it works and I can put it into shell aliases and desktop icons and just forget about it.  You can see the result in my config file

Finally I created both .asoundrc.empty and .asoundrc.defaultheadset, which contained simply:
pcm.!default "headset"
ctl.!default "headset"

Okay, this device!

Now, I was ready to test out this setup and see if I could convince e.g. Ventrilo to use my headset, while most programs would start up with my speakers.  Experimenting with ALSA_BONUS_CONFIG using quodlibet, aplay and friends seemed to work fine.  Great!  Only 20 hours, two or three dozen C files, and a painstakingly custom-configured system later, I could use my headset to listen to music!

Surprise!  It Doesn't Quite Work

While most programs work OK in this configuration, it turns out that Wine crumbled under more than superficial testing.  Ventrilo starts up and plays some sounds, but if it's configured to use the dmix playback device, after the PTT button is pushed once, it chokes.  It doesn't matter if I use the OSS drivers with ALSA emulation, a different asym with the hardware capture device, or whatever.  dmix plus ventrilo equals broken.  Except that it does work on my actual default device, if I plug my mic into the regular microphone port rather than into the USB sound card.  Supposely my default sound card is using dmix as well, so — where's the problem?

Although the circumstances are different, this is very much like the last time I was messing with Linux audio, trying to get Skype and Flash to work with Pulse.  They didn't work with the Pulse plugin, but there was no indication why.  Audio skipped, it stopped, the sound server locked up, but in no case did I get a useful error message, or even a distinctive enough error message to google for.  The failure mode of pretty much every layer of the audio stack (and as I have just demonstrated, they all fail a lot) is to emit silence and record silence, to freeze, and to have applications crash before they start up, leaving the user wondering what happened.  It would be a lot less depressing trying to find and diagnose bugs if there were more error messages that made sense.

Still, this hasn't been a complete waste.  I have an environment variable which can select an appropriate default sound device for a given process.  Even if pulse does start working for me, that should be useful for the huge piles of applications that don't directly support it yet.

Probably Enough for Now

My fight with ALSA isn't quite over, and I haven't reached any terribly useful conclusions beyond "this sucks".  I do have a few suggestions, however.

To Audio Infrastructure Programmers

I'd like to say "make everything work", but I realize that isn't realistic.  For now, please just give those of us who are trying to slog through all of this stuff some better tools to understand what is going on, and to debug it.  See how your spiffy sound driver/plugin/library/server works with existing applications.  Even — or perhaps especially — with proprietary applications that users have little recourse to change.

The most frustrating thing about spending so much time trying to make such a simple use-case work is that I've learned so little of value.  I want to provide really useful, detailed bug reports, but at best I understand something somewhere is setting the wrong sample rate, and at worst I am completely baffled.

To Audio Application Programmers

The audio landscape in linux is obviously a mess.  Sorry.  That doesn't mean that you can get away with supporting one of the panoply of sound mechanisms available.  Use PortAudio or OpenAL or something if you possibly can, so that you can leave the work of choosing whether to try pulse or ALSA or OSS or Jack or whatever to someone else.

Please test your applications with more than one sound plugin.  Try it with dmix and dsnoop, try it with pulse, and try it with at least a USB sound device and an internal sound card.  It would probably be good to try it with more than one kind of internal sound card, too, since the drivers apparently vary a lot.

It would also be good if some of you could start talking to the infrastructure guys and maybe agree on some kind of standard for telling applications where their audio should go.  We have $DISPLAY for a reason; there are plenty of times when the "default" isn't really the default.

To You Poor Users

If you possibly can, stick to just one sound card in Linux.  I did that for a while and it worked OK.  It's obviously possible to go beyond that, but depending on what kind of card you're getting, you might have problems.

One trick I've learned is that if you want to exclusively use something other than your onboard sound card, you can disable the onboard card in the BIOS.  This is a lot more reliable than telling Linux to do it, and you can still get the benefits of linux thinking that your other sound card is the "default", which somehow gets various bits of ALSA mixing magic applied to it, which I don't think I've figured out all of yet.

Hopefully by 2010 all of these concerns will be moot, though.  The Linux world has been moving shockingly fast in every other area; maybe it's time for audio to catch up.

The X Window System

Sometimes people ask me what my desktop system looks like.  I use the X Window System.



The X Window System supports bit mapped displays with multiple color depths, from black-and white to the millions of colors shown here.  It supports overlapping windows, multiple fonts, keyboards, pointing devices such as "mice" and "trackballs".

With a sufficiently powerful display adapter, you can also run popular games such as "World of Warcraft" by Blizzard Entertainment.



Okay, that's enough of that.  Please forgive the self-indulgent nostalgia and inside humor — I hope some of you enjoyed it.

To those of you who have no idea what I'm on about: when I was younger, I was really interested in different operating systems and desktop environments, and read a great deal about them.  At the time I was doing this (circa 2000) I was already using a Linux machine, and therefore X windows.  It was a pretty reasonable (if somewhat rough and DIY) desktop environment at the time, but every time i ran across some online publication talking about it, the included picture was some hilariously improbable shot showing TWM, XBiff, XClock, and XLogo.  Inevitably there would inexplicably be an XEyes window as well.  Who would actually run a program that did nothing but display a logo?  Wouldn't a pair of googly eyes following your mouse around be distracting?  Why would you run TWM when you could run WindowMaker?  To go with this improbable screenshot, there was typically a retro blurb explaining that it supported "bit-mapped graphics" and "multiple colors".

These descriptions were written at a time when such things were taken for granted (and not written up in contemporary descriptions of, for example, the BeOS desktop environment).  I suppose the authors were somewhat lazily copying from ancient marketing copy, unable to make sense of the bizarre constellation of window managers and desktop environments surrounding X itself.

Unfortunately most of these pages have since faded from the web, but as an ironic example, Wikipedia still has an article that mentions "raster graphics" and "pointing devices" as well as the traditional screenshot.

Unlike other other systems which throw their history down the memory hole, every major Linux distribution still ships with a recent, up-to-date copy of these archiaic tools.  So, every once in a while, I choose "TWM" from the "sessions" menu in GDM and have a chuckle.  Once cedega allowed me to replace the traditional "xdémineur" with WoW, I realized this screenshot was just begging to be taken.  If you're curious what my desktop actually looks like, here's me switching between windows while writing this article:



Maybe one day Steve Holden will ask me to participate in "On Your Desktop" and I'll go into a bit more detail.

Zork: Now In Full HD

I am a fan of interactive fiction.  While it would be an understatement to say that the medium has experienced a bit of a lull since its heyday in the era of Infocom, I have been fairly impressed by the IFComp competitions of recent years; really enjoying these games as new, unique experiences rather than nostalgia or kitsch.

One thing has been consistently irritating is getting all of my fancy, modern hardware to play these games in a satisfying way.  The one place I definitely don't want any nostalgia in this experience is remembering the 14-inch flickery CRTs that I played these games on in my youth.  I could point fingers at various unsatisfactory pieces of software, but the fact is that the increasing variety of interpreters (glulxe, hugo, tads and z-machines of various versions) it is sometimes frustrating to even get these games to run on linux, let alone look good.

Enter Gargoyle, a classy looking, multi-platform, multi-interpreter interactive fiction system.

My original experience with Gargoyle, several months ago, was actually pretty bad.  It looked nice, but it took forever to figure out how to compile it.  I had to patch it and I couldn't figure out how to do it gracefully enough to compile on anybody else's system.  When I did built it, it opened a small, apparently fixed-size window with a fairly small font.  Not that good for an immersive experience on a desktop, and terrible for playing on the couch on my HDTV across the room.

Today, I discovered that, first of all, it builds now.  Not only that, there are some packages for Ubuntu if you are adventurous enough to try them.  Finally, and perhaps most importantly, I discovered that you can configure it pretty easily.  In a few minutes, thanks to the "Toggle Fullscreen" key in Compiz's "Extra WM Actions" plugin, I had a config file which looks really nice either on a "full HD" (1920x1080) TV or WUXGA desktop monitor.

Thanks to Gargoyle, here is what Zork looks like today:



Get your copy of "Zork HD" (by which I mean, "my ~/.garglkrc file") here.  Toggle it full-screen and enjoy.