The Dark Art of Sound on Linux

Sunday January 11, 2009

Introduction and Goal

Experiments need to be slotted into some larger context of research, and their results need to be communicated to other practitioners. That's what makes them true "experiments" instead of private fetishes.
 — Bruce Sterling, The Last Viridian Note
I've been trying to get a USB headset to work gracefully with a variety of applications on Linux for quite some time.  Recently I had a bit more time to investigate why this is so difficult, and to learn a few things about ALSA.  Inspired by Mr. Sterling, I feel compelled to share the results of this ... experimentation.

I realize that many of the things I am going to describe here are bugs.  Some are feature requests.  If I had only discovered one or two, I'd just file them myself, but in this case I feel that producing a comprehensive report, detailing the consequences of the relationships between bugs in different packages, would be more useful.  However, such a report is only useful as a resource for others to come along and extract the individual bug reports and do something about it.  I strongly encourage those of you with the appropriate know-how to extract bug reports from this article, report them, and link here for reference.  I will update the article, and subscribe / vote on any bug reports that get made.  When appropriate, I encourage you all to use the Launchpad bug-tracking service to report these issues.

My purpose here is to provide a snapshot of the state of audio on linux, and to ask the maintainers of various packages to reflect upon their complicity in this sprawling disaster.  To prompt them, perhaps, to fix the relatively basic parts of the audio stack before enhancing the crazy extra features that it sports.  There is plenty of finger-pointing and pointless whinging about the state of Linux's audio setup, but I haven't found much in the way of a detailed critique.

Let me be clear: every single layer in the Linux audio stack is broken: the ALSA kernel drivers, the ALSA library, the sound servers (Pulse, of course, but also ones that I don't cover here, like ESD, aRts, and Jack), and the applications which use all of these things.  I don't want to excuse their brokenness.  But the developers of these things have each given us some interesting code, for free, and we shouldn't blame them for that unless they are actively opposed to making it work right, which I don't believe they are.  We should give them as much clarity as possible into the nature of the problems, because every layer can work around the deficiencies of the others.

In the meanwhile, if elements of my own jury-rigged setup are helpful to anyone else, I hope that reading about my investigations will be easier than performing their own.  Before you dive into this whole mess, though, I'd like to be quite clear: it is not really possible to have multiple audio devices attached to a linux machine which arbitrary applications can select from and use.


Right now I'm doing everything with Ubuntu Hardy.  When I eventually upgrade to Intrepid, I will post an update that describes any differences in my results.  However, in the course of reading the various forums out there, looking for answers to my questions, I believe that the problems persist, not just in Hardy, but in recent versions of RedHat and SuSE desktop distributions as well.  At least, my main problem with pulseaudio remains.

I'm writing about the problems in the most direct way that I perceive them.  That means that I oscillate between APIs, implementation code, configuration and end-user experience.  Although I'd like to motivate everyone to make a sound system that Aunt Tillie would find so pleasant and easy to use that it's almost invisible, I think that a sound system which is usable by a deeply knowledgeable, motivated, skilled programmer is a necessary pre-requisite to that.  So right now, I'm only looking for something that I can use.

Finally, I'm not really editing this.  It's already been a depressing amount of effort to compile all this information, so I'm trying to write it up quickly and avoid

The Problem(s)

I have a headset with a USB sound card, and a normal, onboard sound card that is hooked up to speakers.  I would like to use each of these sound cards at different times for different activities.

For one thing, I like to listen to music.  My desk is right next to my wife's; sometimes we feel like enjoying the same music, sometimes not.  So I'd like to be able to switch my music player between my headphones and my speakers easily.

I also like to make and receive calls via Voice-over-IP.  I would prefer to use the headset to actually make the calls, but when a call comes in, as with a regular phone, I'd like to be able to hear it on my speakers until I make it to the desktop.  Importantly I'd like to be able to do this even if I'm listening to music already.  Without this requirement, I'd probably be fine with just one sound device.  I'm beginning to wonder if it's worth the trouble.

I also play video games.  In Linux, my selection is somewhat limited, but World of Warcraft (when tweaked appropriately) runs very nicely under Cedega.  I'd like to be able to do VoIP and gaming on the same headset.

In order to do all of this audio stuff, this means I need to be able to tell the following applications which audio device to use, and to be able to use the same device concurrently from more than one of them at a time:
  • For VoIP
    • Ekiga
    • Skype
    • Twinkle
    • Ventrilo (via WINE)
    • Adobe Flash 10
  • For Gaming
    • World of Warcraft (via Cedega)
    • Neverwinter Nights
    • Quake4
    • On the Rain-Slick Precipice of Darkness (in other words, Torque)
  • For Media Playback
    • Quod Libet
    • Totem-GStreamer
    • VLC
On MacOS or Windows, telling the equivalent applications what device to use is trivial.  On Linux, I think it's impossible in the general case, and certainly challenging otherwise.

The Solution (Not Really)

I'm sure that, even given this disclaimer, somebody is going to tell me that if I used Pulse, all of my problems would be solved and life would be great.

I've previously written about not using PulseAudio.  In that article I briefly mentioned random lockups and crashes.  That's a pretty serious issue, and it makes Pulse unsuitable for daily use.  I've had others tell me that more recent versions of Pulse are more reliable, and so perhaps this is not such a concern any more.  However, it's not the only problem.

Many applications — including all of my VoIP applications, Ekiga, Skype, Ventrilo (i.e. WINE) and Twinkle — do not yet have support for Pulse.  That should be fine, because there's a "pulse" ALSA plugin which connects ALSA clients to PulseAudio servers.

The only problem there is that the compatibility layer doesn't work.  Until recently, it didn't work for Flash; this has been fixed in the Flash 10 update, but it still doesn't work for Skype.  By the way, If you have any interest in audio and Linux, please click on that issue and vote for it.  Even if you don't care about Skype in particular, other audio programmers are going to look at Skype for an example, so it would be good if they fixed their most serious problems.

The pulse plugin, in my experience (although this is less recent than the rest of the article) also has weird, intermittent issues.  Audio artifacts creeping into streams from certain applications.  Timing issues.  Latency, programs sometimes locking up for a few seconds when opening the audio device.

Another issue with Pulse is that non-pulse-native applications can't tell that there are multiple devices that Pulse is managing.  The whole point of my attempt at a setup here is to have different applications use different devices for different purposes.  Applications like Ekiga and Skype don't

Death of 1000 Bugs

The first and most obvious problem that I face is that although the default device is properly configured so that multiple programs can open it at once, other devices (such as my USB headset) do not inherit this configuration.  If you want software mixing you have to set it up yourself.  Luckily I sort of knew how to do that already.  Unfortunately the online documentation describes a setup that defines lots of extra PCM devices.  These cause programs like Skype (among others) to open lots of half-valid PCMs, emitting lots of scary-looking errors on stderr, pausing while they wait for the device, and generally looking broken.

Application Bugs

First, the bugs I already knew about when I started this.

Most applications choke and don't display a useful error message (or, at best, pop up a modal dialog) when they can't open their device.  They won't tell me what's wrong, so if I don't want to waste time trying to map a PID to a PCM device to an alsa configuration and figure out what the heck went wrong, I have to make sure everybody uses dmix all the time.

Quod Libet has the bad habit of leaving the device open when the music is paused.  Worse, there's no "stop" button, so the only way to free the device is to quit the application.  If anything needs exclusive access to the sound device it's using, too bad.

Flash and most native Linux games (quake4, NWN, OTRSPOD) don't let you configure what audio device they're going to use.  Quod Libet requires you to edit a text file and memorize gstreamer pipeline syntax, which I can never remember.

ALSA Configuration

I set out to define the entire thing in one simple alsaconf stanza, understanding each line of it rather than just copying and pasting.  This is the area where I'd like to raise my first complaint.  ALSA configuration is really, really poorly documented.  Reading the various wikis, one gets the impression that nobody really understands how it works.

In fact, reading pcm.c, I'm not sure that the ALSA people themselves really understand how it works.  There's no intermediary structure to represent actual devices and such, just the tree of config data itself.  Therefore there's no way to verify that a config file is valid without actually calling a bunch of snd_pcm_* functions.

However, with the help of two pieces of documentation, "PCM (digital audio) plugins" and "Configuration files", as well as some reading of the aforementioned .c files, I worked out what was expected in ~/.asoundrc.

I've annotated the final results of this adventure in the resultant configuration file, which is almost an article in its own right.

Broken By Default

Another complaint here is that ALSA's policy seems to be as broken as possible by default.  When you start defining a device, you get a device that can't do software mixing.  Why isn't dmix just part of a regular PCM?  What possible value does making device access exclusive have?  If there is some value, why isn't it explained anywhere?  Okay, okay, so I'll set up dmix.  Wait, now I can mix input, but can't multiplex output?  Why do I have to learn about dsnoop?  All right, now I've set up dsnoop.  Now I have to learn about "asym" to put them together.  Okay.  Wait, now 'arecord' can't record from the device?  What?  The sample rate is wrong?  Oh, I have to use the "plughw" plugin to allow mixing at any sample rate.  Wait what?  dmix doesn't work with plughw?  Oh, I have to wrap a "plug" around the outer device?  Why?  Wait, now that I've set this all up, I have to turn on access to other users who already have been explicitly granted permission to write to this audio device?  And of course, if I unplug and plug in my USB devices in a different order, or restart my computer, all my configuration is now wrong, because all the examples use card indexes rather than more stable identifiers.  So I have to go find the (mostly undocumented) stable identifier and start using that.

Which device?

However, now that I've defined my custom device, there's still another problem.  From my list of software above, Skype can see the new device, but Ekiga and Twinkle can't.  Neither "aplay -l" nor "arecord -l" shows my new device.  However, at this point, Ventrillo is interrogating the Windows sound API, it's getting a list of devices which always contains "default", but randomly contains "dsnoop:0", "dmix:0" — or sometimes, two or three devices with blank names.

A workaround is possible with aplay, arecord, and Twinkle, all of which all allow me to explicitly supply a device.  Leaving WINE aside for the moment, I decided to investigate why it was that Ekiga (purportedly as desktop- and sound-savvy a program as one is likely to find) could not see my custom device

A Detour: The Mystery of ALSA Device Enumeration

Ekiga uses a library, Portable Windows Library, or PWLib, to address audio devices.  When the ALSA PWLib plugin lists devices, it follows the example of the alsa utility "aplay" listing devices, which is to say it uses the API to list only hardware devices.  Ekiga has an explicit provision for the "default" device, realizing that someone may have reconfigured that to do something dynamic, but with no provision to allow you to select a different one (even by explicitly typing it in).

The Bluetooth-Alsa project has also noticed this problem, and has cooked up a patch which looks a little silly to me, hard-coding the device name "headset".  Oddly this would have fixed my problem, even though my "headset" device is not (currently) bluetooth.

After discovering all this, I resolved to find the "right" way to enumerate ALSA devices.  With Skype as the only "good" example, unable to get its source code, I spent a while downloading source to different programs.  Eventually I discovered an example in PortAudio which yielded similar results to Skype's introspection.  I implemented a brief C program of my own to verify it.

The lesson here is for programmers: if you are writing an application or library which uses libasound directly, you need to enumerate the configuration hierarchy under "pcm" and make some enlightened guesses about which devices are interesting.  It's not completely correct, but it will at least get you a list that contains the stuff the user wrote in their ~/.asoundrc.

No really, which device?

So, in the absence of any trick to convince Ekgia and friends to actually look at my newly built configuration — and remember, WINE was even worse — I needed a way to trick the ALSA library into paying attention to environment variables.  As it happens, ALSA does pay attention to environment variables!  Unfortunately, it pays just enough attention to hurt.

In its stock configuration, ALSA respects several environment variables: ALSA_PCM_CARD, ALSA_CARD (which mean "what hardware card to use by default") and ALSA_PCM_DEVICE (which means "what hardware device to use by default").  None of these options allow you to specify an additional config file.  None of these variables allow you to select a non-hardware PCM by default.

And it's quite tricky to tell it how to do that.  There's an example on the ALSA wiki which describes how to do this if you're using Pulse (which, sadly, I am not).  ALSA provides some very useful configuration on the default device, including (in my case, at least) automatic dmix, and I'm not really sure how it does it, so I don't want to override it for most applications.  In the case of the unusual, uncooperative ones, I wanted to be able to set an environment variable.

The configuration language for ALSA really sucks, which is weird, because it contains a LISP implementation that would have been perfect for doing this sort of thing.  However, asound.conf does not have support for conditionals (so you can't say "if this environment variable isn't set, omit this stanza").  Most importantly, the user configuration is evaluated before the system configuration, so even using the extremely primitive facilities available to you, you can't copy the default pcm.default and ctl.default declarations before you decide to override them with your own.

I discovered a fun fact at this point: while "pcm.!default = pcm.default" will just exit with an error, more creative forms of looping (i.e. with "getenv") will segfault applications that use ALSA.  You can probably guess why.  So this ad-hoc mini-language is just complex enough to be dangerous, but not enough to be useful.  This is a good example of the repeated theme that once you need to invoke functions, it's better to just use a real programming language.

With enough head-banging, I eventually realized that while I couldn't conditinally load a configuration file, I could decide to load a default (i.e. empty) file or a file of the user's choosing based on an environment variable.  While this definitely isn't as graceful as "sh -c 'ALSA_PCM=mypcm ekiga'", it works and I can put it into shell aliases and desktop icons and just forget about it.  You can see the result in my config file

Finally I created both .asoundrc.empty and .asoundrc.defaultheadset, which contained simply:
pcm.!default "headset"
ctl.!default "headset"

Okay, this device!

Now, I was ready to test out this setup and see if I could convince e.g. Ventrilo to use my headset, while most programs would start up with my speakers.  Experimenting with ALSA_BONUS_CONFIG using quodlibet, aplay and friends seemed to work fine.  Great!  Only 20 hours, two or three dozen C files, and a painstakingly custom-configured system later, I could use my headset to listen to music!

Surprise!  It Doesn't Quite Work

While most programs work OK in this configuration, it turns out that Wine crumbled under more than superficial testing.  Ventrilo starts up and plays some sounds, but if it's configured to use the dmix playback device, after the PTT button is pushed once, it chokes.  It doesn't matter if I use the OSS drivers with ALSA emulation, a different asym with the hardware capture device, or whatever.  dmix plus ventrilo equals broken.  Except that it does work on my actual default device, if I plug my mic into the regular microphone port rather than into the USB sound card.  Supposely my default sound card is using dmix as well, so — where's the problem?

Although the circumstances are different, this is very much like the last time I was messing with Linux audio, trying to get Skype and Flash to work with Pulse.  They didn't work with the Pulse plugin, but there was no indication why.  Audio skipped, it stopped, the sound server locked up, but in no case did I get a useful error message, or even a distinctive enough error message to google for.  The failure mode of pretty much every layer of the audio stack (and as I have just demonstrated, they all fail a lot) is to emit silence and record silence, to freeze, and to have applications crash before they start up, leaving the user wondering what happened.  It would be a lot less depressing trying to find and diagnose bugs if there were more error messages that made sense.

Still, this hasn't been a complete waste.  I have an environment variable which can select an appropriate default sound device for a given process.  Even if pulse does start working for me, that should be useful for the huge piles of applications that don't directly support it yet.

Probably Enough for Now

My fight with ALSA isn't quite over, and I haven't reached any terribly useful conclusions beyond "this sucks".  I do have a few suggestions, however.

To Audio Infrastructure Programmers

I'd like to say "make everything work", but I realize that isn't realistic.  For now, please just give those of us who are trying to slog through all of this stuff some better tools to understand what is going on, and to debug it.  See how your spiffy sound driver/plugin/library/server works with existing applications.  Even — or perhaps especially — with proprietary applications that users have little recourse to change.

The most frustrating thing about spending so much time trying to make such a simple use-case work is that I've learned so little of value.  I want to provide really useful, detailed bug reports, but at best I understand something somewhere is setting the wrong sample rate, and at worst I am completely baffled.

To Audio Application Programmers

The audio landscape in linux is obviously a mess.  Sorry.  That doesn't mean that you can get away with supporting one of the panoply of sound mechanisms available.  Use PortAudio or OpenAL or something if you possibly can, so that you can leave the work of choosing whether to try pulse or ALSA or OSS or Jack or whatever to someone else.

Please test your applications with more than one sound plugin.  Try it with dmix and dsnoop, try it with pulse, and try it with at least a USB sound device and an internal sound card.  It would probably be good to try it with more than one kind of internal sound card, too, since the drivers apparently vary a lot.

It would also be good if some of you could start talking to the infrastructure guys and maybe agree on some kind of standard for telling applications where their audio should go.  We have $DISPLAY for a reason; there are plenty of times when the "default" isn't really the default.

To You Poor Users

If you possibly can, stick to just one sound card in Linux.  I did that for a while and it worked OK.  It's obviously possible to go beyond that, but depending on what kind of card you're getting, you might have problems.

One trick I've learned is that if you want to exclusively use something other than your onboard sound card, you can disable the onboard card in the BIOS.  This is a lot more reliable than telling Linux to do it, and you can still get the benefits of linux thinking that your other sound card is the "default", which somehow gets various bits of ALSA mixing magic applied to it, which I don't think I've figured out all of yet.

Hopefully by 2010 all of these concerns will be moot, though.  The Linux world has been moving shockingly fast in every other area; maybe it's time for audio to catch up.