Introduction and Goal
Experiments need to be slotted into some larger context of research, and
their results need to be communicated to other practitioners. That's
what makes them true "experiments" instead of private fetishes.
— Bruce Sterling, The Last
Viridian Note
I've been trying to get a USB headset to work gracefully with
a variety of applications on Linux for quite some time. Recently I
had a bit more time to investigate why this is so difficult, and to learn
a few things about ALSA. Inspired by Mr. Sterling, I feel compelled
to share the results of this ... experimentation.
I realize that many of the things I am going to describe here are bugs.
Some are feature requests. If I had only discovered one or
two, I'd just file them myself, but in this case I feel that producing a
comprehensive report, detailing the consequences of the relationships
between bugs in different packages, would be more useful. However,
such a report is only useful as a resource for others to come along and
extract the individual bug reports and do something about it. I
strongly encourage those of you with the appropriate know-how to extract
bug reports from this article, report them, and link here for reference.
I will update the article, and subscribe / vote on any bug reports
that get made. When appropriate, I encourage you all to use the
Launchpad bug-tracking service to report these issues.
My purpose here is to provide a snapshot of the state of audio on linux,
and to ask the maintainers of various packages to reflect upon their
complicity in this sprawling disaster. To prompt them, perhaps, to
fix the relatively basic parts of the audio stack before enhancing the
crazy extra features that it sports. There is plenty of
finger-pointing and
pointless whinging about the state
of Linux's audio setup, but I haven't found much in the way of a detailed
critique.
Let me be clear:
every single layer in the Linux audio stack is
broken: the ALSA kernel drivers, the ALSA library, the sound servers
(Pulse, of course, but also ones that I don't cover here, like ESD, aRts,
and Jack), and the applications which use all of these things. I
don't want to excuse their brokenness. But the developers of these
things have each given us some interesting code, for free, and we
shouldn't blame them for that unless they are actively opposed to making
it work right, which I don't believe they are. We should give them
as much clarity as possible into the nature of the problems, because every
layer can work around the deficiencies of the others.
In the meanwhile, if elements of my own jury-rigged setup are helpful
to anyone else, I hope that reading about my investigations will be easier
than performing their own. Before you dive into this whole mess,
though, I'd like to be quite clear: it is not really possible to have
multiple audio devices attached to a linux machine which arbitrary
applications can select from and use.
Methodology
Right now I'm doing everything with Ubuntu Hardy.
When I eventually upgrade to Intrepid, I will post an update that
describes any differences in my results. However, in the course of
reading the various forums out there, looking for answers to my questions,
I believe that the problems persist, not just in Hardy, but in recent
versions of RedHat and SuSE desktop distributions as well. At least,
my main
problem with pulseaudio remains.
I'm writing about the problems in the most direct way that I perceive
them. That means that I oscillate between APIs, implementation code,
configuration and end-user experience. Although I'd like to motivate
everyone to make a sound system that
Aunt Tillie
would find so pleasant and easy to use that it's almost invisible, I think
that a sound system which is usable by a deeply knowledgeable, motivated,
skilled programmer is a necessary pre-requisite to that. So right
now, I'm only looking for something that I can use.
Finally, I'm not really editing this. It's already been a depressing
amount of effort to compile all this information, so I'm trying to write
it up quickly and avoid
The Problem(s)
I have a
headset with a USB sound card, and a normal,
onboard sound card that is hooked up to speakers. I would like
to use each of these sound cards at different times for different
activities.
For one thing, I like to listen to music. My desk is right next to
my wife's; sometimes we feel like enjoying the same music, sometimes not.
So I'd like to be able to switch my music player between my
headphones and my speakers easily.
I also like to make and receive calls via Voice-over-IP. I would
prefer to use the headset to actually make the calls, but when a call
comes in, as with a regular phone, I'd like to be able to hear it on my
speakers until I make it to the desktop. Importantly I'd like to be
able to do this even if I'm listening to music already. Without this
requirement, I'd probably be fine with just one sound device. I'm
beginning to wonder if it's worth the trouble.
I also play video games. In Linux, my selection is somewhat limited,
but
World of Warcraft (when
tweaked appropriately) runs very nicely under
Cedega. I'd like to be able to do VoIP and
gaming on the same headset.
In order to do all of this audio stuff, this means I need to be able to
tell the following applications which audio device to use, and to be able
to use the same device concurrently from more than one of them at a
time:
- For VoIP
- Ekiga
- Skype
- Twinkle
- Ventrilo (via WINE)
- Adobe Flash 10
- For Gaming
-
- World of Warcraft (via Cedega)
- Neverwinter Nights
- Quake4
- On the Rain-Slick Precipice of Darkness (in other words,
Torque)
- For Media Playback
- Quod Libet
- Totem-GStreamer
- VLC
On MacOS or Windows, telling the equivalent applications what device
to use is trivial. On Linux, I think it's impossible in the general
case, and certainly challenging otherwise.
The Solution (Not Really)
I'm sure that, even given this
disclaimer, somebody is going to tell me that if I used Pulse, all of my
problems would be solved and life would be great.
I've previously
written about not using PulseAudio. In that article I briefly
mentioned random lockups and crashes. That's a pretty serious issue,
and it makes Pulse unsuitable for daily use. I've had others tell me
that more recent versions of Pulse are more reliable, and so perhaps this
is not such a concern any more. However, it's not the only
problem.
Many applications — including all of my VoIP applications, Ekiga, Skype,
Ventrilo (i.e. WINE) and Twinkle — do not yet have support for Pulse.
That should be fine, because there's a "pulse" ALSA plugin which
connects ALSA clients to PulseAudio servers.
The only problem there is that the compatibility layer doesn't work.
Until recently, it didn't work for Flash; this has been fixed in the
Flash 10 update, but
it still doesn't work
for Skype. By the way, If you have any interest in audio and
Linux, please click on that issue and vote for it. Even if you don't
care about Skype in particular, other audio programmers are going to look
at Skype for an example, so it would be good if they fixed their most
serious problems.
The pulse plugin, in my experience (although this is less recent than the
rest of the article) also has weird, intermittent issues. Audio
artifacts creeping into streams from certain applications. Timing
issues. Latency, programs sometimes locking up for a few seconds
when opening the audio device.
Another issue with Pulse is that non-pulse-native applications can't tell
that there are multiple devices that Pulse is managing. The whole
point of my attempt at a setup here is to have different applications use
different devices for different purposes. Applications like Ekiga
and Skype don't
Death of 1000 Bugs
The first and most obvious problem that I face
is that although the
default device is properly configured so that
multiple programs can open it at once, other devices (such as my USB
headset) do not inherit this configuration. If you want software
mixing you have to set it up yourself. Luckily I sort of knew how to
do that already. Unfortunately
the online documentation
describes a setup that defines lots of extra PCM devices. These
cause programs like Skype (among others) to open lots of half-valid PCMs,
emitting lots of scary-looking errors on stderr, pausing while they wait
for the device, and generally looking broken.
Application Bugs
First, the bugs I already knew about when I
started this.
Most applications choke and don't display a useful error message (or, at
best, pop up a modal dialog) when they can't open their device. They
won't tell me what's wrong, so if I don't want to waste time trying to map
a PID to a PCM device to an alsa configuration and figure out what the
heck went wrong, I have to make sure everybody uses dmix all the time.
Quod Libet has the bad habit of leaving the device open when the music is
paused. Worse, there's no "stop" button, so the only way to free the
device is to quit the application. If anything needs exclusive
access to the sound device it's using, too bad.
Flash and most native Linux games (quake4, NWN, OTRSPOD) don't let you
configure what audio device they're going to use. Quod Libet
requires you to edit a text file and memorize gstreamer pipeline syntax,
which I can never remember.
ALSA Configuration
I set out to define the entire thing in one
simple alsaconf stanza, understanding each line of it rather than just
copying and pasting. This is the area where I'd like to raise my
first complaint. ALSA configuration is really, really poorly
documented. Reading the various wikis, one gets the impression that
nobody really understands how it works.
In fact, reading
pcm.c
, I'm not sure that the ALSA people
themselves really understand how it works. There's no intermediary
structure to represent actual devices and such, just the tree of config
data itself. Therefore there's no way to verify that a config file
is valid without actually
calling a bunch of snd_pcm_*
functions.
However, with the help of two pieces of documentation, "
PCM
(digital audio) plugins" and "
Configuration
files", as well as some reading of the aforementioned
.c
files, I worked out what was expected in
~/.asoundrc
.
I've annotated the final results of this adventure
in the
resultant configuration file, which is almost an article in its own
right.
Broken By Default
Another complaint here is that ALSA's policy
seems to be as broken as possible by default. When you start
defining a device, you get a device that can't do software mixing.
Why isn't dmix just part of a regular PCM? What possible value
does making device access exclusive have? If there
is some
value, why isn't it explained anywhere? Okay, okay, so I'll set up
dmix. Wait, now I can mix input, but can't multiplex output?
Why do I have to learn about dsnoop? All right, now I've set
up dsnoop. Now I have to learn about "asym" to put them together.
Okay. Wait, now 'arecord' can't record from the device?
What? The sample rate is wrong? Oh, I have to use the
"plughw" plugin to allow mixing at any sample rate. Wait what?
dmix doesn't work with plughw? Oh, I have to wrap a "plug"
around the outer device? Why? Wait, now that I've set this all
up, I have to turn on access to other users
who already have been
explicitly granted permission to write to this audio device? And
of course, if I unplug and plug in my USB devices in a different order, or
restart my computer, all my configuration is now wrong, because all the
examples use card indexes rather than more stable identifiers. So I
have to go find the (mostly undocumented) stable identifier and start
using that.
Which device?
However, now that I've defined my custom device,
there's still another problem. From my list of software above, Skype
can see the new device, but Ekiga and Twinkle can't. Neither
"
aplay -l
" nor "
arecord -l
" shows my new device.
However, at this point, Ventrillo is interrogating the Windows sound
API, it's getting a list of devices which always contains "default", but
randomly contains "dsnoop:0", "dmix:0" — or sometimes, two or three
devices with blank names.
A workaround is possible with aplay, arecord, and Twinkle, all of which
all allow me to explicitly supply a device. Leaving WINE aside for
the moment, I decided to investigate why it was that Ekiga (purportedly as
desktop- and sound-savvy a program as one is likely to find) could not see
my custom device
A Detour: The Mystery of ALSA Device Enumeration
Ekiga uses a
library,
Portable
Windows Library, or PWLib, to address audio devices. When
the ALSA PWLib plugin lists devices,
it follows the example of the alsa utility "aplay" listing devices,
which is to say it uses the API to list only hardware devices. Ekiga
has an explicit provision for the "default" device, realizing that someone
may have reconfigured
that to do something dynamic, but with no
provision to allow you to select a different one (even by explicitly
typing it in).
The Bluetooth-Alsa project
has also noticed this
problem, and has cooked up
a patch which looks a little silly to me, hard-coding the device name
"headset". Oddly this would have fixed my problem, even though my
"headset" device is not (currently) bluetooth.
After discovering all this, I resolved to find the "right" way to
enumerate ALSA devices. With Skype as the only "good" example,
unable to get its source code, I spent a while downloading source to
different programs. Eventually I discovered
an
example in PortAudio which yielded similar results to Skype's
introspection. I implemented a brief C program of my own to verify
it.
The lesson here is for programmers: if you are writing an application or
library which uses
libasound
directly, you need to enumerate
the configuration hierarchy under "pcm" and make some enlightened guesses
about which devices are interesting. It's not completely correct,
but it will at least get you a list that contains the stuff the user wrote
in their
~/.asoundrc
.
No really, which device?
So, in the absence of any trick to
convince Ekgia and friends to actually look at my newly built
configuration — and remember, WINE was even worse — I needed a way to
trick the ALSA library into paying attention to environment variables.
As it happens, ALSA does pay attention to environment variables!
Unfortunately, it pays just enough attention to hurt.
In its stock configuration, ALSA
respects several environment variables:
ALSA_PCM_CARD
,
ALSA_CARD
(which mean "what hardware card to use by default")
and
ALSA_PCM_DEVICE
(which means "what hardware device to use
by default"). None of these options allow you to specify an
additional config file. None of these variables allow you to select
a non-hardware PCM by default.
And it's quite tricky to tell it how to do that. There's
an example on the ALSA wiki which describes how to do this if you're
using Pulse (which, sadly, I am not). ALSA provides some very useful
configuration on the default device, including (in my case, at least)
automatic dmix, and I'm not really sure how it does it, so I don't want to
override it for most applications. In the case of the unusual,
uncooperative ones, I wanted to be able to set an environment
variable.
The configuration language for ALSA really sucks, which is weird, because
it contains a LISP implementation that would have been perfect for
doing this sort of thing. However, asound.conf does not have support
for conditionals (so you can't say "if this environment variable isn't
set, omit this stanza"). Most importantly, the user configuration is
evaluated before the system configuration, so even using the extremely
primitive facilities available to you, you can't copy the default
pcm.default and ctl.default declarations before you decide to override
them with your own.
I discovered a fun fact at this point: while "
pcm.!default =
pcm.default
" will just exit with an error, more creative forms of
looping (i.e. with "getenv") will segfault applications that use ALSA.
You can probably guess why. So this ad-hoc mini-language is
just complex enough to be dangerous, but not enough to be useful.
This is a good example of the repeated theme that once you need to
invoke functions,
it's better to just use a real programming language.
With enough head-banging, I eventually realized that while I couldn't
conditinally load a configuration file, I
could decide to load a
default (i.e. empty) file or a file of the user's choosing based on an
environment variable. While this definitely isn't as graceful as
"
sh -c 'ALSA_PCM=mypcm ekiga'
", it works and I can put it
into shell aliases and desktop icons and just forget about it. You
can see the result in my config file
Finally I created both .asoundrc.empty and .asoundrc.defaultheadset, which
contained simply:
pcm.!default "headset"
ctl.!default "headset"
Okay, this device!
Now, I was ready to test out this setup
and see if I could convince e.g. Ventrilo to use my headset, while most
programs would start up with my speakers. Experimenting with
ALSA_BONUS_CONFIG using quodlibet, aplay and friends seemed to work fine.
Great! Only 20 hours, two or three dozen C files, and a
painstakingly custom-configured system later, I could use my headset to
listen to music!
Surprise! It Doesn't Quite Work
While most programs work OK
in this configuration, it turns out that Wine crumbled under more than
superficial testing. Ventrilo starts up and plays some sounds, but
if it's configured to use the dmix playback device, after the PTT button
is pushed once, it chokes. It doesn't matter if I use the OSS
drivers with ALSA emulation, a different asym with the hardware capture
device, or whatever. dmix plus ventrilo equals broken. Except
that it does work on my actual default device, if I plug my mic into the
regular microphone port rather than into the USB sound card.
Supposely my default sound card is using dmix as well, so — where's
the problem?
Although the circumstances are different, this is very much like the last
time I was messing with Linux audio, trying to get Skype and Flash to work
with Pulse. They didn't work with the Pulse plugin, but there was no
indication why. Audio skipped, it stopped, the sound server locked
up, but in no case did I get a useful error message, or even a distinctive
enough error message to google for. The failure mode of pretty much
every layer of the audio stack (and as I have just demonstrated, they all
fail a lot) is to emit silence and record silence, to freeze, and to have
applications crash before they start up, leaving the user wondering what
happened. It would be a lot less depressing trying to find and
diagnose bugs if there were more error messages that made sense.
Still, this hasn't been a complete waste. I have an environment
variable which can select an appropriate default sound device for a given
process. Even if pulse does start working for me, that should be
useful for the huge piles of applications that don't directly support it
yet.
Probably Enough for Now
My fight with ALSA isn't quite over, and I
haven't reached any terribly useful conclusions beyond "this sucks".
I do have a few suggestions, however.
To Audio Infrastructure Programmers
I'd like to say "make
everything work", but I realize that isn't realistic. For now,
please just give those of us who are trying to slog through all of this
stuff some better tools to understand what is going on, and to debug it.
See how your spiffy sound driver/plugin/library/server works with
existing applications. Even — or perhaps especially — with
proprietary applications that users have little recourse to change.
The most frustrating thing about spending so much time trying to make such
a simple use-case work is that I've learned so little of value. I
want to provide really useful, detailed bug reports, but at best I
understand something somewhere is setting the wrong sample rate, and at
worst I am completely baffled.
To Audio Application Programmers
The audio landscape in linux is
obviously a mess. Sorry. That doesn't mean that you can get
away with supporting one of the panoply of sound mechanisms available.
Use PortAudio or OpenAL or something if you possibly can, so that
you can leave the work of choosing whether to try pulse or ALSA or OSS or
Jack or whatever to someone else.
Please test your applications with more than one sound plugin. Try
it with dmix and dsnoop, try it with pulse, and try it with at least a USB
sound device and an internal sound card. It would probably be good
to try it with more than one kind of internal sound card, too, since the
drivers apparently vary a lot.
It would also be good if some of you could start talking to the
infrastructure guys and maybe agree on some kind of standard for telling
applications where their audio should go. We have $DISPLAY for a
reason; there are plenty of times when the "default" isn't really the
default.
To You Poor Users
If you possibly can, stick to just one sound
card in Linux. I did that for a while and it worked OK. It's
obviously possible to go beyond that, but depending on what kind of card
you're getting, you might have problems.
One trick I've learned is that if you want to exclusively use something
other than your onboard sound card, you can disable the onboard card in
the BIOS. This is a lot more reliable than telling Linux to do it,
and you can still get the benefits of linux thinking that your other sound
card is the "default", which somehow gets various bits of ALSA mixing
magic applied to it, which I don't think I've figured out all of yet.
Hopefully by 2010 all of these concerns will be moot, though. The
Linux world has been moving shockingly fast in every other area; maybe
it's time for audio to catch up.