Lennart Poettering from Red Hat who develops and maintains PulseAudio has written a detailed explanation about the underlying technical improvements in the upcoming version of PulseAudio. “A while ago I started development of special branch of PulseAudio which is called glitch-free. In a few days I will merge it back to PulseAudio trunk, and eventually release it as 0.9.11. I think it’s time to explain a little what all this ‘glitch-freeness’ is about, what made it so tricky to implement, and why this is totally awesome technology.”
What’s Cooking in PulseAudio’s ‘glitch-free’ Branch
17 Comments
-
2008-04-10 11:14 amiwbcman
I really do like the new OSS 4.x. It is a wonderful system which is very promising, but there is not direct competition between OSS and Pulseaudio.
Pulseaudio is not a sound card driver system-like ALSA and OSS. Moreover Pulseaudio provides functionality which cannot be adressed at the sound driver level-for example Pulseaudio enables a networked sound system where sound events (movie/video audio, mp3’s dvd/cd audio etc.) can be re-routed from one machine to another-and this is coupled with Avahi (same author as Pulseaudio) mDNS auto service discovery-ie. each machine on a network registers it’s available sinks/sources and thus become available to every other machine on the network.
Pulseaudio is entirely userspace-in stark contrast to ALSA and OSS. Additionally OSS does not support all of the cards supported by ALSA and although is has superior support for some sound cards, relative to ALSA, it lacks on other sound cards. Moreover the Linux kernel abandoned support for OSS about 2 years ago. OSS, unfortunately still leaves a bad taste in the mouth of many people in the Linux world due to the ancient OSS API. Back in the day lot’s of software was tied to that ancient API and this has caused nothing but headaches and nightmares for years and years. People were literally forced to purchase multiple sound cards due to the fact that apps which used the ancient OSS API each demanded exclusive access to /dev/dspX.
Now, with OSS 4.X, this is no longer the case-it now has really good software mixing, albeit it is very CPU intensive compared to both Pulseaudio and ALSA. What needs to happen is for a consensus to be formed concerning layers. Although I love the per-application volume control of OSS 4.x, this is not something that belongs at the sound driver system level.
Let us try to break up the layers:
1) sound drivers (OSS and ALSA)
2) sound system-coordination of sound events-network sound (pulseaudio)
3) sound API- high level API for sound events (gstreamer, libsydney, and photon)
4) applications (totem, amarok, banshee etc.)
This attempt to break up the layers is misleading- their is functionality overlap between the layers, but it is an attempt to clarify the different functionality needed. From my point of view no application should have direct sound card access unless it needs to be directly tied to card specifics, which is the case for professional level audio recording applications-apps which use Jack fall into this category.
The problem is we have multiple apps vying for access to the hardware at the same time. The desktop needs sound notification at the same time as Totem is playing a DVD at the same time as Pidgin is notifying you that your buddy has just entered into a chat with you while you are using Skype to talk to someone far away. Coordinating these events is a challenge-and difficult to do so in a way that does not result in stutter and glitches.
Moreover we want the ability to route audio around-not merely from one sound card to another but from one machine to another. Pulseaudio is the best attempt yet to resolve these issues. It work with both ALSA and OSS. It also provides per-application volume control-and this is the correct layer at which to do so IMHO. Right now almost all leading distros are implementing Pulseaudio yet the most important level of integration has not taken place yet-this is the Desktop. KDE via Photon can use Pulseaudio, GNOME via gstreamer can use Pulseaudio.
But there is still a lot of work to do to properly integrate this stuff. For example I want a GUI which enumerates, in a humanly readable way, which audio hardware is available-ie. my motherboard AC/97, my soundblaster PCI, my bluetooth headset and my USB sound device. I also would like to see the audio sinks/sources from the 4 other computers that I run on my local network. This GUI must work with DBUS and HAL so that it dynamically registers bluetooth and USB. I only have 1 set of 5.1 speakers which are physically connected to 1 machine. I also have a headset. Audio from most applications must go to the speakers, yet I want Skype and Teamspeak to go to my headset.
Then comes the issues with down/up mixing. You can see how complicated this gets. OSS cannot address all of these issues-neither can ALSA. Most apps need a very simple sound API for playback-but gstreamer is overkill for this purpose. Libsydney promises to provide such a high level API, Photon does this for KDE.
Now with all this complexity comes the necessity of shielding user and application writers from it. Some apps need only high-level API (Photon/libsydney), some need a more powerful API(gstreamer), some need direct ties to the kernel drivers (Jack applications-jack has plugins for ALSA, OSS, gstreamer, and Pulseaudio).
Us users need a mixer which is readable and deterministic and the ability to do per-application tuning. We need to be able to pick and choose our sources/sinks both locally and in the network.
And we still haven’t addressed OpenAl/SDL. As much as I like OSS we must convince software writers to depend as little as possible on the sound drivers-ie. not to write for the OSS *or* the ALSA API unless it is absolutely necessary.
Right now there is no good way to OSS and ALSA to coexist. Maybe someday this will come, but if we go that way, we are demanding that app writers support both. If 90% of apps used layers above OSS/ALSA we might be able to get the last 10% of apps to properly support both OSS and ALSA. Totem should not care whether my soundcard needs OSS or ALSA drivers-likewise all of my music players, flash and skype/teamspeak- these application do not *need* specifics about my hardware to function properly.
Conversely the more applications which are not directly dependent on OSS/ALSA are likely to be fully cross-platform (windows, OS X,*BSD, Solaris and Linux). The way that OSS 4.x comes now it is all or nothing- it provides it’s own hardware detection, has it’s own mixers and guis, etc. What is really needed, but unlikely to happen is that someone would take each of the drivers supported by OSS and ALSA and rewrite them so that they are programmable with both API’s and combine this with *a* hardware detection system based on HAL and one set of GUI’s. Of course this is only a problem for Linux. And of course this is nothing but a pipe dream.
The Linux camp holds ALSA to be the victor. If you google around looking for *any* information on OSS 4.X for Linux you will find virtually nothing. It is really quite sad. ALSA has improved dramatically over the last years yet it the most user-hostile system ever written- the ALSA wiki is an utter abomination-it is outdated, inaccurate, incomplete, misleading and generally useless, the tools surrounding ALSA are horrid, the configuration files are a study in intense nightmares, the integration via gstreamer in GNOME still confronts us with techno-babble which no one can understand, the mixer names are not readable and the mixer functions are anything but deterministic, ie. you click on one checkbox and 10 other things change with no ability to figure out what does what. I used OSS for a while last year and I felt as if I had entered the 21.st century.
pulse audio a o k here, using hardy and some shitty realtek ac97 onboard sound. flash definately fine with opera, dunno about wine… but that uses alsa drivers so a bunch of pa functionality is not present.
I think its an interesting concept, tho i dont really fathom it all… . if wiser heads than mine think its the way of the future then i am willing to sit tight and see how it all plays out.
I generally find PA to be quite neat, esp the network transparency. Vista and osx both have something similar to pa. I’m not saying just because everybody else does it we have to do it too, but the fact that others have done it well (i.e. coreaudio) implies that the effort is justified.
Also as the post explains, the point of alsa is to be closer to hardware, whereas pa is meant to be a userspace daemon, that incidentally is easier and more flexible to use than alsa (there are numerous complaints about its api)
It is a sad situation. The Free software world has *a* graphical windowing system- Xorg. Graphical applications work across the board on Linux, Solaris, and *BSD. As much as OSS would like to be *the* solution, it is not and will not ever be. It may be *the* system for Solaris and perhaps the *BSD’s but it will always be 2nd runner to ALSA on Linux. This situation hurts the Free software world. We have no bargaining position with audio hardware manufacturers and effectively 0 support from the industry.
There are lots and lots of hackers working on software related to audio but they are each working in their own little worlds and very little is being shared. We need a Freedesktop project to shepherd cross-platform sound and form a unified identity to be able have a bragaining position to demand specs from manufacturers. Perhaps this should be something like a meta-project, a place where people working in all of their own little silo’s can look over their shoulders to see what others are doing. Such a project could really help the Free software community in the long run.
Here’s how:
1) together with xdg specs could be established for 3rd party(Skype/Teamspeak/Flash etc.) app writers when developing apps which use audio for the free desktop: a) discourage use of API’s tied directly to the drivers-first step towards cross-platform compatibility b) list advantages/disadvantages and problematic areas when using specific API’s c) promote specific cross-platform capable API’s.
2) By promoting specific cross-platform API’s this project could also become a bargaining tool to help get hardware spec’s opened and published. This would mean intefacing with the industry- and this would give industry a place to voice their needs.
3) This project could also host hackers from the entire freedesktop world to work on making sure that the API’s which are available are available for all of the platforms, and where this is not possible, work together to find work-arounds which mitigate platform issues.
4) This project could serve as a mediator between the DE’s, WM’s and app writers and those hacking on various aspects of audio. This could really help getting the necessary level of desktop integration and help facilitate the solution of those problems which users are likely to encounter in their desktop usage.
5) This project could choose “blessed” API’s and establish “best coding practices” to ensure that apps take advantage of what’s available, properly integrate themselves and work cross-platform.
This meta-project would need members from each of the seperate projects: people from ALSA, OSS, Pulseaudio, Phonon, SDL, OpenAL but also people from Mplayer, Xine, GNOME, KDE, XFCE and people from the Linux kernel project as well as people from Solaris and the *BSD’s.
Then a list of specific to-do’s could be established and prioritized, for example:
1) apps written to ancient OSS API should be updated to 4.x
2) ALSA-OSS emulation software should be upated to newest ALSA and newest OSS.
3) OSS support for Pulseuadio and gstreamer be updated to 4.x
4) Libsydney could be made into a freedesktop standard API(libsydney is still in the design phase-not much real code yet-but it appears to be a cross-platform form of Pulseaudio*)
*) Pulseaudio is right now very closely tied to feature of the Linux kernel and ALSA. although a version of Pulseaudio is available for Windows it is quite outdated. Pulseaudio seems to be heading in the direction of extreme integration(bluetooth audio, rt, HAL etc.) and thus is tightly coupled to Linux to the exclusion of every other platform. Libsydney will be API compatible with Pulseaudio but is designed to be cross-platform. Apparently libsydney will be a portable subset of Pulseaudio.
> > > “Actually, the goals for libsydneyaudio are at a lower level than
> > > gstreamer/helix/phonon as it is only for PCM audio. The idea is mainly
> > > to have a powerful, but easy to use cross-platform API for PCM audio
> > > capture and playback. So it would sit right on top of ALSA, pulseaudio,
> > > OSS, … and abstract away all hardware-related complexity (e.g. having
> > > to check whether a card supports XYZ on ALSA before using it).”
> >
libsydney is just an API for PCM. And it’s intended to become the only
supported PCM API for PulseAudio. That way if people want to code
against PA they get cross-platform support for free.
From these quotes it appears as if libsydney might be the kind of high-level API that Phonon is or at least play a similiar role: ie. app writers simply write to the libsydney API, which in turn talks directly to the underlying levels-which could be pulseaudio itself, or ALSA or OSS.
Perhaps Pulseaudio can be best understood from the vantage: Pulseaudio seems to be trying to replace the software mixing aspects of the sound card drivers systems (OSS and ALSA). By virtue of it’s tight integration with Linux kernel specifics it is relegating OSS and ALSA to being mere sound card drivers-ie. let the kernel drivers do their kernelspace work and everything above kernelspace, ie. userspace, be handled by Pulseaduio.
The real advances in OSS and ALSA, in the last years, have been in software mixing-software mixing is not really portable-because in order to be performant it must utilize kernel specifics (real time, lock-free etc.). I could of course be wrong and Lennart Poettering might vehemently contradict me -but this is what I have gleamed.
A couple of final notes: I was depressed to find my own writings in google when I searched for OSS, pulseaudio, and libsydney- this means the people working on this stuff are not writing much of anything, leaving people like me to try to synthesize and gleam the ongoing developments, which is bound to contain many errors and misunderstandings. There has been 0 uptake of OSS 4.x in the Linux world which is trully sad. Lennart Poettering responded to my request for better OSS support in Pulseaudio with this :
To be honest I don’t really care about OSS being open sourced or not. ALSA won and that’s not going to change any time soon. PA supports both ALSA and OSS. ALSA support in PA will get a lot more love than OSS support will. Why? Because everyone is going to be using ALSA and not OSS.
I cannot find any discussion between OSS devs and ALSA devs. None at all. The gstreamer guys have announced updated OSS support-but nothing more than a one line blurb.
I fear that OSS will only become relevant in Linux again if someone like Sún decides to purchase 4front tech and create a community around OSS. Why would this impact Linux? because Sun would be pushing for cross-platform audio support. Totally unlikely, but the only real hope I see. Even though OSS is basically being shunned by the Linux community there are probably far more OSS users running Linux than those running Solaris, *BSD’s etc. taken together ie. probably a couple of thousand. And we still have the legacy of apps written to use the ancient OSS API, much of which will never be re-written or updated to the new 4.x API. And ALSA is so seriously under-manned that it just hurts to see. Lennart Poettering would like Pulseaudio to be something like CoreAudio for Linux. But something needs to be done to get people rallying around improved cross-platform audio so that hardware vendors and propietary software companies start properly supporting freedesktop audio architecture.
Sorry for going on and on-audio on Linux has been a nightmare for 15 years now, the whole scene has been fragmented into camps for years now, and if Solaris and the BSD’s become more popular in the next 5 years we will end up where we were when Linux abandoned OSS 3 years ago.
Wow, perhaps some day Pulse will be able to do what JACK audio server has been able to do for years : P
I hope they get the flash bug fixed. I keep crashing out of firefox when I try to view flash movies. Same movies run fine on my CentOS with the same flash plugin.
Not only does the link fail, but Google’s supplied URL also fails when I search for “pulse-glitch-free.html”.
With Google’s cache I get on the same search for “pulse-glitch-free.html”
http://209.85.165.104/search?q=cache:gfAMRSLenXoJ:0pointer.de/blog/…
The problem is on your end, I think, since it works perfectly fine on all my machines here. The link is correct.
They should use git.
It came with my upgrade to Hardy, and it was just too bad.
Opening KDE4, Firefox and Skype at same time just did not worked, and cpu usage was getting high. It just seems like another implementation of a layer, much like arts for KDE3 was.
Just removed it and I’m happy with alsa-only again
I had a similar experience with Fedora 8. PulseAudio caused lags: hitting play/pause/stop in Amarok had a delay and audio/video in VLC had a offset. And when I switch to another login (no matter if X11 or text only) the sound of the currently playing music stopped. When I switched back I heard a crippled bit of the song at the position where I switched the first time and then the current position was played. IMHO this is all crap that no one needs. After removing PulseAudio all behaved like it should.
Well, but flash keeps crashing sometimes. But at last it doesn’t crash Firefox. Only the Flashplayer gets gray.
I don’t see how an additional layer of software makes everything suddenly glitchfree. If the device and/or driver is shitty, no amount of layers of code is going to fix that.
How glitchfree everything is can be noticed on Vista. The onboard HD codec, the various Soundblasters and the C-Media 8788 based cards I’ve tried all glitched under relative low load spikes.
–edit: Citing this because Pulseaudio is also an userland mixer.
Edited 2008-04-09 18:41 UTC
I have had a terrible time getting sound – especially the microphone – working with games under Wine (generic, cedega or CXOffice), but since I upgraded to Hardy Heron Beta with PulseAudio, it works flawlessly. I can now chat while I frag in Half-life. Woohoo! YMMV
Using Fedora 8 and I had to kill PA as well. Still it does seem like a good idea to me once they get it stable, it is still relatively new after all.
I finally got my Creative X-Fi working on Linux with the OSS v4 drivers (which seem to use a similar design in some ways, userland daemon I believe). And I couldn’t be happier with it. Steam apps work great again (using ALSA gave me nothing, or worse, crashes). So if this is where PA is headed, I’m all for it.
Isn’t PA basically ESD redux? Isn’t this a solution we all decided sucked over 10 years ago?
Having read the summary of what this branch does I can say I’m impressed. My first reaction is: Get it in to ALSA. In the Linux world we can’t agree on much and even if I like what I hear I wont ever agree to use PulseAduio. If this is as wonderful as it sounds, put it where it belongs: at the root of the audio stack, in the kernel.
If this offends low-latency buffs give them a compile- or run-time tunable to switch back to whatever behavior they like.
Edited 2008-04-10 00:23 UTC
Don’t get me wrong, I like the idea of a sound server. It looks nice on paper but for now since it is new it is a pain. The real issue is added complexity. Adding another layer means more transactions to complete the goal of outputting audio. Basically all the sound application developers need to adjust their app so it is aware of the soundserver. This is really the main issue. Flash video crashes don’t happen on a Fedora 8 system with pulseaudio removed. Lag in totem also disappears with pulseaudio removed. Pulseaudio mostly works for me in Fedora 8. The Flash crashes and probably static noise when the volume hits zero in sound applications bother me the most. I think the sound server idea and abstraction of sound in/out is an excellent idea. Just the apps have not had time to catch up yet. It is similar to the analogy of moving from a single speed bike to a 12 speed. There is more chance of problems based on increased factors like switching gears. Complexity has a price. Hopefully Fedora 9 will pick up some of this “glitch-free”-ness. I will have to checkout how the Ubuntu people have integrated it. Seems like their experience is more “glitch-free.”
Why use hyped PulseAudio or ALSA when you can use OSS with in kernel live mixing of channles on ALL major UNIX and Linux variants like *BSD, Solaris, HP-UX, AIX, Tru64 and Linux.
While FreeBSD has its own great implementation of OSS drivers, OSS from OpenSound also works great.
You cannot argue about its license, because its avialable with ALL major open source licenses: BSD, CDDL, GPL2.
Why waste time to something totally useless like ALSA/PulseAudio when you have READY, OPEN, FREE, COMPLETE and CROSS PLATFORM sollution for ALL sound needs?