1 Using Windows, Choosing Host APIs
default-host-api
all-host-apis
host-api
find-output-device
device-low-output-latency
2 Playing Sounds
s16vec-play
3 Playing Streams
stream-play
stream-play/ unsafe
4 A Note on Memory, Synchronization, and Concurrency
4.1 Copying Vs. Streaming
4.2 Memory

Portaudio: Bindings for the Portaudio portable sound library

John Clements <[email protected]>

 (require (planet clements/portaudio:3:=1))
This collection provides bindings to the cross-platform “Portaudio” library, capable of playing sound on Windows, OS X, and Linux.
This package includes the portaudio dynamic libraries for Windows and Mac, where I believe that users will have trouble compiling and installing such a package. On Linux, users must install it themselves.
In addition, there is a small C library that provides interface code. This is provided in compiled form for all platforms.
These C libraries, like all other higher-level parts of this package, assume that all samples are represented as 16-bit signed integers, and that there are exactly two channels of interleaved audio. Doing something else would require recompiling the callback code, or writing your own callbacks and calling the portaudio functions directly.
The functions of the portaudio package are provided directly. These are not documented; instead, read "portaudio.rkt" to see the header file for portaudio, which turns out to be essentially the best existing documentation for the package. Furthermore, these functions are not at all safe, and it’s definitely possible to crash racket using these functions.
There are also higher-level functions, s16vec-play, and stream-play, and stream-play/unsafe, for playing sounds and streams. The first two are safe. These are documented below.
Platforms differ in their support for simultaneous streams. The Mac OS X Coreaudio API seems able to support hundreds of simultaneous streams. The Windows API’s that I’ve tested don’t reliably support more than one. On platforms that don’t support more than one stream, then, playing multiple sounds at once requires adding them all to one stream. This solution also has the advantage of substantially lower latency. The RSound library provides support for this, in the form of its play/s function.
My ability to test on different platforms is limited; I’m always eager to hear about successes and failures that people experience with different OS / Hardware combinations.
Cheers!

1 Using Windows, Choosing Host APIs

Using Portaudio on Windows raises a few extra challenges. In particular, Windows machines generally support a number of different "Host API"s that Portaudio can use to interact with the machine. In addition, these Host APIs may also target multiple different devices.

The default Host API for windows is MME. My observations suggest that this API is limited; it can open only a small number of simultaneous streams, and the latency for playing sounds is extremely high.

The WASAPI API (if that’s not redundant) has its own issues; in particular, it seems to be necessary to manually set the playback device to the right sample rate (for rsound, typically 44100Hz) before starting DrRacket. Failing to do so simply results in an "invalid device" error from Portaudio.

To address these issues, Portaudio includes a number of functions used to control the selection of the host API. Using these functions should not be necessary on Mac OS X.

procedure

(default-host-api)  symbol?

Returns the default API for the platform.

procedure

(all-host-apis)  (listof symbol?)

Returns a list of the APIs supported by the platform.

parameter

(host-api)  (or/c false? symbol?)

(host-api api)  void?
  api : (or/c false? symbol?)
Controls the choice of API made when opening a stream (including calls to s16vec-play and stream-play).

procedure

(find-output-device desired-latency)

  exact-nonnegative-integer?
  desired-latency : number?
Given a latency, finds a device number that uses the current API and has the desired latency and two output channels.

procedure

(device-low-output-latency device-number)  number?

  device-number : exact-nonnegative-integer?
Given a device number, return the "low output latency" associated with that device.

2 Playing Sounds

The first high-level interface involves copying the entire sound into a malloc’ed buffer, and then playing it. This is relatively low-latency. On the other hand, copying the sound involves doubling the memory required for the sound itself, so it’s a bad idea to call this for sounds that are really big (> 100MB?).

procedure

(s16vec-play s16vec    
  start-frame    
  end-frame    
  sample-rate)  (-> void?)
  s16vec : s16vector?
  start-frame : nat?
  end-frame : nat?
  sample-rate : nonnegative-real?
Given an s16vector containing interleaved 16-bit signed integer samples, plays the given sound, starting at the given frame and ending at the given frame. Returns a thunk that can be used to halt the sound, if desired. Play is asynchronous: control returns as soon as the sound has started playing.

This function signals an error if start and end frames are not ordered and legal.

Here’s an example of a short program that plays a sine wave at 426 Hz for 2 seconds:

#lang racket
 
(require (planet clements/portaudio)
         ffi/vector)
 
(define pitch 426)
 
(define sample-rate 44100.0)
(define tpisr (* 2 pi (/ 1.0 sample-rate)))
(define (real->s16 x)
  (inexact->exact (round (* 32767 x))))
 
(define vec (make-s16vector (* 88200 2)))
(for ([t (in-range 88200)])
  (define sample (real->s16 (* 0.2 (sin (* tpisr t pitch)))))
  (s16vector-set! vec (* 2 t) sample)
  (s16vector-set! vec (add1 (* 2 t)) sample))
 
(s16vec-play vec 0 88200 sample-rate)

3 Playing Streams

procedure

(stream-play buffer-filler 
  buffer-time 
  sample-rate) 
  (list/c (-> real?) (-> (list-of (list/c symbol? number?)))(-> void?))
  buffer-filler : (-> buffer-setter? nat? nat? void?)
  buffer-time : nonnegative-real?
  sample-rate : nonnegative-real?
Given a buffer-filling callback and a buffer time (in seconds) and a sample rate, starts playing a stream that uses the given callback to supply data. The buffer-filler receives two arguments: a procedure that can be used to mutate the buffer, and the length of the buffer in frames.

The function returns a list containing three functions: one that queries the stream for a time in seconds, one that returns statistics about the stream, and a third that stops the stream.

This function is believed safe; it should not be possible to crash DrRacket by using this function badly (unless you exhaust memory by choosing an enormous buffer size).

Here’s an example of a program that uses stream-play to play a constant pitch of 426 Hz forever:

#lang racket
 
(require (planet clements/portaudio))
 
(define pitch 426)
(define base-frames 0)
(define sample-rate 44100.0)
(define tpisr (* 2 pi (/ 1.0 sample-rate)))
(define (real->s16 x)
  (inexact->exact (round (* 32767 x))))
 
(define (buffer-filler setter frames)
  (for ([i (in-range frames)]
        [f (in-range base-frames (+ base-frames frames))])
    (define sample
      (real->s16 (* 0.2 (sin (* tpisr f pitch)))))
    (setter (* i 2) sample)
    (setter (+ 1 (* i 2)) sample))
  (set! base-frames (+ base-frames frames)))
 
(match-define (list timer stats stopper)
              (stream-play buffer-filler 0.2 sample-rate))

Note that this example uses a long buffer of 0.2 seconds (= 200 milliseconds) so that most GC pauses won’t interrupt it.

However, this a latency of 200ms is be pretty terrible for an interactive system. I usually use 50ms, and just put up with the occasional miss in return for lower latency.

procedure

(stream-play/unsafe buffer-filler 
  buffer-time 
  sample-rate) 
  (list/c (-> real?) (-> void?))
  buffer-filler : (-> cpointer? int? void?)
  buffer-time : nonnegative-real?
  sample-rate : nonnegative-real?
Given a callback and a buffer time (in seconds) and a sample rate, starts playing a stream using the given callback to supply data.

The difference is that this function’s callback is called with a cpointer, rather than a set!-proxy. This saves the overhead of a function call and several checks, but perhaps more importantly allows the use of functions like memcpy and vector-add that can operate at much higher speeds (currently ~5x) than the current vector operations.

4 A Note on Memory, Synchronization, and Concurrency

Note: the following is not organized to the high standards of a technical paper. The Management would like to apologize in advance, and humbly requests your forgiveness.

Interacting with sound libraries is tricky. The basic framework for this library is what’s called a "pull" architecture; the OS makes a call to a callback every 5-50ms[*], asking for new data to be shoveled into a given buffer. This callback runs on a separate OS thread, which means that Racket must somehow synchronize with this thread to provide data when needed.

One difficulty here is that Racket is garbage-collected, with GC pauses that typically run from 50ms to 100ms. This means that when a program is generating garbage, there are simply bound to be hiccoughs in a stream-based program. In general, these don’t seem to be too awful, and it’s often possible to write programs that generate very little garbage.

After trying several architectures, the model that seems to work the best is a shared-memory design, where the callback is written entirely in C, and takes its data from a buffer shared with Racket. If Racket has written the data into the buffer, then this routine copies it into the OS’s buffer. If not, then it just zeros out the buffer to play silence.

4.1 Copying Vs. Streaming

This package supports two different play interfaces: a "copying" interface and a "streaming" interface.

The copying interface is simple: Racket stuffs an entire sound into a buffer, then opens a new stream, providing a callback that pulls samples out of the buffer until it’s done. This means that the sound is not affected by GC pauses or Racket’s speed. On the other hand, it means duplicating the entire sound (expensive, for large sounds), and it requires a platform that can support multiple streams simultaneously. (OS X, yes. Windows, usually no.) Also, it tends to have higher startup latency (especially on windows), because there’s time required to start a new stream. Finally, it requires pre-rendering of the entire sound, meaning that interactivity is out.

The streaming interface solves these problems, but exposes more of the grotty stuff to the programmer. Rather than providing sound data, the user provides a racket callback that can generate sound data on demand. If the given callback can’t keep up with the demand, the stream starts to hiccough.

More specifically, this package uses a ring buffer, whose length can be specified independently of the underlying machine latency. The Portaudio engine calls the user’s racket callback quite frequently–on the order of every 1-5ms–to top up this ring buffer. When GC pauses occur, the C callback will drink up everything left in the ring buffer, and then just play silence.

Choosing the length of this ring buffer is therefore difficult: too short, and you’ll hear frequent hiccoughs as the C callback runs out of data. Too long, and you get high-latency, sluggish response. Times on the order of 50ms seem to be an acceptable compromise.

4.2 Memory

Shared memory management is a big pain. Racket is garbage-collected, but it’s interacting with an audio library that is not. It’s nearly impossible to avoid all possible race conditions related to the free-ing of memory.

The first and largest issue is the block of memory shared between the Racket engine and the C callback. The current setup is that the memory is freed by a close-stream callback associated with the stream on the Portaudio side. The sequence is therefore this: Racket calls CloseStream. Portaudio then stops calling the callback, and closes the stream. Then, it calls the provided "all-done" callback, which frees the memory. One note here is that Racket should probably wrap the pointer in a mutable object so that it can be severed on the Racket side when the stream is closed. Actually, that’s true of the stream, as well.

[*] Different platforms are different; currently, this package insists on a latency of at most 50ms, or it just refuses to run. It appears that all modern platform can provide this, though it’s sometimes a bit tricky to decide which output device to use.