Equipment Standard – Audio latency

One of the first issues I wanted to look into when I started working on my master thesis was audio latency.

Audio latency is the delay between when an audio signal enters and when it emerges from a system. Potential contributors to latency in an audio system include analog-to-digital conversion, buffering, digital signal processing, transmission time, digital-to-analog conversion and the speed of sound in air.

The reason I wanted to look into this immediately is that I felt that for a drummer, being quite conscious about timing, this would be an essential aspect.

To elaborate further:

When playing with effects such as reverbs and other effects that lasts for a certain amount of time, I don’t feel that some milliseconds back or forth have any disadvantage, it could actually be an interesting musical aspect.

But when it comes to triggering sounds, delays, looping, gating and other effects that have to react fast, audio latency is relevant.

When I started testing this, I was very fixated on finding the sound card with the lowest latency and I thought that when I did, it would radically revolutionize my setup.

With my current setup, I have to live with a total latency at about 8 milliseconds. What i gathered from my research (as shown below) is that I could obtain a latency at about 2 milliseconds with a certain, and quite affordable setup. That’s a difference of 6 milliseconds, let’s put that into context.

In dry air at 20 °C, the speed of sound is 343.2 meters per second. That is 343.2/1000 = 0.3432 meters per millisecond ≈ one meter in 3 milliseconds. So the difference in audio latency between the two sound cards corresponds to two meters in distance to a sound source. Many guitarists have at least that distance to their amplifier. The distance from my drums to my ears could be close to a meter, that’s 3 milliseconds. What about a taller drummer than me, that would mean even more latency. From this reflections, I can’t conclude on anything else than that it’s all about habituation.

So, to a certain extent one can get used to a fair amount of latency, you learn to press “loop” a little bit before you actually will loop, and you make sure you push the beat just a little bit when playing with delays or triggering samples. This definitely doesn’t have to be conscious, by practicing and listening, one adapts to this quite quickly.

But compressors, gates and similar effects still demand low latency to operate at their best. When compressing drums, I don’t think you would want to add another 8 milliseconds before the compressor actually kicks inn. That’s why I completed the following test.

First, a brief run trough of the parameters needed to control the latency.

– Sampling rate:

When analog audio is sent into the computer, it goes into an ADC (Analog-to-digital converter) which takes X samples of the audio per second. X is defined in hertz, is called the sampling rate and is usually at a value of 44’100 or above if you want to listen to Swedish-American engineer Harry Nyquist and how his theorem is described on Wikipedia:

In theory, a Nyquist frequency just larger than the signal bandwidth is sufficient to allow perfect reconstruction of the signal from the samples: see Sampling theorem: Critical frequency. However, this reconstruction requires an ideal filter that passes some frequencies unchanged while suppressing all others completely (commonly called a brick-wall filter). In practice, perfect reconstruction is unattainable. Some amount of aliasing is unavoidable.

Signal frequencies higher than the Nyquist frequency will encounter a “folding” about the Nyquist frequency, back into lower frequencies. For example, if the sample rate is 20 kHz, the Nyquist frequency is 10 kHz, and an 11 kHz signal will fold, or alias, to 9 kHz. However, a 9 kHz signal can also fold up to 11 kHz in that case if the reconstruction filter is not adequate. Both types of aliasing can be important.

When attainable filters are used, some degree of oversampling is necessary to accommodate the practical constraints on anti-aliasing filters: instead of a brickwall, one has flat response in the passband up to a point called the cutoff frequency or corner frequency, (pass all frequencies below there unchanged), then gradual rolloff in a transition band, finally suppressing signals above a certain point completely or almost completely in the stopband. Thus, frequencies close to the Nyquist frequency may be distorted in the sampling and reconstruction process, so the bandwidth should be kept below the Nyquist frequency by some margin (frequency headroom) that depends on the actual filters used.

For example, audio CDs have a sampling frequency of 44100 Hz. The Nyquist frequency is therefore 22050 Hz, which is an upper bound on the highest frequency the data can unambiguously represent. If the chosen anti-aliasing filter (a low-pass filter in this case) has a transition band of 2000 Hz, then the cut-off frequency should be no higher than 20050 Hz to yield a signal with negligible power at frequencies of 22050 Hz and greater.

– Buffer size:

A buffer in this contexts can be explained as how much time the computer can use for “processing” before we have to see or hear something happening and it is defined in samples. So, if the buffer size is 512 samples and the sampling rate is 44’100Hz, the buffer adds an additional delay of 512/44100 ≈ 0.012 = 12 milliseconds. The computer do need a buffer, how low you can go regarding buffer size depends on the computers cpu, ram and hdd speed but also, a really important factor is what audio interface (sound card) you are using. This is because all the manufacturers write their own drivers and have different ADC’s (analog-to-digital-converters) and DAC’s (digital-to-analog-converters), which are all major contributors to latency.

The complete audio monitoring latency of a configuration, often referred to as the roundtrip latency, is the sum of:

AD converter latency
Input portion of the I/O buffer
Audio driver latency
Output portion of the I/O buffer
DA converter latency

When just doing playback, of course you don’t have to go trough all of these and the latency is the sum of:

Audio driver latency
Output portion of the I/O Buffer
DA converter latency

Now, let’s move on to the actual test:

These where the candidates:

Focusrite Pro40 (FW)
RME Babyface (USB)
RME Fireface 800 (FW)
“Lynx Aurora 8 (FW)” Didn’t get this to speak well with OSX 10.7.1, or I had the wrong break-out cable.

Focusrite ISA 828 (ADAT)

I tested the I/O-latency by running a cable from the output to the input and measuring the delay with a program called MaxMSP.

I used a MacBook Pro 6,2 (15″ mid 2010), 2,66GHz intel core i7, 4 GB 1067 MHz DDR3 ram, running Mac OS X Lion 10.7.1 (11B26).

I/O vector size and signal vector size as described in the MaxMSP manual:

The I/O Vector Size (I/O stands for input/output) controls the number of samples that are transferred to and from the audio interface at one time.

The Signal Vector Size sets the number of samples that are calculated by MSP objects at one time. This can be less than or equal to the I/O Vector Size, but not more. If the Signal Vector Size is less than the I/O Vector Size, MSP calculates two or more signal vectors in succession for each I/O vector that needs to be calculated.

With an I/O vector size of 256, and a sampling rate of 44.1 kHz, MSP calculates about 5.8 milliseconds of audio data at a time.

The I/O Vector Size may have an effect on latency and overall performance. A smaller vector size may reduce the inherent delay between audio input and audio output, because MSP has to perform calculations for a smaller chunk of time. On the other hand, there is an additional computational burden each time MSP prepares to calculate another vector (the next chunk of audio), so it is easier over-all for the processor to compute a larger vector. However, there is another side to this story. When MSP calculates a vector of audio, it does so in what is known as an interrupt. If MSP is running on your computer, whatever you happen to be doing (word processing, for example) is interrupted and an I/O vector’s worth of audio is calculated and played. Then the computer returns to its normally scheduled program. If the vector size is large enough, the computer may get a bit behind and the audio output may start to click because the processing took longer than the computer expected. Reducing the I/O Vector Size may solve this problem, or it may not. On the other hand, if you try to generate too many interrupts, the computer will slow down trying to process them (saving what you are doing and starting another task is hard work). Therefore, you’ll typically find the smaller I/O Vector Sizes consume a greater percentage of the computer’s resources. Optimizing the performance of any particular signal network when you are close to the limit of your CPU’s capability is a trial-and-error process. That’s why MSP provides you with a choice of vector sizes.

—–

Babyface (USB), 44,1kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 171 samples = 3.877551 ms
I/O vector size: 32, Signal vector size: 32 = 203 samples = 4.603175 ms
I/O vector size: 64, Signal vector size: 64 = 267 samples = 6.054422 ms

I/O vector size: 1024, Signal vector size: 1024 = 2187 samples = 49.591835 ms

Babyface (USB) + ISA 828, 44,1kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 170 samples = 3.854875 ms
I/O vector size: 32, Signal vector size: 32 = 202 samples = 4.580499 ms
I/O vector size: 64, Signal vector size: 64 = 266 samples = 6.031746 ms

I/O vector size: 1024, Signal vector size: 1024 = 2186 samples = 49.56916 ms

Babyface (USB), 48kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 171 samples = 3.5625 ms
I/O vector size: 32, Signal vector size: 32 = 203 samples = 4.229167 ms
I/O vector size: 64, Signal vector size: 64 = 267 samples = 5.5625 ms

I/O vector size: 1024, Signal vector size: 1024 = 2187 samples = 45.5625 ms

Babyface (USB) + ISA 828, 48kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 170 samples = 3.541667 ms
I/O vector size: 32, Signal vector size: 32 = 202 samples = 4.208333 ms
I/O vector size: 64, Signal vector size: 64 = 266 samples = 5.541667 ms

I/O vector size: 1024, Signal vector size: 1024 = 2186 samples = 45.541668 ms

Babyface (USB), 96kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 140 samples = 1.458333 ms
I/O vector size: 32, Signal vector size: 32 = 263 samples = 2.739583 ms
I/O vector size: 64, Signal vector size: 64 = 327 samples = 3.40625 ms

I/O vector size: 1024, Signal vector size: 1024 = 2247 samples = 23.40625 ms

Babyface (USB) + ISA 828, 96kHz, Line (XLR):

I/O vector size: 1024, Signal vector size: 1024 = 2247 samples = 23.40625 ms

Babyface (USB), 192kHz, Line (XLR):

I/O vector size: 16, Signal vector size: 16 = 116 samples = 0.604167 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 32, Signal vector size: 32 = 228 samples = 1.1875 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 64, Signal vector size: 64 = 422 samples = 2.197917 ms OUTPUT HAS POLARITY REVERSED

I/O vector size: 1024, Signal vector size: 1024 = 2342 samples = 12.197917 ms OUTPUT HAS POLARITY REVERSED

Fireface 800 (FW400 -> 800), 44,1KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 219 samples = 4.965986 ms
I/O vector size: 32, Signal vector size: 32 = 251 samples = 5.69161 ms
I/O vector size: 64, Signal vector size: 64 = 315 samples = 7.142857 ms

I/O vector size: 2048, Signal vector size: 2048 = 4283 samples = 97.120178 ms

Fireface 800 (FW400 -> 800) + ISA828, 44,1KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 213 samples = 4.83 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 32, Signal vector size: 32 = 245 samples = 5.555555 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 64, Signal vector size: 64 = 309 samples = 7.006803 ms OUTPUT HAS POLARITY REVERSED

I/O vector size: 2048, Signal vector size: 2048 = 4277 samples = 96.984123 ms OUTPUT HAS POLARITY REVERSED

Fireface 800 (FW400 -> 800), 48KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 219 samples = 4.5626 ms
I/O vector size: 32, Signal vector size: 32 = 251 samples = 5.229167 ms
I/O vector size: 64, Signal vector size: 64 = 315 samples = 6.5625 ms

I/O vector size: 2048, Signal vector size: 2048 = 4284 samples = 89.25 ms

Fireface 800 (FW400 -> 800) + ISA828, 48KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 213 samples = 4.4375 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 32, Signal vector size: 32 = 245 samples = 5.104167 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 64, Signal vector size: 64 = 309 samples = 6.4375 ms OUTPUT HAS POLARITY REVERSED

I/O vector size: 2048, Signal vector size: 2048 = 4277 samples = 89.104164 ms OUTPUT HAS POLARITY REVERSED

Fireface 800 (FW400 -> 800), 96KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 188 samples = 1.958333 ms OUTPUT HAS POLARITY REVERSED
I/O vector size: 32, Signal vector size: 32 = 348 samples = 3.625 ms
I/O vector size: 64, Signal vector size: 64 = 412 samples = 4.291667 ms

I/O vector size: 2048, Signal vector size: 2048 = 4380 samples = 45.625 ms

Fireface 800 (FW400 -> 800) + ISA828, 96KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 189 samples = 1.96875 ms
I/O vector size: 32, Signal vector size: 32 = 347 samples = 3.614583 ms
I/O vector size: 64, Signal vector size: 64 = 411 samples = 4.28125 ms

I/O vector size: 2048, Signal vector size: 2048 = 4379 samples = 45.614582 ms

Fireface 800 (FW400 -> 800), 192KHz, Line:

I/O vector size: 16, Signal vector size: 16 = 166 samples = 0.864583 ms
I/O vector size: 32, Signal vector size: 32 = 326 samples = 1.697917 ms
I/O vector size: 64, Signal vector size: 64 = 598 samples = 3.114583 ms

I/O vector size: 2048, Signal vector size: 2048 = 4566 samples = 23.78125 ms

Fireface 800 (FW400 -> 800) + ISA828, 192KHz, Line:

I/O vector size: 2048, Signal vector size: 2048 = 4566 samples = 23.78125 ms

Pro40, 44.1Khz, Line:

I/O vector size: 16, Signal vector size: 16 = 5.827664 ms
I/O vector size: 32, Signal vector size: 32 = 6.55328 ms
I/O vector size: 64, Signal vector size: 64 = 8.004535 ms

I/O vector size: 2048, Signal vector size: 2048 = 97.981857 ms

Pro40 + ISA828, 44.1Khz, Line:
I/O vector size: 16, Signal vector size: 16 = 6.485261 ms
I/O vector size: 32, Signal vector size: 32 = 7.210885 ms
I/O vector size: 64, Signal vector size: 64 = 8.662131 ms

I/O vector size: 2048, Signal vector size: 2048 = 98.639458 ms

Pro40, 48Khz, Line:
I/O vector size: 16, Signal vector size: 16 = 5.354167 ms
I/O vector size: 32, Signal vector size: 32 = 6.020833 ms
I/O vector size: 64, Signal vector size: 64 = 7.354167 ms

I/O vector size: 2048, Signal vector size: 2048 = 90.020836 ms

Pro40 + ISA828, 48Khz, Line:
I/O vector size: 16, Signal vector size: 16 = 5.958333 ms
I/O vector size: 32, Signal vector size: 32 = 6.625 ms
I/O vector size: 64, Signal vector size: 64 = 7.958333 ms

I/O vector size: 2048, Signal vector size: 2048 = 90.625 ms

Pro40, 96Khz, Line:

I/O vector size: 32, Signal vector size: 32 = 5.541667 ms
I/O vector size: 64, Signal vector size: 64 = 6.208333 ms
I/O vector size: 128, Signal vector size: 128 = 7.541667 ms

I/O vector size: 2048, Signal vector size: 2048 = 47.541668 ms

Pro40 + ISA828, 96Khz, Line:
I/O vector size: 32, Signal vector size: 32 = 5.916667 ms
I/O vector size: 64, Signal vector size: 64 = 6.583333 ms
I/O vector size: 128, Signal vector size: 128 = 7.916667 ms

I/O vector size: 2048, Signal vector size: 2048 = 47.916668 ms

—–

Note that hardware with very slow transient response may decrease accuracy of measurement.

When I connected the adat cable and the ISA was at 192kHz when I turned it on, I got super system failure …

Here are some graphs, x=buffer size, y=latency in milliseconds:

—–

One parameter I didn’t realize until later I should have measured is how different sound cards handle low latency also called LLP (Low Latency Performance), I didn’t record any numbers for this, as I didn’t know how to do it, but I experienced that the RME Babyface handled low buffer sizes quite well compared to the other cards, I used the “test section” in Ableton Live, which simulates a certain cpu-usage and lets you listen to a test tone and hear if there are any drop-outs.

I also experienced that I could increase the sampling rate, thereby lowering the latency which is defined by samples in the buffer size, and still get good performance from the test tone. Any thoughts on sampling rate vs buffer size and how this affects the performance of the system? why choose low sampling rate and low buffer size, instead of high sampling rate and high buffer size, if they result in the same amount of latency? To get an answer to this and the concept of LLP, I had to seek help amongst other nerds at gearslutz.com and I stumbled upon this forum thread where they answered me on my issue on buffer size vs sampling rate:

“TAFKAT”:

Raising the sample rate not only effects the latency value but also increases the overhead in DSP processing , throughput and data file sizes so its a balancing act. For those needing to work at the higher sample rates the usual practice is to raise the buffer sizes accordingly to compensate for the increase in processing overhead. There is no real advantage to increase the sample rate simply to lower the latency value if the higher sample rates are not specifically required for the project IMO.

“tuRnitUpsuM”:

(…) Higher sample rates do not equal lower latency in the bigger picture when you (action <> reaction) have to raise buffer values to compensate for increased DSP data overflow. Data size increases, time value increases to process the change. Whats gained on the front end – is lost on the back end.

My response:

Are these relations linear? From the numbers I ended up with, I got the impression that they’re not. Would there be any reason to measuring the sound cards LLP at different sample rates and see if the performance/sample-ratio is the same?

For example: would a setup running 44.1KHz with a buffer size of 16 samples perform exactly equally to a setup running 88.2KHz with a buffer size of 32 samples on the same computer?

“Timur Born”:

There is a linear correlation and a non-linear one.

Linear: Double the sample-rate at double the audio buffer size equals exactly the same audio buffer latency.

Linear: Double the sample-rate equals exactly double the data/bandwidth the CPU + RAM + HD need to process, which at double the audio buffer size equals about half the number of tracks/fx being usable.

Non-linear: AD/DA conversion is faster (=lower latency) with higher sample-rates. But we are talking about a maximum of 2 ms here, usually less.

Non-linear: Your computer (CPU + RAM + HD) may not be able to deal with the number of “theoretically” possible tracks/fx, so either you may not get double the tracks/fx at half the sample-rate or may not get half the number of tracks/fx at double the sample-rate.

“TAFKAT”:

I don’t believe it will remain linear with the number of Plugins/Polyphony when using a higher sample rate at double the buffer to maintain the same latency, as the system still has to deal with the higher processing associated with the higher sample rate.

It may be interesting to get an empirical value to the exact scaling variable , I have some 96K alternate sessions of DAWbench DSP that I will give a run up when I get some time and see what it tips up.

They also had developed quite an extensive database of the LLP’s of different audio interfaces. They used a benchmark program called DAWbench DSP to measure the LLP, keep in mind that some of these cards are high speed PCI cards (which soon is available to laptops trough thunderbolt expansion bays) and not USB or FW cards:

I made a more graphical presentation of the mentioned LLP rating values, the numbers are all just in linear relevance to the best card, higher is better:

When researching on latency, after this point, it starts to get quite technical and the more I learn, the more I realize I don’t know. I feel that for now, I know enough about the subject to understand the latency and DSP performance of my current setup. I’ve learned that latency is not as relevant as I originally feared, and that there is more to the concept of latency and DSP than I thought.