What is the format of the data in the AudioBuffer memory buffer, and how do I convert it to something else?

Raymond Chen

The Windows Runtime AudioBuffer class represents a buffer of audio data. What is the format of this data?

The memory buffer you obtain from the AudioBuffer object takes the form of an array of audio samples. Each audio sample is a collection of IEEE single-precision floating point numbers which represent a linear range of waveform amplitude from −1.0 to +1.0.

In C#, this floating point format is known as System.Single. In C++, it is typically represented by float.

Each sample contains one value for each channel, and the channels come in a specified order, described in the documentation for the WAVE­FORMAT­EXTENSIBLE structure.

For example, suppose you have a dual-channel audio buffer, say stereo left/right. The table in the WAVE­FORMAT­EXTENSIBLE documentation says that the channels come in the order left then right. Therefore the values come in this order:

Sample index Channel Value index
0 Left 0
Right 1
1 Left 2
Right 3
2 Left 4
Right 5

Some people call this interleaved format, but whether it’s interleaved depends on what color glasses you’re wearing.

It’s interleaved if you look at it from the point of view of a channel, since the data from one channel is interleaved with data from the other channels.

But it’s perfectly linear format if you look at it from the point of view of the samples, since all the data for one sample is packed together.

Okay, next question: How do you convert this to other formats?

Well, that depends on what other format you’re converting it to. If you’re converting to a linear format,¹ then you can perform a simple linear conversion. We know that the result is going to be f(x) = ax + b for some values of a and b. We just need to figure out what those values are.

Substitute x = −1.0:  vmin = a × (−1) + b
Substitute x = +1.0:  vmax = a × (+1) + b

Solving the system of simultaneous equations gives

a = (vmaxvmin)/2
b = (vmax + vmin)/2

There is an extra wrinkle to this formula: If the destination is an integer range, then the negative values will have one extra value of range compared to the positive values. For example, a 16-bit signed value will range from −32768 to + 32767. If we plug zero into the function, we get just b, which is the average between the high and low values, and which will be numerically −½ rather than zero.

I don’t know how audio people usually solve this problem. One option I’ve seen is to throw out the most negative value, so the effective range for a 16-bit signed value is −32767 to +32767. In that case, the formula simplifies to merely multiplying by vₘₐₓ.

Maybe some audio people can tell me what they do here.

¹ There are nonlinear formats for audio data. For example, μ-law is a companding algorithm which uses a nonlinear formula in order to express a wider dynamic range in a small space, at a cost of resolution at the extremes.


Discussion is closed. Login to edit/delete existing comments.

  • aybe 0

    +10000 Most people want to believe that one value is a sample, but indeed it’s a value.
    When you ask them about how you’d name these values forming say an L/R signal? They anwser ‘frames’ or even better, ‘a pair of samples’ 😂

    For the conversion from float to integer, here’s how it’s done in some frameworks: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html

    But there’s a critical step you forgot to mention when going integer, you have to apply some dithering.
    Once done, in software like Wavelab, you can assert that you have a good master by checking it through the ‘Bit-meter’ window.

  • anonymous 0

    This comment has been deleted.

    • switchdesktopwithfade@hotmail.com 0

      Why doesn’t Media Foundation support OGG? It’s not like there are any licensing requirements. Why doesn’t Media Foundation support every free format up to this point? It’s not like the code is going to change from year to year. FFMPEG shouldn’t be necessary for anything for the most part.

    • 紅樓鍮 0

      In Julia an unsigned fixed-point number type N0fn represents a number from 0 to 1 inclusive, using an underlying n-bit unsigned integer, which makes the resolution 1/(2^n – 1). For example, 0x01 as an N0f8 represents 1/255. This practice of uniform scaling by (2^n – 1) seems standard in the field of digital signal processing.

      • Brian Boorman 0

        Interesting non-standard notation. Old embedded DSP guys use Q notation (eg. Q15) and the newer standard is fx notation (eg. fx1.16 is same as Q15).

  • aybe 0

    Raymond, I have a question for you!

    Why does when Task Manager “Options/Always on top” gets clicked, the application “reboots” itself or seems to ?

    Thank you.

    I’ve uploaded a small GIF that shows what I’m talking about, check it here : https://ibb.co/CndvPVk

    • Jan Ringoš 0

      This is really off topic, but the answer is that the Task Manager doesn’t actually use standard Always on Top. Instead it creates its window in a higher “band”. This band is always above all other Always on Top windows. Which is where you want Task Manager to be. Metro apps in Windows 8 used to be in this band, but those now live in the same band as all other windows. Only Task Manager and Genuine Notification (IIRC) now use higher bands.

    • GL 0

      The program does reboot itself (I don’t know the reason for this design, though), which can be inferred from command line switches. In the current Windows release, the switches /0 ~ /4 means started from Win+X menu / UI in Task Manager / Ctrl+Shift+Esc / Windows Security dialog / Taskbar context menu, respectively. If you start Task Manager and then choose / un-choose Always on Top, the command line switch “changes to” /1.

      • 紅樓鍮 0

        I remember my Task Manager always showed itself started with /7. What does that mean?

Feedback usabilla icon