How Your DAW Does Math

Much of the number crunching that occurs within your audio software remains a mystery; does your DAW treat audio as floating-point numbers or as fixed-point numbers? What''s the difference, and should you really care?
Publish date:
Social count:
Much of the number crunching that occurs within your audio software remains a mystery; does your DAW treat audio as floating-point numbers or as fixed-point numbers? What''s the difference, and should you really care?
Image placeholder title

Desktop musicians have lived with computers and digital audio long enough to know how binary and decimal numbers relate to one another. Had humans developed with two fingers instead of ten, we might well consider binary numbers normal. Yet much of the number crunching that occurs within your audio software remains a mystery. For example, does your DAW treat audio as floating-point numbers or as fixed-point numbers? Do you know what the difference is, and should you really care?

The Fix Is In

Image placeholder title

FIG. 1: The binary number 1101 -represents 8 (1 5 23) + 4 (1 5 22) + 0 (0 5 21) + 1 (1 5 20), or 13, in the decimal system.

Decimal numbers are known as base 10 numbers because the meaning of each column is based on increasing powers of 10. The base, 10, is also known as the radix, and the dot that divides an expression into integers on the left and fractional values on the right is called the radix point (it's called the decimal point in the decimal system). The first digit to the left of the radix point represents the number of ones (the ones column); the second, multiples of 10 (101); the third, multiples of 100 (102); and so on. Each place represents another power of the radix, with the exponent (the number indicating the power) corresponding to the number of places to the left of the ones column.

The same pattern applies to the right of the radix point, but the exponents are negative. The first decimal place (tenths) consists of multiples of 10-1; the second (hundredths), multiples of 10-2; the third (thousandths), multiples of 10-3; and so on. The negative exponent means the same as dividing 1 by that power of 10. For example, 10-3 is the same as 1 ÷ 103 or 1 ÷ 1,000 — each means a thousandth. Yes, the ones place is multiples of 100 — any number raised to the power of 0 is 1.

All of this is known as fixed-point arithmetic because the radix point never moves. The same principles apply in a binary system, but because the radix is 2, each place is an appropriate power of 2. The binary number 1101 represents 8 + 4 + 0 + 1, or 13, in the decimal system (see Fig. 1). Try not to think too hard about binary fractions or your head might explode — the same pattern of negative exponents applies.

Fixed-point arithmetic is pretty straightforward, but there are times when it is inefficient or limiting. In particular, describing very large or very small numbers in fixed-point notation is awkward. For example, as I write this, the U.S. national debt is reported to be $8,551,759,486,884.88. (No, wait — it just went up $3 million!) That's a long string of numbers and is usually therefore expressed as “$8½ trillion.” Although the word trillion is not terribly meaningful to a calculator or a computer, it's a useful description for us because it makes a very large number more manageable visually and mentally.

Float Me a Trillion?

Floating-point numbers are good for expressing very large or very small numbers. For example, in floating-point parlance one would say that the Andromeda galaxy is about 2.1 × 1019 kilometers away. In fixed-point terminology, however, the distance must be expressed using a number that is 20 digits long: 21,000,000,000,000,000,000! To create a floating-point number from a fixed-point number, lop off the significant digits from the left of the expression (the significand, or 2.1 in this case) and specify how many places you needed to shift the radix point (19 in this example). The significand is sometimes called the mantissa, although some prefer to reserve that term for use with logarithms. The rest of the expression then indicates the magnitude of the number — how big (or small) this number really is. It consists of a radix and an exponent (which is effectively a zoom level). You may recognize this as the way your calculator represents very large numbers — through scientific notation, or decimal floating-point notation.

The 32-bit floating-point numbers that most host-based DAWs use allocate 1 bit (called the sign bit) to indicate whether the number is positive or negative; 8 bits (256 values) of exponent; and 23 bits of significand (see Fig. 2). Interestingly, there are still 24 bits of precision in the significand. That is because a “normalized” floating-point number always has exactly one nonzero digit to the left of the radix point, and because with a binary number that can only be a 1, a bit isn't wasted on it. This is known as the hidden bit.

Reality Sets In

There's nothing inherently better about either system. As long as you can keep adding digits to the left or the right of the radix, you can represent any real number in fixed-point format. Problems start when you have a limited number of digits (or bits, in the case of binary) with which to work.

Consider the problem of adding the fixed-point decimal numbers 5,200 and 6,582 if restricted to using only four digits. The answer that one would like to come up with is 11,782, but that uses five digits. This is known as an overflow or, in audio terms, clipping.

Image placeholder title

FIG. 2: Here is the anatomy of a 32-bit floating-point number: 1 sign bit (positive or -negative), 8 exponent bits, and 23 significand bits.

The floating-point system presents similar challenges: (5.2 × 103) + (6.582 × 103) should equal 1.1782 × 104, but if restricted to using only four digits of significand, the final digit (2) is lost. And if restricted to using exponents of no more than 3, there is an overflow. Clipping and loss of resolution are the omnipresent bugaboos of the math that makes our DAWs tick.

Both numbering systems pre-sent unique challenges and require unique solutions for handling overflow, resolution, and other quirks. The sound that a particular DAW has is due in part to the skills of the programmers who wrote the algorithms that decide how signals are summed, scaled, and otherwise manipulated internally. Given sufficient resolution, however, the differences between the two fade into relative obscurity. Modern DAWs often use double-precision arithmetic, meaning that fixed-point systems allow 48 bits to process 24-bit signals and that some floating-point systems allow 64 bits to handle 32-bit signals.

The other big distinction between the two systems is how they handle quantization error. Quantization error is the unavoidable result of the finite resolution of digital samples. When the measured voltage at an A/D converter or the result of a processing algorithm is between two numbers, you must round the result up or down to the system's resolution. Rounding very small fixed-point numbers (representing very quiet signals) results in very large errors relative to the overall signal level: rounding from 4.5 up to 5 is a 10 percent error! The louder the signal, the less significant quantization error becomes: rounding from 99.5 up to 100 is a 0.5 percent error and rounding from 999.5 up to 1,000 is only 0.05 percent.

Things are different, however, with floating-point numbers. Rounding is still more problematic with very small significand values than with large significand values. But each time that the exponent increases or decreases, the pattern starts over, just as the rising pitch of an engine drops and restarts every time you shift up. Quantization error no longer fades into obscurity with rising levels. It can be quite large in a loud floating-point signal compared with an equivalent fixed-point signal, but because the signal is loud, how apparent will the quantization distortion be? Some observers suggest that the absolute level of the quantization distortion is less significant than the fact that its level “pumps” with each change in exponent.


The advantages of either system diminish with increased resolution, when more bits are available to buffer overflow and retain resolution. To put everything in perspective, you will get much more mileage from wise mic selection and placement than obsessing over the differences between fixed- and floating-point systems, and neither type of arithmetic will save a bad song.

Brian Smithers is a musician and educator in central Florida.