By Brian Smithers | April 1, 2007

Desktop musicians have lived with computers and digital audio long enough to know how binary and decimal numbers relate to one another. Had humans developed with two fingers instead of ten, we might well consider binary numbers normal. Yet much of the number crunching that occurs within your audio software remains a mystery. For example, does your DAW treat audio as floating-point numbers or as fixed-point numbers? Do you know what the difference is, and should you really care?

Decimal numbers are known as base 10 numbers because the meaning of each column is based on increasing powers of 10. The base, 10, is also known as the *radix*, and the dot that divides an expression into integers on the left and fractional values on the right is called the *radix point* (it's called the *decimal point* in the decimal system). The first digit to the left of the radix point represents the number of ones (the ones column); the second, multiples of 10 (10^{1}^{2}*exponent* (the number indicating the power) corresponding to the number of places to the left of the ones column.

The same pattern applies to the right of the radix point, but the exponents are negative. The first decimal place (tenths) consists of multiples of 10^{-1}^{-2}^{-3}^{-3}^{3}^{0}

All of this is known as *fixed-point* arithmetic because the radix point never moves. The same principles apply in a binary system, but because the radix is 2, each place is an appropriate power of 2. The binary number 1101 represents 8 + 4 + 0 + 1, or 13, in the decimal system (see **Fig. 1**). Try not to think too hard about binary fractions or your head might explode — the same pattern of negative exponents applies.

Fixed-point arithmetic is pretty straightforward, but there are times when it is inefficient or limiting. In particular, describing very large or very small numbers in fixed-point notation is awkward. For example, as I write this, the U.S. national debt is reported to be $8,551,759,486,884.88. (No, wait — it just went up $3 million!) That's a long string of numbers and is usually therefore expressed as “$8½ trillion.” Although the word *trillion* is not terribly meaningful to a calculator or a computer, it's a useful description for us because it makes a very large number more manageable visually and mentally.

Floating-point numbers are good for expressing very large or very small numbers. For example, in floating-point parlance one would say that the Andromeda galaxy is about 2.1 × 10^{19}*significand*, or 2.1 in this case) and specify how many places you needed to shift the radix point (19 in this example). The significand is sometimes called the *mantissa*, although some prefer to reserve that term for use with logarithms. The rest of the expression then indicates the magnitude of the number — how big (or small) this number really is. It consists of a radix and an exponent (which is effectively a zoom level). You may recognize this as the way your calculator represents very large numbers — through *scientific notation*, or decimal floating-point notation.

The 32-bit floating-point numbers that most host-based DAWs use allocate 1 bit (called the *sign bit*) to indicate whether the number is positive or negative; 8 bits (256 values) of exponent; and 23 bits of significand (see **Fig. 2**). Interestingly, there are still 24 bits of *precision* in the significand. That is because a “normalized” floating-point number always has exactly one nonzero digit to the left of the radix point, and because with a binary number that can only be a 1, a bit isn't wasted on it. This is known as the *hidden bit*.

There's nothing inherently better about either system. As long as you can keep adding digits to the left or the right of the radix, you can represent any real number in fixed-point format. Problems start when you have a limited number of digits (or bits, in the case of binary) with which to work.

Consider the problem of adding the fixed-point decimal numbers 5,200 and 6,582 if restricted to using only four digits. The answer that one would like to come up with is 11,782, but that uses five digits. This is known as an overflow or, in audio terms, *clipping*.

The floating-point system presents similar challenges: (5.2 × 10^{3}^{3}^{4}

Both numbering systems pre-sent unique challenges and require unique solutions for handling overflow, resolution, and other quirks. The sound that a particular DAW has is due in part to the skills of the programmers who wrote the algorithms that decide how signals are summed, scaled, and otherwise manipulated internally. Given sufficient resolution, however, the differences between the two fade into relative obscurity. Modern DAWs often use *double-precision* arithmetic, meaning that fixed-point systems allow 48 bits to process 24-bit signals and that some floating-point systems allow 64 bits to handle 32-bit signals.

The other big distinction between the two systems is how they handle quantization error. Quantization error is the unavoidable result of the finite resolution of digital samples. When the measured voltage at an A/D converter or the result of a processing algorithm is between two numbers, you must round the result up or down to the system's resolution. Rounding very small fixed-point numbers (representing very quiet signals) results in very large errors relative to the overall signal level: rounding from 4.5 up to 5 is a 10 percent error! The louder the signal, the less significant quantization error becomes: rounding from 99.5 up to 100 is a 0.5 percent error and rounding from 999.5 up to 1,000 is only 0.05 percent.

Things are different, however, with floating-point numbers. Rounding is still more problematic with very small significand values than with large significand values. But each time that the exponent increases or decreases, the pattern starts over, just as the rising pitch of an engine drops and restarts every time you shift up. Quantization error no longer fades into obscurity with rising levels. It can be quite large in a loud floating-point signal compared with an equivalent fixed-point signal, but because the signal is loud, how apparent will the quantization distortion be? Some observers suggest that the absolute level of the quantization distortion is less significant than the fact that its level “pumps” with each change in exponent.

The advantages of either system diminish with increased resolution, when more bits are available to buffer overflow and retain resolution. To put everything in perspective, you will get much more mileage from wise mic selection and placement than obsessing over the differences between fixed- and floating-point systems, and neither type of arithmetic will save a bad song.

*Brian Smithers is a musician and educator in central Florida.*

Because of a recent global security issue, the Disqus website recommends that all users change their Disqus passwords. Here's a URL about the issue: http://engineering.disqus.com/2014/04/10/heartbleed.html

related articles

Connect with EM

the em poll

most popular

No Articles Found

Discover Emusician