Peter Heinrich is a video and computer game programmer who has worked on products for Amiga, PC, Sega, 3DO, and MacIntosh. He studied mathematics at the University of Oregon, and now lives in Seattle. Nathan Dwyer lives and works in Seattle. His interests are widespread and include programmer/computer interfaces, and graphics.
Introduction
When developing graphics applications, programmers must frequently make a trade off between precision and speed. Many graphics algorithms must store and calculate fractional values, but if those algorithms use floating-point they may incur such a large speed hit as to become unusable. At the other extreme, integer calculations are fast, but simple integers in C cannot represent fractional values. In cases like this, programmers can use a fixed-point representation to strike a good compromise.
Fixed-point representation trades range and precision for speed, and is based on a simple observation. Binary numbers are sums of powers of two. Each digit position in a binary number represents a unique power of two, that is, 2 raised to an integral exponent such as 20, 21, 22, etc. Adding places to the right adds positions of decreasing exponents: 2-1, 2-2, 2-3, and so on (see Figure 1a) . The method for converting the number to decimal remains the same as with integers: sum the powers of two whose digits have the value 1. Thus, the number 0.11 represents the decimal number 2-1 + 2-2 = 0.5 + 0.25 = 0.75, and 0101.0110 represents the number 22+20+2-2+2-3 = 4 + 1 + 0.25 + 0.125 = 5.375 (see Figure 1b) . The separator between positive and negative exponents is called the binary point (analogous to the decimal point in decimal numbers).
In a fixed-point representation, the number's digits are stored in a Standard C type variable (such as int or long), but the digits do not represent the same quantities as they would for a simple integer. The fixed-point representation includes an implied binary point, which is assigned a fixed position within the variable. Since some bits must represent the fractional part, fewer bits are available for use by the integer part and the range is decreased. For example, a 16-bit number usually has a range of [ -32768, 32767 ]. If we set the binary point at the fourth bit, the range is decreased to [ -4096, 4095 ], but it can now represent numbers to 1/16 precision. I notate this as 12.4: twelve bits of integer, four bits of mantissa. In exponential notation, fixed-point numbers evaluate to N + f * 2-d, where N is the integer part, f is the fractional part, and d is the number of digits assigned to the fractional component.
Speed Comparisons
Fixed-point is, for the most part, a change in how integer numbers are interpreted. Fixed-point operations require a small amount of extra work, which is at most an extra shift per operation. Thus, fixed-point operates at almost exactly the same speed as the integer equivalent. Generally, this speed is several orders of magnitude faster than floating-point, with or without an FPU.
Range/Precision Trade-offs for Graphics
Fixed-point is particularly applicable to computer graphics. Since computer screens have limited resolution, usually representable by eleven or twelve bits, the fixed-point number's decreased range does not represent too much of a handicap.
For example, the 12.4 representation works well for many applications. Recall that 12.4 applies to 16-bit integers, with four bits assigned to the fractional part of the number, giving 0.0625 precision (1/16) and a range of [ -4096, 4095 ]. This range provides enough precision to run a Bresenham line algorithm on a very large screen. For algorithms involving color averaging, it's tempting to use 8.8 representation, since color images commonly contain eight-bit color components. However, this representation runs the risk of overflow. A safer bet is 10.6, which still gives precision to the 1/64. If you need very high precision (as in the trigonometric functions discussed later), you can use 2.14 representation. Here two bits form the integer part to represent both 1 and -1, and numbers can be precise to the 1/16384, or 0.000061. 32-bit integers allow even higher levels of precision while still maintaining a large range.
Basic Fixed-Point Operations
Basic mathematical operations work a little differently with fixed-point numbers than with integers. Consequently, the C programmer must do some bookkeeping to keep track of which numbers are of which type. Also, the program will require conversion routines to convert from fixed-point to integer and vice versa. I describe here the conversion operations, basic math operations, and along the way, demonstrate the bookkeeping required.
Converting
Converting between fixed-point numbers and integers is fairly simple. To convert from integer to fixed-point, shift the integer up by the number of bits in the mantissa (the fractional part). Thus, to convert the integer 00001101 to 4.4 representation, shift it left four places. The resulting fixed-point number is 11010000, and has four bits of precision. To convert from fixed-point to integer, shift the opposite direction.
I set these operations up in C by defining a shift constant (FixedPointShiftAmount) and conversion operators written in terms of that constant. Listing 1 shows the code for these conversion functions (FPToInteger and IntegerToFP), as well as other basic operations.
If you use these conversion functions you should either check for overflow errors or ensure that integers being converted to fixed-point format have a compatible range. If all fixed-point related shifts are done with a single constant it is easy to change precision later on.
Fixed-point addition and subtraction work exactly the same as integer addition and subtraction. To see why, consider the numbers in exponential notation, as in Figure 2a. With four bits in the mantissa, 2.5 + 3.125 = 0010.1000 + 0011.0010 = 0101.1010 = 5.625. (Keep in mind that you can't add fixed-point numbers and integers together like this. Nor can you add two fixed-point numbers with different binary points. To add or subtract two numbers they must both be in the same format.)
Multiplication and division are only slightly more complicated. Examine the exponential form of multiplication in Figure 2b. When two fixed-point numbers are multiplied, the result ends up shifted to the left by the number of bits in the mantissa. The code must allow for these extra bits of precision, and an additional shift must take place after the multiplication to normalize the result.
Division is similar except the operation causes a loss rather than a gain in precision. To compensate, the code shifts one argument up before division.
Note that multiplication and division, while not difficult, require more care than addition and subtraction. The extra shifts involved in multiplication and division must be placed carefully, or significant bits in the result may disappear into oblivion. Inattentive programming can produce total loss of precision. This is particularly true when using 32-bit fixed-point numbers. (See sidebar for more information on avoiding loss of precision.)
One property of fixed-point numbers allows for occasional optimizations: multiplying a fixed-point number and an integer results in a fixed-point number that does not require the additional shift. A programmer who carefully monitors mixing of types can use this to advantage.
Advanced Operations
A basic understanding of the principles discussed above will enable use of fixed-point mathematics in real situations. In many cases, switching to fixed-point is transparent. A 3D system, for example, may model physical bodies with position, velocity, acceleration, etc., all of which may be represented in fixed-point with very little impact on the code.
However, the basic fixed-point operations discussed above don't fully satisfy the needs of graphics applications. For example, a 3D graphics system will probably rely heavily on trigonometric functions, traditionally accessible only via floating-point numbers. Without fixed-point alternatives for, say, cos and sin, modest speed improvements in other areas may disappear in the noise. Programmers concerned with performance will probably want to deal with fixed-point numbers exclusively. In this section I present several fixed-point trigonometric functions. But first I discuss a concept upon which all trigonometric functions depend: fixed-point angles.
Angle Representation
Angles are used extensively in all sorts of calculations, including trajectories, light-source shading, reflection, and rotation. A program that uses the standard library trig functions will declare these angles as type double and will express the angles in radians. As long as the standard library routines aren't used, however, there's no reason not to choose some other type, or different units more suitable to integer/fixed-point arithmetic.
Consider making all angles fit in an unsigned char. An angle variable can then provide 256 divisions in a circle, for a resolution of approximately 1.41° per division. This really isn't much precision, though; it would be nice to be able to represent angles of less than 1°. What if all angles fit in an unsigned short? Then each division would represent approximately 0.00549° because a circle would have 65,536 divisions.
As a side effect, using an integer to hold an angle provides automatic overflow protection. That is, the results of arithmetic operations will always be modulo the maximum value (+1) an integer can represent. For examples, assume angle A is an unsigned short equal to 57,330. Adding the value 32,980 sets A to 24,744, not 90,310 (the result will be modulo 65,536). In terms of angles, this is equivalent to adding 315° to 181° and obtaining 136° (instead of 496°); the result always remains on the unit circle.
Fixed-Point Trig Functions
The trig functions presented here make extensive use of mathematical tables. (Some programmers spurn math tables, preferring algorithmic solutions, but a simple table lookup is frequently the most efficient and elegant way to approach a problem. For example, a sine or cosine math table for angles contained in unsigned chars requires at most 256 entries (even less, as it turns out). Of course, unsigned shorts will require more entries, but the table need not grow to an unreasonable length.
Assume the lowest four bits of an angle are insignificant when used as an argument to a trig function. The resulting table needs only 4,096 entries, and if the full-scale range of the angle argument is ± 180°, the trig function can still resolve angles as small as 0.0879°. I've made no reference to the size of the entries in the table, because it is completely arbitrary. Each entry may hold a 2.6, 12.4, or even 4.28 fixed-point value. The size of each value and the position of its binary point depends on the range and precision required. For example, a custom fixed-point version of cos might take a 16-bit argument (whose lower four bits weren't significant) and return a 2.14 fixed-point result. A fixed-point atan2 routine might take two 16.16 values and return an 8-bit result. The needs of the program dictate how the tables shape up.
Listing 2 shows fixed-point substitutes for sin and cos. This listing demonstrates how a single table may be used for both routines. It also shows how the size of the table may be reduced by taking advantage of the functions' symmetry. [For sanity's sake we've abbreviated this table in the printed listing; the full table and source are available on this month's code disk. mb] Listing 3 shows the code for generating the table used in Listing 2.
Fixed-Point in Parametric Equations
One graphics-related task that benefits from fixed-point math is the computation of parametric equations. Parametric formulas are generally functions of a single variable, normally t, that evaluate to a point in space for each value of t in the range [ 0, 1 ]. I describe two optimizations here: using fixed-point to represent t, and computing the equation iteratively.
Given two points X1 and X2, the line between them is given by the parametric equation in Figure 3a. This equation can be broken up into the system of equations in Figure 3b.
Listing 4 shows floating-point and fixed-point implementations that compute the equations of Figure 3b. In the fixed-point implementation t is a fixed-point number, while everything else is an integer. Recall that fixed-point/integer multiplication requires no compensating shift operation at the end; the code takes advantage of this extra efficiency to avoid converting all the coordinates into fixed-point. The fixed point implementation runs quite a bit faster without losing the clarity of the code.
To further optimize the algorithm, the program could compute each point on the line iteratively that is, it would compute each point from the previous point. Given that t changes by a discrete amount (dt) each iteration, X(t) can be computed as shown in Figure 3c.
To compute each value of X(t), just add ( X2 - X1 )dt to the previous value of X(t). ( X2 - X1 )dt is a constant that can be computed before the loop is entered; each iteration of the loop involves merely a few additions. Listing 5 contains a typical implementation. In this case, I've kept all the coordinates as fixed-point to maintain precision.
A Test Program
Listing 6 shows a test program, which draws a sine curve, a cosine curve, and several diagonal lines, to exercise the fixed-point functions presented in this article. The program calls functions from the Borland Graphics Interface (BGI), such as initgraph (initialize graphics) and putpixel; if you don't have a Borland compiler you will need to substitute equivalent functions supplied with your compiler. The code disk also contains an EXE file for users who just want to run the test program.
Summary
Fixed-point speeds up arithmetic requiring fractional precision without much loss of code clarity. Graphics algorithms can especially benefit from fixed-point number representations because graphics applications usually operate over a limited range of the numbers. Using fixed-point requires the C programmer to make a conceptual shift in thinking about how values are stored. Once this conceptual shift has been made, fixed-point becomes a simple matter of bookkeeping.
Precision/Data Loss
Using fixed-point arithmetic can result in data loss due to overflow. Checking for overflow can seriously reduce the speed increase achieved through the use of fixed-point.
Addition and subtraction are generally safe. Assume a fixed-point number of N bits (integer and fractional part). If the largest possible number is added to itself, the computation is represented by the following equation:
( 2N - 1 )+(2N - 1 )= 2*2N - 2 = 2N+1 - 22N+1 - 2 can be represented in N + 1 bits. If the range of values to be represented can fit in seven or fifteen bits, then addition will never overflow the data type being used. Otherwise, as in the angle representation example discussed in the body of the article, the range can be scaled to fit into 16 bits, so that the nature of the values represented allows overflow to be ignored.Multiplication requires some care. The algorithm used for multiplication computes an intermediate value which can be equal to the largest representable number squared. This multiplication is represented by this equation:
(2N - 1)(2N - 1 )= 22N- 2*2N + 1 = 22N-2N + 1 + 1This number requires 2N bits to be represented. If the fixed-point value is close to 16 bits, the intermediate value will require 32. An 8-bit fixed-point multiplication will require a 16-bit intermediate value. If the fixed-point number is represented with more than 16 bits, a special integer multiplication routine may be needed to compute the >32-bit result.Division has the same caveats as multiplication, since the first value is left-shifted N bits before the division. The same rules about size apply.
The practical application of fixed-point arithmetic requires an understanding of the usage model of the fixed-point numbers. The best technique is to find a range of values that can be represented by an 8- or 16-bit data type and ensure that values stay within that range. Fixed-point numbers can then simply be cast to the next larger data type at the start of the multiplication and division routines.
Figure 1 Fixed-point representation with implied binary points. a) shows general from where b0 - b7 can each be 0 or 1, and weight applied to each digit is shown on top. b) shows how to evaluate a specific number
Figure 2 Exponential form addition and multiplication
A fixed-point number can be represented as: N + f*2-d integer (N) + fraction (f*2-d) This is represented in an integer by left-shifting by d bits, which is the same as multiplying by 2d: 2d*(N + f*2-d) =N*2d + fa) Addition
(N*2d + f) + (M*2d + g) = (N + M)*2d + f + gb) Multiplication
(N*2d + f)*(M*2d + g) =2d(N + f*2-d)*2d(M + g*2-d) =2d(A)*2d(B) =22d(A*B) =2d[2d(A*B)]Figure 3 Parametric line formulas
a) Vector form
X(t) = X1 + (X2 - X1)tb) Scalar form
x(t) = x1 + (x2 - x1)t y(t) = y1 + (y2 - y1)tc) Iterative form
X(t) = X1 + (X2 - X1)(t) X(t + dt) = X1 + (X2 -X1)(t + dt) = X1 + (X2 - X1)t + (X2 - X1)dt = X(t) + (X2 - X1)dt, X(0) = X1
Listing 1 Conversion functions
Shift amount for fixed-point/integer conversions. */ #define FixedPointShiftAmount 8 #define FPToInteger( FP ) ( FP >> FixedPointShiftAmount ) #define IntegerToFP( Integer ) ( Integer << FixedPointShiftAmount ) short FPMulFP( short N, short M ) { return (short) ( ( (long) N * (long) M ) >> FixedPointShiftAmount ); } short FPDivFP( short N, short M ) { return (short) ( ( (long) N << FixedPointShiftAmount ) / M ); } /* End of File */
Listing 2 Fixed-point trig functions
#include <stdio.h> #include <stdlib.h> #define DEGREES_90 0x0400 #define DEGREES_180 0x0800 typedef short Trig; /* describes a 2.14 fixed-point number */ typedef short Angle; /* describes a 12.4 fixed-point number */ static Trig cosineTable[ DEGREES_90 + 1 ] = /* This example uses a circle divided uniformly into 4096 "degrees". The cosine values for the first quadrant are stored here. Only the first quadrant is necessary because the lookup routine is able to take advantage of cosine's symmetry. This table was generated by another small program using floating point. All it needed to do was compute the cosine every (1/4096)(2*pi) radians and convert it into a 2.14 fixed-point number. The values stored here are truncated, but some applications may benefit from rounding. instead. Note that this table contains one more entry than there are divisions in a quadrant. This is to ensure that all angles in the range [0,90] are accounted for. */ { 0x4000, 0x3fff, 0x3fff, 0x3fff, 0x3fff. 0x3fff, 0x3fff, 0x3fff, 0x3ffe. 0x3ffe, 0x3ffe, 0x3ffd, 0x3ffd, 0x3ffc, 0x3ffc, 0x3ffb, ... [full table available on code disk -mb] 0x0192, 0x0178, 0x015f, 0x0146, 0x012d. 0x0114, 0x00fb. 0x00e2, 0x00c9, 0x00af, 0x0096, 0x007d1, 0x0064, 0x004b, 0x00321, 0x00191, 0x000 }; Trig cosine( Angle angle ) /* This routine returns the cosine of angle, returning a fixed-point value in the range [-1, 1]. Since this is performed with a simple table lookup, the Trig type returned may have any format desired. This example treats it as a 2.14 fixed-point number. The Angle type describes a 12.4 fixed-point number. This divides the unit circle into 4096 integral "degrees". Since every angle may be represented in 16 bits, overflow is impossible: the implicit modulus of limited-precision integer math guarantees this. */ { Trig value: /* Convert angle from 12.4 fixed-point to integer. */ angle >>= 4; /* Since cos(x) == cos(-x), combine both cases. */ if( 0 > angle ) angle = -angle: /* Check which quadrant contains angle. This isn't strictly necessary, but by doing so this routine is able to exploit the symmetry of the cosine function, reducing the size of the lookup table by three quarters. */ if( DEGREES_90 < angle ) { /* If not in first quadrant, adjust lookup index by noting that cos(x) = -cos(180 - x). */ angle = DEGREES_180 - angle; value = -cosineTable[ angle ]; } else value = cosineTable[ angle ]; return( value ); } Trig sine( Angle angle ) /* This routine returns the sine of angle. Like the routine above, it returns a Trig value, which represents a 2.14 fixed-point number. */ { /* Make use of the fact that sin(x) = cos(x - 90). Note that the constant must be converted to 12.4 fixed-point before it may be subtracted. */ angle -= (DEGREES_90 << 4); return( cosine( angle ) ); } /* End of File */
Listing 3 Cosine Table generation
#include <math.h> #include <stdio.h> #include <stdlib.h> /* This is the routine used to generate the table used in the fixed-point sine and cosine functions. */ #define PI 3.14159265358979323846 #define DEGREES_IN_CIRCLE (1L << 12) #define TABLE_UNITY (1L << 14) int main( void ) { double degree, step; unsigned short value; int column; step = (2.0 * PI) / DEGREES_IN_CIRCLE; for( degree = 0.0, column = 0; degree < PI / 2.0; degree += step ) { value = (unsigned short)(TABLE_UNITY * cos( degree )); printf( "0x%04hx,%c", value, ++column % 8 ? ' ' : '\n' ); } return( 0 ); } /* End of File */
Listing 4 Floating-point and fixed-point line routines
/* These functions rely on the Borland graphics library */ #include <graphics.h> #define FixedPointShiftAmount 8 #define FPToInteger( FP ) ( FP >> FixedPointShiftAmount ) #define IntegerToFP( Integer ) ( Integer << FixedPointShiftAmount) typedef long FixedPnt; void FloatingPointLine( float x1, float y1, float x2, float y2 ) { float t, x, y; for( t = 0.0; t < 1.0; t += 0.05 ) { x = x1 + ( x2 - x1 ) * t; y = y1 + ( y2 - y1 ) * t; putpixel( x, y, 12 ); } } void FixedPointLine( short x1, short y1, short x2, short y2 ) { // This dt gives 128 steps FixedPnt t, dt = 1 << ( FixedPointShiftAmount - 6 ); short x, y; for( t = 0; t < IntegerToFP( 1 ); t += dt ) { x = x1 + FPToInteger( (( x2 - x1 ) * t )); y = y1 + FPToInteger( (( y2 - y1 ) * t )); putpixel( x, y, 13 ); } } /* End of File */
Listing 5 Iterative, fixed-point line routine
/* This function relies on the Borland graphics library */ #include <graphics.h> #define FixedPointShiftAmount 8 #define FPToInteger( FP ) ( FP >> FixedPointShiftAmount ) #define IntegerToFP( Integer ) ( Integer << FixedPointShiftAmount ) typedef long FixedPnt; void IterativeLine( short x1, short y1, short x2, short y2 ) { /* This dt gives 128 steps */ FixedPnt t, dt = 1 << ( FixedPointShiftAmount - 6 ); FixedPnt x, y, dx, dy; /* Compute FixedPnt deltas */ dx = ( x2 - x1 ) *dt; dy = (y2 - y1 ) * dt; /* Initial values */ x = IntegerToFP((long) x1 ); y = IntegerToFP((long) y1 ); putpixel( x1, y1, 15 ); for( t = 0; t < IntegerToFP( 1 ); t += dt ) { X += dx; y += dy; putpixel( FPToInteger( x ), FPToInteger( y ), 14 ); } } /* End of File */
Listing 6 Test program
/* This test driver is meant to be linked with listings 2, 4, and 5, and uses the Borland graphics library. Egavga.bgi needs to be in the same directory as the executable */ #include <conio.h> #include <stdio.h> #include <stdlib.h> #include <graphics.h> #define SIN_COLOR 12 #define COS_COLOR 12 typedef short Trig; /* describes a 2.14 fixed-point number */ typedef short Angle; /* describes a 12.4 fixed-point number */ extern Trig cosine( Angle angle ); extern Trig sine( Angle angle ); extern void FloatingPointLine( float x1, float y1, float x2, float y2 ); extern void FixedPointLine( short x1, short y1, short x2, short y2 ); extern void IterativeLine( short x1, short y1, short x2, short y2 ); static void DrawAxis( void ); static void Label( const char *pLabel ); void main( void ) { int driver = VGA, mode = VGAHI, errorcode; Angle angle; Trig sin, cos; short screenX, screenY; short centerX, centerY; /* Init graphics */ initgraph( &driver, &mode, "" ); if( ( errorcode = graphresult() ) != grOk ) { printf( "Error: %s\n", grapherrormsg( errorcode ) ); exit( 10 ); } centerX = getmaxx() / 2; centerY = getmaxy() / 2; /* Draw Sin function */ clearviewport(); DrawAxis(); Label( "Sin" ); for( angle=0x8000; angle<0x7fff; angle++ ) { /* Get sine value, scale, and convert to integer */ sin = sine( angle ); /* scale to screen */ screenX = angle >> 7; screenY = sin >> 7; /* plot */ putpixel( centerX + screenX, centerY + screenY, SIN_COLOR ); } getch(); /* Draw Cos function */ clearviewport(); DrawAxis(); Label( "Cos" ); for( angle=0x8000; angle<0x7fff; angle++ ) { /* Get sine value, scale, and convert to integer */ sin = cosine( angle ); /* scale to screen */ screenX = angle >> 7; screenY = sin >> 7; /* plot */ putpixel( centerX + screenX, centerY + screenY, SIN_COLOR ); } getch(); /* Draw Lines */ clearviewport(); DrawAxis(); Label( "Lines" ); FloatingPointLine( centerX, centerY - 50, centerX + 50, centerY ); FloatingPointLine( centerX, centerY + 50, centerX + 50, centerY ); FloatingPointLine( centerX - 50, centerY, centerX, centerY - 50 ); FloatingPointLine( centerX - 50, centerY, centerX, centerY + 50 ); FixedPointLine( centerX, centerY - 100, centerX + 100, centerY ); FixedPointLine( centerX, centerY + 100, centerX + 100, centerY ); FixedPointLine( centerX - 100, centerY, centerX, centerY - 100 ); FixedPointLine( centerX - 100, centerY, centerX, centerY + 100 ); IterativeLine( centerX, centerY - 150, centerX + 150, centerY ); IterativeLine( centerX, centerY + 150, centerX + 150, centerY ); IterativeLine( centerX - 150, centerY, centerX, centerY - 150 ); IterativeLine( centerX - 150, centerY, centerX, centerY + 150 ); getch(); /* DeInit */ closegraph(); } void DrawAxis( void ) { int maxx, maxy; /* get screen extents */ maxx = getmaxx(); maxy = getmaxy(); /* Draw Axis */ line( maxx / 2, 0, maxx / 2, maxy ); line( 0, maxy / 2, maxx, maxy / 2 ); } void Label( const char *pLabel ) { outtextxy( 5, 5, pLabel ); } /* End of File */