Too many textbooks provide overly ornate approaches to simple combinatoric problems. Heres a simple technique that is all too often overlooked.
How many different ways can you combine items chosen from an N-element set? If order is unimportant, the number of such combinations is 2N, including all subsets of size 0 through size N. Now what if you needed to know the membership of each combination? This problem is inherently intractible for large N, but I (Larry) was asked by a fellow graduate researcher at the University of Michigan to create a generator of all such combinations [1] . After Id tried a number of possible solutions, colleague Mark Flacy suggested the following simple representation (see Listing 1) , which incidentally should run efficiently on almost any hardware:
Use N bits of a long integer; this can represent exactly 2N distinct values. Associate every bit position with one of the N items. Then every one of the 2N possible values has a unique combination of bits set to 1, and the positions of the set bits are indexes to the N items. Incrementing the integer from 1 to 2N covers all possible combinations.
To find out which bits of the integer are set, we use a bitmask with one bit set, and perform a logical AND between the integer and the bitmask. If the AND returns 1, the bit is set, so we have found an element of the combination; we emit in some way the element associated with that bits position. Then we shift the mask left by 1 bit position.
After we shift N times, we reset the mask and increment the integer. This generates a new combination, so we delimit it in some way from the previous combination, and then examine its bits as before.
Since we are performing 2N operations, we put the integer and mask in registers to minimize memory accesses. We use function pointers for both the delimiter and emitter functions so the implementor may use those functions most appropriate for the data. (Or, of course, an implementor may prefer to substitute appropriate inline code for efficiency.) We pass an unsigned char as the index to the emitter function, as N is not likely to exceed 32 on most machines, this being the largest scalar register size typically available. (The purpose is to conserve memory if the indexes are merely to be stored, since most implementations can store four times as many chars as longs, and twice as many as shorts.) The space requirement is three registers (plus two more for the loop counters) and whatever is required by the called delimiter and emitter functions. We expect the bulk of the run time (by far) to be taken up by the function calls. Tests show an unloaded Sparc to complete for N = 20 in a few seconds and N = 25 in about 5 minutes, calling null delimiter and emitter functions. (If that sounds slow for a set of register operations, do the math; the first figure represents over 20 million items emitted; the second, over 800 million.)
What if n > 8*sizeof(long)? Well, where are you going to put the data? If N = 32, youre going to have over 4 billion combinations, each composed of up to 32 objects, and each object of a non-zero size. If the ability to deal with larger N is required, it shouldnt be difficult to extend this representation to words larger than a long, but (of course) at considerable cost in speed.
A variant function, filtering for combinations of size K, is included on the code disk. o
Footnote
[1] Research funded by General Motors, NSF, and ARPA.
Mark Flacy and Larry Brunelle did graduate work together at the University of Michigan and are now employed as Members of Scientific Staff at Bell Northern Research in Richardson, TX. Mark and Larry may be reached, respectively, at flacy@bnr.ca or 73700.1577@compuserve.com, and larryb@eecs.umich.edu or 76334@compuserve.com.