The Learning C/C++urve

Columns

The Learning C/C++urve: All This and C++ Too!

Bobby Schmidt

Bobby recovers from the trade show blues, and examines a few proposed changes to Standard C.

Copyright © 1998 Robert H. Schmidt

A real grab-bag this time: editorials on SD and C9X, a new (to me) development environment, more on abstracted C++ pointers, and the ubiquitous Diligent Readers.

SaDistic bEAST

From what I've been told, attendance at SD East (29 September - 3 October) was around 700, down at least 300 from the same show six months earlier in San Francisco. While nobody seems to know for certain the reasons for this drop, my guesses include:

the needs of the local population. I suspect that much of the work done within the Washington DC beltway differs from that done within the SF Bay area. A talk abstract appealing to a West attendee may leave an East attendee cold.

Miller Freeman scheduling the East coast SD the same week as Rosh Hashana (Jewish New Year), preventing much of the regional population from attending.

Miller Freeman also scheduling the East coast SD and the West coast Embedded Systems Conference for the same week. The ESC show attendance was way up, about 20% over expectations. Frequent flyer Dan Saks actually attended both shows, but I bet he's about the only one who did.

Microsoft scheduling a programmers' conference the week before SD.

Alan Greenspan's fears of inflationary attendance figures.

El Niño.

The fallout from this, of course, was lowered attendance at many individual talks. And while other speakers had small numbers, none could match my nadir of zero. You read right — nobody came to my namespaces talk, which I'd given at previous conferences to enthusiastic response. Go figure.
We're going to try this again at SD '98 West, the second week of February in San Francisco. Attendance considerations aside, I still prefer that show: the flight is much shorter from Seattle, I can easily piggyback a trip to my family in Arizona, and Annie and I enjoy the California coast. In contrast, I won't be at the DC show, which is scheduled for August this time around. I don't know about you, but I find the choice of August in DC or August in Seattle a no-brainer.

MachTen CodeBuilder

On an Internet mail list, I caught wind of a virtual Unix development system running atop the Mac OS. Called MachTen CodeBuilder, this software is developed and sold by Tenon Intersystems. Once I visited the company's web site, http://www.tenon.com, I became quite intrigued by the technical specifications, wrote to the sales people, and managed to snare a complimentary software subscription (one of the perks of membership in the fourth estate).
In Unix acumen, I fall between rank novice and guru. Most of whatever knowledge I have was quite dormant before I installed CodeBuilder. Nonetheless, the installation went without a hitch, and I was able to configure the system to my taste. Granted, I've not yet tried anything fancier than a couple "Hello world" projects; but still, given the complexity of Unix and the number of places things could go boom, I was mildly impressed.
CodeBuilder solves two long-standing problems. First, I wanted a command-line interpreter on the Mac; while I have come to appreciate the Mac's interface, I have also missed the ability to manipulate files or text wholesale. Second, I wanted to test my published code and ideas against a larger range of translators. Since CodeBuilder includes GNU compilers for C and C++, I reckoned many of you (especially in academia) would find my CodeBuilder experiences relevant.
The package also features translators for Ada, Fortran, Java, and Objective-C, allowing me the chance to play with language comparisons. It also supports an X-Window desktop, including a NeXT look-alike. With this amalgam of Mac OS and Unix, I'm guessing the experience is somewhat like that of Apple's Rhapsody.
Starting next month, I will run all my published code samples through the GNU compilers, and report on any discovered quirks or deviations from the ever-changing ISO Standards.

WG14 Follies

As a member of the ISO committee standardizing C9X (Version 2.0 of the C Language), I receive a torrent of messages from that committee's email reflector. Among the many other benefits, the frequent arrival of such messages ensures my modem stays connected while I surf the Web. By sharing this bounty with you, I provide the analogous benefit of ensuring your brain cells stay connected while you read this magazine.
Around the time you see these words, the first publicly available Working Draft of the C Standard should be available [1]. Anticipating this, I want to describe some new proposed types being added to C, and comment on the design issues they raise. (Since I'm writing this in October 1997, there's always a chance that the following will be dropped or modified in the Final Working Draft. I think it unlikely, however, since these proposals have been on the books for a long time.)
First off, 32-bit long and unsigned long are no longer the high cards in the integer deck. They've been trumped by the 64-bit long long and unsigned long long. Think of long long as the new Super Biggie fries to long's plain Biggies; what used to seem huge is now just ordinary.
As you may imagine, no addition of an integer type comes cheaply:

To differentiate long constant literals from long long, tack on an extra L, as in 36LL. For unsigned long long, you now have such graceful examples as 11ULL, or its hex equivalent, 0xBULL.

<limits.h> sports new values. The largest is ULLONG_MAX, with a mind-numbing minimum of 18,446,744,073,709,551,615. This is close enough to infinity for most everyday use, unless you are counting atoms in the galaxy or fans of this column [2].

You can no longer assume that unsigned long holds the largest integer value. This renders familiar paradigms like
	/* works regardless of size_t representation */
	printf("size is %lu", (unsigned long) sizeof(n));
unreliable.

The grammar is complicated by the allowed repetition of the keyword long. While my reading may be in error, I can't find anything specific in the grammar forbidding the unusual
	  long unsigned long int n;
Integer Meta-types

By the time 21st century versions of C and C++ roll around, we'll have to fix the above syntax — otherwise we face 256-bit long long long longs.The proposed C9X solution is a set of typedefs and macros living in the new standard header <inttypes.h>. These new definitions allow you to specify for a given implementation

integers of exactly certain sizes

integers of at least certain sizes

the "fastest" or most efficient integers of certain sizes

the fastest integer of any size

the largest integer

an integer big enough to hold a pointer

Examples:
int16_t i;       // a signed int of exactly 16 bits
int_least16_t j; // a signed int of at least 16 bits
int_fast16_t k;  // an efficient 16-bit signed int
intfast_t m;     // the most efficient signed int
intmax_t n;      // the largest signed int
intptr_t p;      // big enough to hold a (void *)
Each typedef has collateral macros specifying various limits, such as
INT16_MAX       // largest int16_t value
INT_LEAST16_MAX // largest int_least16_t value
INT_FAST16_MAX  // largest int_fast16_t value
INTFAST_MAX     // largest intfast_t value
and so on. By testing for the presence of these macros, you can know if an implementation supports a particular sized integer:
#if defined INTPTR_MAX
  printf("implementation has intptr_t");
#else
  printf("implementation has no intptr_t");
#endif
<inttypes.h> also defines collateral macros abstracting scanf and printf format specifiers:
int16_t n;
// reads a hex int16_t
scanf(SCNx16, &n);
// writes a decimal int16_t
printf(PRId16, n);
constant definitions:
// n set to 64-bit 123
int64_t n = INT64_C(123);
and limits of types defined in other headers:
// largest possible size_t value
size_t s = SIZE_MAX;
as well as a handful of integer/string conversion functions.

The combined effect is a thin abstraction around the normal char/short/int/long pantheon. But even this is really a systematic hack, reflecting the grammatical limitations of C. For C++, I envision the language eventually adopting template-like syntax for such attributed integer types, allowing declarations like
int<16> i;      // 16-bit signed integer
unsigned<64> u; // 64-bit unsigned integer
With clever use of templates, I bet you could get close to simulating such a scheme today — an exercise well within the spirit of my ongoing column series, but one I definitely leave for the student.
<inttypes.h> has fostered much discussion on the reflector. The glut of global typedefs and especially macros does not sit well with many committee members. It certainly doesn't jibe with the C++ committee's efforts to move standard names out of the global namespace and to reduce dependency on macros. For this reason if nothing else, I have trouble believing the C++ group will ever adopt <inttypes.h> in its current form.
Because other new Standard C library functions may reference these sized types, it follows that a certain subset of these types must be available on all conforming implementations. But that may impose a burden on an implementation that, for example, really doesn't have a fast 32-bit unsigned int, but must support a library function using uint_fast32_t anyway.
Then there's the meaning of terms like "fixed-size integral types." For example, is an int16_t physically 16 bits, no more, no less? Or can the type contain values only in the range -32,768 to +32,767, with the possibility of "extra" unused bits in the actual hardware? Since -32,768 is the lower limit for 2's complement architectures, does an int16_t on a 1's complement machine have a different specified exact range? And what if an implementation maps several bit patterns (representations) into the same mathematical value — does the 16-bit limit refer to the number of representations, or to the number of values?
I'm guessing you have a general feeling of what these types mean. But for the legislature creating the C Constitution, general feelings are not enough. The committee moves asymptotically toward an impossible goal: identify exactly the hierarchy of needs in the C community, define an unambiguous technical vocabulary, precisely express needs and solutions in that vocabulary, and have those solutions interpreted identically by all educated parties. To help nudge the Standard closer to that goal, I urge you to read the Draft once it's publicly available, understand the parts of interest or importance to you, and let the committee members know your thoughts and feelings.

complex is Simple

Compared to the <inttypes.h> definitions, the new complex type brings fewer complications, in part because it's not competing with prior C art. Unlike the typedefs and macros in <inttypes.h>, complex is a new keyword. However, in opposition to "normal" type-name keywords like int, complex is a keyword exactly in translation units including the new standard header <complex.h>; otherwise, it's available for your use as an identifier.
This probably muddies your mental model, introducing a so-called keyword that is not innately known to the translator. Instead, the keyword nature must be enforced by the programmer. Such a solution avoids breaking legacy code with home-brew complex types. Were complex a real keyword, you'd be forced to change any of your identifiers named complex, even if you never intended to use the language's built-in complex type.
I certainly don't like the idea of complex being a keyword in one translation unit but not another, and actually prefer it not be considered a keyword at all. We have precedence for introducing standard names that aren't faux keywords (e.g., size_t). Perhaps a different designation, like "reserved identifier" or "predefined identifier" would better capture the spirit. Also, calling C's complex a keyword creates conflict with C++'s complex, which the C++ Standard does not consider a keyword.
Type complex works as you'd expect. Think of it as you think of FILE, a structure with supporting macros and functions, all defined in one header. The only potential surprise comes in the syntax. Where C++'s complex is a template:
complex<float> c; // float real and float imag parts
C's complex features a non-template syntax:
float complex c; // float real and float imag parts
As with the keyword vs. non-keyword debate earlier, this syntactic difference represents an incompatibility between C and C++. I'm led to an inescapable realization: once this new C Standard hits the streets, we can no longer think of C as a subset of C++. The respective committees have created a fork in the language design, which I expect only to widen with Version 2 of Standard C++.

bool

Like complex, the proposed bool is a keyword in a translation unit exactly when its header <stdbool.h> is included. This header defines the usual suspects bool, false, and true. The same philosophical fork from C++ exists here as with complex, although in a different direction: unlike complex, bool is considered by the C++ Standard to be a real keyword. (At least the syntax is the same in both languages.)
Once you have a standard bool, you naturally want to start defining standard library routines in terms of bool. And unlike complex, which has its standard library routines ensconced in the same header defining the type, bool's standard library clients can more reasonably appear in any header.
But since bool is not a real keyword, this creates a dilemma: does a standard header referencing bool internally include <stdbool.h>, or does it require the user to include <stdbool.h> explicitly first, even though the user code may never reference bool directly?
Well, reading between the Standard's lines, I deduce standard headers don't include one another. For commonly used standard names (like size_t) referenced in multiple headers, the solution is to duplicate the needed definition in each header. Presumably then, any standard header referencing bool could simply define it. But this creates the prospect of a translation unit defining bool without having explicitly included <stdbool.h>. In such a translation unit, is bool still a keyword?
I've long believed C held a certain design elegance, especially compared to C++. But issues like the above start wadding Standard C into the familiar C++-esque knot of conflicting goals, ambiguous meaning, and the dreaded Unintended Consequences. I'd rather Standard C either make bool and complex real keywords, break legacy code, and be done with it; or acknowledge the weight of prior art and leave them alone. It's too late to change the language into what some wish it had been all along, while simultaneously making that change invisible to those content with the present system [3].

Our Underwriters

Earlier I mentioned the folks at Tenon Intersystems providing me a complimentary subscription to CodeBuilder. The fact is, all the Mac OS development systems I use in this column come from such subscriptions. This specifically includes (along with CodeBuilder) Metrowerks CodeWarrior Professional and Symantec C++.
While Symantec appear to have given up on the Mac C++ market, Metrowerks continue to send updates every few months. Unfortunately, they are lagging pretty far behind the C++ Standard in some key areas (e.g., no support for namespaces, no partial template specialization, outdated STL definitions). I have not yet banged on CodeBuilder's GNU implementation enough to pass judgment, but will in the coming months.
On a related note, publishers Addison Wesley have kindly sent me copies of pretty much any technical book I've requested. My point person at AW, publicist Chris Guzikowski, has patiently and consistently fulfilled my requests, even though I forever put off reviewing books I promise him I'll get to.
None of these companies is run by altruists; all believe it in their self-interest to send me material. Even so, I feel right giving them all a long overdue public thanks.

Red October

In my October column, I cited Diligent Reader M. Seitz, first name unknown. I have since learned his full name is Matthew Seitz, and promised him I'd print a correction.
In response to that same column, Hugh S. Myers recommends the book Human Factors and Typography for More Readable Programs, written by Ronald M. Baecker and Aaron Marcus, and published by Addison Wesley in 1990. Hugh says this is the definitive book for those who must have "the printed (electronic or otherwise) form of their source code communicate."

More On Abstracted C++ Pointers

Diligent Reader Marcus Barnes takes exception to my use of macros in last September's column. I used those macros to tailor auto_pointer for objects allocated by ways other than scalar new. Marcus recommends a technique using an STL-like traits class. I'm going to ponder Marcus' proposal, and summarize it in the next column. In that same column I'll show a couple other ways you might want to allocate pointed-to objects, and how to represent them (probably in both my and Marcus' style).
But what about objects that aren't dynamically allocated at all? auto_pointer, in all its flavors, assumes it will eventually deallocate whatever it points to. Beyond that, auto_pointer also protects against some misuse of pointer semantics. If we removed the auto-ness from auto_pointer, we'd have a class offering stronger semantics for all pointers, regardless of how their referenced objects came to be:
template <class t>
class pointer
    {
public:
    ~pointer();
    pointer(t * const = NULL);
    pointer(pointer const &);
    pointer(class nil const &);
    pointer &operator=(pointer const &);
    bool operator==(pointer const &) const;
    bool operator!=(pointer const &) const;
    pointer &operator+=(size_t const);
    pointer &operator-=(size_t const);
    operator t *() const;
    t **operator&();
    t * const *operator&() const;
    t &operator*() const;
    t *operator->() const;
private:
    pointer(void *);
    operator void *() const;
    t &operator[](size_t) const;
    t *pointer_;
    };
Differences from auto_pointer:

~pointer() does not free the contained pointer_.

There is no pointer ownership, obviating members release and is_owner_.

No members are protected or virtual. auto_pointer was designed to be a base class, allowing variations in how pointer_ is auto-freed. Since pointer doesn't free anything, the same impetus for derivation no longer exists.

A constructor accepting the nil pointer introduced last month.

operator+= and operator-=. I felt these were too dangerous for auto_pointers, since they'd change the value of the underlying pointer_; when auto_pointer went to free pointer_, the freed value could be invalid. Since pointer doesn't free anything, these operations are safer.

operator== and operator!=. In the presence of owned pointer_s, I wasn't sure of the semantic meaning of comparing two auto_pointers. Did p == q mean p.pointer_ == q.pointer_, or did it also require p.is_owner_ == q.is_owner_? Since class pointer has no ownership, the semantics of comparison are simpler.

The implementation for class pointer appears as Listing 1. You should find it an unsurprising mutation of auto_pointer.
Next month, I'll wrap up the pointer discussion, and introduce the next fundamental abstraction: arrays. o

Notes

1. I don't know yet if this draft will be freely available on the Internet. Assuming it is, I suggest you try anonymous ftp to: ftp://ftp.dmk.com/DMK/sc22wg14. I'll track down the real answer by next month and publish the verdict.
2. Conversely, to hold the number of attendees at an SD presentation, a signed char should do — or a bool if it's at SD East.
3. For a more "real world" analogy, consider the ongoing debate over U.S. Tax Code reform.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-425-488-7696, or via Internet e-mail as rschmidt@netcom.com.