The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

Getting to the Point(er)

Bobby offers more opinions on protected vs. private inheritance, and on keeping track of resources under MAPI.

Copyright © 1997 Robert H. Schmidt

A shorter-than-usual column this month, as I recover from the Software Development conference held in San Francisco at the beginning of April. Sometimes I think this conference runs technical seminars as a smokescreen, covering its true nature as a non-stop party and shmoozefest. This show was particularly dense with social activity — the JavaOne booths were set up next door to Software Development's, leading to much cross-pollination on the show floors.

CUJ Managing Editor Marc Briand and I were literally front row center for Bill Gates' keynote. While I found Bill's pontification his usual thinly veiled product pitch, I did note a couple of interesting points. First, Windows is spreading like a virus into consumer and embedded hardware; Bill promises/threatens Windows CE in places where "you won't even know it's Windows." Second, like the rest of the world, Bill doesn't really know what to do with Java. I suspect he's just trying to stay with the pack until he figures out a way to either own the Java market, or spin the market into some other direction (ActiveJ anyone?).

With the prominence of Java at these shows, I found myself recalling a scene from Apocalypse Now. Martin Sheen's character (Captain Willard) comes upon a seemingly perpetual battle: the Americans build Du Long bridge each night, only to have the North Vietnamese destroy it each morning. Willard looks in vain for the Americans' commanding officer, eventually finding a soldier who thinks that Willard is in command. The truth is, nobody is in command, yet all parties keep flailing away as if somebody, somewhere, has a master plan.

I also thought about Beanie Babies. Every kid (and parent) now has to have them, even though their value is artificially created by every other kid and parent wanting them — a self-contained synthetic market. I counted around 200 Java vendors at these shows. Can the market sustain this? Do these vendors sell to anyone except one another? Maybe Java doesn't technically fit the definition of a pyramid scheme, but I wonder if the effect won't be the same. Today's Java Beanie Baby [1] could be tomorrow's Ada Cabbage Patch doll.

More on Access Control

I've received two more e-letters on protected vs. private, which (I hope!) represents the last on this topic, for a few months anyway.

Diligent Reader the First a.k.a. Sean Hoffman writes in defense of protected. I'm excerpting most of Sean's letter here, since he raises several points I'd like to address:

"...[Y]ou made a point regarding the 'privacy' of member data in class declarations. You go on to say that your practice is to declare all member data private.

I vociferously disagree, based on the following reason.

By declaring data as private, you are making a fundamental assumption about all future derived classes that would/could be derived from your class. I don't think it's fair to assume that at the time a base class is/was written, you will know all possible means in which it could be utilized in the future.

...[T]here have been several times when I've cursed IBM for making critical data that I needed to get to private, in which case my only alternative is/was to roll my own class from scratch, duplicating IBM's functionality and killing the whole idea of code reuse.

I believe in the whole idea of protecting data, but you have to assume that someone who's inheriting your class will at least have done enough homework on studying your class to know what they're getting themselves into. Basically, you've got to trust that they'll use their access wisely.

I'd go even further to say that, one should lean towards using the protected keyword even more than private, as it lends itself more easily to inheritance."

To me, Sean's argument distills out to three points:

Authors of base classes can't presume to know all contexts using those classes.

At the same time, those authors can assume that class users thoroughly research and understand the base classes.

Easy inheritance is desirable.

Ironically enough, where Sean finds these points justification for protected, I find them cause for private. I differentiate "could be utilized" (Sean's phrase) from "should be utilized" or "could be reliably utilized." My perspective is conservative — if you can't make reasonable guarantees that people can successfully muck with your object's state (as preserved in data members), then don't tempt them. Put another way, if you can show only some use is valid, and your options are to allow no use (private) or essentially unrestricted use (protected with thin derivation), choose the former [2] .

The third point above (desirable inheritance) transcends protected/private, and probably warrants its own column. I find inheritance overused (especially by C++ novices) and, in the presence of virtual bases, borderline dangerous. In my experience, large inheritance schemes are thickets of poor cohesion and pathologically tight coupling (I submit MFC as defense exhibit A).

Code Reuse

Diligent Reader the Second (Golden E. Murray) opines:

"Having spent a lot of time with interface libraries and trying to extend them I have come to form the opinion that it is better to not use the private access specifier unless the data you are trying to protect is really critical and must not be changed in a derived class. Part of the whole reason behind OOP (as I see it) is to provide reusable (and yet safe) components. If someone really wants to use the standard specified access modifier, well, that's their problem. After all, the private access specifier is just as vulnerable.

How?!?!?! you say?

Well, when I was working with the Zinc interface library years ago, they had specified many of their data members to be private.

Too many of them, so that I could not extend their scrollbar to do what I needed. So, I fired up the editor, opened the header file, and changed the access specifier to "protected" as it should have been declared in the first place. It works even better than the standard access modifier — it works with any and all compilers!"

Much of my response here hinges on the definition of "code reuse." As I see it, once you alter a class from its originally authored state, you are no longer reusing the same code — you have instead cloned the class, and are using (not reusing) the clone. I contrast this to client code that uses a base class as is, and is therefore sensitive to any changes in that base (e.g., code calling into or deriving from the C++ STL classes).

If you take a snapshot of a class, then change the access specifiers as Golden suggests, the original base class is no worse off; the class's author can change the base with no effect on you. The clone becomes part of your project, not the base author's, and is therefore under your control. You become responsible for regression testing clone changes — the original author is no longer on the hook to maintain compatibility.

I think my views stems from this premise: a class author understands that class better than does a class user. As Sean says, a class author can't know everywhere a class can be used; but I'd wager that author knows better than you or I where the class can't be used. Changing a base class requires a leap of faith: you assume you can successfully alter the code's private state outside the code's control, without encountering the unforgiving Law Of Unintended Consequences.

In the end, protected is one of those half-empty half-full concepts: you can see it as either allowing extra access relative to private, or as restricting access relative to public. However, given that protected is really a hint [3] , the choice effectively comes to full public or full private; in support of this, I almost never use protected in my own work. But as Golden suggests, all access control (or for that matter, anything a class author writes) is ultimately a hint, since any text editor can trump it.

Back to Our Regularly Scheduled Program

With this protected imbroglio safely receding in our rear-view mirror, let us turn once again to our abstraction saga vis-a-vis Microsoft's COM and MAPI. Last month I promised we'd explore our first abstracted type for this project, and explore we shall. But first, a little background/refresher.

In the start of this year, I wrote a couple of columns surveying fundamental C and C++ abstraction techniques. In April, I introduced Microsoft's COM and MAPI as examples of interfaces not tuned to C++'s abstraction strengths. I suggested then that exploring alternatives to standard MAPI would give real-world examples of C++ abstraction at work.

In these next months, we'll abstract basic or foundational types that work in many contexts, and that give rise to more complex abstractions, all within the framework of MAPI. I start off this month with pointers, first by surveying their role in MAPI, then following with the first of several abstracted alternatives.

MAPI functions have two ways to hand values back to callers:

function return values

by-address (pointer) parameters

Note that reference parameters are not in this short list. Because MAPI is language-independent, it aims for lowest common (interface) denominator. Many languages support pointers, or can be coerced into simulating them; few languages support references. Thus, MAPI traffics heavily in pointers.

While pointers have the virtue of ubiquity, they suffer several well-known problems in C++. Among them:

Non-static pointers are not automatically initialized to NULL or some other well-defined pointer value.

Not all pointed-to objects are allocated and deallocated identically, or at all.

Jumps out of sequence (e.g., via exception, return, or exit) can leave pointed-to objects in memory.

Pointers to scalars can be indexed as if they pointed to arrays.

These problems are exacerbated in MAPI-calling functions, which often have multiple return points. Because MAPI is not tuned to C++, MAPI functions don't throw exceptions; instead, they return success/failure codes which callers must interrogate. At each MAPI call that could fail, the caller must know which pointers to deallocate and how to deallocate them.

For example, much of Microsoft's MAPI client code (both internally and in public samples) looks like
SomeMAPIType *p1, *p2, *p3;
HRESULT result;
     
result = SomeMAPICall1(&p1);
if (SUCCEEDED(result)
    {
    result = SomeMAPICall2(p1, &p2);
    if (SUCCEEDED(result))
        {
        result = SomeMAPICall3(&p3);
        if (SUCCEEDED(result))
                    SomeMAPICall4();
        else
            {
            p1->Release();
            delete p2;
            return
               E_SOME_ERROR_CODE_1;
            }
        MAPIFreeBuffer(p3);
        }
    else
        {
        p1->Release();
        return E_SOME_ERROR_CODE_2;
        }
    delete p2;
    }
else
    return E_SOME_ERROR_CODE_4;
p1->Release();
return S_OK;
where

HRESULT holds a COM function result code.

SUCCEEDED is a standard COM macro, testing HRESULTs for success.

S_OK is the standard COM result code for success.

E_xxx is a place holder for Microsoft's predefined COM error result codes.

Release is the real Microsoft COM member corresponding to the reference-decrementing release function I described in April's column.

MAPIFreeBuffer — surprise! — frees a buffer allocated by MAPI.

You could streamline the code, by pulling common sequences into functions, corralling pointer declarations into nested blocks, or one of several other techniques. However you repackage this code — which is essentially C design written in C++ — you still must cope with pointer bookkeeping.

To solve these problems, I'd like to rearchitect C++ pointers so they have those C features I like (including COM compatibility) without suffering the C limitations I don't like. As is my typical solution for such dilemmas, I'll craft a C++ class that starts with the default C++ behavior, selectively restricting and enhancing that behavior through the class interface. Happily, Standard C++ already offers an excellent starting point for just such a class: auto_ptr.

auto_ptr

P.J. Plauger described the auto_ptr template class in his July 1996 CUJ column. That class is a black sheep within the otherwise STLish header <memory> (although I suppose you could argue any pointer is an iterator, thus qualifying as an STL wannabe). Because Plauger's column listed out the auto_ptr contents, I will not repeat that listing here. For our discussion, know that an auto_ptr acts much like a normal C++ pointer with several crucial differences:

an auto_ptr must initialize from some other pointer value (the default constructor initializes from NULL). This reduces the chance of dereferencing a dangling or uninitialized pointer.

an auto_ptr contains a private pointer to a dynamically-allocated object, much as a string contains a private char *.

when an auto_ptr goes out of scope, its destructor may call delete on the private pointed-to object, removing a prime opportunity for memory leakage.

the private object is owned.

To understand this last notion more fully, consider what happens when the auto_ptrs in
{
auto_ptr<char> p1 = new char;
auto_ptr<char> p2 = new char;
auto_ptr<char> p3 = p2;
}
go out of scope:

p1's destructor calls delete on its allocated (and owned) char.

p2 allocates a char, then hands ownership of that char to p3 as a side-effect of being copied to p3. As a result, p2's destructor does not touch the allocated char.

p3, although it did not originally allocate the char, owns the char anyway by virtue of the copy construction from p2. Thus, p3's destructor deletes the char.

Such ownership helps ensure that the contained object is deleted exactly once, even in the presence of exceptions and other early exits. However, the ownership scheme is not bullet-proof:
auto_ptr<char> p1 = new char;
if (true)
    {
    auto_ptr<char> p2 = p1; // #1
    }                       // #2
*p1 = 'a';                  // #3
At the line marked #1, p2 owns the allocated char as a side-effect of being copied from p1. Once p2 goes out of scope (#2), its destructor deletes that allocated char. At #3, p1 then tries to assign into the (now non-existent) char.

To be fair, this same problem can occur even with plain C-style pointers; the trouble stems from the final user of a resource not being the one freeing that resource. auto_ptr makes a reasonable stab at remedying this for many situations. Were this its only liability, we could consider auto_ptr as a replacement for true pointers in MAPI calls. Unfortunately, ownership and destruction of objects is not always as simple as auto_ptr would suggest.

Many MAPI calls allocate COM objects on your behalf, incrementing the object's reference count and assuming you will similarly decrement that count when you're done with the object. Once the object's count hits zero, the object is supposed to self-destruct. Conversely, object's contained by auto_ptrs do not self-destruct; instead they are explicitly deleted by the auto_ptr destructor. Clearly, these two philosophies are in opposition.

Further, for objects that do need explicit destruction, auto_ptr assumes scalar delete is the proper destruction vehicle. This assumption is fine for the simple example of
auto_ptr<char> p = new char;
but not so fine for
auto_ptr<char> p = new char[1];
which requires delete [] in the auto_ptr destructor, or
auto_ptr<char> p = (char *) malloc(1);
which requires free. As they are independent of C++, MAPI objects (even those without reference counts) are typically not created by new, rendering auto_ptr's assumption of delete invalid.

All this means that, as specified in the C++ Standard, auto_ptr is unsuitable for many MAPI objects. This does not mean, however, that the idea underlying auto_ptr is unsuitable; in fact, encapsulating pointers in class objects is a foundation of C++ type design. As demonstration, next month I'll incrementally modify auto_ptr to address the above MAPI-related limitations. In addition, I'll make other changes to auto_ptr, independent of the MAPI considerations, that make the class safer. Finally, I'll start extrapolating from this auto_ptr-inspired class to others abstracting pointer and pointer-like behavior. o

Notes

[1] And no, I didn't see the pun of "Java Bean" and "Beanie Baby" until I literally typed the words here. Now that the genie's out of the bottle, I wonder if we'll see little bean-filled Java marketing collateral at future trade shows. Personally, I'm waiting for the first Java lamp, with blobs of market share oozing up and down.

[2] To quote Robert Heinlein: "You can have peace. Or you can have freedom. Don't ever count on having both at once."

[3] As I've discussed in previous columns (and at SD), derived classes can easily elevate protected member access to public.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-206-488-7696, or via Internet e-mail as rschmidt@netcom.com.