Back to TOC Columns


Questions & Answers

Pete Becker

Traits to the Rescue

How can a program know the range of a type without knowing the type in advance? Pete shows a method based on the magic of traits.

To ask Pete a question about C or C++, send e-mail to pbecker@wpo.borland.com, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.


Q

My implementation of an associative test assumes that the comparison operators are defined for Key, so I can compare between keys without knowing their type in advance. But I also need to know the maximum and minimum possible keys, which depend on the actual Key type (e.g. -MAXINT...MAXINT in the case of int keys, "".."zzzzzz" in the case of string keys, etc.) How can I let my implementation know these values without knowing the type of Key in advance? In other words, is there a way to associate a constant value with a type, in the same way an operator is associated with a type?

A

This is easy to do, with a technique developed by some folks working on the ANSI/ISO standard, who ran into the same problem in designing the string and iostream templates. Their solution was to introduce an additional template that defines various traits for the type that is being used. Let's take a quick look at the sort of problem that this solves in the string template, then we can look at your specific problem.

The goal of the basic_string template in the ANSI/ISO working paper is to provide a uniform interface to strings of more or less arbitrary character types. The immediate motivation for this was the need for both a string class to hold ordinary char types and a wide string class to hold wide chars. These were originally two separate classes, named string and wstring, with nearly identical interfaces. The differences between the two were mostly in the argument types that their member functions took: the string class took text arguments of type const char *, and the wstring class took text arguments of type const wchar_t *. Some people thought that this redundancy was a bad thing, and proposed creating a template that could be instantiated for any character-like type.

This general string template quickly ran into a problem, though. Depending on the actual character type, there may be much more efficient ways of implementing common operations such as copying a string of characters. In a general-purpose template the only way to copy a string of characters is to write a short loop that copies individual characters:


for( int i = 0; i < length(); i++ )
     dest[i] = source[i];

Using a loop like that to copy char types, however, is almost certainly much slower than using the standard library functions that already know how to do such copies:


strcpy( dest, source );

or, if the string does not store a null terminator in its text buffer,


memcpy( dest, source, length() );

However, neither strcpy nor memcpy can handle strings of wchar_t, nor can they handle strings of arbitrary character types.

One possible solution to this efficiency problem would be to permit specialization of the basic_string template for these particular types. That would allow implementers to provide optimized versions of basic_string for the most common types, char and wchar_t, while still permitting users to create basic_strings of their own types when needed. That's not much better than the original proposal, though: it still involves two complete implementations of nearly identical string classes, one for chars and one for wchar_ts, and doesn't help users who need to create fast implementations of basic_string for other types.

The solution that the ISO/ANSI committee adopted was to create a traits template that tells the basic_string template how to handle time-critical operations efficiently. Rather than specialize the basic_string template for each of the character types, implementers can specialize the char_traits template -- a much less daunting task -- and get the efficiency they need. The skeleton of the char_traits template is shown in Listing 1.

Now, that's still a fairly large class, but it's nowhere near as large as the basic_string class itself, and it's definitely much simpler than the iostream hierarchy that it also supports. I'm not going to go through it in detail: most of the members are self-explanatory. Remember, though, that the purpose of this template is to provide a default implementation of all of its members, which will probably work with most character-like types. In order to work efficiently, it will almost certainly have to be specialized for the particular character type being used.

For example, the working paper requires that implementers provide a specialization for char that looks like this:


namespace std {
  struct char_traits<char> {
     typedef char char_type;
     typedef int int_type;
     typedef streampos pos_type;
     typedef streamoff off_type;
     typedef mbstate_t state_type;

     static void assign(
         char_type &c1,
         const char_type &c2 );
     static bool eq(
         const char_type &c1,
         const char_type &c2 );
     // remainder omitted.
  };
}


Looks a lot like the other one, doesn't it? In fact, the interface to all of the standard specializations of char_traits is the same except for the typedefs at the beginning. These need to be changed to reflect the actual type being specialized. To handle wide chars, the typedefs look like this:


namespace std {
  struct char_traits<wchar_t> {
     typedef wchar_t char_type;
     typedef wint_t int_type;
     typedef wstreampos pos_type;
     typedef wstreamoff off_type;
     typedef mbstate_t state_type;

     // remainder omitted.
  };
}

Since the member functions of char_traits are all written in terms of these typedef names, changing the typedefs themselves is all we need do to change the entire interface.

Each of these specializations has its own implementation, which can be written with its particular type in mind. So, for instance, to be able to use memcpy to copy chars, we'd implement the copy member function of char_traits<char> like this:


static inline char_type *
     char_traits<char>::copy( char_type *s1,
                              const char_type *s2,
                              size_t n )
{
memcpy( s1, s2, n );
}

Now whenever basic_string needs to copy characters around it can call traits<char>::copy and get this more efficient version. In fact the call involves a bit more indirection, but we need to look a bit more at basic_string to see how this is actually done.

In order to hook the traits class into basic_string we need another template parameter. The basic_string template actually looks like this:


template <class charT,
          class traits = char_traits<charT>,
          class Allocator = allocator<charT> >
class basic_string;


Don't worry about the third template parameter. Allocators aren't relevant to what we're discussing here. We're interested only in the first two parameters, which describe the character type that our string holds and the traits class that basic_string should use. Note that the traits class has a default type, which is simply the char_traits template instantiation for the character type being used. That's a convenience: it means that the standard library can provide a definition for a string of ordinary characters like this:


typedef basic_string<char> string;


This tells the compiler that the name string refers to basic_string<char, char_traits<char>, allocator<char> >. Within the template we can use the name traits to refer to the second parameter. That means that code within the template that needs to copy chars around looks like this:


traits::copy( dest, source, length );

As we've seen above, for basic_string<char>, traits::copy is an inline function that calls memcpy. That's how we write an efficient string template. So, to get back to your original question, how do we associate specific values with a particular type? It's easy: write a template and specialize it. Listing 2 makes a stab at it.

Notice that I've used the numeric_limits template to specify the minimum and maximum values for int. That's part of the standard library, and can be used for any of the built-in integral types. As its name suggests, it provides information about the numeric properties of the built-in types. If you stick to built-in types for your Key you don't need to provide your own template to tell you the minimum and maximum values: they're already provided for you by numeric_limits. I wouldn't recommend extending numeric_limits to cover char arrays, however; they're not really numeric types, and the things you have to provide to specialize numeric_limits just don't make sense for char arrays. Your design will be much cleaner if you provide your own template that's specific to the task you're working on, and pull the information you need from numeric_limits when it makes sense to.

Now you can write your code in terms of range<Key>::min() and range<Key>::max(), without having to worry about the details of the Key type. Of course, you have to provide template specializations for each of your Key types, but that's not a big job. Certainly not as big as rewriting your template code for each Key type.

Q

Your final realloc code in your August 1996 C/C++ Users Journal article contains an egregious bug, obvious at a glance. You can fix the bug and make the code more natural by removing count++; from inside the if{} and changing ptr[count-1] = val; to ptr[count++] = val;.

You really should at least read your code before publishing it (if you did read it, then you need to work on your reading comprehension of C code).
-- Jim Balter

A

You are, indeed, correct: the final code snippet does not increment count correctly. That's the sort of thing that happens when you incrementally change code to illustrate different approaches. Is there an Evelyn Wood reading comprehension course for C? o

Pete Becker is Senior Development Manager for C++ Quality Assurance at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.