If you're having trouble getting all those parentheses right on a complex new-expression, maybe you need Dan's latest parsing tool.
Copyright © 1997 by Dan Saks
Last month, as part of my continuing series on new- and delete-expressions, I presented a grammar for new-expressions. I compared that grammar with a grammar for declarations in an effort to demonstrate that the skills you need to master new-expressions are essentially the same skills you need to master declarations.
Last year, as part of a series on declarations, I presented a program called decl that translated C++ declarations into English. (The final form of that program appeared in "C++ Theory and Practice: Declarators, Finale," CUJ, October 1996.) If, in fact, there is considerable overlap between the syntax of new-expressions and the syntax of declarations, then it should have been easy for me to adapt decl into another program that translates new-expressions into English. Indeed it was, and I will show you that program, which I call newexpr.
With decl, you can type in a declaration such as:
T *T::*f(int) const;and decl replies with:
f is ... const member function with parameter(s) ... <type> is ... int returning ... pointer to member of T with type ... pointer to ... TWith newexpr, you can type in a new-expression such as:
new T*[N];and the program responds with:
<type> is ... array with N elements of type... pointer to ... T without an initializer list using an allocation function selected by member lookupnewexpr can recognize a new-expression with a placement argument list, an initializer, or a leading :: (scope resolution operator). For example, given the input:
::new (a, 1) T (x);newexpr replies:
<type> is ... T with initializer list: x with placement arguments: a, 1 using a global allocation functionnewexpr can detect many (but not all) input errors. For example, newexpr responds to the input:
new T[N](3);with the message:
error: array type cannot have non-empty initializer listAn Updated decl
Before I explain how I produced newexpr from decl, let me take just a moment to review decl. Very briefly, the program consists of three files, scanner.h, scanner.cpp, and decl.cpp. scanner.h and scanner.cpp define and implement the scanner class. A scanner object reads characters from an input stream and groups those characters into tokens, such as identifiers, literals, and operators. The scanner represents each token as an object of class type token, defined and implemented entirely in scanner.h.
decl.cpp defines and implements the parser class. A parser object repeatedly obtains a sequence of tokens from a scanner, verifies that the token sequence constitutes a valid declaration, and if so, translates it into English.
Since I last showed you the decl program, I've made a few small changes. You can obtain the updated code from the CUJ online sources (see page 3 for details). I'll just alert you to the changes.
Some of the changes reflect that fact that I'm using newer compiler versions. I tested the code with the 32-bit Borland C++ compiler (bcc32) version 5.01, and with Microsoft Visual C++ 5.0, both under Windows 95. The C++ that these compilers accept is sufficiently similar that I could take advantage of new language and library features without cluttering the code with #ifdefs.
In particular, both compilers now support bool as a built-in type, so decl no longer simulates the bool type. If your compiler does not yet support bool, you must supply definitions for bool, false, and true, such as
enum bool { false, true };or
#define int bool #define false 0 #define true 1Earlier versions of decl contained instances of the following using-directive in a conditional-compilation wrapper:
#ifdef NAMESPACE using namespace std; #endifThe wrapper was a concession to the fact that some compilers placed STL (Standard Template Library) components, such as the class template deque, in namespace std while other compilers didn't support namespaces at all. Both compilers I used for the latest version of decl now place the STL components in namespace std. decl still has the using-directives, but not the surrounding preprocessor statements.
The latest version of decl also reflects my recently acquired habit of writing the const qualifier, if present, as the rightmost specifier in a type-specifier sequence. That is, I now write declarations such as
scanner(scanner const &);in preference to
scanner(const scanner &);This is purely a matter of style. It allows you to read declarations more naturally from right to left, as "reference to constant scanner" rather than as "reference to a scanner that's const".
Hardly anyone, except for my pal Bobby Schmidt, places const as I do. When I discussed the merits of this style (in "C++ Theory and Practice: Mixing const with Type Names," CUJ, December 1996), I wasn't using this style myself. Since then, I've decided to give it a serious try. I realize that I'm swimming against a very strong tide, and so someday I may look back on this as just an experiment.
Non-converting Constructors
Another small refinement is that the scanner declares its contructor using the keyword explicit:
explicit scanner(istream &);This prevents my code from accidentally using this constructor as a user-defined conversion. Permit me to digress for a bit while I elaborate [1] .
By default, a constructor that you can call with one argument is a converting constructor. For example, suppose class rational represents exact fractions, such as 1/3 or -2/5:
class rational { public: rational(int); ...This rational constructor lets you initialize a rational number by specifying only the numerator. For instance,
rational r(3);initializes rational number r with the value 3/1.
Since this constructor is a converting constructor, it is also a rule for converting an int into a rational. If, for example, the program also contains a function
void f(rational);and a call such as f(3), the compiler does not simply reject the call. Rather, it uses the converting constructor to convert 3 to a rational and passes the result of the conversion as the argument to f.
For an arithmetic types such as rational and int, using a constructor as an implicit conversion rule makes sense. However, for many other types, such implicit conversions can be a source of errors.
In the earlier versions of decl, the scanner constructor
scanner(istream &);is also a converting constructor. However, I never intended the compiler to use this a rule to implicitly convert an istream into a scanner. I only want to use it to explicitly construct scanner objects. Therefore, I added the keyword explicit to its declaration.
The keyword explicit is a fairly recent addition to C++. I believe that explicit should have been the default all along, but such is life. Now that the keyword explicit is available, you'll see me using it more often, and I encourage you to do the same.
The keyword explicit has no effect on constructors that could never be converting constructors. That is, it has no effect on a constructor that must be called with zero arguments, or with two or more arguments. Adding the keyword explicit to such constructors does no harm. Therefore, you can opt for a style that places explicit in front of every constructor declaration.
On the other hand, the keyword explicit does have an effect on a constructor such as
T(int, int = 0);Although this constructor has two parameters, the second has a default argument value. Thus, you can call this constructor with only one argument, so it is a converting constructor. Adding the keyword explicit to this constructor's declaration renders it non-converting.
Updating the Scanner
I began turning the decl program into the newexpr program by comparing their grammars. Table 1 shows the grammar for C++ object and function declarations that I used in designing decl. Table 2 shows the grammar for new-expressions that I used in designing newexpr. Both grammars are expressed in the EBNF notation summarized in Table 3.
Many of the productions (grammar rules) in Table 2 refer to productions defined in Table 1. For example, the production
type-id = type-specifier-seq abstract-declarator .appearing in Table 2 refers to abstract-declarator defined in Table 1. The grammar in Table 3 ultimately refers to all but the first three productions from Table 1.
The grammar for new-expressions introduces only one token not already in the grammar for declarations, namely the keyword new. Modifying the scanner to work with newexpr required only three small changes.
The definition for class scanner in scanner.h defines an enumerated type category with an enumerator for each different kind of token. I simply added another enumerator called NEW to represent the keyword new.
class token { friend class scanner; public: enum category { ... NEW, ... };The scanner class has a static data member keyword which is a table that contains the spelling for each keyword and its associated token category value. I added an entry for new to that member's definition appearing in scanner.cpp:
scanner::tk_pair const scanner::keyword[] = { ... { "new", token::NEW }, ... };The image function converts a token category to a null-terminated character string (mostly for displaying readable error messages). I added one more case to the switch statement for the keyword new:
char const *image(token::category tc) { switch (tc) { ... case token::NEW: return "new"; ... } return "no such token"; }These changes do not interfere with the operation of the decl parser in any way. Therefore, decl and newexpr use the same scanner.
The newexpr Parser
Writing a parser for newexpr was a bit more work than extending the scanner. Although I was able to transplant most of decl's parsing functions into newexpr without change, I had to put a little effort into altering the declarator and direct_declarator functions. I also had to add a function for expression-list to handle initializers and placement arguments.
Listing 1 contains a partial listing of the file newexpr.cpp, the newexpr parser. It omits those functions that are identical to the corresponding functions in the decl parser. Again, you can obtain the complete source from the CUJ online sources.
Earlier versions of the decl parser had a member function called decl_specifier_seq. In a declaration, the sequence of specifiers leading up to a declarator can be storage-class-specifiers (such as static or extern) or function-specifiers (such as inline or virtual) as well as type-specifiers (such as int, const, or type-names). In a new-expression, the specifiers can be only type-specifiers. For example, you can declare
extern char *p;but you cannot allocate storage using
p = new extern char *;because extern is not a type-specifier.
The grammar in Table 1 shows that the decl program doesn't recognize any decl-specifiers other than type-specifiers. Thus, the decl-specifier-seq production in Table 1 is actually equivalent to the type-specifier-seq production in Table 2. In the new version of decl, I simply renamed the decl_specifier_seq function as type_specifier_seq so I could use the same exact function in both decl and newexpr.
The declarators in a declaration can be either abstract or concrete. An abstract-declarator is one that cannot have a declarator-id, and a concrete-declarator is one that must have a declarator-id. In the decl program, the declarator and direct_declarator functions do the work of parsing both kinds of declarator. Both functions accept a parameter of enumerated type declarator_category, which indicates whether the particular call can accept an abstract-declarator, a concrete-declarator, or either.
New-expressions can accept a third kind of declarator, called a new-declarator. As I explained last month, a new-declarator is essentially an abstract-declarator that cannot contain parentheses. For example, in:
p = new T *[N];T is a type-specifier and *[N] is a new-declarator, and together they mean "array with N elements of type pointer to T". If you try to use parentheses in the new-declarator, as in
p = new T (*)[N];compilers will interpret (*) as an ill-formed initializer, and [N] as a subscript operator following the new-expression.
The only way to get parentheses into the declarator of a new- expression is to enclose the entire allocation type in parentheses, as in:
p = new (T (*)[N]);In this case, compilers parse the declarator (*)[N] as an abstract-declarator, which may use parentheses. This new-expression allocates a "pointer to array with N elements of type T".
You can verify that a new-declarator is essentially a restricted form of abstract-declarator as follows. Start with the production for declarator from Table 1:
declarator = direct-declarator | ptr-operator declarator .Then, change all occurrences of declarator to new-declarator to obtain:
new-declarator = direct-new-declarator | ptr-operator new-declarator .which is exactly the same as the production for new-declarator in Table 2.
The production for direct-declarator from Table 1 is:
direct-declarator = [ declarator-id | "(" declarator ")" ] { array-suffix | function-suffix } .In this case, substituting new-declarator for declarator yields:
direct-new-declarator = [ new-declarator-id | "(" new-declarator ")" ] { array-suffix | function-suffix } .which doesn't look very much like the production for direct-new-declarator in Table 2. At least, not yet.
Since a new-declarator is always an abstract-declarator, you can simplify the production by removing the declarator-id part:
direct-new-declarator = [ "(" new-declarator ")" ] { array-suffix | function-suffix } .Moreover, a new-declarator can't have parentheses, so you can remove the parts that use parentheses and simplify the production even further:
direct-new-declarator = { array-suffix } .If you substitute the right-hand-side of the production for array-suffix (from Table 1) , you get:
direct-new-declarator = { "[" [ constant-expression ] "]" } .This is much closer to the production for direct-new-declarator in Table 2, but they're still not the same. The production just above (derived from new-declarator in Table 1) says that all the array dimensions must be constant-expressions. The production in Table 2:
direct-new-declarator = [ "[" expression "]" ] { "[" constant-expression "]" } .allows the first dimension to be a non-constant expression.
This distinction is important in C++, but not in newexpr. newexpr, like decl, does not maintain a symbol table, so it can't tell whether a name represents a constant or non-constant value. (The scanner regards all identifiers that begin with an uppercase letter as previously declared names, but makes no further distinction among them. They're all just names.) Therefore, we could simplify the production for direct-new-declarator in Table 2 as:
direct-new-declarator = { "[" constant-expression "]" } .Now the only difference between this production and
direct-new-declarator = { "[" [ constant-expression ] "]" } .(the one that evolved from declarator in Table 1) is that the constant-expression is mandatory in the former and optional in the latter. For example,
new char *[][N]is not a valid new-expression. (A new-expression cannot allocate an array of unspecified dimension.) On the other hand,
char *table[][N]is valid, either in an extern declaration or a parameter declaration.
The grammar in Table 1 also suggests that
char *table[M][] char *table[][]are valid, even though they really aren't. C++ has a semantic rule stating that only the first dimension of a multidimensional array can be unspecified. Earlier versions of decl did not enforce this semantic rule they simply accepted unspecified array bounds in any array-suffix. The new version of decl does enforce the rule, and rejects both of those declarations just above. I'll explain how a little later.
Rather than write new parsing functions for new-declarator and direct-new-declarator, I augmented declarator and direct_declarator to recognize new-declarators as well. I merely extended the enumerated type declarator_category to include NEW as another possible value:
class parser { ... private: enum declarator_category { ABSTRACT, CONCRETE, EITHER, NEW };I also added just a few well-placed conditional expressions to turn off parentheses and declarator-ids for new-declarators. Look in the function direct_declarator in Listing 1 for occurrences of the declarator_category value NEW.
As I mentioned before, prior versions of decl allowed unspecified array bounds in any array-suffix. In order to catch invalid uses of unspecified array bounds, I had to split the type_category value ARRAY into two type_categories, SIZED_ARRAY and UNSIZED_ARRAY:
class parser { ... private: ... enum type_category { SIMPLE, SIZED_ARRAY, UNSIZED_ARRAY, FUNCTION, POINTER, REFERENCE };I also augmented the array_suffix function to return a type_category value of either SIZED_ARRAY or UNSIZED_ARRAY. Look again in the direct_declarator function of Listing 1 to see how the parser uses these values to detect semantically incorrect array declarators.
If you compare the declaration for the declarator function in the decl parser with the corresponding declaration in the newexpr parser, you will see that the one in newexpr has an additional parameter of type declarator_category & called inner. This allows newexpr to enforce semantic constraints that do not apply to declarations.
Every declarator has an inner type category and an outer type category. The inner type category is the category of the highest priority declarator operator, the outer is the category of the lowest. For example, in the declaration
T (*x[N])();the declarator specifies that x has type "array with N elements of type pointer to function with no parameters". In this case, the inner type category is SIZED_ARRAY, and the outer category is FUNCTION. A declarator with no declarator operators has type category SIMPLE. For example, in the declaration
T x;both the inner and outer type categories are SIMPLE.
newexpr uses the inner type category to check that a new-expression does not attempt to allocate a function, a reference, or an array of unspecified dimension. For example, whereas
int &ris a valid declaration
new int &is not a valid new-expression. The statements that enforce this constraint appear in the new_expression parsing function in Listing 1.
At first, you might think it's impossible to even specify a new- expression that allocates a function type. After all, if you write
new T()compilers interpret the parentheses as an empty initializer, not as a parameter list in a function declarator. However, if you write
new (T ())compilers will interpret the inner parentheses as a parameter list. The allocated type is "function with no parameters returning T". But this violates the aforementioned constraint, so it's not valid.
Delete-expressions
By comparison with new-expressions, delete-expressions have a much simpler syntax. Although I don't believe that writing a program that interprets delete-expressions would be a particularly enlightening exercise, the syntax still deserves a few comments.
You can describe delete-expressions with a single production:
delete-expression = [ "::" ] "delete" [ "[" "]" ] cast-expression .The leading :: is optional. It affects the deallocation function lookup just as the :: in a new-expression affects the allocation function lookup. That is, the :: tells the compiler to skip the normal member name lookup and look directly in global scope.
The optional [] after the keyword delete indicates that the cast- expression designates the first element in an array. I covered the rules for array delete in the very first article of this series (see "C++ Theory and Practice: new and delete", CUJ, January 1997).
Although we most often see the cast-expression in a delete- expression as just a simple identifier, it can be an elaborate expression. For example,
delete static_cast<D *>(p);deletes the object addressed by the result of converting p to a D *. Although this is something you don't want to do very often, remember that compilers must allow for things like this when trying to make sense of erroneous delete-expressions.
It's a Wrap
Well, this ends my saga of new and delete, at least for the time being. Have fun with the newexpr and decl programs. I invite feedback on these programs, but please understand that I cannot provide technical support for those of you who have trouble building the programs. o
Reference
[1] To find out more about converting constructors read Scott Meyers' article, "Mastering User-Defined Conversion Functions," CUJ, August 1995.
Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly 7 years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504- 4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.