If you have to write yet another expression evaluator, maybe this time you should start with a set of classes that does most of the hard bits.
Introduction
Evaluating expressions is a common operation in programming languages, so much so that many programmers may not appreciate the complexity of implementing an expression evaluator that is, until they need to implement one of their own. Arithmetic and Boolean operator precedence, variable data, and infix notation all complicate the process of expression evaluation. Its tempting to try to simplify the implementation, by restricting it to a minimal set of operators, or disallowing parenthesis and variables. We could even do away with operator precedence by requiring users to enter expressions in postfix or prefix notation. Unless they are old FORTH programmers, however, they will probably rebel. Few users will want to change their way of thinking to accommodate the requirements of an application.
In this article, I present a set of expression objects that will properly implement a flexible and expandable infix expression evaluator. Sample uses include creating interfaces that allow users to enter values as expressions or as spreadsheet-like operations. The expression objects could even be used to implement a simple programming language interpreter. For those interested in object-oriented programming, this article also demonstrates an application for which inheritance and polymorphism are a natural fit.
Design Specs
I need to set a few design boundaries before developing the expression evaluator. Specifically, the evaluator will:
- evaluate expressions in algebraic infix notation
- provide the following basic set of functionality:
- Arithmetic operations (+, -, *, /)
- Boolean operations (and, or, not)
- Relational operations (>, <, >=, <=, ==, !=)
- handle constant as well as variable data maintained by a symbol table
- not be bounded by syntax
- be designed for expandability
While the above cannot fill the needs of every application needing an expression evaluator, it does cover a wide range of commonly needed capabilities.
Designing an Expression Tree
To begin the design process, I determine how an expression can be decomposed into objects. As shown in Figure 1, an expression, such as 1+2*3/(4-2), can be represented in a tree structure (known as an expression tree) which breaks an expression down into its components.
An in-order traversal of the expression tree will reconstruct the infix expression. Each node in the tree possesses a value, which is either inherent (as in literals) or is some function of the values of its children (as in operators like +, , *, /). Evaluating the root node (the + in Figure 1) will cause a post-order traversal of the expression tree, resulting in the value of the expression as a whole.
Base Classes
Beginning with the Exp_Obj class shown below, I begin modeling the nodes in the expression tree with objects.
// Base Expression Object class Exp_Obj { public: virtual float GetValue( Symbol_Table &sym ) {return 0.0;}; virtual int GetType( void ) {return NONE;}; virtual int GetPriority( void ) {return 0;}; };This class does nothing but define a standard interface for the rest of the expression objects. By deriving all other expression objects directly or indirectly from the base Exp_Obj class, I can use polymorphism and dynamic binding to facilitate easy manipulation of the expression objects, regardless of their specific type or functionality. The GetValue operation returns the value of the expression object. The derived classes will implement the functionality of the object by redefining this operation. The sym parameter to GetObject provides access to the symbol table information if needed. (I describe the symbol table in more detail later.)
I have defined five categories, or types, of expression objects: NONE, OPERAND_TYPE, BINARY_OPERATOR_TYPE, UNARY_OPERATOR_TYPE, and EXPRESSION_TYPE. The GetType operation returns an expression objects type.
GetPriority returns the infix priority of an object, which is used to maintain operator precedence when evaluating the expression. In the expression 1+2*3/(4-2), the parentheses have the highest precedence, so their content is evaluated first. The * and / have equal precedence in this expression, so they are evaluated next, in a left-to-right order. The + operator has the lowest priority in this expression and is evaluated last. Table 1 shows the infix priority of the operators I implement as expression objects.
I derive the Unary_Obj (shown below) and Binary_Obj classes from Exp_Obj to abstract characteristics common to two types of operators: those that operate on a single object and those that operate on two objects. The Unary_Obj class holds a pointer to another Exp_Obj. Unary_Objs constructors initialize this pointer, either to a NULL or to point to an Exp_Obj. SetObj enables changing the pointer after construction, and GetType always returns UNARY_OPERATOR_TYPE. Classes derived from Unary_Obj will redefine the pure virtual function GetValue to incorporate the functionality of the specific unary operator.
class Unary_Obj : public Exp_Obj { protected: Exp_Obj *obj; public: Unary_Obj( void ) { obj=NULL; } Unary_Obj( Exp_Obj *o ) { obj=o; } virtual void SetObj(Exp_Obj *o) { obj=o; } virtual int GetType( void ) {return UNARY_OPERATOR_TYPE;} virtual float GetValue( Symbol_Table &sym )=0; };The Binary_Obj class (below) is very similar to the Unary_Obj class. A Binary_Obj maintains pointers to both a left and a right Exp_Obj. The pointers are initialized at construction and may be changed thereafter via the SetLeft and SetRight operators. GetType always returns BINARY_OPERATOR_TYPE, and the pure virtual function GetValue is defined by derived classes to implement the binary operators functionality.
class Binary_Obj : public Exp_Obj { protected: Exp_Obj *left, *right; public: Binary_Obj( void ) { left=NULL; right=NULL; } Binary_Obj( Exp_Obj*l, Exp_Obj *r ) { left=l; right=r; } virtual void SetLeft(Exp_Obj *l){ left=l;} virtual void SetRight(Exp_Obj *r){right=r;} virtual int GetType( void ) {returnBINARY_OPERATOR_TYPE;} virtual float GetValue( Symbol_Table &sym )=0; };Operators
Now that Ive provided for the basic structure of the expression tree, I can start to fill the tree with more concrete objects. Operators can connect to terminal objects (which may be either variables, such as A, or literals, such as 123.4) or to subtrees.
The Unary_Minus_Obj class implements the unary minus operation (-). It is derived from Unary_Obj. GetValue returns the negative of the referenced Exp_Objs value, and GetPriority returns the operators precedence. Remember that GetType and SetObj are also available since they are inherited from the Unary_Obj class.
class Unary_Minus_Obj : public Unary_Obj { public: Unary_Minus_Obj( Exp_Obj *o ): Unary_Obj( o ){}; Unary_Minus_Obj( void ): Unary_Obj() {}; virtual float GetValue(Symbol_Table &sym) {return -(obj->GetValue(sym));} virtual GetPriority( void ) { return UNARY_SUB_PRIORITY; } };The Add_Obj implements a binary addition (+) in its GetValue operation and returns the operators infix priority in GetPriority. All children of Binary_Obj inherit SetLeft, SetRight, and GetType. The subtraction (-), multiplication (*), and division (/) operations are implemented in a similar fashion. The complete set of expression objects are available on the code disk and online sources.
class Add_Obj : public Binary_Obj { public: Add_Obj(Exp_Obj *l,Exp_Obj *r): Binary_Obj(l,r){}; Add_Obj( void ):Binary_Obj(){}; virtual int GetValue( Symbol_Table &sym ) {return left->GetValue(sym) + right->GetValue(sym);} virtual int GetPriority( void ) { return ADD_SUB_PRIORITY;} };Terminal Objects
Literals and variables are what I call terminals; they occupy the leaves of the expression tree. I model literals with the Literal_Obj class, which defines a variable to hold the literal value. The literals value is initialized by the constructor, GetType returns OPERAND_TYPE, and GetValue returns the literal value stored in the object. To simplify the interface to all objects, GetValue always takes a reference to the symbol table even when it is not needed.
class Literal_Obj : public Exp_Obj { private: float value; public: Literal_Obj( int v ) { value=v;} virtual float GetValue( Symbol_Table &sym ) { return value; } virtual int GetType( void ) { return OPERAND_TYPE; } };The last component of the expression tree is the Variable_Obj, which holds not the variables data, but its name. The variables data resides in a symbol table. The symbol table maintains the name and value of each variable and provides access to each variables data. The Symbol_Table class in Listing 1 provides three operations: Get, Set, and Is_In. The Get operation retrieves a variables data by name, Set updates a variables data by name (or creates the variable if it is not in the table), and Is_In returns a pointer to an entry in the symbol table if the variable exists. In the Variable_Obj class, the GetValue operation simply looks up the variables value in the symbol table.
class Variable_Obj : public Exp_Obj { private: char name[25]; public: Variable_Obj( char *var_name ) { strcpy(name,var_name); } virtual float GetValue( Symbol_Table &sym ) { return sym.Get(name); } virtual int GetType( void ) { return OPERAND_TYPE; } };Putting Together the Tree
The expression evaluator needs to find the expression objects arranged according to their associated expression tree. The process of linking the expression objects together to form the expression tree is the most complicated part of the Object-Oriented Expression Evaluator. Listing 2 shows part of a new class, Base_Expression, derived from Exp_Obj to build an expression tree from an expression string.
In Base_Expression the Exp_Obj pointer exp points to the root of the expression tree. GetType returns EXPRESSION_TYPE, and GetExp returns the original expression string. Base_Expression provides two GetValue operations, both of which return the value of the root node in the expression tree. The first function is equivalent to the GetValue in any other expression object. The second function creates a temporary symbol table with nothing in it so that the user does not have to create a symbol table when evaluating expressions that do not require one.
Building the expression tree involves four operations: Get_Token, Build_Expression, Build, and Push_Tree. The Get_Token operation finds the next token in the expression string and its type, and reports whether or not any errors occurred getting the token. Get_Token is a pure virtual function, which must be defined in a child class. Thus, the syntax of the expression (how the operators, identifiers, etc., are combined) is abstracted out of the basic objects. This gives the user a very flexible engine for implementing any number of expression syntax rules.
The Build_Expression operation creates the appropriate expression objects according to the information it receives from Get_Token and uses the Build operation to add the objects to the expression tree. The Build_Expression and Build operations implement the algorithms shown in Figure 2.
The Build algorithm requires the use of two stacks: one for the expression being built and one to temporarily store operators. The ExpStack class shown in Listing 3 implements a stack data structure to store objects of the Exp_Obj type and provides three operations: Push, Pop, and Empty. Since all of the expression objects are derived directly or indirectly from Exp_Obj, polymorphism allows any expression object to be stored on the stack regardless of its specific type.
I provide four control objects to aid in the expression-tree building process: Open_Paren_Obj, Close_Paren_Obj, Start_Obj, and Stop_Obj (Listing 4). Open_Paren_Obj and Close_Paren_Obj supply a priority and type so they can be used in the Build operation to correctly override operator precedence when necessary. The Start and Stop objects identify the first and last tokens.
As objects are pushed onto the expression stack, the Push_Tree operation converts the stack into a tree as it is built. If Push_Tree is adding a unary operator to the tree, it will pop an object off the expression stack, set the operator objects pointer to the popped object, and push the operator object onto the expression stack. If a binary operator is added to the tree, Push_Tree will pop two objects off the expression stack, assign the operator objects left and right pointers to the two objects, and push the binary operator object onto expression stack. The only object left in the stack after the expression string is parsed will be the root of the expression tree. Figure 3 illustrates how an expression tree is created from an expression string using the algorithm described above.
One more class needs to be implemented before the expression objects can be used. Since the Get_Token operation is a pure virtual function in Base_Expression, a child class must be derived to implement it. I have created two classes; the C_Like_Expression class implements the Get_Token operation to recognize a C-like expression syntax while the Pascal_Like_Expression class implements a Pascal-like expression syntax. These classes are not shown here, but are available on the code disk and online sources. Listing 5 shows several examples utilizing the expression classes to evaluate both C-like and Pascal-like expressions.
Hierarchy at a Glance
Now that Ive presented all the components, I show the complete class hierarchy for the expression evaluator in Figure 4.
This is by no means a complete set of objects; however, it is a strong foundation and allows easy addition of new objects. It is fairly simple to add support for different data types (like int, float, double, and long) into the expression objects.
Command-Line Evaluator
The code disk also includes a program that uses the C_Like_Expression class to implement a command-line expression evaluator. This example is very simple, but it illustrates how creating an Object-Oriented Expression Evaluator makes evaluating expressions easy.
Conclusion
C++s inheritance and polymorphism features are especially effective in modeling the semantics of infix algebraic expressions. The Object-Oriented Expression Evaluator provides the programmer with a set of objects that exploit the features of an object-oriented programming language to create objects that are easy to understand, flexible, and expandable. Using the hierarchy presented here as a starting point, it should be possible to create code to handle more elaborate and interesting kinds of expressions. o
Joey Rogers is an Application Engineer for the Intergraph Corporation in Huntsville, Alabama. His expertise includes neural networks, language interpretation, and object-oriented programming. He can be reached at srogers@hq.pcmail.ingr.com.