Back to TOCC/C++ Contributing Editors


Questions & Answers: The Danger of Undeclared Functions

Pete Becker

Lots of things can go wrong on a function call, particularly if you omit prototypes or get ambitious with function pointers.


To ask Pete a question about C or C++, send e-mail to petebecker@acm.org, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.

Q

I've got a simple question about C. What is the output from the following code?

printf("%f\n", sqrt(36.0));

Is it 6, 6.0, or 6.000000, an absurd result? Please clarify this! — Srikanth

A

If this were an Internet newsgroup, the answer would be simple: none of the above. The code will not compile — you can't call printf or any other function from global scope. In typical newsgroup fashion, such an answer is literally correct but not at all helpful. It does hint at a relevant point, though: context is important. If you want to understand the run-time behavior of a piece of code, you must write it in a way that compiles and runs. Then you can get meaningful answers to questions about why it acts the way it does.

I'll begin by adding the minimal context required to make this code compile:

int main()
{
printf("%f\n", sqrt(36.0));
}

When I compile this code with Borland's version 5.2 and run the resulting executable file, I get the following output:

0.000000

I think that qualifies as absurd. There are a couple of problems with this code, both related to the lack of function declarations. When the C compiler sees a call to a function it knows nothing about, it guesses that the function will return an int. Since I know that sqrt actually returns a double, I'm relatively certain that I'll get strange results if the compiler expects an int. To fix this, I tell the compiler that sqrt actually returns a double:

double sqrt();

int main()
{
printf("%f\n", sqrt(36.0));
}

I've added a declaration for the function sqrt. The empty parentheses tell the compiler that I'm not saying anything about what arguments it takes. Now the output is

6.000000

which is much better than what I had before. I still don't have a robust program, though. Suppose I had called sqrt with the value 36 instead of 36.0:

double sqrt();

int main()
{
printf("%f\n", sqrt(36));
}

Now I'm back to

0.000000

because the compiler doesn't know that sqrt takes an argument of type double. When the compiler sees a call to a function and it doesn't know what argument types the function takes, it assumes that you've called it with parameters of exactly the type that the function takes. When I called sqrt with the argument 36.0, the compiler inferred that sqrt takes a single argument of type double because I called it with an argument of type double. Since I happened to call it with the correct type, the code worked correctly. When I called it with 36 instead, the compiler inferred that sqrt takes a single argument of type int, and called sqrt accordingly. Since sqrt is written to take a double, neither the value it received nor the result made sense. To fix this call, I must tell the compiler that sqrt takes an argument of type double:

double sqrt(double);

int main()
{
printf("%f\n", sqrt(36));
}

Now the declaration of sqrt also contains information on sqrt's argument types. Adding the argument information turns the declaration into a prototype [1], and I now get

6.000000

because the compiler knows that sqrt takes a double, and converts 36 to double in order to call sqrt.

Rather than remember to type in the exact prototypes for all of the library functions your program uses, you should use an #include directive to pull in the standard header that contains the prototypes for those functions. The prototype for sqrt is contained in <math.h>, so I can change my code to look like this:

#include <math.h>

int main()
{
printf("%f\n", sqrt(36));
}

Including this header doesn't change the result of running the program, but it's less error-prone because I don't have to remember to type in the actual signature for sqrt. If I used more math functions, I wouldn't need explicit protytpes for them either, because their prototypes are also included in math.h.

I haven't said anything about the call to printf yet, because it's a bit peculiar. Since there is no declaration for printf visible at the point where it is called, the compiler will infer that it takes two arguments, one of type char* and one of type double. That leads to a problem if I subsequently call it with different arguments:

#include <math.h>

int main()
{
printf("%f\n", sqrt(36));
printf("%d %d\n", 1, 2 );
}

The second call to printf presents a rather different set of arguments, and the compiler is entitled to complain that the call doesn't make sense. After all, it originally inferred that printf takes two arguments of types char* and double, and now I'm calling it with three arguments of types char*, int, and int. Although I know this is a valid way to call printf, the compiler does not. In fact, I've been on rather thin ice all along, because I haven't told the compiler the truth about printf. As you may remember, its first argument is of type const char*, and it takes an arbitrary number of additional arguments of arbitrary types. Its argument list is variable, and I need to tell the compiler this.

The rule is actually rather draconian: I am not allowed to call a function that takes a variable argument list unless the compiler has already seen a prototype for that function. That's different from the rule for functions like sqrt, which I can call without a prototype as long as I am sure that the compiler will infer the correct types. In the case of a function with a variable argument list, the compiler cannot actually infer what the correct types are because the set of correct types is unbounded. So the rule is that you aren't allowed to do it. Note, however, that in my next-to-last program, with a single call to printf, the compiler can't tell from the call that printf takes a variable argument list. It will simply infer that printf takes a char* and an int. Although this will work with many compilers, the C language definition does not actually require it to work.

I'm in the realm of undefined behavior here: the language definition says that this call is not valid, but it does not require the compiler to tell me that I made a mistake, and it does not require the compiler to do anything sensible with this code. The solution is to add a prototype for printf:

#include <math.h>
int printf( const char *, ... );

int main()
{
printf("%f\n", sqrt(36));
}

Now the program is valid and will work as expected. Once again, however, I should use the standard headers instead of writing out the prototype explicitly:

#include <math.h>
#include <stdio.h>

int main()
{
printf("%f\n", sqrt(36));
}

I hope this discussion has made you nervous about relying on the compiler's inferences about function signatures in the absence of prototypes. It's very easy to go wrong here, so most programmers adopt a rule of never calling a function without a preceding prototype. Many compilers help you out here by warning you when you call a function that does not have a prototype. Take those warnings seriously, and add prototypes as needed.

The dangers in the C rule are leading to language changes. C++ does not permit calling a function unless the compiler has seen a prototype before the call. The C9X working paper, which will probably result in a revised C standard sometime in 1999, also adopts this rule. So if you're in the habit of relying on implicit function declarations, break it — both to make your code more robust and to avoid having to rewrite a lot of code when the new C standard arrives.

Q

I found an interesting problem with procedures that take variable numbers of arguments in C++. If the last fixed parameter before the start of the variable parameters is passed by reference, the va_start() macro does not work correctly. The va_start() macro uses the address operator (&) to get the address of the last fixed parameter on the stack. However, if that parameter is passed by reference, the address operator returns the address of the value that is being referred to, not the address of the reference on the stack. This results in the variable argument macros trying to find the arguments relative to the address of the value that was passed by reference, rather than those relative to the reference on the stack. Is this just an accepted limitation of the va_start macros, or should the authors of the C++ runtime libraries make this work? Can they make it work? — Todd Niec

A

This is an accepted limitation of variable argument functions: the last named argument cannot be a reference. The technique you describe for implementing the va_start macro is fairly common, and it can't handle references. Another technique that some compilers use is to add a dummy argument after the last named argument, and to access the arguments in the variable part of the argument list using their positions relative to this dummy argument. That would make it possible to use a reference as the last named argument, but at the cost of adding the dummy argument. The previous technique does not require the compiler to do anything special when it encounters a function that takes a variable argument list, so it's easier to implement. It's also the technique often used in C libraries, where there is no pass-by-reference, so this problem does not arise. Rather than require C++ compiler vendors to change their mechanism to accommodate references, the C++ language definition says that compiler vendors don't have to make this work.

Q

I am trying to create a function that returns a pointer to another function. Is there a way to do this? If so, what is the syntax for the declaration? If there's no direct way of doing it, is there a workaround? Here is a simplified example:

Ret_Func_Pointer(int (*f)(void))
{
return f;
}

I need the output of Ret_Func_Pointer to be of the same type as the argument passed in. (In this case, the output should be a pointer to a function that returns an integer and that has no arguments.) — Daniela Toneva

A

Yes, it can be done. The simplest way to do it is to create a typedef for the function pointer, and write the function in terms of that typedef:

typedef int (*fptr)(void);
fptr Ret_Func_Pointer(fptr f)
{
return f;
}

This snippet defines fptr as the name of a type for a pointer to a function that takes no arguments and returns int. It then defines Ret_Func_Pointer as a function that takes an argument of type fptr and returns an fptr. You could also write the function declaration directly, without the typedef:

int (*Ret_Func_Pointer(
    int (*f)(void)))(void)
{
return f;
}

I hope the latter version leaves you thoroughly confused — that ought to be enough to keep you from using it in any actual code you write. In general, when you are defining complicated types, you ought to do the same thing that you'd do when you are defining complicated computations: break them down into simpler parts, make the parts work correctly, and build up the more complicated items from the simpler ones. With complicated computations, break the computation down into smaller functions; with complicated declarations, break the computation down into smaller declarations. For example, if you needed to create an array of ten pointers to functions of the type previously discussed, your first inclination might be to try to define the type on-the-fly:

int (*(data[10])(int (*)(void)))(void);

I believe this declaration is correct, but I won't swear to it. If I came across it in code I was maintaining, I'd have to work fairly hard to assure myself that I understood what it meant. It's much clearer if I break it down into pieces:

typedef int (*fptr)(void);
typedef fptr (*func)(fptr);
func data[10];

See how much easier it is to understand? Don't be afraid to use typedefs. They can make your code much clearer.

Q

I have a question about this code:

#include <stdio.h>
class FOO
{
public:
        FOO();
        int  GetX() const;
private:
        int  X;
        int& GetX();
};

int main()
{
FOO foo;
printf("X is %d\n", foo.GetX());
return 0;
}

I have two functions:

int FOO:GetX() const
int& FOO::GetX()

I thought that the former would have been called in main, but it does not compile. This seems mighty strange to me. Maybe you can enlighten me. — Adrian Pybus

A

The problem is that the compiler does not consider access restrictions when it decides which overloaded function to call. In this case, the compiler chooses the non-const version of getX, then tells you that it cannot call it because it is private. That's what the language definition requires, and there actually is a good reason for it.

Before I discuss why, though, look a bit more carefully at what's happening here. There are two functions named GetX, neither of which takes any explicit arguments. One is const, and the other is not. Figuring out which to call is easy: the const version is called when the function is invoked on a const object, and the non-const version is called for a non-const object. In your code example, foo is not const, so the non-const version of GetX is the one chosen by the overloading rules.

It would be possible to write a consistent set of overloading rules that took into account only the versions of an overloaded function that could, in fact, be called. Such a set of rules would exclude the non-const version of GetX because it is private and therefore cannot be called from main. That would leave only the const version of GetX to consider, and since it is valid to call a const member function on a non-const object, the compiler would generate code to call the const version of GetX.

The danger in a rule like this is that changing the access specifier for a function can change which version of a function is called. If the non-const version of GetX were changed from private to public, it — rather than the const version — would become the correct function to call. That, in turn, would require a maintenance programmer who changed an access specifier to review every use of every function affected by that access specifier to see if the change in access affected the application of the overloading rules. Every change in overloading would require re-testing, so the change in one access specifier would ripple throughout the code that used this class. The C++ language definition avoids this ripple effect by making overload resolution independent of access. That makes programs much more maintainable. Even though it surprises people occasionally, there are far fewer surprises than there would be with a different rule.

Q

I have a problem with an array of function pointers, and I must admit it is beginning to drive me crazy. Could you please help me solve this problem? Please forgive my English. I speak Norwegian 99 percent of the year, so parts of this may sound weird to you [2]. My problem is as follows: I have a class called CCXParser. This class contains an array of function pointers and is, for the time being, called m_ptrFuncArray. This array contains pointers to certain member functions of class CCXParser. In C++ this translates into:

class CCXParser {
public :
    CCXParser();
private :
 // array to hold function pointers.
 // ( g_nNumResWordANSI = 62 )
    void ( *m_ptrFuncArray[ g_nNumResWordANSI ] ) ();
    // member function with pointers stored in m_ptrFuncArray.
    void parseASM();
    void parseAUTO();
    ...
    ...
    void parseVOLATILE();
    void parseWHILE();
    void parseXALLOC();
};

In the constructor for this class, the initialization for m_ptrFuncArray is implemented as follows:

CCXParser::CCXParser()
{
m_ptrFuncArray[  0 ] = parseASM;
m_ptrFuncArray[  1 ] = parse_ASM;
m_ptrFuncArray[  2 ] = parseAUTO;
....
....
m_ptrFuncArray[  59 ] = parseVOLATILE;
m_ptrFuncArray[  60 ] = parseWHILE;
m_ptrFuncArray[  61 ] = parseXALLOC;
}

In another function CheckSyntax, I make an effort to actually call the member functions:

void CCXParser::CheckSyntax()
{
for ( int i = 0;
      i < m_aTokens.GetSize(); i++ ) {
    CCXToken l_ctToken;
    enumTokenType l_eTokenType;
    l_ctToken = m_aTokens.GetAt( i );
    l_eTokenType =
        l_ctToken.GetTokenType();
    ( *m_ptrFuncArray[ 
            l_eTokenType - ttASM ] ) ();
}

What I intend to do in this function is to call parseFOO based on the type of token present in l_ctToken. The variable l_eTokenType holds an enumerated value based on the token type. By taking the result of (l_eTokenType - ttASM) and using this as an index into m_ptrFuncArray, I hope to call the correct function. If l_eTokenType equals ttASM, the function parseASM should be called (array element 0). Perhaps at this point you are laughing your head off; I, though, am not. I feel perhaps I am grasping for something way beyond my reach when trying to complete the application I am designing and creating. But I refuse to surrender to better judgement, hence this mail.

Now for the frustrating facts. I'm using Visual C++ v4.2 (that's frustrating), on Windows 95 (also frustrating) :-). The error (or 62 errors) I get when compiling is this:

error C2446: '=' : no conversion from
    'void (CCXParser::*)(void)'
to 'void (__cdecl *)(void)'

A

The error message contains a clue to the problem. Notice that it says that it can't convert a type that includes the class name CCXParser to a type that does not include that name. In fact, the problem you're running into is a common one: a member function is not like an ordinary function, and you cannot use the address of a member function as an ordinary pointer to function. There are three ways I can think of to fix this problem. First, you can use an array of member function pointers instead of an array of function pointers. Second, you can write global functions that call these member functions. Third, you can use objects instead of functions. Let me briefly describe each of these techniques and their respective advantages and disadvantages.

The first approach is to change to an array of member function pointers. The array m_ptrFuncArray should be declared as an array of pointers to member functions of CCXParser:

void (CCXParser::*m_ptrFuncArray[g_nNumResWordANSI])();

This declares m_ptrFuncArray to be an array of g_nNumResWordANSI elements, with each element containing a pointer to a member function of CCXParser that takes no arguments and returns void. You then must initialize these array elements to point to the appropriate member functions:

CCXParser::CCXParser()
{
m_ptrFuncArray[  0 ] = &CCXParser::parseASM;
m_ptrFuncArray[  1 ] = &CCXParser::parse_ASM;
m_ptrFuncArray[  2 ] = &CCXParser::parseAUTO;
....
....
m_ptrFuncArray[  59 ] = &CCXParser::parseVOLATILE;
m_ptrFuncArray[  60 ] = &CCXParser::parseWHILE;
m_ptrFuncArray[  61 ] = &CCXParser::parseXALLOC;
}

The & and the class name are required. You can actually get away with leaving them out in Visual C++, which is probably why the error message and its accompanying description did not help much. However, the language definition requires them, so you should get in the habit of using them; your code will be much more portable.

Finally, to call these functions, you must use the syntax for a member function pointer:

void CCXParser::CheckSyntax()
{
for ( int i = 0; i < m_aTokens.GetSize(); i++ ) {
        CCXToken l_ctToken;
        enumTokenType l_eTokenType;
        l_ctToken = m_aTokens.GetAt( i );
        l_eTokenType = l_ctToken.GetTokenType();
        (this->*m_ptrFuncArray[l_eTokenType-ttASM])();
        
}

Notice the use of the ->* operator in calling the member function on the object pointed to by this. If you had a reference to an object instead of a pointer, you'd use the .* operator.

The major drawback in using pointers to member functions is that the compiler-generated code for such a call can be quite lengthy. That's mostly because the function it calls can be virtual, and the generated code must call the proper virtual function, based on the dynamic type of the object it's being called on. That is, the compiler must call a different function depending on whether the object is of the base type or the derived type. That code can be nasty. You're not using virtual functions here, but the compiler doesn't know that when it sees the call, so it must generate code that can deal with them.

The second approach is to write global functions to handle the dispatching:

void parseASM( CCXParser *p )
{
p->parseASM();
}

This time your array declaration needs only a slight change to reflect the fact that these global functions take an argument:

void (*m_ptrFuncArray[g_nNumResWordANSI ])(CCXParser*);

Your initialization code for this array should work fine. You may, however, need to add a :: qualifier preceding the names of the global functions if Microsoft's compiler insists that parseASM refers to the member function rather than the global one. Your calls to these functions are fine: they are, after all, just ordinary functions.

The major advantage of this approach is that the resulting code is almost certainly faster than calling through pointers to member functions. The major disadvantage, of course, is that you must write all those extra functions.

The third approach, writing small parser objects instead of one big one, is my favorite. For example:

class Parser
{
public:
    virtual void parse() = 0;
};
class ParseASM : public Parser
{
public:
    void parse();
};
....// definitions for the rest of the parser classes
ParseASM parseASM;
....// definitions for the rest of the parser objects

Now your array simply needs to hold pointers to objects of type Parser:

Parser *m_ptrFuncArray[g_nNumResWordANSI];

Initalizing it is easy:

m_ptrFuncArray[0] = &parseASM;
....

Calling the appropriate functions is also easy:

m_ptrFuncArray[l_eTokenType-ttASM]->parse();

This approach makes the class CCXParser much smaller: it just has to know how to dispatch parser calls, without worrying about the details of parsing individual keywords. Pushing the details out into individual classes can make program maintenance simpler.

Choosing among these three design approaches involves weighing the benefits and drawbacks of each. Of course, the ultimate decision is affected by the details of your application. Still, I suspect that the third approach, with lots of little parser classes, may well be the best one in most cases. That is, provided you have a good set of tools for keeping track of lots of little classes. o

Notes

[1] Note that a function definition also counts as a prototype:

#include <stdio.h>

void f(int i)
{
printf( "%d\n", i );
}

int main()
{
f(1);        /* legal in C9X: definition of f
                supplies complete argument information */
}

[2] You've presented a clear description of a reasonable approach to this problem. Despite your protests, your English is quite good, and is better than the English I sometimes hear from native speakers.

Pete Becker is Technical Project Leader for Dinkumware, Ltd. He spent eight years in the C++ group at Borland International, both as a developer and manager. He is a member of the ANSI/ISO C++ standardization commmittee. He can be reached by email at petebecker@acm.org.