19.1 Grammatical Difficulties

CHAPTER 19: LALR(1) Java Grammar

Previous

Java Language

Index

Next

19.1 Grammatical Difficulties

19.1.1 Problem #1: Names Too Specific , 19.1.2 Problem #2: Modifiers Too Specific , 19.1.3 Problem #3: Field Declaration versus Method Declaration , 19.1.4 Problem #4: Array Type versus Array Access , 19.1.5 Problem #5: Cast versus Parenthesized Expression

There are five problems with the grammar presented in preceding chapters.

19.1.1 Problem #1: Names Too Specific

Consider the two groups of productions:


PackageName:

	Identifier

	PackageName . Identifier

TypeName:

	Identifier

	PackageName . Identifier

and:


MethodName:

	Identifier

	AmbiguousName . Identifier

AmbiguousName:

	Identifier

	AmbiguousName . Identifier

Now consider the partial input:

class Problem1 { int m() { hayden.

When the parser is considering the token hayden , with one-token lookahead to symbol ". ", it cannot yet tell whether hayden should be a PackageName that qualifies a type name, as in:

hayden.Dinosaur rex = new Hayden.Dinosaur(2);

or an AmbiguousName that qualifies a method name, as in:

hayden.print("Dinosaur Rex!");

Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of names in the grammar.

The solution is to eliminate the nonterminals PackageName, TypeName, ExpressionName, MethodName, and AmbiguousName, replacing them all with a single nonterminal Name:


Name:

	SimpleName

	QualifiedName

SimpleName:

	Identifier

QualifiedName:

	Name . Identifier

A later stage of compiler analysis then sorts out the precise role of each name or name qualifier.

For related reasons, these productions in S4.3:


ClassOrInterfaceType:

	ClassType

	InterfaceType

ClassType:

	TypeName

InterfaceType:

	TypeName

were changed to:


ClassOrInterfaceType:

	Name

ClassType:

	ClassOrInterfaceType

InterfaceType:

	ClassOrInterfaceType

19.1.2 Problem #2: Modifiers Too Specific

Consider the two groups of productions:


FieldDeclaration:

	FieldModifiersopt Type VariableDeclarators ;

FieldModifiers:

	FieldModifier

	FieldModifiers FieldModifier

FieldModifier: one of

	public protected private

	final static transient volatile

and:


MethodHeader:

	MethodModifiersopt ResultType MethodDeclarator Throwsopt

MethodModifiers:

	MethodModifier

	MethodModifiers MethodModifier

MethodModifier: one of

	public protected private

	static

	abstract final native synchronized

Now consider the partial input:

class Problem2 { public static int

When the parser is considering the token static , with one-token lookahead to symbol int -or, worse yet, considering the token public with lookahead to static -it cannot yet tell whether this will be a field declaration such as:

public static int maddie = 0;

or a method declaration such as:

public static int maddie(String art) { return art.length(); }

Therefore, the parser cannot tell with only one-token lookahead whether static (or, similarly, public ) should be reduced to FieldModifier or MethodModifier. Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of modifiers in the grammar.

While not all contexts provoke the problem, the simplest solution is to combine all contexts in which such modifiers are used, eliminating all six of the nonterminals ClassModifiers (S8.1.2), FieldModifiers (S8.3.1), MethodModifiers (S8.4.3), ConstructorModifiers (S8.6.3), InterfaceModifiers (S9.1.2), and ConstantModifiers (S9.3) from the grammar, replacing them all with a single nonterminal Modifiers:


Modifiers:

	Modifier

	Modifiers Modifier

Modifier: one of

	public protected private

	static

	abstract final native synchronized transient volatile

A later stage of compiler analysis then sorts out the precise role of each modifier and whether it is permitted in a given context.

19.1.3 Problem #3: Field Declaration versus Method Declaration

Consider the two productions (shown after problem #2 has been corrected):


FieldDeclaration:

	Modifiersopt Type VariableDeclarators ;

and:


MethodHeader:

	Modifiersopt ResultType MethodDeclarator Throwsopt

where ResultType is defined as:


ResultType:

	Type

	void

Now consider the partial input:

class Problem3 { int julie

Note that, in this simple example, no Modifiers are present. When the parser is considering the token int , with one-token lookahead to symbol julie , it cannot yet tell whether this will be a field declaration such as:

int julie = 14;

or a method declaration such as:

int julie(String art) { return art.length(); }

Therefore, after the parser reduces int to the nonterminal Type, it cannot tell with only one-token lookahead whether Type should be further reduced to ResultType (for a method declaration) or left alone (for a field declaration). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to eliminate the ResultType production and to have separate alternatives for MethodHeader:


MethodHeader:

	Modifiersopt Type MethodDeclarator Throwsopt

	Modifiersopt void MethodDeclarator Throwsopt

This allows the parser to reduce int to Type and then leave it as is, delaying the decision as to whether a field declaration or method declaration is in progress.

19.1.4 Problem #4: Array Type versus Array Access

Consider the productions (shown after problem #1 has been corrected):


ArrayType:

	Type [ ]

and:


ArrayAccess:

	Name [ Expression ]

	PrimaryNoNewArray [ Expression ]

Now consider the partial input:

class Problem4 { Problem4() { peter[

When the parser is considering the token peter , with one-token lookahead to symbol [ , it cannot yet tell whether peter will be part of a type name, as in:

peter[] team;

or part of an array access, as in:

peter[3] = 12;

Therefore, after the parser reduces peter to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be reduced ultimately to Type (for an array type) or left alone (for an array access). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to have separate alternatives for ArrayType:


ArrayType:

	PrimitiveType [ ]

	Name [ ]

	ArrayType [ ]

This allows the parser to reduce peter to Name and then leave it as is, delaying the decision as to whether an array type or array access is in progress.

19.1.5 Problem #5: Cast versus Parenthesized Expression

Consider the production:


CastExpression:

	( PrimitiveType ) UnaryExpression

	( ReferenceType ) UnaryExpressionNotPlusMinus

Now consider the partial input:

class Problem5 { Problem5() { super((matthew)

When the parser is considering the token matthew , with one-token lookahead to symbol ) , it cannot yet tell whether (matthew) will be a parenthesized expression, as in:

super((matthew), 9);

or a cast, as in:

super((matthew)baz, 9);

Therefore, after the parser reduces matthew to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be further reduced to PostfixExpression and ultimately to Expression (for a parenthesized expression) or to ClassOrInterfaceType and then to ReferenceType (for a cast). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to eliminate the use of the nonterminal ReferenceType in the definition of CastExpression, which requires some reworking of both alternatives to avoid other ambiguities:


CastExpression:

	( PrimitiveType Dimsopt ) UnaryExpression

	( Expression ) UnaryExpressionNotPlusMinus

	( Name Dims ) UnaryExpressionNotPlusMinus

This allows the parser to reduce matthew to Expression and then leave it there, delaying the decision as to whether a parenthesized expression or a cast is in progress. Inappropriate variants such as:

(int[])+3

and:

(matthew+1)baz

must then be weeded out and rejected by a later stage of compiler analysis.

The remaining sections of this chapter constitute a LALR(1) grammar for Java syntax, in which the five problems described above have been solved.