CHAPTER 3: Lexical Structure Previous
Previous
Java Language
Java Language
Index
Index
Next
Next

3.5 Input Elements and Tokens

The input characters and line terminators that result from escape processing (S3.3) and then input line recognition (S3.4) are reduced to a sequence of input elements. Those input elements that are not white space (S3.6) or comments (S3.7) are tokens. The tokens are the terminal symbols of the Java syntactic grammar (S2.3).

This process is specified by the following productions:


Input:

	InputElementsopt Subopt

InputElements:

	InputElement

	InputElements InputElement

InputElement:

	WhiteSpace

	Comment

	Token

Token:

	Identifier

	Keyword

	Literal

	Separator

	Operator

Sub:

	the ASCII SUB character, also known as "control-Z"

White space (S3.6) and comments (S3.7) can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the ASCII characters - and = in the input can form the operator token -= (S3.12) only if there is no intervening white space or comment.

As a special concession for compatibility with certain operating systems, the ASCII SUB character (\u001a , or control-Z) is ignored if it is the last character in the escaped input stream.

Consider two tokens x and y in the resulting input stream. If x precedes y, then we say that x is to the left of y and that y is to the right of x. For example, in this simple piece of Java code:


class Empty {
}

we say that the } token is to the right of the { token, even though it appears, in this two-dimensional representation on paper, downward and to the left of the { token. This convention about the use of the words left and right allows us to speak, for example, of the right-hand operand of a binary operator or of the left-hand side of an assignment.

Top© 1996 Sun Microsystems, Inc. All rights reserved.