CHAPTER 3: Lexical Structure Previous
Previous
Java Language
Java Language
Index
Index
Next
Next

3.1 Unicode

Java programs are written using the Unicode character set, version 2.0. Information about this encoding may be found at:

http://www.unicode.org and ftp://unicode.org

Versions of Java prior to 1.1 used Unicode version 1.1.5 (see The Unicode Standard: Worldwide Character Encoding (S1.2) and updates). See S20.5 for a discussion of the differences between Unicode version 1.1.5 and Unicode version 2.0.

Except for comments (S3.7), identifiers, and the contents of character and string literals (S3.10.4, S3.10.5), all input elements (S3.5) in a Java program are formed only from ASCII characters (or Unicode escapes (S3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.

© 1996 Sun Microsystems, Inc. All rights reserved.