
- •New to the Tenth Edition
- •Preface
- •Acknowledgments
- •About the Author
- •Contents
- •1.1 Reasons for Studying Concepts of Programming Languages
- •1.2 Programming Domains
- •1.3 Language Evaluation Criteria
- •1.4 Influences on Language Design
- •1.5 Language Categories
- •1.6 Language Design Trade-Offs
- •1.7 Implementation Methods
- •1.8 Programming Environments
- •Summary
- •Problem Set
- •2.1 Zuse’s Plankalkül
- •2.2 Pseudocodes
- •2.3 The IBM 704 and Fortran
- •2.4 Functional Programming: LISP
- •2.5 The First Step Toward Sophistication: ALGOL 60
- •2.6 Computerizing Business Records: COBOL
- •2.7 The Beginnings of Timesharing: BASIC
- •2.8 Everything for Everybody: PL/I
- •2.9 Two Early Dynamic Languages: APL and SNOBOL
- •2.10 The Beginnings of Data Abstraction: SIMULA 67
- •2.11 Orthogonal Design: ALGOL 68
- •2.12 Some Early Descendants of the ALGOLs
- •2.13 Programming Based on Logic: Prolog
- •2.14 History’s Largest Design Effort: Ada
- •2.15 Object-Oriented Programming: Smalltalk
- •2.16 Combining Imperative and Object-Oriented Features: C++
- •2.17 An Imperative-Based Object-Oriented Language: Java
- •2.18 Scripting Languages
- •2.19 The Flagship .NET Language: C#
- •2.20 Markup/Programming Hybrid Languages
- •Review Questions
- •Problem Set
- •Programming Exercises
- •3.1 Introduction
- •3.2 The General Problem of Describing Syntax
- •3.3 Formal Methods of Describing Syntax
- •3.4 Attribute Grammars
- •3.5 Describing the Meanings of Programs: Dynamic Semantics
- •Bibliographic Notes
- •Problem Set
- •4.1 Introduction
- •4.2 Lexical Analysis
- •4.3 The Parsing Problem
- •4.4 Recursive-Descent Parsing
- •4.5 Bottom-Up Parsing
- •Summary
- •Review Questions
- •Programming Exercises
- •5.1 Introduction
- •5.2 Names
- •5.3 Variables
- •5.4 The Concept of Binding
- •5.5 Scope
- •5.6 Scope and Lifetime
- •5.7 Referencing Environments
- •5.8 Named Constants
- •Review Questions
- •6.1 Introduction
- •6.2 Primitive Data Types
- •6.3 Character String Types
- •6.4 User-Defined Ordinal Types
- •6.5 Array Types
- •6.6 Associative Arrays
- •6.7 Record Types
- •6.8 Tuple Types
- •6.9 List Types
- •6.10 Union Types
- •6.11 Pointer and Reference Types
- •6.12 Type Checking
- •6.13 Strong Typing
- •6.14 Type Equivalence
- •6.15 Theory and Data Types
- •Bibliographic Notes
- •Programming Exercises
- •7.1 Introduction
- •7.2 Arithmetic Expressions
- •7.3 Overloaded Operators
- •7.4 Type Conversions
- •7.5 Relational and Boolean Expressions
- •7.6 Short-Circuit Evaluation
- •7.7 Assignment Statements
- •7.8 Mixed-Mode Assignment
- •Summary
- •Problem Set
- •Programming Exercises
- •8.1 Introduction
- •8.2 Selection Statements
- •8.3 Iterative Statements
- •8.4 Unconditional Branching
- •8.5 Guarded Commands
- •8.6 Conclusions
- •Programming Exercises
- •9.1 Introduction
- •9.2 Fundamentals of Subprograms
- •9.3 Design Issues for Subprograms
- •9.4 Local Referencing Environments
- •9.5 Parameter-Passing Methods
- •9.6 Parameters That Are Subprograms
- •9.7 Calling Subprograms Indirectly
- •9.8 Overloaded Subprograms
- •9.9 Generic Subprograms
- •9.10 Design Issues for Functions
- •9.11 User-Defined Overloaded Operators
- •9.12 Closures
- •9.13 Coroutines
- •Summary
- •Programming Exercises
- •10.1 The General Semantics of Calls and Returns
- •10.2 Implementing “Simple” Subprograms
- •10.3 Implementing Subprograms with Stack-Dynamic Local Variables
- •10.4 Nested Subprograms
- •10.5 Blocks
- •10.6 Implementing Dynamic Scoping
- •Problem Set
- •Programming Exercises
- •11.1 The Concept of Abstraction
- •11.2 Introduction to Data Abstraction
- •11.3 Design Issues for Abstract Data Types
- •11.4 Language Examples
- •11.5 Parameterized Abstract Data Types
- •11.6 Encapsulation Constructs
- •11.7 Naming Encapsulations
- •Summary
- •Review Questions
- •Programming Exercises
- •12.1 Introduction
- •12.2 Object-Oriented Programming
- •12.3 Design Issues for Object-Oriented Languages
- •12.4 Support for Object-Oriented Programming in Smalltalk
- •12.5 Support for Object-Oriented Programming in C++
- •12.6 Support for Object-Oriented Programming in Objective-C
- •12.7 Support for Object-Oriented Programming in Java
- •12.8 Support for Object-Oriented Programming in C#
- •12.9 Support for Object-Oriented Programming in Ada 95
- •12.10 Support for Object-Oriented Programming in Ruby
- •12.11 Implementation of Object-Oriented Constructs
- •Summary
- •Programming Exercises
- •13.1 Introduction
- •13.2 Introduction to Subprogram-Level Concurrency
- •13.3 Semaphores
- •13.4 Monitors
- •13.5 Message Passing
- •13.6 Ada Support for Concurrency
- •13.7 Java Threads
- •13.8 C# Threads
- •13.9 Concurrency in Functional Languages
- •13.10 Statement-Level Concurrency
- •Summary
- •Review Questions
- •Problem Set
- •14.1 Introduction to Exception Handling
- •14.2 Exception Handling in Ada
- •14.3 Exception Handling in C++
- •14.4 Exception Handling in Java
- •14.5 Introduction to Event Handling
- •14.6 Event Handling with Java
- •14.7 Event Handling in C#
- •Review Questions
- •Problem Set
- •15.1 Introduction
- •15.2 Mathematical Functions
- •15.3 Fundamentals of Functional Programming Languages
- •15.4 The First Functional Programming Language: LISP
- •15.5 An Introduction to Scheme
- •15.6 Common LISP
- •15.8 Haskell
- •15.10 Support for Functional Programming in Primarily Imperative Languages
- •15.11 A Comparison of Functional and Imperative Languages
- •Review Questions
- •Problem Set
- •16.1 Introduction
- •16.2 A Brief Introduction to Predicate Calculus
- •16.3 Predicate Calculus and Proving Theorems
- •16.4 An Overview of Logic Programming
- •16.5 The Origins of Prolog
- •16.6 The Basic Elements of Prolog
- •16.7 Deficiencies of Prolog
- •16.8 Applications of Logic Programming
- •Review Questions
- •Programming Exercises
- •Bibliography
- •Index

Summary 197
represents state 0 of the parser. The input contains the input string with an end marker, in this case a dollar sign, attached to its right end. At each step, the parser actions are dictated by the top (rightmost in Figure 4.4) symbol of the parse stack and the next (leftmost in Figure 4.4) token of input. The correct action is chosen from the corresponding cell of the ACTION part of the parse table. The GOTO part of the parse table is used after a reduction action. Recall that GOTO is used to determine which state symbol is placed on the parse stack after a reduction.
Following is a trace of a parse of the string id + id * id, using the LR parsing algorithm and the parsing table shown in Figure 4.5.
Stack |
Input |
0 |
id + id * id $ |
0id5 |
+ id * id $ |
0F3 |
+ id * id $ |
0T2 |
+ id * id $ |
0E1 |
+ id * id $ |
0E1+6 |
id * id $ |
0E1+6id5 |
* id $ |
0E1+6F3 |
* id $ |
0E1+6T9 |
* id $ |
0E1+6T9*7 |
id $ |
0E1+6T9*7id5 |
$ |
0E1+6T9*7F10 |
$ |
0E1+6T9 |
$ |
0E1 |
$ |
Action
Shift 5
Reduce 6 (use GOTO[0, F])
Reduce 4 (use GOTO[0, T])
Reduce 2 (use GOTO[0, E])
Shift 6
Shift 5
Reduce 6 (use GOTO[6, F])
Reduce 4 (use GOTO[6, T])
Shift 7
Shift 5
Reduce 6 (use GOTO[7, F])
Reduce 3 (use GOTO[6, T])
Reduce 1 (use GOTO[0, E])
Accept
The algorithms to generate LR parsing tables from given grammars, which are described in Aho et al. (2006), are not overly complex but are beyond the scope of a book on programming languages. As stated previously, there are a number of different software systems available to generate LR parsing tables.
S U M M A R Y
Syntax analysis is a common part of language implementation, regardless of the implementation approach used. Syntax analysis is normally based on a formal syntax description of the language being implemented. A context-free grammar, which is also called BNF, is the most common approach for describing syntax. The task of syntax analysis is usually divided into two parts: lexical analysis and syntax analysis. There are several reasons for separating lexical analysis—namely, simplicity, efficiency, and portability.

198 |
Chapter 4 Lexical and Syntax Analysis |
A lexical analyzer is a pattern matcher that isolates the small-scale parts of a program, which are called lexemes. Lexemes occur in categories, such as integer literals and names. These categories are called tokens. Each token is assigned a numeric code, which along with the lexeme is what the lexical analyzer produces. There are three distinct approaches to constructing a lexical analyzer: using a software tool to generate a table for a table-driven analyzer, building such a table by hand, and writing code to implement a state diagram description of the tokens of the language being implemented. The state diagram for tokens can be reasonably small if character classes are used for transitions, rather than having transitions for every possible character from every state node. Also, the state diagram can be simplified by using a table lookup to recognize reserved words.
Syntax analyzers have two goals: to detect syntax errors in a given program and to produce a parse tree, or possibly only the information required to build such a tree, for a given program. Syntax analyzers are either top-down, meaning they construct leftmost derivations and a parse tree in top-down order, or bottom-up, in which case they construct the reverse of a rightmost derivation and a parse tree in bottom-up order. Parsers that work for all unambiguous grammars have complexity O(n3). However, parsers used for implementing syntax analyzers for programming languages work on subclasses of unambiguous grammars and have complexity O(n).
A recursive-descent parser is an LL parser that is implemented by writing code directly from the grammar of the source language. EBNF is ideal as the basis for recursive-descent parsers. A recursive-descent parser has a subprogram for each nonterminal in the grammar. The code for a given grammar rule is simple if the rule has a single RHS. The RHS is examined left to right. For each nonterminal, the code calls the associated subprogram for that nonterminal, which parses whatever the nonterminal generates. For each terminal, the code compares the terminal with the next token of input. If they match, the code simply calls the lexical analyzer to get the next token. If they do not, the subprogram reports a syntax error. If a rule has more than one RHS, the subprogram must first determine which RHS it should parse. It must be possible to make this determination on the basis of the next token of input.
Two distinct grammar characteristics prevent the construction of a recursive-descent parser based on the grammar. One of these is left recursion. The process of eliminating direct left recursion from a grammar is relatively simple. Although we do not cover it, an algorithm exists to remove both direct and indirect left recursion from a grammar. The other problem is detected with the pairwise disjointness test, which tests whether a parsing subprogram can determine which RHS is being parsed on the basis of the next token of input. Some grammars that fail the pairwise disjointness test often can be modified to pass it, using left factoring.
The parsing problem for bottom-up parsers is to find the substring of the current sentential form that must be reduced to its associated LHS to get the next (previous) sentential form in the rightmost derivation. This substring is called the handle of the sentential form. A parse tree can provide an intuitive

Review Questions |
199 |
basis for recognizing a handle. A bottom-up parser is a shift-reduce algorithm, because in most cases it either shifts the next lexeme of input onto the parse stack or reduces the handle that is on top of the stack.
The LR family of shift-reduce parsers is the most commonly used bottomup parsing approach for programming languages, because parsers in this family have several advantages over alternatives. An LR parser uses a parse stack, which contains grammar symbols and state symbols to maintain the state of the parser. The top symbol on the parse stack is always a state symbol that represents all of the information in the parse stack that is relevant to the parsing process. LR parsers use two parsing tables: ACTION and GOTO. The ACTION part specifies what the parser should do, given the state symbol on top of the parse stack and the next token of input. The GOTO table is used to determine which state symbol should be placed on the parse stack after a reduction has been done.
R E V I E W Q U E S T I O N S
1.What are three reasons why syntax analyzers are based on grammars?
2.Explain the three reasons why lexical analysis is separated from syntax analysis.
3.Define lexeme and token.
4.What are the primary tasks of a lexical analyzer?
5.Describe briefly the three approaches to building a lexical analyzer.
6.What is a state transition diagram?
7.Why are character classes used, rather than individual characters, for the letter and digit transitions of a state diagram for a lexical analyzer?
8.What are the two distinct goals of syntax analysis?
9.Describe the differences between top-down and bottom-up parsers.
10.Describe the parsing problem for a top-down parser.
11.Describe the parsing problem for a bottom-up parser.
12.Explain why compilers use parsing algorithms that work on only a subset of all grammars.
13.Why are named constants used, rather than numbers, for token codes?
14.Describe how a recursive-descent parsing subprogram is written for a rule with a single RHS.
15.Explain the two grammar characteristics that prohibit them from being used as the basis for a top-down parser.
16.What is the FIRST set for a given grammar and sentential form?
17.Describe the pairwise disjointness test.
18.What is left factoring?

200Chapter 4 Lexical and Syntax Analysis
19.What is a phrase of a sentential form?
20.What is a simple phrase of a sentential form?
21.What is the handle of a sentential form?
22.What is the mathematical machine on which both top-down and bottom-up parsers are based?
23.Describe three advantages of LR parsers.
24.What was Knuth’s insight in developing the LR parsing technique?
25.Describe the purpose of the ACTION table of an LR parser.
26.Describe the purpose of the GOTO table of an LR parser.
27.Is left recursion a problem for LR parsers?
P R O B L E M S E T
1.Perform the pairwise disjointness test for the following grammar rules.
a.A → aB b cBB
b.B → aB bA aBb
c.C → aaA b caB
2.Perform the pairwise disjointness test for the following grammar rules.
a.S → aSb bAA
b.A → b{aB} a
c.B → aB a
3.Show a trace of the recursive descent parser given in Section 4.4.1 for
the string a + b * c.
4. Show a trace of the recursive descent parser given in Section 4.4.1 for the string a * (b + c).
5.Given the following grammar and the right sentential form, draw a parse tree and show the phrases and simple phrases, as well as the handle.
S → aAb bBA A → ab aAB B → aB b
a.aaAbb
b.bBab
c.aaAbBb
6.Given the following grammar and the right sentential form, draw a parse tree and show the phrases and simple phrases, as well as the handle.
S → AbB bAc A → Ab aBB B → Ac cBb c
a.aAcccbbc
b.AbcaBccb
c.baBcBbbc