Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Cooper K.Engineering a compiler

.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 89 / 369 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

3.3. TOP-DOWN PARSING

arrange the non-terminals in some order

A1, A2, . . . , An

for i ← 1 to n

for j ← 1 to i-1

replace each production of the form Ai→Aj γ with the productions

Ai→δ1γ | δ2γ | . . . | δk γ, where Aj →δ1 | δ2 | . . . | δk are all the current Aj productions.

eliminate any immediate left recursion on Ai using the direct transformation

Figure 3.2: Removal of left recursion

The transformation shown above eliminates immediate left recursion. Left recursion can occur indirectly, when a chain of rules such as α→β, β→γ, and γ→αδ combines to create the situation that α→+αδ. This indirect left recursion is not always obvious; it can be hidden through an arbitrarily long chain of productions.

To convert these indirect left recursions into right recursion, we need a more systematic approach than inspection followed by application of our transformation. Figure 3.2 shows an algorithm that achieves this goal. It assumes that the grammar has no cycles (A →+ A) or productions (A → ).

The algorithm works by imposing an arbitrary order on the non-terminals. The outer loop cycles through the non-terminals in this order, while the inner loop ensures that a production expanding Ai has no non-terminal Aj with j < i. When it encounters such a non-terminal, it forward substitutes the non-terminal away. This eventually converts each indirect left recursion into a direct left recursion. The ﬁnal step in the outer loop converts any direct recursion on Ai to right recursion using the simple transformation shown earlier. Because new non-terminals are added at the end of the order and only involve right recursion, the loop can ignore them—they do not need to be checked and converted.

Considering the loop invariant for the outer loop may make this more clear. At the start of the ith outer loop iteration

k < i, a production expanding Ak with Al in its rhs, for l < k.

At the end of this process, (i = n), all indirect left recursion has been eliminated through the repetitive application of the inner loop, and all immediate left recursion has been eliminated in the ﬁnal step of each iteration.

3.3.3Predictive Parsing

When we parsed x − 2 × y with the right-recursive expression grammar, we did not need to backtrack. In fact, we can devise a parser for the right-recursive

70 CHAPTER 3. PARSING

expression grammar that never needs to backtrack. To see this, consider how the parser makes a decision that it must retract through backtracking.

The critical choice occurs when the parser selects a production with which to expand the lower fringe of the partially constructed parse tree. When it tries to expand some non-terminal α, it must pick a rule α→β. The algorithm, as shown in Figure 3.1, picks that rule arbitrarily. If, however, the parser could always pick the appropriate rule, it could avoid backtracking.

In the right-recursive variant of the expression grammar, the parser can make the correct choice by comparing the next word in the input stream against the right-hand sides. Look at the situation that arose in the derivation of x − 2 × y in the previous section. When the parser state was in the state

	Rule or
	Action	Sentential Form	Input
8		Id Expr	x	↑−	2	×	y
8		Id Expr	x		2		y
it needed to choose an expansion for Expr .				The possible right-hand sides

were: + Term Expr , − Term Expr , and . Since the next word in the input stream was −, the second choice is the only one that can succeed. The ﬁrst choice generates a leading +, so it can never match and will lead directly to backtracking. Choosing can only match the end of string, since Expr can only occur at the right end of a sentential form.

Fortuitously, this grammar has a form where the parser can predict the correct expansion by comparing the possible right-hand sides against the next word in the input stream. We say that such a grammar is predictive; parsers built on this property are called predictive parsers.

Before going further, we should deﬁne the property that makes a grammar predictively parsable. For any two productions, A → α | β, the set of initial terminal symbols derivable from α must be distinct from those derivable from β. If we deﬁne First(α) as the set of tokens that can appear as the ﬁrst symbol in some string derived from α, then we want

First(α) ∩ First(β) =

For an entire grammar G, the desired property is

rules A→α1 | α2 | α3 | · · ·αn in a grammar G

First(α1) ∩ First(α2) ∩ First(α3) ∩ · · ·First(αn) =

If this property holds true for each non-terminal in the grammar, then the grammar can be parsed predictively. Unfortunately, not all grammars are predictive.1 Consider, for example, a slightly more realistic version of our on-going example. A natural extension to our right recursive expression grammar would replace the production Factor→Identifier with a set of productions that describe the syntax for scalar variable references, array variable references, and

function calls.

1This condition is also called the LL(1) condition. Any grammar that meets this condition can be used to construct a table-driven, LL(1) parser.

3.3. TOP-DOWN PARSING

For each N T A G

ﬁnd the longest preﬁx α common to two or more right-hand sides

if α = then

replace the rules expanding A,

A → αβ1 | αβ2 | · · · | αβn | γ with

A → α ﬁe | γ

ﬁe → β1 | β2 | · · · | βn and add ﬁe to N T

Repeat until no common preﬁxes remain.

Figure 3.3: Left Factoring a Grammar

11.Factor → Identifier

12.| Identifier [ Exprlist ]

13.| Identifier ( Exprlist )

20. Exprlist → Expr , Exprlist

21.| Expr

Here, we show the C syntactic convention of using parentheses for function calls and square brackets for array references. This grammar fragment is not predictive, because the initial terminal symbol generated in rules 11, 12, and 13 is the same. Thus, the parser, in trying to expand a Factor on the parse tree’s lower fringe, cannot decide between 11, 12, and 13 on the basis of a single word lookahead. Of course, looking ahead two tokens would allow it to predict the correct expansion.

We can rewrite rules 11, 12, and 13 to make them predictive.

11.	Factor	→	Identifier arguments
12.	arguments	→	[ Exprlist ]

13.| ( Exprlist )

14.

In this case, we were able to transform the grammar into a predictive grammar. In essence, we introduced a new non-terminal to represent the common preﬁx of the three rules that were not predictive. We can apply this transformation, systematically and mechanically, to an arbitrary grammar. Figure 3.3 shows a simple algorithm that does this. However, left factoring the grammar will not always produce a predictive grammar.

Not all languages can be expressed in a predictive grammar. Using leftrecursion elimination and left factoring, we may be able to transform a grammar to the point where it can be predictively parsed. In general, however, it is undecidable whether or not a predictive grammar exists for an arbitrary contextfree language. For example, the language

72	CHAPTER 3. PARSING
	{an0bn \| n ≥ 1} {an1b2n \| n ≥ 1}

has no predictive grammar.

3.3.4Top-Down Recursive Descent Parsers

Given a predictive grammar G, we can construct a hand-coded parser for G that operates by recursive descent. A recursive descent parser is structured as a set of mutually recursive routines, one for each non-terminal in the grammar. The routine for non-terminal A recognizes an instance of A in the input stream. To accomplish this, the routine invokes other routines to recognize the various non-terminals on A’s right-hand side.

Consider a set of productions A → β1 | β2 | β3. Since G is predictive, the parser can select the appropriate right hand side (one of β1, β2, or β3) using only the input token and the First sets. Thus, the code in the routine for A should have the form:

/* ﬁnd an A */

if (current token First(β1)) ﬁnd a β1 & return true

else if (current token First(β2)) ﬁnd a β2 & return true

else if (current token First(β3)) ﬁnd a β3 & return true

else {

report an error based on A and current token return false

}

For each right hand side A → βi, the routine needs to recognize each term in βi . The code must check for each term, in order.

•For a terminal symbol, the code compares the input symbol against the speciﬁed terminal. If they match, it advances the input token and checks the next symbol on the right hand side. If a mismatch occurs, the parser should report the syntax error in a suitably informative message.

•For a non-terminal symbol, the code invokes the routine that recognizes that non-terminal. That routine either returns true as an indication of success, or it reports an error to the user and returns false. Success allows the parser to continue, recognizing the next symbol on the right hand side, if any more exist. A return value of false forces the routine to return false to its calling context.

For a right hand side β1 = γδρ, with γ, ρ N T and δ T , the code needs to recognize a γ, a δ, and an ρ (abstracted away as “ﬁnd a β1 and return true” in the previous code fragment). This code might look like:

3.4. BOTTOM-UP PARSING

if (current token First(β1)) { if (Parse γ() = false)

return false

else if (current token = δ) {

report an error ﬁnding δ in A → γδρ return false

}

current token ← next token() if (Parse ρ() = false)

return false else

return true

}

The routine Parse A will contain a code fragment like this for each alternative right-hand side for A.

To construct a complete recursive descent parser, then, the strategy is clear. For each non-terminal, we construct a routine that recognizes that non-terminal. Each routine relies on the other routines to recognize non-terminals, and directly tests the terminal symbols that arise in its own right hand sides. Figure 3.4 shows a top-down recursive descent parser for the predictive grammar that we derived in Section 3.3.2. Notice that it repeats code for similar right hand sides. For example, the code in ExprP() under the tests for ‘+’ and for ‘-’ could be combined to produce a smaller parser.

Automating the Process Top-down recursive-descent parsing is usually considered a technique for hand coding a parser. Of course, we could build a parser generator that automatically emitted top-down recursive descent parsers for suitable grammars. The parser generator would ﬁrst construct the necessary First sets for each grammar symbol, check each non-terminal to ensure that the First sets of its alternative right-hand sides are disjoint, and emit a suitable parsing routine for each non-terminal symbol in the grammar. The resulting parser would have the advantages of top-down recursive-descent parsers, such as speed, code-space locality, and good error detection. It would also have the advantages of a grammar-generated system, such as a concise, high-level speciﬁcation and a reduced implementation e ort.

3.4Bottom-up Parsing

To parse a stream of words bottom-up, the parser begins by creating leaves in the parse tree for each word. This creates the base of the parse tree. To construct a derivation, it adds layers of non-terminals on top of the leaves, in a structure dictated by both the grammar and the input stream. The upper edge of this partially-constructed parse tree is called its upper frontier. Each layer extends the frontier upward, toward the tree’s root.

To do this, the parser repeatedly looks for a part of that upper frontier that matches the right-hand side of a production in the grammar. When it ﬁnds a

Main()

token ← next token(); if (Expr() = false)

then

next compilation step else return false

Expr()

result ← true

if (Term() = false) then result ← false

else if (EPrime() = false) then result ← false

return result

EPrime() result ← true

if (token = ’+’) then token ← next token() if (Term() = false)

then result ← false else if (EPrime() = false)

then result ← false else if (token = ’-’) then token ← next token()

if (Term() = false) then result ← false

else if (EPrime() = false) then result ← false

else result ← true /* */

return result

Term()

result ← true

if (Factor = false) then result ← false

else if (TPrime() = false) then result ← false

return result

CHAPTER 3. PARSING

TPrime() result ← true

if (token = ×) then token ← next token() if (Factor() = false)

then result ← false else if (TPrime() = false)

then result ← false else if (token = ÷) then

token ← next token() if (Factor() = false)

then result ← false else if (TPrime() = false)

then result ← false; else result ← true /* */

return result

Factor()

result ← true

if (token = ’(’) then token ← next token() if (Expr() = false)

then result ← false else if (token = ’)’)

then

report syntax error result ← false

else token ← next token() else if (token = Number)

then token ← next token() else if (token = identiﬁer)

then token ← next token() else

report syntax error result ← false

return result

Figure 3.4: Recursive descent parser for expressions

3.4. BOTTOM-UP PARSING

Digression: Predictive parsers versus DFAs

Predictive parsing is the natural extension of dfa-style reasoning to parsers. A dfa makes its transition based on the next input character. A predictive parser requires that the expansions be uniquely determined by the next word in the input stream. Thus, at each non-terminal in the grammar, there must be a unique mapping from the ﬁrst token in any acceptable input string to a speciﬁc production that leads to a derivation for that string. The real di erence in power between a dfa and a predictively-parsable, or ll(1), grammar, derives from the fact that one prediction may lead to a right-hand side with many symbols, whereas, in a regular grammar, it predicts only a single symbol. This lets predictive grammars include productions like

p → ( p )

which is beyond the power of a regular expression to describe. (Recall that a regular expression can recognize (+ Σ )+, but this does not specify that the opening and closing parentheses must match.)

Of course, a hand-constructed top-down, recursive-descent parser can use arbitrary tricks to disambiguate production choices. For example, if a particular left-hand side cannot be predicted with a single symbol lookahead, the parser could use two symbols. Done judiciously, this should not cause problems.

match, it builds a node to represent the non-terminal symbol on production’s left-hand side, and adds edges to the nodes representing elements of the righthand side. This extends the upper frontier. This process terminates under one of two conditions.

1.The parser reduces the upper frontier to a single node that represents the grammar’s start symbol. If all the input has been consumed, the input stream is a valid sentence in the language.

2.No match can be found. Since, the parser has been unable to build a derivation for the input stream, the input is not a valid sentence. The parser should issue an appropriate error message.

A successful parse runs through every step of the derivation. A failed parse halts when it can ﬁnd no further steps, at which point it can use the context accumulated in the tree to produce a meaningful error message.

Derivations begin with the goal symbol and work towards a sentence. Because a bottom-up parser proceeds bottom-up in the parse tree, it discovers derivation steps in reverse order. Consider a production α→β where β T . A bottom-up parser will “discover” the derivation step α→β before it discovers the step that derives α. Thus, if a derivation consists of a series of steps that produces a series of sentential forms

S0 = γ0 γ1 γ2 · · · γn−1 γn = sentence,

76	CHAPTER 3. PARSING

the bottom-up parser will discover γn−1 γn before it discovers γn−2 γn−1. (It must add the nodes implied by γn−1 γn to the frontier before it can discover any matches that involve those nodes. Thus, it cannot discover the nodes in an order inconsistent with the reverse derivation.) At each point, the parser will operate on the frontier of the partially constructed parse tree; the current frontier is a preﬁx of the corresponding sentential form in the derivation. Because the sentential form occurs in a rightmost derivation, the missing su x consists entirely of terminal symbols.

Because the scanner considers the words in the input stream in a left-to-right order, the parser should look at the leaves from left to right. This suggests a derivation order that produces terminals from right to left, so that its reverse order matches the scanner’s behavior. This leads, rather naturally, to bottom-up parsers that construct, in reverse, a rightmost derivation.

In this section, we consider a speciﬁc class of bottom-up parsers called lr(1) parsers. Lr(1) parsers scan the input from left-to-right, the order in which scanners return classiﬁed words. Lr(1) parsers build a rightmost derivation, in reverse. Lr(1) parsers make decisions, at each step in the parse, based on the history of parse so far, and, at most, a lookahead of one symbol. The name lr(1) derives from these three properties: left-to-right scan, reverse-rightmost derivation, and 1 symbol of lookahead.2 Informally, we will say that a language has the lr(1) property if it can be parsed in a single left-to-right scan, to build a reverse rightmost derivation, using only one symbol of lookahead to determine parsing actions.

3.4.1Using handles

The key to bottom-up parsing lies in using an e cient mechanism to discover matches along the tree’s current upper frontier. Formally, the parser must ﬁnd some substring βγδ of the upper frontier where

1.βγδ is the right-hand side of some production α → βγδ, and

2.α → βγδ is one step in the rightmost derivation of the input stream.

It must accomplish this while looking no more than one word beyond the right end of βγδ.

We can represent each potential match as a pair α→βγδ, k , where α→βγδ is a production in G and k is the position on the tree’s current frontier of the right end of δ. If replacing the occurrence of βγδ that ends at k with α is the next step in the reverse rightmost derivation of the input string, then α → βγδ, k is a handle of the bottom-up parse. A handle concisely speciﬁes the next step in building the reverse rightmost derivation.

A bottom-up parser operates by repeatedly locating a handle on the frontier of the current partial parse tree, and performing the reduction that it speciﬁes.

2The theory of lr parsing deﬁnes a family of parsing techniques, the lr(k) parsers, for arbitrary k ≥ 0. Here, k denotes the amount of lookahead that the parser needs for decision making. Lr(1) parsers accept the same set of languages as lr(k) parsers.

3.4. BOTTOM-UP PARSING					77

		Token	Frontier	Handle	Action

	1.	Id		— none —	extend
	2.	−	Id	Factor→Id,1	reduce
	3.	−	Factor	Term→Factor,1	reduce
	4.	−	Term	Expr→Term	reduce
	5.	−	Expr	— none —	extend
	6.	Num	Expr −	— none —	extend
	7.	×	Expr − Num	Factor→Num,3	reduce
	8.	×	Expr − Factor	Term→Factor	reduce
	9.	×	Expr − Term	— none —	extend
	10.	Id	Expr − Term ×	— none —	extend
	11.	eof	Expr − Term × Id	Factor→Id	reduce
	12.	eof	Expr − Term × Factor	Term→Term × Factor	reduce
	13.	eof	Expr − Term	Expr→Expr − Term	reduce
	14.	eof	Expr	Goal→Expr	reduce
	15.	eof	Goal	— none —	accept

Figure 3.5: States of the bottom-up parser on x − 2 × y

When the frontier does not contain a handle, the parser extends the frontier by adding a single non-terminal to the right end of the frontier. To see how this works, consider parsing the string x − 2 × y using the classic expression grammar. At each step, the parser either ﬁnds a handle on the frontier, or it adds to the frontier. The state of the parser, at each step, is summarized in Figure 3.5, while Figure 3.6 shows the corresponding partial parse tree for each step in the process; the trees are drawn with their frontier elements justiﬁed along the top of each drawing.

As the example shows, the parser only needs to examine the upper frontier of partially constructed parse tree. Using this fact, we can build a particularly clean form of bottom-up parser, called a shift-reduce parser. These parsers use a stack to hold the frontier; this simpliﬁes the algorithm in two ways. First, the stack trivializes the problem of managing space for the frontier. To extend the frontier, the parser simply shifts the current input symbol onto the stack. Second, it ensures that all handles occur with their right end at the top of the stack; this eliminates any overhead from tracking handle positions.

Figure 3.7 shows a simple shift-reduce parser. To begin, it shifts an invalid symbol onto the stack and gets the ﬁrst input symbol from the scanner. Then, it follows the handle-pruning discipline: it shifts symbols from the input onto the stack until it discovers a handle, and it reduces handles as soon as they are found. It halts when it has reduced the stack to Goal on top of invalid and consumed all the input.

2.Id

3.Factor

4. Term

Factor

5. Expr

Term

Factor

6. Expr −

Term

Factor

7. Expr − Num

Term

Factor

8. Expr − Factor

Term Num

Factor

9. Expr − Term

Term Factor

Factor Num

CHAPTER 3. PARSING

10. Expr − Term ×

Term Factor

Factor Num

11. Expr − Term × Id

Term Factor

Factor Num

12. Expr − Term × Factor

?? ?

Term

Factor

Factor Num

Id
13. Expr −		PP
		Term
	?		? qP
Term		Term × Factor

?? ?

Factor		Factor		Id
	?		?

Id		Num

14.Expr

			XX

			? X
Expr		−		XzTerm
					PP
	?				? qP
Term				Term × Factor

?? ?

Factor		Factor		Id
	?		?

Id		Num

15.Goal

			?
		Expr
			XX	XXzTerm
			?	XXzTerm
Expr		−			PP
		−			PP
	?				? qP
Term			Term × Factor

?? ?

Factor

? ?

Id Num

Figure 3.6: Bottom-up parse of x − 2 × y

<<< < Предыдущая 1 2 3 4 5 6 7 89 / 369 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.2013168.53 Кб8CompactPCI specification short form.Rev 2.1.1997.pdf
#
23.08.20132.86 Mб15Computer arithmetic lectures.pdf
#
23.08.20132.04 Mб321Conklin E.K.Forth programmer's handbook.2000.pdf
#
23.08.201384.88 Кб11Constantinides C.AOP considered harmful.pdf
#
23.08.201357.95 Кб121Construction and use of broadband transformers.pdf
#
23.08.20132.31 Mб49Cooper K.Engineering a compiler.pdf
#
23.08.20134.82 Mб19Cooper M.Advanced Bash scripting guide Rev1.4.2002.pdf
#
23.08.2013916.67 Кб13Cooper M.Advanced bash-scripting guide.2002.pdf
#
23.08.201356.71 Кб13Cortes L.Designing a graphical user interface.1997.pdf
#
23.08.2013642.66 Кб21Coulson D.Mastering IPTables.pdf
#
23.08.201329.44 Кб4cover.pdf