Parsing example

This section illustrates the process of parsing input using grammar defined in previous and shown on Fig. 3. Below on Fig. 5 the grammar is presented once again, this time with productions written on a separate line.

Fig. 5 Grammar

(0) entire_file: SENTENCE

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog / $eof

(2) ADJECTIVES: Quick / Fox

(3) ADJECTIVES: Brown / Fox

(4) ADJECTIVES: Quick Brown / Fox

(5) MANYJUMPS: Jumps / Over, And

(6) MANYJUMPS: MANYJUMPS And Jumps / Over, And

For each production a list of terminal symbols that can appear on input next is shown (after slash).

Parser table calculated with Alpag is shown on Fig. 6.

Note that some publications prefer to use two tables for PDA description: one for shift and reduce actions performed under terminal symbols and a separate one for gotos which are performed under nonterminals. Here a single table contains all the information.

Fig. 6 Pushdown Automaton action table

State	Next symbol	Action	Next state	Reduce by production	Reduce to nonterminal
#1	A	shift	#3
#1	SENTENCE	goto	#2
#2	$eof	reduce		(0)
#3	Quick	shift	#5
	Brown	shift	#6
	ADJECTIVES	goto	#4
#4	Fox	shift	#8
#5	Brown	shift	#7
#5	Fox	reduce		(2)	ADJECTIVES
#6	Fox	reduce		(3)	ADJECTIVES
#7	Fox	reduce		(4)	ADJECTIVES
#8	Jumps	shift	#10
#8	MANYJUMPS	goto	#9
#9	Over	shift	#11
#9	And	shift	#12
#10	Over	reduce		(5)	MANYJUMPS
#10	And	reduce		(5)	MANYJUMPS
#11	A	shift	#14
#12	Jumps	shift	#13
#13	Over	reduce		(6)	MANYJUMPS
#13	And	reduce		(6)	MANYJUMPS
#14	Dog	shift	#15
#15	$eof	reduce		(1)	SENTENCE

The input text being parsed is:

A Quick Fox Jumps And Jumps Over Dog

The text was chosen to show how LR(1) handles:

Recursion. In the grammar left recursion is present in production (6)
Ambiguity between productions (2) and (4). This ambiguity known as shift-reduce conflict is resolved by using lookahead symbol in state #5.

When analyzing process of parsing it is necessary to consider the grammar (Fig. 5), the automaton (Fig. 6) as well as current state of parser stack and remaining input which are changing from step to step.

For each parsing step shown are:

Current state of automaton (its number)
Automaton stack. Stack contains identifiers of most recent automatons states and terminals or nonterminals pushed on stack. Top-of stack state is always the same as current state
Remaining input. First symbol of the input is also the current lookahead symbol used to choose between possible actions available in current state.
Position of current state against the grammar marked with ^ symbol.
Automaton states correspond to particular points within grammar. The meaning of ^ symbol is "we are here providing that this interpretation of input is correct".

Selected grammar productions are shown with possible 'next' symbols (listed after slash '/'). These are symbols that are likely to appear next in input stream given the surrounding grammar context. Next possible symbols are calculated during parser table generation and converted to lookahead symbols for individual actions where applicable.

Below is a sequence of actions taken by parser. A short explanation of each step is included where applicable.

Starting position:

State #1

Stack: #1

Input: A Quick Fox Jumps And Jumps Over A Dog

Current position in grammar:

(1) SENTENCE: ^ A ADJECTIVES Fox MANYJUMPS Over A Dog

This is the starting state of automaton

Next symbol: A

Action: shift A to stack and go to state #3

State #3

Stack: #1 A #3

Input: Quick Fox Jumps And Jumps Over A Dog

Current position in grammar:

(1) SENTENCE: A ^ ADJECTIVES Fox MANYJUMPS Over A Dog

(2) ADJECTIVES: ^ Quick

(3) ADJECTIVES: ^ Brown

(4) ADJECTIVES: ^ Quick Brown

Current position in production (1) before nonterminal ADJECTIVES expands also to starting positions in all productions for that nonterminal (2),(3),(4). At this moment all these production s are plausible alternatives.

In current state next possible terminal can be Quick or Brown. Action taken by parser depends on next terminal.

Next symbol: Quick

Action: shift Quick to stack and goto state #5

State #5

Stack: #1 A #3 Quick #5

Input: Fox Jumps And Jumps Over A Dog $eof

Current position in grammar:

(2) ADJECTIVES: Quick ^ / Fox

(4) ADJECTIVES: Quick ^ Brown

After reading Quick only productions (2) and (4) are likely choices.

If the parser was LR(0) it would be impossible to decide whether to reduce right now by production (2) or postpone reduction and read next token which could be Brown matching production (4).

Since the grammar is LR(1), grammar tables contain rules based on preview of next symbol. If next symbol is Fox reduction will be done by production (2). If next symbol is Brown parser will continue pushing symbols to the stack in accordance with production (4)

Next symbol: Fox

Action: reduce by production (2). Remove top of stack element Quick (which corresponds to right-hand-side of this production). It is replaced with nonterminal ADJECTIVES. State below removed element is #3. Parser then performs goto operation under nonterminal that was just placed on stack. Next state after goto is #4.

State #4

Stack: #1 A #3 ADJECTIVES (made of ‘Quick’) #4

Input: Fox Jumps And Jumps Over A Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES ^ Fox MANYJUMPS Over A Dog

Currently the only possible next token is Fox (which was used as lookahead in previous step but was not read from input and remains as next input token)

Next symbol: Fox

Action: shift Fox to stack and go to state #8

State #8

Stack: #1 A #3 ADJECTIVES #4 Fox #8

Input: Jumps And Jumps Over A Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox ^ MANYJUMPS Over A Dog

(5) MANYJUMPS: ^ Jumps / Over, And

(6) MANYJUMPS: ^ MANYJUMPS And Jumps / Over, And

Current position in production (1) ahead of MANYJUMPS expands also to beginnings of all productions for that nonterminal. Only possible terminal in current state is Jumps.

Next symbol: Jumps

Action: shift Jumps to stack and go to state #10

State #10

Stack #1 A #3 ADJECTIVES #4 Fox #8 Jumps #10

Input: And Jumps Over A Dog $eof

Current position in grammar:

(5) MANYJUMPS: Jumps ^ / Over, And

In current state the only possible action is to reduce by rule (5). Possible next tokens (lookaheads) are Over and And. Since for both these symbols decision is the same, analyzing lookahead symbol is not really necessary to make the decision.

A Full LR(1) parser will have explicit rules for both possible next symbols (Over, And) . A simplified parser may have one default catch-all rule causing reduction by production (5) regardless of the next symbol. Process of discarding lookahead information where it is not necessary to resolve ambiguities is called defaulting. If state #10 was defaulted, the only action in the state would become: 'reduce by production (5) regardless what the next symbol is'. When state is defaulted LR(1) parser behaves like a simple LR(0) no-lookahead parser in these states.

Next symbol: And

Action: reduce by production (5). Remove top of stack symbol Jumps and get back to state #8. Then go to state #9 pushing the nonterminal ‘MANYJUMPS’.

State #9

Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS (made of Jumps) #9

Input: And Jumps Over A Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS ^ Over A Dog

(6) MANYJUMPS: MANYJUMPS ^ And Jumps

Next input token can be either Over or And. Decision is made after reading next token.

Next symbol: And

Action: Shift And to stack and go to state #12

State #12

Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 And #12

Input: Jumps Over A Dog $eof

Current position in grammar:

(6) MANYJUMPS: MANYJUMPS And ^ Jumps

Next symbol: Jumps

Action: Shift Jumps to stack and go to state #13

State #13

Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 And #12 Jumps #13

Input: Over A Dog $eof

Current position in grammar:

(6) MANYJUMPS: MANYJUMPS And Jumps ^

At the end of production the only possible action is reduction. Full LR(1) grammar contains explicit rules for all possible next-symbols ( Over, And). However since only one action is possible, there is no real need to analyze lookahead symbol.

An LR(1) parser implementation can be made to prefetch lookahead symbol always or only when necessary. Parsers generated by Alpag can be configured to act either way.

Next symbol: Over (which is not very relevant at the moment)

Action: reduce by production (6). Production has three elements on its right hand side: MANYJUMPS, And, Jumps which correspond to current top-of-stack configuration. Reduction removes these three symbols. Top of stack state becomes #8. Then parser goes to state #9 pushing MANYJUMPS to stack. Of course this newly pushed MANYJUMPS corresponds to left-hand-side of production (6). It is not the same as MANYJUMPS that was earlier on the stack in the same place.

State #9

Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS (made of MANYJUMPS, And, Jumps) #9

Input: Over A Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS ^ Over A Dog

(6) MANYJUMPS: MANYJUMPS ^ And Jumps

Next symbol: Over

Action: Shift Over to stack and go to state #11

State #11

Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11

Input: A Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over ^ A Dog

Next symbol: A

Action: Shift A to stack and go to state #14

State #14

Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11 A #14

Input: Dog $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A ^ Dog

Next symbol: Dog

Action: Shift Dog to stack and go to state #15.

State #15

Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11 A #15 Dog #15

Input: $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog ^

End of input stream was reached (the lookahead symbol is $eof).

In current state the only action (for end-of-file) is to reduce by production (1).

Next symbol: $eof

Action: reduce by production (1). Remove seven elements from stack that correspond to right hand side of production (1). State on stack below this production is #1. In state #1 the parser pushes just reduced nonterminal SENTENCE and goes to next state #2.

State #2

Stack #1 SENTENCE (made of A ADJECTIVES Fox MANYJUMPS Over A Dog) #2

Input: $eof

Current position in grammar:

(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog ^

The action in state #2 under symbol $eof is to recognize current top of stack nonterminal SENTENCE as valid input file which ends the parsing process with success.

Syntax tree of parsed text is shown on Fig. 7. It describes decomposition of all nonterminals into their elementary terminals and nonterminals.

Fig. 7 Syntax tree of parsed text

< Parser algorithm | First steps >

Alpag Manual