alpag.net manual
Introduction / Parser / Parsing example
< Parser algorithm | First steps >

Parsing example

This section illustrates the process of parsing input using grammar defined in previous and shown on Fig. 3. Below on Fig. 5 the grammar is presented once again, this time with productions written on a separate line.

Fig. 5 Grammar
(0) entire_file: SENTENCE
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog / $eof
(2) ADJECTIVES: Quick / Fox
(3) ADJECTIVES: Brown / Fox
(4) ADJECTIVES: Quick Brown / Fox
(5) MANYJUMPS: Jumps / Over, And
(6) MANYJUMPS: MANYJUMPS And Jumps / Over, And
For each production a list of terminal symbols that can appear on input next is shown (after slash).

Parser table calculated with Alpag is shown on Fig. 6.

Note that some publications prefer to use two tables for PDA description: one for shift and reduce actions performed under terminal symbols and a separate one for gotos which are performed under nonterminals. Here a single table contains all the information.

Fig. 6 Pushdown Automaton action table

State

Next symbol

Action

Next state

Reduce by production

Reduce to nonterminal

#1

A

shift

#3

SENTENCE

goto

#2

#2

$eof

reduce

(0)

#3

Quick

shift

#5

Brown

shift

#6

ADJECTIVES

goto

#4

#4

Fox

shift

#8

#5

Brown

shift

#7

Fox

reduce

(2)

ADJECTIVES

#6

Fox

reduce

(3)

ADJECTIVES

#7

Fox

reduce

(4)

ADJECTIVES

#8

Jumps

shift

#10

MANYJUMPS

goto

#9

#9

Over

shift

#11

And

shift

#12

#10

Over

reduce

(5)

MANYJUMPS

And

reduce

(5)

MANYJUMPS

#11

A

shift

#14

#12

Jumps

shift

#13

#13

Over

reduce

(6)

MANYJUMPS

And

reduce

(6)

MANYJUMPS

#14

Dog

shift

#15

#15

$eof

reduce

(1)

SENTENCE

The input text being parsed is:

A Quick Fox Jumps And Jumps Over Dog

The text was chosen to show how LR(1) handles:

When analyzing process of parsing it is necessary to consider the grammar (Fig. 5), the automaton (Fig. 6) as well as current state of parser stack and remaining input which are changing from step to step.

For each parsing step shown are:

Selected grammar productions are shown with possible 'next' symbols (listed after slash '/'). These are symbols that are likely to appear next in input stream given the surrounding grammar context. Next possible symbols are calculated during parser table generation and converted to lookahead symbols for individual actions where applicable.

Below is a sequence of actions taken by parser. A short explanation of each step is included where applicable.

Starting position:

State #1
Stack: #1
Input: A Quick Fox Jumps And Jumps Over A Dog
Current position in grammar:
(1) SENTENCE: ^ A ADJECTIVES Fox MANYJUMPS Over A Dog

This is the starting state of automaton

Next symbol: A

Action: shift A to stack and go to state #3

State #3
Stack: #1 A #3
Input: Quick Fox Jumps And Jumps Over A Dog
Current position in grammar:
(1) SENTENCE: A ^ ADJECTIVES Fox MANYJUMPS Over A Dog
(2) ADJECTIVES: ^ Quick
(3) ADJECTIVES: ^ Brown
(4) ADJECTIVES: ^ Quick Brown

Current position in production (1) before nonterminal ADJECTIVES expands also to starting positions in all productions for that nonterminal (2),(3),(4). At this moment all these production s are plausible alternatives.

In current state next possible terminal can be Quick or Brown. Action taken by parser depends on next terminal.

Next symbol: Quick

Action: shift Quick to stack and goto state #5

State #5
Stack: #1 A #3 Quick #5
Input: Fox Jumps And Jumps Over A Dog $eof
Current position in grammar:
(2) ADJECTIVES: Quick ^ / Fox
(4) ADJECTIVES: Quick ^ Brown

After reading Quick only productions (2) and (4) are likely choices.

If the parser was LR(0) it would be impossible to decide whether to reduce right now by production (2) or postpone reduction and read next token which could be Brown matching production (4).

Since the grammar is LR(1), grammar tables contain rules based on preview of next symbol. If next symbol is Fox reduction will be done by production (2). If next symbol is Brown parser will continue pushing symbols to the stack in accordance with production (4)

Next symbol: Fox

Action: reduce by production (2). Remove top of stack element Quick (which corresponds to right-hand-side of this production). It is replaced with nonterminal ADJECTIVES. State below removed element is #3. Parser then performs goto operation under nonterminal that was just placed on stack. Next state after goto is #4.

State #4
Stack: #1 A #3 ADJECTIVES (made of ‘Quick’) #4
Input: Fox Jumps And Jumps Over A Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES ^ Fox MANYJUMPS Over A Dog

Currently the only possible next token is Fox (which was used as lookahead in previous step but was not read from input and remains as next input token)

Next symbol: Fox

Action: shift Fox to stack and go to state #8

State #8
Stack: #1 A #3 ADJECTIVES #4 Fox #8
Input: Jumps And Jumps Over A Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox ^ MANYJUMPS Over A Dog
(5) MANYJUMPS: ^ Jumps / Over, And
(6) MANYJUMPS: ^ MANYJUMPS And Jumps / Over, And

Current position in production (1) ahead of MANYJUMPS expands also to beginnings of all productions for that nonterminal. Only possible terminal in current state is Jumps.

Next symbol: Jumps

Action: shift Jumps to stack and go to state #10

State #10
Stack #1 A #3 ADJECTIVES #4 Fox #8 Jumps #10
Input: And Jumps Over A Dog $eof
Current position in grammar:
(5) MANYJUMPS: Jumps ^ / Over, And

In current state the only possible action is to reduce by rule (5). Possible next tokens (lookaheads) are Over and And. Since for both these symbols decision is the same, analyzing lookahead symbol is not really necessary to make the decision.

A Full LR(1) parser will have explicit rules for both possible next symbols (Over, And) . A simplified parser may have one default catch-all rule causing reduction by production (5) regardless of the next symbol. Process of discarding lookahead information where it is not necessary to resolve ambiguities is called defaulting. If state #10 was defaulted, the only action in the state would become: 'reduce by production (5) regardless what the next symbol is'. When state is defaulted LR(1) parser behaves like a simple LR(0) no-lookahead parser in these states.

Next symbol: And

Action: reduce by production (5). Remove top of stack symbol Jumps and get back to state #8. Then go to state #9 pushing the nonterminal ‘MANYJUMPS’.

State #9
Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS (made of Jumps) #9
Input: And Jumps Over A Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS ^ Over A Dog
(6) MANYJUMPS: MANYJUMPS ^ And Jumps

Next input token can be either Over or And. Decision is made after reading next token.

Next symbol: And

Action: Shift And to stack and go to state #12

State #12
Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 And #12
Input: Jumps Over A Dog $eof
Current position in grammar:
(6) MANYJUMPS: MANYJUMPS And ^ Jumps

Next symbol: Jumps

Action: Shift Jumps to stack and go to state #13

State #13
Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 And #12 Jumps #13
Input: Over A Dog $eof
Current position in grammar:
(6) MANYJUMPS: MANYJUMPS And Jumps ^

At the end of production the only possible action is reduction. Full LR(1) grammar contains explicit rules for all possible next-symbols ( Over, And). However since only one action is possible, there is no real need to analyze lookahead symbol.

An LR(1) parser implementation can be made to prefetch lookahead symbol always or only when necessary. Parsers generated by Alpag can be configured to act either way.

Next symbol: Over (which is not very relevant at the moment)

Action: reduce by production (6). Production has three elements on its right hand side: MANYJUMPS, And, Jumps which correspond to current top-of-stack configuration. Reduction removes these three symbols. Top of stack state becomes #8. Then parser goes to state #9 pushing MANYJUMPS to stack. Of course this newly pushed MANYJUMPS corresponds to left-hand-side of production (6). It is not the same as MANYJUMPS that was earlier on the stack in the same place.

State #9
Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS (made of MANYJUMPS, And, Jumps) #9
Input: Over A Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS ^ Over A Dog
(6) MANYJUMPS: MANYJUMPS ^ And Jumps

Next symbol: Over

Action: Shift Over to stack and go to state #11

State #11
Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11
Input: A Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over ^ A Dog

Next symbol: A

Action: Shift A to stack and go to state #14

State #14
Stack #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11 A #14
Input: Dog $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A ^ Dog

Next symbol: Dog

Action: Shift Dog to stack and go to state #15.

State #15
Stack: #1 A #3 ADJECTIVES #4 Fox #8 MANYJUMPS #9 Over #11 A #15 Dog #15
Input: $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog ^

End of input stream was reached (the lookahead symbol is $eof).

In current state the only action (for end-of-file) is to reduce by production (1).

Next symbol: $eof

Action: reduce by production (1). Remove seven elements from stack that correspond to right hand side of production (1). State on stack below this production is #1. In state #1 the parser pushes just reduced nonterminal SENTENCE and goes to next state #2.

State #2
Stack #1 SENTENCE (made of A ADJECTIVES Fox MANYJUMPS Over A Dog) #2
Input: $eof
Current position in grammar:
(1) SENTENCE: A ADJECTIVES Fox MANYJUMPS Over A Dog ^

The action in state #2 under symbol $eof is to recognize current top of stack nonterminal SENTENCE as valid input file which ends the parsing process with success.

Syntax tree of parsed text is shown on Fig. 7. It describes decomposition of all nonterminals into their elementary terminals and nonterminals.

Fig. 7 Syntax tree of parsed text
< Parser algorithm | First steps >
Alpag Manual