Handling alternative productions

Grammars often contain multiple alternative productions that start in the same way and differ only by their terminal elements. Parser working through input from left to right cannot choose between alternative productions after having seen just a few initial symbols, especially if these symbols are same in multiple productions. Parser must be considering multiple alternative interpretations of input sequence in parallel and keep reading. When, after reading enough symbols, only single interpretation remains valid, parser can choose it.

One of greatest strengths of LR(1) algorithm is ability to deal with such cases.

Example given below illustrates how LR(1) parser chooses between multiple alternatives which start in the same way. Studying this example can be very educative and help analyze automatons for production grammars.

Grammar is:

@1 STMT: he DOESWHAT;
@2 STMT: he drinks WHAT;
@3 DOESWHAT: drinks BEVERAGE;
@4 WHAT: tea;
@5 BEVERAGE: coffee;

The grammar can accept two inputs which differ only by last terminal:

he drinks coffee

he drinks tea

Note that depth of nesting grammar productions is different for both these inputs:

@1 STMT: he ( @3 DOESWHAT : drinks ( @5 BEVERAGE: coffee ) )
@2 STMT: he drinks (@4 WHAT: tea )

Parser reading input sequence starting with 'he drinks' is not capable of telling whether it matches first or second expansion. In first case after reading symbol he parser should recursively nest into production @3 for symbol drinks. In second case after reading he next terminal symbol drinks is on the same level in production @2.

Parser automaton is built to consider both these scenarios in parallel. This means that after reading 'he drinks' parser is at the same time inside production @2 as well as inside nested productions @1 and @3. Final decision is made after reading following token (coffee or tea). This final decision results in a sequence of reductions which can handle any nesting depth.

Here is automaton for the grammar:

state #1
$accept: . STMT
actions
he shift to #3
STMT goto #2
state #2
$accept: STMT .
actions
$end reduce by @6 to $accept
state #3
STMT: he . DOESWHAT
STMT: he . drinks WHAT
actions
drinks shift to #5
DOESWHAT goto #4
state #4
STMT: he DOESWHAT .
actions
$end reduce by @1 to STMT
state #5
STMT: he drinks . WHAT
DOESWHAT: drinks . BEVERAGE
actions
tea shift to #8
coffee shift to #9
WHAT goto #6
BEVERAGE goto #7
state #6
STMT: he drinks WHAT .
actions
$end reduce by @2 to STMT
state #7
DOESWHAT: drinks BEVERAGE .
actions
$end reduce by @3 to DOESWHAT
state #8
WHAT: tea .
actions
$end reduce by @4 to WHAT
state #9
BEVERAGE: coffee .
actions
$end reduce by @5 to BEVERAGE

Example will be explained using two scenarios. Initial steps are the same for both these scenarios:

State #1

stack: #1

pending input : he drinks (coffee or tea)

Current position in grammar (either of):

@1 STMT: ^ he DOESWHAT;

@2 STMT: ^ he drinks WHAT;

After reading he:

State #3

stack: #1 he #3

pending input : drinks (coffee or tea)

Current position in grammar (either of):

@1 STMT: he ^ DOESWHAT;

@2 STMT: he ^ drinks WHAT;

After reading drinks:

State #5

stack: #1 he #3 drinks #5

pending input : (coffee or tea)

Current position in grammar:

scenario 1:

@1 STMT: he DOESWHAT; // inside DOESWHAT...

@3 DOESWHAT: drinks ^ BEVERAGE;

scenario 2:

@2 STMT: he drinks ^ WHAT;

Now depending on next input terminal either of two scenarios will take place.

Studying further steps pay attention to the sequence of reductions which convert sequence of tokens of, actually, the same length into parse trees of different depths.

Scenario 1

Providing next symbol is coffee, situation is:

State #9

stack: #1 he #3 drinks #5 coffee #9

pending input: $end

Current position in grammar:

@1 STMT: he DOESWHAT ^;

@3 DOESWHAT: drinks BEVERAGE ^;

@5 BEVERAGE: coffee ^;

Sequence of reductions follows (turn attention to stack contents).

Current state is #9. Action for next symbol $end is to reduce coffee to BEVERAGE which replaces top of stack with nonterminal:

State #7

stack: #1 he #3 drinks #5 BEVERAGE #7

pending input: $end

Current position in grammar:

@1 STMT: he DOESWHAT ^;

@3 DOESWHAT: drinks BEVERAGE ^;

In state #7 next action is to reduce two top-of-stack elements drinks BEVERAGE to DOESWHAT.

State #4

stack: #1 he #3 DOESWHAT #4

pending input: $end

Current position in grammar:

@1 STMT: he DOESWHAT ^current;

In state #4 last reduction leads to:

State #2

stack: #1 STMT #2 input: $end

Scenario 2

If last input symbol is tea situation is:

State #8

stack: #1 he #3 drinks #5 tea #8

pending input: $end

Current position in grammar:

@2 STMT: he drinks WHAT ^current;

@4 WHAT: tea ^current;

In state #8 action under next symbol $end is to reduce by @4:

State #6

stack: #1 he #3 drinks #5 WHAT #6 input: $end

Current position in grammar:

@2 STMT: he drinks WHAT ^current;

In state #6 next action is to reduce by @2 which leads to:

State #2

stack: #1 STMT #2 input: $end

Conclusions

If grammar contains multiple similar productions, parser can postpone decision on choosing one of them and read more input symbols until only one production matches.

When reduction is executed, length of reduced production tells how many elements will be removed off the stack. It also determines the state in which parser resumes its operation after reduction. As a consequence similar stack configurations can be 'sliced' into different numbers of nested productions.

< Basic parsing | Recursion >

Alpag Manual