Types of conflicts

Parser reads subsequent input tokens and at the same time walks states of automaton. Each visited state is supposed to provide information on what to do next depending on input token read. A conflict occurs when it is not possible to build an automaton where in each state for each possible next token there would be just a single action. In other words a conflict is a situation, where in some state there are two or more candidate actions for a single symbol.

Parser can perform two types of actions: reduce, which reduces top of stack elements according to some production, and shift, which pushes next token to stack hoping to match some longer production. Given these two types of actions possible conflicts are:

shift-reduce – parser does not know if to reduce now according to some matching rule, or to shift waiting for longer match
reduce-reduce – top-of-stack symbols match more than one grammar rule. Parser is unable to say by which rule stack should be reduced

Note that there are no shift-shift conflicts.

Shift-reduce

The most typical example of shift-reduce conflict is known as 'dangling else'. In many programming languages syntax of IF instruction allows optional else block. Corresponding grammar is:

@1 code: IF_INSTR;
@2 IF_INSTR: IF condition THEN code
@3 IF_INSTR: IF condition THEN code ELSE code

Providing that input is:

IF condition THEN IF condition THEN code ELSE code

it is not possible to tell if it should be interpreted as:

@2 IF condition THEN ( @3 IF_INSTR : IF condition THEN code ELSE code )

@3 IF condition THEN (@2 IF_INSTR : IF condition THEN code ) ELSE code

That is whether the terminal ELSE code belongs to the outer or inner IF_INSTR.

When parser is processing input from left to right it must make final decision when in the following situation:

read so far: IF condition THEN IF condition THEN code
pending input: ELSE code

Possible options are:

decision to reduce top of stack according to production @2:

IF condition THEN ( IF condition THEN code ) // top of stack matches production @2, so reduce
IF condition THEN code: F_INSTR // now ELSE will be pushed
IF condition THEN code: F_INSTR ELSE
// now code will be pushed...
IF condition THEN code: IF_INSTR ELSE code // top of stack matches production @3, so reduce
IF_INSTR

alternative decision is to shift ELSE:

IF condition THEN IF condition THEN code // push ELSE
IF condition THEN IF condition THEN code ELSE // now push code
IF condition THEN ( IF condition THEN code ELSE code ) // top of stack matches @3, reduce
( IF condition THEN code: IF_INSTR ) // now top of stack matches @2, so reduce again
IF_INSTR

Dangling else is an example of grammar ambiguity. User must explicitly specify which interpretation of above situation should be assumed when building parser. To resolve the conflict user must manually resolve the conflict by telling Alpag whether to shift or reduce in given situation.

Reduce-reduce

A reduce-reduce conflict is almost always caused by serious grammar error. Conflicts of this type cannot be resolved by pointing to one of possible solutions, though some global options for handling them exist. Usually grammar must be rewritten to avoid such conflicts.

In most cases it is possible to rework the grammar so that general idea behind it is preserved and conflict disappears.

Usually reduce-reduce conflicts arise when certain construction (sequence of symbols) can be achieved or build from grammar rules in more than one way, as in the example:

COMMAND_ARRAY: COMMAND_ARRAY COMMAND | COMMAND;
COMMAND:
CMD_OPT_INT |
CMD_OPT_STR;
CMD_OPT_INT:
CMD |
CMD INT;
CMD_OPT_STR:
CMD |
CMD STR;

The idea behind above erroneous the grammar is to allow three alternative syntaxes of a COMMAND:

CMD // CMD alone
CMD INT // CMD with INT argument
CMD STR // CMD with STR argument

However when a standalone CMD token (not followed by INT or STR) appears, two alternative interpretations of this CMD token exist:

COMMAND: CMD_OPT_INT where CMD_OPT_INT: CMD
COMMAND: CMD_OPT_STR where CMD_OPT_STR: CMD

It is not possible to tell between these two interpretations of CMD token.

Note that introducing such ambiguity was not probably the intent of the author.

Parser must provide unambiguous decisions in all states. After CMD was read possible decisions (given next token is known) looks as follows:

CMD next token: STR // shift STR (next move will be to reduce to CMD_OPT_STR)
CMD next token: INT // shift INT (next move will be to reduce to CMD_OPT_INT)
CMD next token: CMD // should reduce by CMD_OPT_INT or CMD_OPT_STR ???

In above state parser is unable to tell by which production to reduce. Construction of parser fails and grammar must be rewritten.

Note that above grammar can be easily rewritten to avoid the conflict while preserving the original idea:

COMMAND_ARRAY: COMMAND_ARRAY COMMAND | COMMAND;
COMMAND:
CMD |
CMD INT |
CMD STR;

Trivial example of reduce-reduce conflict used in this chapter is not representative for entire class of r-r conflicts that can arise in production grammars. No simple rule or advice can be given on how to avoid such conflicts. However usually r-r conflicts are consequence of decomposing grammar into very deep trees of overly nested productions.

< Conflicts | Conflict context >

Alpag Manual