Parser algorithm

Once CFG grammar is correctly defined, the process of parser generation is fully automatic. Generation may fail if defined grammar is outside of LR(1) grammar class that Alpag can handle even if grammar itself is correct.

The process of defining CFG grammar that matches the language, is not ambiguous and fits the LR(1) grammar class can be long, complex and require many iterations. That’s why it is necessary to fully understand the way LR algorithm works.

LR parsers perform bottom-up parsing using a pushdown automaton (PDA). Pushdown automaton has two major components:

Automaton defined over finite number of states. Each state contains list of actions to take, depending on next terminal symbol read from input. The automaton is a static description of automaton calculated once for given input grammar and used on each run.
Stack which contains current list of symbols that were read from input and were not yet recognized or classified as part of any grammar production. Parser reads consecutive input symbols and pushes them to the stack. Whenever top-of-stack sequence of elements matches a right-hand-side of grammar production for certain nonterminal, this sequence is reduced, that removed from stack, and replaced by left-hand-side nonterminal for that production.

A state does not contain actions for all possible input symbols but only for symbols that are expected in that state. If next input symbol does not match any of symbols defined in current state an error is reported.

There are three types of actions. Two of these – shift, and reduce – are performed conditionally depending on next input terminal. Third type of action – goto – is performed for nonterminals being results of reductions.

Shift action removes the matched terminal symbol from input and places it on top of parser stack. Then it goes to the next state as specified for that action. During shift operation previous automaton state is saved on stack.
Reduce action does not remove terminal symbol from input. The symbol remains on input and can be processed by further actions. Reduction removes top-of-stack elements corresponding to a particular grammar production and replaces them with nonterminal for which this production was defined. Reduction also returns automaton to the state it was in before reduced symbols where pushed to stack. Reduction is immediately followed by goto.
Goto action performs transition to next state under nonterminal. It is executed whenever a nonterminal is added to top of the stack, which occurs during reduction operation. This means that goto is executed always after reduce.

Most publications prefer to treat goto as different type of action from shift and reduce and present two parser tables, one for shift/reduce and one for goto. In this manual and in report files generated by Alpag always single parser table is presented covering all shift/reduce/goto actions.

This chapter contains a minimum necessary to understand contents of parser tables presented further. A more detailed description of parser operation can be found in Parser operation chapter.

< Grammar definition | Parsing example >

Alpag Manual