Parser grammar defined in previous sections does not contain any custom user code. Such code must be provided to report recognized grammar constructs. Besides tokens that are fed to parser as input usually come with values. These values should be processed as well.
Whenever parser recognizes certain grammar construction it performs a reduction replacing multiple grammar elements with one nonterminal. User code executed during reduction must perform all processing necessary to preserve the data attached to reduced symbols.
Upon reduction user code can do two things:
Often some mixture of both approaches is used.
Parser can store on stack values provided by user and assigned to terminals and nonterminals. Type of value stored on stack can be chosen by user. It is also possible to declare a record with multiple value types, different for particular terminals and nonterminals.
Values for nonterminals are calculated during reduction. Values for terminals must be assigned when parser reads the input, and usually come from lexer.
In example below a single value of type string will be used for all terminals and nonterminals. Tested parser will not be interfaced with real lexer. Instead source of tokens (terminal symbols) will be simulated using a predefined array of tokens and an array of values for these tokens, both embedded directly in parser.
Below is a complete parser input file. It can be used as input file for Alpag.
User code to be placed inside parser class is contained in %code blocks.
Individual productions were added code executed during reduction.
The parser_next_token code is executed each time parser needs next input token from lexer. In example code parser_next_token prints a line containing identifier of token fetched and it associated value.
Code for each production logs the reduction along with result value. This should enable tracing parser operation.
Reduction occurs when all right-hand-side elements were correctly recognized, and parser is fully confident that these match exactly one production. During reduction all right-hand side elements are replaced by one left-hand side nonterminal. Both reduced elements and result nonterminal can have associated value. User code can calculate result value and assign it to $$ variable. values of right-hand side elements are available in $1, $2, $3… variables and can be used in calculation.
Since string is used for all values result of each reduction can be calculated by concatenating values of all right-hand side elements. Results are additionally wrapped in easily distinguishable delimiters to emphasize order of reductions.
Executing above code should reveal sequence of all operations including fetching input tokens and reducing productions.
Rebuild parser invoking:
Now generated parser can be added to target application.
Create a console application and place following code inside its main method:
Note that Parse() procedure does not exit until entire input file was read.
Upon launch above code should print:
Last row represents entire recognized input. It corresponds to FILE, the top-level nonterminal of grammar. Braces reveal the order of reductions.
Note that at the beginning parser read three symbols: two symbols k1, 123 necessary to perform first reduction CMD: KEYW, NUM and also third symbol k2. If you inspect the grammar carefully you will notice that reading first two symbols is completely sufficient to perform first reduction. In other words these first two symbols couldn’t be anything else but production CMD: KEYW, NUM. Why then parser decided to read one more symbol?
Parsers built by Alpag can be configured to read lookahead symbols always, or only when necessary. If not set explicitly Alpag will choose automatically which mode to use. For above, quite minimalistic grammar, Alpag decided to build parser that reads lookahead symbol always, whether necessary or not.
Add one more line to the head part of input file:
This option forces defaulting, that is performing default reductions without reading lookahead symbol.
Build parser once again, recompile test code and run the program.
This time output will be:
At the beginning parser reads two symbols, immediately decides that the only likely reduction at this point is CMD: KEYW, NUM and performs it right away. Final result, the string in last row, is the same as in earlier case, but the exact sequence of actions taken by parser is different.
Behavior of parsers generated by Alpag depends not only on input grammar but also on configuration of options controlling parser generation. Understanding these options is necessary to generate a parser that behaves exactly as expected.
Complete discussion of parser generation issues can be found in Parser section.