Parser rules section is declared using %prules or %pr.
A parser rule starts with a nonterminal which must be placed at the beginning of line. It is followed by colon ':'. After colon come right hand side elements of production, terminals or nonterminals. It is possible to declare multiple right-hand sides (and thus multiple grammar productions) by separating them with pipe '|'. A grammar rule is terminated with semicolon ';'.
Each production can contain code block which is executed when production has been fully recognized by parser and is just about to be reduced. This code block is placed at the end of production (after last right-hand side element).
Production can also contain mid-rule code blocks which are placed between right hand side elements. Code block cannot be placed at beginning of production before any elements.
Each production can have a number of options which are placed after all elements but before terminal code block. Moreover individual elements of production can have their own modifiers.
Below is the outline of most general grammar production form (some elements shown are optional). Detailed explanation of individual elements is provided further in this chapter.
Right hand side of production can be of zero length. To improve readability %empty token can be placed instead of empty sequence of elements.
Nonterminals (left and right hand side) are referenced using their identifier, like:
Terminals can be referenced by identifier, user-defined string name, or character (which was used to assign code to terminal):
When referencing terminals both single or double quotes can be used, regardless whether user defined name or character code is used.
Typical form of production is:
It specifies left hand side nonterminal, sequence of right hand side elements and code executed when parser reduces entire production.
Occasionally mid rule-code is placed between right-hand-side elements:
Mid-rule code is executed when parser reaches its location within production.
When mid-rule code fires, further right-hand-side production elements has still not been read and recognized. There is no guarantee that these further elements will be matched. This means that mid-rule code can fire even if entire production is never matched.
Multiple productions for the same left-hand-side nonterminal can be declared within single rule, separated with pipes '|':
Productions for the same nonterminal can be declared using separate rules as well.
The left hand side nonterminal can be declared with optional name and optional value type:
name can be used to reference value of nonterminal in code blocks
valueType declares value type of nonterminal. This way of declaring value type can be used exchangeably with %type command. If both appear, these must be non-conflicting.
The right hand side elements can also have modifiers:
where:
A single nonterminal must be selected as start nonterminal for the grammar. This nonterminal becomes topmost construct of grammar corresponding to entire input stream. By default left-hand-side nonterminal of first grammar production declared in grammar becomes start nonterminal.
Start nonterminal can be explicitly declared using command:
This command must be placed in parser definitions %%pdefs section.
Each production can be declared with explicit lookahead context. This context determines what terminal symbols are allowed next on input stream for production to be matched.
Lookahead context is specified as one or more comma-separated terminals placed at the end of production after slash '/':
Above production will be matched only if next input terminal is lahTerm1 or lahTerm2.
Grammar can contain multiple productions with the same left and right hand sides providing their lookahead contexts are disjoint. In this group there can be one production with no lookahead context. It will match all lookahead symbols not covered explicitly by other rules.
Normally Alpag calculates possible lookahead symbols itself. Using explicit declaration enables fine-tuning parser behavior.
Below is a list of options that can be placed at the end of production:
Specifies precedence of this production, used to resolve shift-reduce conflicts.
With first syntax precedence of production becomes the same as precedence of referenced terminal symbol. Second syntax takes precedence of named precedence class.
Using this option overrides any precedence that could be derived from elements of production.
Disables a single warning. Warning code can be decimal or hexadecimal. Multiple %warnoff options can be specified for single production.
Both these options have the same meaning: require that automaton at the end of production should be defaulted. When state is defaulted, production is reduced without analyzing lookahead token.
Parser generation will fail if end-of-production state cannot be defaulted.
This option disables defaulting of automaton state at the end of production, even if defaulting is possible.
This option ignores (disables) all shift-reduce conflicts for production
This option ignores (disables) shift-reduce conflicts under given terminal symbol. Option can be specified multiple times for different symbols.
This option ignores (disables) shift-reduce conflict against given place (declared in other production). Option can be specified multiple times for different place identifiers.
This option ignores (disables) a given number (count) of shift-reduce conflicts for this production, regardless of their character. When used in combination with other %igsr options it covers only conflicts not resolved by other means.
Code associated with parser rules is executed when production is reduced. The code usually calculates value of left-hand-side nonterminal (being result of reduction) using values of right-hand-side production elements. To perform this task code must have access to all these values.
Following placeholders can be used inside in code handling reduction:
Name is matched against:
To use $NAME syntax referenced name must be unique within production. If name of terminal or nonterminal is used, there can be only one such element in production. If user-assigned name is used it must be unique within production.
Both integer and name arguments can be wrapped in square braces:
(usage of operators in code is purely illustrative):
Parser can be configured to process information about file locations.
Information about location of individual elements can be accessed using following placeholders:
Format of variables storing file location information depends on configuration.
The value type of expressions $$, $1, $2... is taken from declaration of respective terminal, nonterminal or mid rule code. It is possible to override this default behavior using type casting:
Consequences of casting value depend on the way value types are handled. By default alternative value types are stored in dedicated fields of a value data record, so forcing value type results simply in referencing another field of this record.
Value types can be also declared using custom code template for both field declaration and access. In such case consequences of casting value type depend on the way underlying fields are stored.
It is also possible to access entire value data record using asterisk '*' like so:
With this syntax all fields of value data record can be accessed using syntax like $<*>$.MyField.
By default expression like $$ denotes a simple field which can be used for both getting and assignment. Sometimes it may be necessary to use different syntax for setting and getting the value.
Alpag provides special syntax for assignment:
Assigned-to value type must have %set code template defined.
The %set code must contain:
Note that $$ stands for entire record with value data, not just single field holding value for that single type.
Above mechanism can be used with both, value field generated by Alpag, or with completely custom code.
Field named MyType is added automatically to value data structure (via %type declaration).
Getting value of this field is done in usual way. Setting is done via provided setter template.
If reduction code is:
Generated code will be
where dataR, data2, data4 stand for actual locations of value data records.
Above declaration adds field named UserField to the value data structure.
If reduction code is:
Generated code will be:
Assignment syntax can be used even if %set code template was not declared. In such case assignment:
is replaced with:
A number of predefined macros can be used within reduction code. These are: