Parser definitions section is declared using %pdefs or %pd.
When this section is present in input file, it is assumed that file contains parser definition.
Parser input is actually a sequence of terminal symbols coming from lexer. Whenever lexer detects a meaningful input sequence of characters it reports a symbol.
Symbol codes are integers. To exchange them, lexer and parser must settle a common set of unique codes for each possible input terminal symbol.
By default symbol codes are assigned by Alpag for each declared terminal as consecutive integer numbers. When necessary, user can also assign symbol codes manually. Both approaches can be mixed.
For performance reasons it is better to use codes assigned by Alpag.
When terminal symbol codes are assigned by parser, source of terminal symbols (lexer) must be using them when reporting matches. Sometimes source of tokens can be using its own fixed token identifiers which for some reason cannot be changed to parser-assigned ones. In such case parser must be configured to use these identifiers. To accomplish this parser terminals can be declared with manually assigned codes.
Alpag can use any positive integers for input terminal symbols. If custom codes are used it is better if these small integer numbers, preferably in continuous range starting from 1.
Symbols passed to parser can have associated value. Nonterminal symbols resolved during parsing also can have associated value. Type of value can be the same for all grammar elements or can be declared for each terminal and nonterminal separately.
If parser is using multiple value types, these must be declared explicitly using %valtype keyword. Such value type can be then referenced when declaring terminals and nonterminals by putting its name in angle brackets:
Terminals and nonterminals can be also declared with no value type at all.
One way of resolving ambiguity in grammar is using information about associativity of individual terminal symbols. Terminal symbols can be declared as right, left or non-associative.
Another way to resolve ambiguity is to group terminals in precedence classes. These can be used to resolve shift-reduce conflicts by choosing grammar construct which higher precedence terminal. By default all terminals have the same precedence.
Traditional way of declaring terminals with precedence is by using %precedence , %left or %right keyword. These keywords in single operation introduce a new precedence class and declare terminal symbols that belong to this class.
Alternative way is to declare a named precedence class using %prclass keyword. This named class can be then referenced from terminal symbol declaration putting all declared terminals in that class.
To reference precedence class use its name preceded with exclamation mark:
Terminal symbols are declared using %token, %left, %right, %nonassoc and %precedence keywords (described further in this section). Following types of information can be specified when declaring terminals using one of these keywords:
Declares named value type which can be used when declaring terminals or nonterminals.
Available options are:
name of programming language type
code declaring variable.
code template for accessing variable. Code must contain $$ placeholder for variable.
code template for getting value of variable. Code must contain $$ placeholder for variable. Effectively the same as %access code.
code template for setting value of variable. Code must contain $$ placeholder for variable and $1 placeholder for assigned value.
specifies that this type is a default type. Default type is used for all terminals that do not have value type specified explicitly.
disables warning with given integer code
Terminals can be also declared with no value type.
Functionality of declaring multiple value types is available in yacc via %union keyword.
Declares a named precedence class.
Order of %prclass declarations becomes also the order of their respective precedence classes. Precedence classes declared further in the file have higher priority.
The %prclass keyword introduces named class which can be further used to declare symbols anywhere in the file. Note that %precedence, %left, %right declarations used to declare tokens also introduce their own anonymous precedence classes which participate in global order of precedence classes.
Named precedence class can be later referenced using name prefixed with exclamation mark
Terminal symbol declaration can be introduced using one of following keywords:
Each keyword can be used to declare multiple tokens. All tokens declared on the same line have the same precedence class and associativity.
When token declaration starts with !precClass reference, all tokens declared on the same line are added to this precedence class.
Value type can be referenced anywhere in the line using <valtype> syntax. Such value type is assigned to all tokens that follow it (until end of line or another value type reference).
General format of declaration is as follows (using %token as example). Square brackets denote optional elements):
where:
Token identifier declaration can have one of following forms (square brackets denote optional element):
where:
Either integerId or 'character' can be specified but not both. If neither is given, code for nonterminal is assigned automatically.
Do not use single quoted characters in declarations (like 'c') if you do not need symbol codes to match ASCII codes of these characters. If you want reference tokens from grammar using string declare tokens using double quotes (like "c") which does not interfere with symbol code assignment.
To maintain backward compatibility Alpag is using the same set of keywords for declaring terminal symbol as yacc that is: %token, %left, %right, %nonassoc, %precedence. Syntax of these commands has been extended in Alpag but their default behavior is the same as in yacc. This means that yacc grammar can be copied verbatim to Alpag providing the same behavior.
Declares nonterminals with optional value type.
where:
Grammar nonterminals do not have to be declared explicitly. Each identifier that appears in grammar productions and was not declared as a terminal is considered a valid nonterminal.
Declares named error token.
where options are:
Code which is executed when parser enters error recovery mode.
Code executed in error recovery mode on each not matched and skipped input token
Code executed when error recovery is over
With this option, when error recovery starts cached lookahead token is discareded
With this option, when error recovery starts cached lookahead token is left for analysis
By default parser contains only one predefined error handling token named error.
If required error-recovery behavior differs from location to location, user can define multiple error tokens providing specific behavior and custom error handling code.
User-defined error tokens can be used anywhere built-in error token is.