alpag.net manual
Input file format / Lexer rules section
< Parser definitions | Parser rules section >

Lexer rules section

Lexer rules section is declared using %lrules or %lr. It can contain only lexer rule definitions.

Rule format

Each lexer rule is defined by regular expression pattern at the beginning of the line, followed by space and optional user code. The code is executed, when the pattern was matched.

General format of lexer rule is as follows (square brackets denote optional elements):

[<mode1,mode2...>][<<EOF>>]regularExpression [options] [{ code }]

where:

There can be no spaces between modes list, <<EOF>> token, and regular expression.

If rule has no modes specified it is active in INITIAL mode and all modes declared with %s.

Format used by yacc allows 'bare' code, not wrapped in curly braces. This syntax is not allowed with Alpag.

Options

Regular expression pattern can be followed by options. Allowed options are:

[identifier]
%name identifier

Defines optional given name (identifier) of this rule.

%use namedCodeIdentifier

References a predefined named code which will be fired when this rule is matched. This can be used instead of {code} section.

%return retcodeName <valueType> parserTokenIdentifier
%return <valueType> parserTokenIdentifier
%return retcodeName parserTokenIdentifier
%return parserTokenIdentifier

The %return keyword automatically generates code which reports match using given parserTokenIdentifier. This mechanism can be used only if file contains also definition of a parser. Specified parserTokenIdentifier must match one of tokens declared for the parser.

Return code is generated using template specified in matching %retcode declaration. Return code is looked up using valueType and retcodeName. When retcodeName is not specified a default retcode for given value type is assumed. When value type is not specified either, lookup is done using value type of token as declared in parser grammar. To use this last lookup variant, value type names used in parser grammar and value types declared with %retcode must match.

The %return option replaces both %use command and explicit code section.

%warnoff warningCode

Locally disables warning with given code (decimal or hex). Multiple %warnoff options can be specified, each disabling single warning.

Chaining patterns

It is possible to assign one common block of code to multiple rules. This can be done using pipe '|' character at the end of all but last lexer rule in the group like this:

pattern1 |
pattern2 |
...
patternN { code }

Grouping by modes

Often multiple lexer productions have the same set of modes. It is possible to define this set of modes once for these rules wrapping them in curly braces like so:

<mode1,mode2> {
lexerRule1
lexerRule2
// ...
}

Above mechanism does not support nesting.

Examples

[A-Za-z]+ { return TOKEN_WORD; } // typical lexer rule
<COMMENT>[^*] + // rule active only in specific mode and taking no action
<*>BREAK { return TOKEN_BREAK; } // rule active in all modes
[0-9]* | // two rules…
0x[0-9A-F] { return TOKEN_DEC_OR_HEX; } // …with common code block
<INITIAL,SPECIAL>{ // rules with common set of modes
[0-9] { return TOKEN_DIGITS; }
[A-Z] { return TOKEN_LETTERS; }
}
< Parser definitions | Parser rules section >
Alpag Manual