Generating lexer report

Alpag can generate report file which contains summary of input grammar and description of generated automaton.

To get the report run Alpag once again, this time with –lr (lexer report) switch:

alpag –lr myLexer.alp

This time Alpag generates one more file: myLexer.lx.txt.

You can also use additional –lro <options> switch to control what exactly will be added to the report. Default configuration is suitable for most uses.

Generated file contains a number of sections.

The rules section contains list of rules that were effectively read.

rules
&1
modes: INITIAL
regex: {LETTER}+
&2
modes: INITIAL
regex: \ +

Rules can grouped in modes. By default there is only one mode: INITIAL.

If rule structure is complex you can outline it by generating report with option –lro all,rt.

The symbols section lists all distinct nonoverlapping character ranges used in your grammar. You should see the following:

symbols
%1 symEOL
%2 elements: A..Z charCount: 26
%3 elements: 0x20

This means that whenever lexer encounters input character in A-Z range it translates it to internal symbol %2, while space (0x20) is translated to symbol %3. Symbol for EOL is generated automatically for end-of line handling mechanism.

You don’t have to be concerned with codes assigned to inner symbols unless you plan to analyze inner tables of lexer automaton.

Input rules are internally expressed using calculated symbols. You can see this representation in effectiveRules section:

effectiveRules
mode 0 INITIAL
&1
regex: [%2][%2]*
&2
regex: [%3][%3]*

These rules are then used to build the automaton :

automaton
states
@1
goto @2 under %2
goto @3 under %3
@2
accept &1
goto @2 under %2
@3
accept &2
goto @3 under %3

The automaton section lists states and transitions.

Automaton always starts in state @1. If input character is in range A-Z (symbol %2), automaton performs transition to state @2 (goto @2 under %2).

State @2 is an accepting state for rule &1 (rule matching one or more letters). If more letters (symbol %2) appear on input, automaton will perform transition from state @2 back to @2 (ready to accept any number of letters).

State @3 works in similar matter for sequences of spaces (symbol %3).

Inspecting lexer automaton is almost never necessary. Lexer behavior can be usually easily explained analyzing grammar alone. Only when fighting very stubborn errors, you may ever need to look at the automaton.

XML and HTML report

Using the –lrx option you can generate an XML report. Setting this option produces one more file:

myLexer.lx.xml

XML report is intended mainly for export and interchange of information with third party tools.

Alpag comes with XSL template file lexer.xsl that can translate XML report to HTML for easy browsing. You must use third party XSL translator (like msxsl) to perform the task. When using msxsl, the HTML report can be generated as follows:

msxsl myLexer.lx.xml lexer.xsl –o myLexer.html

Contents of the generated HTML file is generally the same as in textual report, with hyperlinks added for easier browsing.

< Lexer grammar | Integration with application >

Alpag Manual