alpag.net manual
Introduction / General concepts
< Introduction | Lexer >

General concepts

This section explains the rationale for using automatic lexer and parser generators and the motivation behind two-step lexing/parsing process.

Role and purpose of parser generator

Parsing text is a typical computing task. Code performing this task, a parser, can be written manually or generated by a dedicated tool: a parser generator. Parser generator can generate code capable of parsing text conforming to grammar provided by user. This generated code can be included as a part of any application.

When language features simple grammar, parser for that language can be usually written by hand, often resulting in code which is faster and easier to understand than an automatically generated code would ever be. If language has complex grammar, writing parser completely manually is not easy. In such cases automatic parser generator is preferred.

Using automatic parser generator has several advantages over manual approach:

There are also disadvantages:

For big and complex grammars it is usually impractical to write parsers by hand, so using an automatic parser generator is inevitable. Decision remains which tool to choose.

Alpag deals with above disadvantages in a number of ways:

Approach to parsing

When reading input stream one must recognize such language elements like words, or sentences. Some of these constructs, like elementary words, can be easily distinguished using just their textual properties without analyzing entire text from start to end. Such analysis can be thus performed locally. On the other hand higher-level language constructs, like entire sentences, usually require considering context (like surrounding words) to properly identify role of individual language element. Such analysis cannot be done locally but must consider entire phrase.

Difference in requirements for low-level and high-level parsing paves the way for using two analyzers: one for discovery of low-level lexical elements and another for performing high-level text-wide analysis. Alpag follows this two-level approach and can generate two analyzers:

It is possible to build a monolithic parser, capable of analyzing input stream starting from individual characters up to the level of entire sentences. Two-stage approach is far more elastic though. Combined power of two analyzers can be greater than power of a single monolithic parser. Moreover intermediate code can be placed between lexer and parser giving even more elasticity.

Alpag can generate a lexer alone a parser alone or both. There is no obligation to use lexer and parser both generated by Alpag. User is free to combine either of these analyzers with code written by hand or generated using third-party tools.

This manual presents lexer generation and parser generation issues is separate chapters. Still it should be understood that parser and lexer must cooperate in the task of analyzing input. In practice lexer and parser grammars are written in parallel and corrected one against the other.

< Introduction | Lexer >
Alpag Manual