alpag.net manual
Lexer / Programming interface / Basic interface
< Programming interface | Token Value >

Basic interface

Lexer class constructor can be instantiated without any parameters:

MyLexer lexer = new MyLexer();

Reading input stream and providing input to lexer must be handled by user code (as described below). Once user is ready to provide input to the lexer, actual processing analysis can begin.

To get next match user invokes NextToken() method:

int token = lexer.NextToken();

NextToken() reads input (using custom code provided by user) and finds longest possible match. Once match is found, NextToken() executes user-defined code for matched rule. If this user-defined code contains a return statement, then value associated with that statement is returned from NextToken(). If user code contains no return statement, or if no user code is attached to the rule, NextToken() will not return but instead try to find next match.

Internal structure of NextToken() method can be schematically presented as follows:

int NextToken()
{
for(;;) // loop until EOR or ERROR
{
matched_rule_or_status = ReadInputAndFindLongestMatch();
if( matched_rule_or_status is EOF_or_ERROR )
{
return matched_rule_or_status; // EOF or ERROR;
}
switch( rule )
{
case rule1:
// user code for rule 1
// do something and return token
return USER_CONSTANT_FOR_RULE1;
case rule2:
// user code for rule 2
// do something but don’t return
break;
// no user code provided for other rules...
}
}
}

Usually code attached to rules returns a unique user-defined constant which describes matched rule. In such case NextToken() quits after each match and must be reinvoked to get next match. This mode of operation is called 'pull' operation.

Alternatively user may choose to do entire processing inside rule-attached code without returning. In such case NextToken() will not exit until entire input has been read. This approach is known as 'push' operation.

Using custom code attached to rules any combination of above operation modes can be achieved.

< Programming interface | Token Value >
Alpag Manual