Integration with application

Once grammar is ready and lexer code was generated without errors it can be integrated into target application.

Code generated by Alpag does not make any assumption regarding where the input data is coming from. That is, generated lexer does not read by itself any input stream, memory table, array or other source of data. Code for reading input must be provided by user.

The interface used by Alpag lexer matches typical interface exposed by file or text streams. It is assumed that input is read in chunks (blocks) until the end of input is reached. User must provide own code for opening the stream, calling that stream’s read() method (or its equivalent) whenever lexer needs more data, and finally closing it when done.

Example code in this chapter does not read any real file or stream. Reading file is simulated by walking over a constant string. To simplify things even further entire code, including the input text, is located inside lexer class.

Lexers generated by Alpag can operate on byte streams or character streams. In the example a character-oriented lexer is used (set with Lexer.In.Formats option).

Here is complete lexer input file. Code to be placed inside lexer class is contained in %code blocks:

/* myLexer – my first lexer */

// Lexer is using character-oriented input
%option Lexer.In.Formats Chars
%option Lexer.Code.Lexer.ClassName "MyLexer"

// this code is placed inside lexer class body (in head part of class)
%code lexer_top {
// input string
const string INPUT_TEXT = "MY SAMPLE INPUT";
// current position in string (mimics file pointer)
int inputOffset = 0;

void Log( string message )
{
Console.WriteLine( message );
}

}

// this code is placed inside input-reading method
// the method usually reads input in chunks
%code read_input_chars {
// arguments which are available here:
// char[] buffer – buffer to be filled with input data
// int offset – buffer offset to place data at
// int count - max number of chars to place in buffer at offset
// return: count of characters read, must be at least one
// return zero on EOF

// return 0 if end-of-input was reached (0 counts as EOF)
if( inputOffset >= INPUT_TEXT.Length )
return 0;

// available space in buffer is usually large
// it is okay to return less than requested number of chars
// we shall return one character at a time
// there is always room in the buffer so no need to test 'count'
buffer[offset] = INPUT_TEXT[inputOffset];
inputOffset++;
return 1;
}

%%ldefs
LETTER [A-Z]
%%lrules
LETTER+ { return 23; }
\ +

Build lexer invoking:

alpag myLexer.lex

Now generated code can be embedded in target application.

Create console application and place following code inside its main method:

// default namespace for generated code
using Common;
// ...

MyLexer lexer = new MyLexer();
for(;;)
{
int token = lexer.NextToken();
// if token value is less than zero it is an error
if( token < 0 )
{
// the only harmless 'error' is end-of-file marker (= -1)
if( token == MyLexer.EOF )
break; // ok, we are done
// there is a number of predefined error codes
// ( you can also define custom error codes
// these must have values below MyLexer.USER_FIRST )
Console.WriteLine( "Error: " + token );
break;
}
// the lexer.NextToken() method returns user-defined constant
// returned from action-code (like 'return 23' in the example lexer)
// actual input text that was matched is stored in lexer's buffer
// several methods exist for fetching matched input text from buffer
// the most simple convenience method is:
string tokenVal = lexer.TokenValueGetString();
Console.WriteLine( "Token: " + token + " Value: " + tokenVal );
}

Upon launching this code should print:

Token: 23 Value: MY
Token: 23 Value: SAMPLE
Token: 23 Value: INPUT

Complete discussion of lexer generation and integration can be found in Lexer section.

< Generating lexer report | Parser >

Alpag Manual