Buffers

Buffer allocation

Lexer during operation must allocate space for buffers. By default memory allocation is done by lexer. User can override default memory allocation methods to get better control over the process. Methods for allocating and releasing buffer space are:

byte[] InByteBufNew( int size )
void InByteBufDelete( byte[] oldBuf )
char[] InCharBufNew( int size )
void InCharBufDelete( char[] oldBuf )
int[] LexBufNew( int size )
void LexBufDelete( int[] oldBuf )
byte[] OutByteBufNew( int size )
void OutByteBufDelete( byte[] oldBuf )
char[] OutCharBufNew( int size )
void OutCharBufDelete( char[] oldBuf )

Allocation methods ('New') should return new buffer of at least requested size. More than given size can be allocated (and lexer will use that excess size). Allocated buffer does not have to be cleared (it can contain garbage)

Buffer passed to 'Delete' methods can be either discard or reuses for any purpose.

Switching input buffers

Sometimes it is necessary to switch lexer to another input stream without resetting its operation.

When option Lexer.Buffers.InBufSwitchingEnabled is set to true, lexer allows switching between different input buffers during run.

When user switches lexer to another buffer, old buffer previously used by lexer, is returned. Old buffer context is saved in structure:

class LexerInBufState {
public object UserTag; // custom user data
// ...
}

The structure is allocated by lexer and returned to user. Structure contains old buffer along with supporting fields. User may delete the structure, or preserve it and pass later on back to lexer restoring input buffer state.

Lexer class contains InBufUserTag field. This field is automatically saved to and restored from UserTag field of LexerInBufState structure when switching buffers. User can save any information there.

Saving and restoring buffer

Methods for manipulating buffer state are:

LexerInBufState InBufSaveAndResetState( [options] )

Saves current input state and returns it to the user. Lexer is initiated with new input buffer.

void InBufRestoreState( LexerInBufState state [,options] )

Switches lexer input buffer to provided state. Previous lexer buffer is discarded. The contents of state passed as argument is cleared and cannot be used again (should be discarded by user)

LexerInBufState InBufReplaceState( LexerInBufState state [,options] )

Switches lexer input buffer to provided state. Previous lexer buffer state is saved and returned to user. The contents of state structure passed by user is overtaken by lexer and emptied. User cannot pass it to lexer second time.

void InBufDeleteState( LexerInBufState state )

Deletes buffer state which will not be used again.

In most cases obsolete LexerInBufState objects can be simply abandoned. It is safer though to formally delete them via dedicated lexer method. Deleting old lexer buffers this way invokes any custom user-provided buffer cleanup methods.

Buffer stack

Setting Lexer.Buffers.InBufStackEnabled option to true enables additional buffer handling mechanism: the buffer stack. When buffer stack is enabled, to two additional methods are available:

void InBufPushAndResetState( [options] )

Saves current input buffer on lexer internal stack. Lexer is reset to new empty input buffer.

bool InBufPopState( [options] )

Discards current input buffer and reverts to buffer that was recently saved on stack. If stack is empty this method returns false and current lexer buffer is not discarded.

Above two methods and methods for manually saving and restoring buffer state can be used together. This enables swapping current lexer buffer while preserving contents of stack.

Handling EOF

Methods ReadInputBytes and ReadInputChars report end of input stream by returning zero or EOF code. This brings lexer to state when no more input will be accepted. This also prevents switching lexer to another buffer to continue reading.

It is possible to change this behavior by providing implementation of ContinueOnEOF() method.

Whenever one of input reading methods reports end of stream, the ContinueOnEOF() method is invoked. If method returns false, lexer stops. If method returns true, lexer will continue reading input.

Implementation of ContinueOnEOF() method is controlled by setting Lexer.Code.ContinueOnEOFImpl to either UserCode or Virtual. If first option is chosen implementation must be provided by user in continue_on_eof code block and is inserted in method body. In second case user should override ContinueOnEOF() in a subclass.

ContinueOnEOF() method is primarily intended to switch lexer to another input buffer and another stream associated with that buffer. Buffer should be switched from within this method.

Note that when ReadInputBytes or ReadInputChars returns EOF, lexer marks current input buffer as closed. Lexer will not attempt another read to such buffer. This behavior assumes that current buffer is going to be swapped for another, still active buffer.

User may provide custom implementation that switches input stream to another, while keeping the same buffers. To continue reading to current buffer after EOF was reached, user should invoke:

InEndOfFileClear()

This method clears end-of-file indication for current input buffer, allowing more data to be appended to it, possibly from another input stream.

Handling start of file

When switching buffers information about current position in input stream is saved together with buffer. It is also reset when new empty buffer is provided. Valid position information is important for lexers tracking file position (like row column). It is also important if automatic detection of input encoding is used.

When switching buffers, position information can be controlled by setting following options:

KeepPosition does not change current file position (column and row) when switching to another buffer. If this option is not set, position will be changed to that saved with buffer being activated.
NotStartOfFile applies when new buffer is allocated. Specifying this flag explicitly informs lexer that newly allocated buffer is not a start of new stream, and no encoding autodetection should be performed.
By default allocating new buffer is interpreted as opening new input stream and lexer will attempt to detect its encoding even if KeepPosition was specified.

Lexer can be manually forced to enter start-of-file condition by invoking:

SetStartOfFile( [options] )

This method should be used only if recent buffer switch resulted in positioning lexer at beginning of input stream.

< Encodings | Modes >

Alpag Manual