alpag.net manual
Lexer / Programming interface / More and Less
< Modes | Rejecting >

More and Less

By default each invocation of NextToken() discards previously match text and continues search from position at the end of previous match. Two mechanisms called colloquially 'more' and 'less' can be used to modify this behavior.

More

Lexer can be instructed to leave recently matched text in output buffer. In such case, after next match is found, it is appended to value of previous match. This way output token value can accumulate results of several subsequent matches.

This mechanism must be explicitly enabled setting Lexer.MoreEnabled to true. Once enabled following methods become available

CurrentMatchPrependToNext()
MORE()

Invoking these methods tells lexer to prepend current match to results of next match. It has effect on next match only. To continue concatenation it must be called again after each match.

CurrentMatchDiscardBeforeNext()
NO_MORE()

Calling these methods disables more reverting effect of call to CurrentMatchPrependToNext().

Data for output token value can be stored in input, lexer or output buffer, depending on configuration of buffers. Using more may cause such buffer to grow. User should limit using more to just a few logically connected consecutive tokens, and avoid uncontrolled growth of concatenated value.

Less

By default each invocation of NextToken() continues from position in input stream at the end of recently matched text. It is possible to make lexer step back and rescan recently matched text or its part.

Lexer can step back only as far as the beginning of recently reported match (or sum of matches concatenated with more).

The less mechanism must be explicitly enabled by setting Lexer.LessEnabled to true. Once enabled following methods become available:

CurrentMatchLeaveForRescan( [int exceptFirstN] [,RunLengthKind kind] )

LESS( [exceptFirstN] )

Invoking these methods moves current position in input stream to position at the beginning of recently reported match plus optional offset given by exceptFirstN, which must be in range 0 to TokenValueLength.

Activating less decreases length of current value of token which is trimmed to specified exceptFirstN. Characters past exceptFirstN are immediately returned to input buffer. To process entire recently matched value, one should do it before invoking LESS.

The CurrentMatchLeaveForRescan method accepts also one additional parameter which specifies interpretation of exceptFirstN length. Possible values are:

By default output buffer units are assumed.

Note that lexer will not allow to split any character 'in half'. If value specified in exceptFirstN falls in the middle of multibyte-encoded character lexer will skip to end of this sequence.

Activating less has effect on next match only.

Once activated less can be also turned off using:

CurrentMatchNoRescan()
NO_LESS

This is equivalent to calling LESS with full length of most recently matched text.

Usually LESS is used in combination with modes to rescan input with different set of active lexer rules.

LESS can be used together with MORE. In such case the exceptFirstN parameter is counted starting from beginning of entire text including portions remaining from previous matches. It is possible to step back to any point within this text, also fragments from previous searches.

Using LESS introduces additional overhead and disables certain optimizations. LESS(0) is cheaper than LESS with nonzero value.

It is possible to fall in endless loop using LESS(0).

Example

Using grammar:

A* { if(TokenValueLength > 1 ) LESS(TokenValueLength-1 ); }
B* { ... }

and input text:

AAABB

Lexer operation will be as follows:

1. match input. Rule A* is matched with text AAA (TokenValueLength == 3). Current position in input stream is after AAA.

2. execute code for the rule. LESS( 2 ) is called. Current position in input stream is after AA.

3. match input. Rule A* is reported with matched text A (TokenValueLength == 1). Current position in input stream is after AAA.

4. execute code for rule. Nothing is called. Current position in input steam is still after AAA.

5. match input. Rule B* is matched with text BB.

< Modes | Rejecting >
Alpag Manual