Lexer grammar can have multiple rules matching various tokens that can possibly appear on input stream. Lexer reads subsequent characters from input stream and tries to match them against all rules. If, after having read a few characters, some rules don't match current input sequence, lexers ignores such rules and continues operation for other rules that still give hope of match. Lexer continues the walk until no rule can match any more characters. Then it stops and reports rule that gave longest match.
Lexer operation principles can be summarized as follows:
Above rules are illustrated using example (see Fig. 9 ).
Using grammar with two rules:
With input text:
ABBCCX
When lexer starts reading input, both rules are plausible options.
After reading first three letters ABB both rules are still likely options. Rule AB* gives full match for this sequence and could be potentially reported as matching. Rule A[BC]*D does not match current input yet, since it requires terminal 'D', but it gives hope for longer match.
After next letter C, current input sequence becomes ABBC. This sequence does not match rule AB*, so lexer ignores that rule. However the other rule A[BC]*D still gives hope for match, so lexer continues the scan.
When letter X is reached, input sequence is ABBCCX. Rule A[BC]*D does not match it.
Lexer has to report a longest match. It gets back to rule that gave longest match so far but was abandoned in search for better match. In this case the rule giving longest valid match is AB* with value ABB. Lexer reports match for this rule.