Input handling options
Lexer.In.Encoding
Input encoding format. When encoding switching is enabled, this is the default encoding.
ASCII
-
ASCII 7-bit encoding
ASCIICP
-
8bit encoding with code page translation
ASCIIRAW
-
8bit encoding without interpretation of 128..255 range
UTF8
-
Unicode UTF-8 variable length encoding (full Unicode coverage)
UCS2
-
Unicode UCS2 fixed two byte encoding (BMP-only coverage)
UTF16
-
Unicde UTF-16 variable two to four byte encoding (full Unicode coverage)
Lexer.In.EncodingSet
List of input encodings supported by lexer (multiple values allowed). Leave empty for all encodings
ASCII
-
ASCII 7-bit encoding
ASCIICP
-
8bit encoding with code page translation
ASCIIRAW
-
8bit encoding without interpretation of 128..255 range
UTF8
-
Unicode UTF-8 variable length encoding (full Unicode coverage)
UCS2
-
Unicode UCS2 fixed two byte encoding (BMP-only coverage)
UTF16
-
Unicde UTF-16 variable two to four byte encoding (full Unicode coverage)
Lexer.In.EncodingMaps
Enables input code maps for some known encodings. These maps can be used in ASCIICP mode.
_none
Windows
ISO8859
Lexer.In.ZeroCountAction
When reading input and zero bytes is returned, setting this option results in reading again (until explicit EOF). Default behaviour is to treat zero as EOF
EOF
-
Zero byte count is interpreted as end-of-file
RepeatRead
-
When zero bytes is returned, another read is performed until explicit EOF is reached
Lexer.In.Format
Input buffer format (bytes or characters)
None
Bytes
-
Buffer is byte array
Chars
-
Buffer is character array
BytesChars
Lexer.In.FormatDefault
Input buffer default format if multiple formats are enabled
Bytes
-
Buffer is byte array
Chars
-
Buffer is character array
Lexer.In.EncodingSwitchable
Enables switching input buffer encoding in runtime.
Lexer.In.EncodingAutodetect
Enables input encoding autodetection.
Lexer.In.EncodingAutodetectOptions
Enables non-standard input encoding autodetection mechanisms. By default only Byte Order Mark (BOM) is interpreted.
None
SkipWide
-
Disables detection of wide encodings (UCS2, UTF-16)
WideIsUCS
-
When wide encoding is detected, assume UCS2 (not UTF-16)
SkipWideNonStandard
-
Do not perform non-standard tests for wide encoding (used in absence of BOM)
Lexer.In.EatBOM
When set, initial Byte Order Mark (BOM) character is consumed (not passed to lexer)
Lexer.In.Endiannes
Default byte order (endiannes) for input bytes. Ignored for char buffers.
LE
-
Little Endian ordering (lowest byte comes first)
BE
-
Big Endian ordering (highest byte comes first)
Lexer.In.ErrorReaction
Error reaction to invalid input byte sequence
Throw
-
Throw exception
ReturnError
-
Return error code from lexer
IgnoreSkip
-
Ingore and skip invalid input chars. Resync on next valid char
ReplaceWithChar
-
Replace errornous input chars with replacement char
Lexer.In.ErrorReplacementChar
Replacement character for unrecognized input characters. Using characters outside of intended lexer character-map width is not advised (ASCII range is always safe).