Several options control setting and switching encoding.
Option Lexer.In.EncodingSwitchable enables switching encodings in runtime. When set to false lexer will support only one default encoding.
The Lexer.In.Encoding option sets input encoding. If lexer is configured to support multiple encodings, provided value becomes the default encoding. When switching encodings is disabled, provided value becomes the only supported encoding.
Explicit list of supported encodings can be provided using Lexer.In.Encodings option. When this option is left empty all encodings are supported.
When lexer starts reading input stream, encoding can be detected automatically using methods like detection of BOM (Byte Ordering Mark).
Autodetection can be enabled using Lexer.In.EncodingAutodetect option. Autodetection will consider only encodings listed in Lexer.In.EncodingSet. Encoding specified in Lexer.In.Encoding option is a default guess. Mechanism can be further customized using Lexer.In.EncodingAutodetectOptions option, with flags:
The Lexer.In.Endiannes controls endiannes of input stream for two-byte encodings (like UCS2 or UTF16). When encoding autodetection is on, value provided here is a default endiannes.
When Lexer.In.EatBOM is set to true, Byte Ordering Mark (BOM), if present, is deleted from input, and not passed to the lexer.
If input stream contains byte sequences that are not valid in current encoding, an error is reported.
Error reaction can be controlled using Lexer.In.ErrorReaction option. Possible values are:
The Lexer.In.ErrorReplacementChar specifies replacement character to use.
Replacement character should be within character range of configured input encodings. Default replacement char is space (' ') which is safe for all encodings.
When input encoding ASCIICP is set, lexer translates byte-oriented input to Unicode characters using provided map. The map should be assigned to InCodeToUnicodeCharMap field. The map should have 256 entries with Unicode character codes for each possible input byte. User is free to provide any map.
Multiple predefined maps for some well-known encodings can be added directly to lexer using Lexer.In.EncodingMaps option. One of these maps must be assigned to InCodeToUnicodeCharMap before processing.