A simple table with multiple Unicode characters, their respective code points, UTF-8 and UTF-16 encodings, and names. The last two characters are supplementary characters.
a U+0061 61 0061 "Latin Small Letter A" ~ U+007E 7E 007E Tilde ¥ U+00A5 C2_A5 00A5 "Yen Sign" » U+00BB C2_BB 00BB "Right-Pointing Double Angle Quotation Mark" ½ U+00BD C2_BD 00BD "Vulgar Fraction One Half" ¿ U+00BF C2_BF 00BF "Inverted Question Mark" ß U+00DF C3_9F 00DF "Latin Small Letter Sharp S" ä U+00E4 C3_A4 00E4 "Latin Small Letter A with Diaeresis" ï U+00EF C3_AF 00EF "Latin Small Letter I with Diaeresis" œ U+0153 C5_93 0153 "Latin Small Ligature Oe" € U+20AC E2_82_AC 20AC "Euro Sign" 東 U+6771 E6_9D_B1 6771 "CJK Unified Ideograph-6771" 𝄞 U+1D11E F0_9D_84_9E D834_DD1E "Musical Symbol G Clef" 𠀇 U+20007 F0_A0_80_87 D840_DC07 "CJK Unified Ideograph-20007"
An empty text file, containing only the preamble bytes.
A text file containing four lines. The last line is empty. A text editor that interprets this file as POSIX text file will only display three lines instead of four, because a line in a POSIX text file must end with a line feed character opposed to ReliableTXT where two lines are separated by a line feed character.
A text file containing two long lines (13.000 characters per line). Other than POSIX text files, ReliableTXT does not limit the length of a line.
A text file containing the 32 C0 control characters (U+0000-U+001F). A ReliableTXT compliant text editor will only display two lines, because only the line feed character will be interpreted as line break. The first line therefor contains the 10 characters U+0000-U+0009 (NUL, SOH, STX, ETX, EOT, ENQ, ACK, BEL, BS, TAB). The second line contains the 21 characters U+000B-U+001F (VT, FF, CR, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, CAN, EM, SUB, ESC, FS, GS, RS, SU). Other than in POSIX text files, the null character U+0000 is allowed in ReliableTXT files.
A text file containing the 7 Unicode line break characters. A ReliableTXT compliant text editor will only show a line break when the line feed character occures (U+000A).
A text file containing the first 100 characters of the CJK Unified Ideographs block (U+4E00-...). In this example the UTF-8 encoding creates a bigger file than the UTF-16 encoding.
The table example with all 4 ReliableTXT encodings (UTF-8, UTF-16, UTF-16-Reverse, UTF-32) but without the preamble.
The table example written with UTF-32-Reverse encoding (Little Endian) with and without a preamble. A ReliableTXT compliant text editor will read the example with the preamble wrongly as UTF-16-Reverse encoded file.
Four files containing invalid encoded data. The UTF-8 example contains an invalid 0xFF byte. Both UTF-16 examples contain an unpaired surrogate, starting with a high surrogate that is not followed by a low surrogate. The UTF-32 example contains a codepoint that is higher than the maximum codepoint U+10FFFF. A ReliableTXT compliant text editor must show an error message, instead of ignoring or replacing invalid characters.