Stenway Developer Network

ReliableTXT Test Files

Example01 - Table

A simple table with multiple Unicode characters, their respective code points, UTF-8 and UTF-16 encodings, and names. The last two characters are supplementary characters.

a 	U+0061    61            0061        "Latin Small Letter A"
~ 	U+007E    7E            007E        Tilde
¥ 	U+00A5    C2_A5         00A5        "Yen Sign"
» 	U+00BB    C2_BB         00BB        "Right-Pointing Double Angle Quotation Mark"
½ 	U+00BD    C2_BD         00BD        "Vulgar Fraction One Half"
¿ 	U+00BF    C2_BF         00BF        "Inverted Question Mark"
ß 	U+00DF    C3_9F         00DF        "Latin Small Letter Sharp S"
ä 	U+00E4    C3_A4         00E4        "Latin Small Letter A with Diaeresis"
ï 	U+00EF    C3_AF         00EF        "Latin Small Letter I with Diaeresis"
œ 	U+0153    C5_93         0153        "Latin Small Ligature Oe"
€ 	U+20AC    E2_82_AC      20AC        "Euro Sign"
東 	U+6771    E6_9D_B1      6771        "CJK Unified Ideograph-6771"
𝄞 	U+1D11E   F0_9D_84_9E   D834_DD1E   "Musical Symbol G Clef"
𠀇 	U+20007   F0_A0_80_87   D840_DC07   "CJK Unified Ideograph-20007"

Example02 - Empty

An empty text file, containing only the preamble bytes.

Example03 - Four Lines

A text file containing four lines. The last line is empty. A text editor that interprets this file as POSIX text file will only display three lines instead of four, because a line in a POSIX text file must end with a line feed character opposed to ReliableTXT where two lines are separated by a line feed character.

Example04 - Long Lines

A text file containing two long lines (13.000 characters per line). Other than POSIX text files, ReliableTXT does not limit the length of a line.

Example05 - C0 Control Characters

A text file containing the 32 C0 control characters (U+0000-U+001F). A ReliableTXT compliant text editor will only display two lines, because only the line feed character will be interpreted as line break. The first line therefor contains the 10 characters U+0000-U+0009 (NUL, SOH, STX, ETX, EOT, ENQ, ACK, BEL, BS, TAB). The second line contains the 21 characters U+000B-U+001F (VT, FF, CR, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, CAN, EM, SUB, ESC, FS, GS, RS, SU). Other than in POSIX text files, the null character U+0000 is allowed in ReliableTXT files.

Example06 - Unicode Line Break Characters

A text file containing the 7 Unicode line break characters. A ReliableTXT compliant text editor will only show a line break when the line feed character occures (U+000A).

Example07 - CJK

A text file containing the first 100 characters of the CJK Unified Ideographs block (U+4E00-...). In this example the UTF-8 encoding creates a bigger file than the UTF-16 encoding.

InvalidExample01 - Table

The table example with all 4 ReliableTXT encodings (UTF-8, UTF-16, UTF-16-Reverse, UTF-32) but without the preamble.

The table example written with UTF-32-Reverse encoding (Little Endian) with and without a preamble. A ReliableTXT compliant text editor will read the example with the preamble wrongly as UTF-16-Reverse encoded file.

InvalidExample02 - Corrupt Data

Four files containing invalid encoded data. The UTF-8 example contains an invalid 0xFF byte. Both UTF-16 examples contain an unpaired surrogate, starting with a high surrogate that is not followed by a low surrogate. The UTF-32 example contains a codepoint that is higher than the maximum codepoint U+10FFFF. A ReliableTXT compliant text editor must show an error message, instead of ignoring or replacing invalid characters.