A WSV-document is essentially a jagged array (an array of arrays) of string values. You can use it to write tabular data, but you are not limited to it. One line could contain only one value, another one could contain thousand values. There are no limits to the number of values.
Here is an example of a table with four columns and three data rows:
FirstName LastName Age PlaceOfBirth William Smith 30 Boston Olivia Jones 27 "San Francisco" Lucas Brown - Chicago
And here is an example of non tabular data, a list of four points which are connected to two triangles:
2D Pts 1 1 1 -1 -1 -1 -1 1 T 0 1 2 T 2 3 0
Values containing one or more whitespace characters must be enclosed in doublequotes:
"My Value"
Values containing doublequotes must be enclosed in doublequotes. The contained doublequote character is written as an escape sequence of two doublequotes:
"My ""Value"""
Values containing one or more hash characters must be quoted. Otherwise it's interpreted as a comment:
"MyValue#1"
Values containing one or more line feed characters must be quoted. The line feed is written as an escape sequence of a doublequote followed by a slash and another doublequote:
"Line1"/"Line2"
An empty string must be quoted:
""
A hyphen-minus must be quoted, otherwise it will be interpreted as null:
"-"
To write a null value, use the hyphen-minus character:
-
All other values do not need to be enclosed in doublequotes:
MyValue
Single line comments are written using the hash character:
# My comment
To get multiple lines of comments, precede all of them with a hash character
# My first comment # My second comment
Comments can come after values:
Value1 Value2 # My comment
Comments can contain any value except the line-feed character which terminates the comment
# My comment with ### hashes and "doublequotes"
The leading and trailing whitespace of a line is skipped. So there are as many whitespace characters allowed as desired:
Value1a Value1b Value2a Value2b Value3a Value3b
Unicode has 25 characters marked as whitespace. WSV uses 24 of them as whitespace, and the line-feed character as line break character (LF / U+000A).
Codepoint | Name |
---|---|
U+0009 | Character Tabulation |
U+000A | Line Feed |
U+000B | Line Tabulation |
U+000C | Form Feed |
U+000D | Carriage Return |
U+0020 | Space |
U+0085 | Next Line |
U+00A0 | No-Break Space |
U+1680 | Ogham Space Mark |
U+2000 | En Quad |
U+2001 | Em Quad |
U+2002 | En Space |
U+2003 | Em Space |
U+2004 | Three-Per-Em Space |
U+2005 | Four-Per-Em Space |
U+2006 | Six-Per-Em Space |
U+2007 | Figure Space |
U+2008 | Punctuation Space |
U+2009 | Thin Space |
U+200A | Hair Space |
U+2028 | Line Separator |
U+2029 | Paragraph Separator |
U+202F | Narrow No-Break Space |
U+205F | Medium Mathematical Space |
U+3000 | Ideographic Space |
A WSV file is a ReliableTXT file. Therefor one of the following encodings must be used:
All four encodings must write a preamble (BOM) indicating the used encoding.
WSV document:
Values:
Whitespace:
Comments:
Note: "Any whitespace character" and "Any character except whitespace" refer to the 25 Unicode whitespace characters.
String not closed (1, 19):
a b c "hello world
Invalid double quote after value (1, 4):
a b"hello world"
Invalid character after string (1, 14):
"hello world"a b c
Invalid string line break (1, 9):
"Line1"/ "Line2"
The term CSV (comma-separated values) has a variaty of interpretations and there is no single standard defined. The RFC 4180 is one example how a CSV file can be interpreted. But still there are many variations in use and especially corner cases such as null values, multi-line values, comments, empty lines, and leading and trailing whitespace lead to problems. The following export dialog of LibreOffice Calc let's you choose whether to use commas, semicolons, colons, tabs or spaces as field delimiter and also the quote type to enclose values can be chosen. There is also an option to use a fixed column width:
Value 1,Value 2,Value 3
The import dialog shows how such a variaty leads to a lot of trial and error. Loading CSV files is therefor a fragile process that can involve many manual steps:
Loading a WSV document is a 100% automatable process that does not require any manual steps. Saving a WSV document only lets the user choose which of the four ReliableTXT encodings is used and additionally a pretty-printing option to include padding whitespace to align values.
Through the special line break escaping of WSV ("/"), values containing the line feed character do not span multiple lines.
"Line1 Line2"
"Line1"/"Line2"
Grammar Chars <WsChar> 0009 000B 000C 000D 0020 0085 00A0 1680 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 2028 2029 202F 205F 3000 <LineBreakChar> 000A <HashChar> "#" <CommentChar> Any Except <LineBreakChar> <DoubleQuoteChar> 0022 <MinusChar> "-" <ValueChar> Any Except <HashChar> <DoubleQuoteChar> <LineBreakChar> <WsChar> <OneCharValueChar> <ValueChar> Except <MinusChar> <SlashChar> / <StringChar> Any Except <DoubleQuoteChar> <LineBreakChar> End Tokens <LineBreak> <LineBreakChar> <Ws> Repeat+ <WsChar> <CommentText> Repeat+ <CommentChar> <CommentStart> <HashChar> <Null> <MinusChar> <Value> [ ( <ValueChar> Repeat+ <ValueChar> ) <OneCharValueChar> ] <StringStart> <DoubleQuoteChar> <StringEnd> <DoubleQuoteChar> <StringLineBreak> <SlashChar> <EscapedDoubleQuote> <DoubleQuoteChar> <DoubleQuoteChar> <StringText> Repeat+ <StringChar> End Syntax <Comment> <CommentStart> Optional <CommentText> <SingleLineString> <StringStart> Repeat* [ <StringText> <EscapedDoubleQuote> ] <StringEnd> <String> <SingleLineString> Repeat* ( <StringLineBreak> <SingleLineString> ) <WsvLineValue> [ <Value> <Null> <String> ] <WsvLineValues> <WsvLineValue> Repeat* ( <Ws> <WsvLineValue> ) Optional <Ws> <WsvLine> Optional <Ws> Optional <WsvLineValues> Optional <Comment> <Wsv> Optional ( <WsvLine> Repeat* ( <LineBreak> <WsvLine> ) ) End RootRule <Wsv> End