Stenway Developer Network

WSV Specification

Structure

A WSV-document is essentially a jagged array (an array of arrays) of string values. You can use it to write tabular data, but you are not limited to it. One line could contain only one value, another one could contain thousand values. There are no limits to the number of values.

Here is an example of a table with four columns and three data rows:

FirstName      LastName  Age PlaceOfBirth
William        Smith     30  Boston
Olivia         Jones     27  "San Francisco"
Lucas          Brown     -   Chicago

And here is an example of non tabular data, a list of four points which are connected to two triangles:

2D
Pts 1 1 1 -1 -1 -1 -1 1
T 0 1 2
T 2 3 0

Quoting Values

Values containing one or more whitespace characters must be enclosed in doublequotes:

"My Value"

Values containing doublequotes must be enclosed in doublequotes. The contained doublequote character is written as an escape sequence of two doublequotes:

"My ""Value"""

Values containing one or more hash characters must be quoted. Otherwise it's interpreted as a comment:

"MyValue#1"

Values containing one or more line feed characters must be quoted. The line feed is written as an escape sequence of a doublequote followed by a slash and another doublequote:

"Line1"/"Line2"

An empty string must be quoted:

""

A hyphen-minus must be quoted, otherwise it will be interpreted as null:

"-"

Null

To write a null value, use the hyphen-minus character:

-

Normal Values

All other values do not need to be enclosed in doublequotes:

MyValue

Comments

Single line comments are written using the hash character:

# My comment

To get multiple lines of comments, precede all of them with a hash character

# My first comment
# My second comment

Comments can come after values:

Value1 Value2 # My comment

Comments can contain any value except the line-feed character which terminates the comment

# My comment with ### hashes and "doublequotes"

Leading And Trailing Whitespace

The leading and trailing whitespace of a line is skipped. So there are as many whitespace characters allowed as desired:

Value1a       Value1b
  Value2a    Value2b 
    Value3a Value3b  

Whitespace Characters

Unicode has 25 characters marked as whitespace. WSV uses 24 of them as whitespace, and the line-feed character as line break character (LF / U+000A).

Codepoint Name
U+0009 Character Tabulation
U+000A Line Feed
U+000B Line Tabulation
U+000C Form Feed
U+000D Carriage Return
U+0020 Space
U+0085 Next Line
U+00A0 No-Break Space
U+1680 Ogham Space Mark
U+2000 En Quad
U+2001 Em Quad
U+2002 En Space
U+2003 Em Space
U+2004 Three-Per-Em Space
U+2005 Four-Per-Em Space
U+2006 Six-Per-Em Space
U+2007 Figure Space
U+2008 Punctuation Space
U+2009 Thin Space
U+200A Hair Space
U+2028 Line Separator
U+2029 Paragraph Separator
U+202F Narrow No-Break Space
U+205F Medium Mathematical Space
U+3000 Ideographic Space

Text Encoding

A WSV file is a ReliableTXT file. Therefor one of the following encodings must be used:

  • UTF-8
  • UTF-16 (Big Endian)
  • UTF-16 Reversed (Little Endian)
  • UTF-32 (Big Endian)

All four encodings must write a preamble (BOM) indicating the used encoding.

Syntax Diagrams

WSV document:

Values:

Whitespace:

Comments:

Note: "Any whitespace character" and "Any character except whitespace" refer to the 25 Unicode whitespace characters.

Parser Errors

String not closed (1, 19):

a b c "hello world

Invalid double quote after value (1, 4):

a b"hello world"

Invalid character after string (1, 14):

"hello world"a b c

Invalid string line break (1, 9):

"Line1"/ "Line2"

Comparison To CSV And TSV

The term CSV (comma-separated values) has a variaty of interpretations and there is no single standard defined. The RFC 4180 is one example how a CSV file can be interpreted. But still there are many variations in use and especially corner cases such as null values, multi-line values, comments, empty lines, and leading and trailing whitespace lead to problems. The following export dialog of LibreOffice Calc let's you choose whether to use commas, semicolons, colons, tabs or spaces as field delimiter and also the quote type to enclose values can be chosen. There is also an option to use a fixed column width:

Value 1,Value 2,Value 3

The import dialog shows how such a variaty leads to a lot of trial and error. Loading CSV files is therefor a fragile process that can involve many manual steps:

Loading a WSV document is a 100% automatable process that does not require any manual steps. Saving a WSV document only lets the user choose which of the four ReliableTXT encodings is used and additionally a pretty-printing option to include padding whitespace to align values.

Through the special line break escaping of WSV ("/"), values containing the line feed character do not span multiple lines.

"Line1
Line2"
"Line1"/"Line2"

Grammar

Grammar
  Chars
    <WsChar>              0009 000B 000C 000D 0020 0085 00A0 1680 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 2028 2029 202F 205F 3000
    <LineBreakChar>       000A
    <HashChar>            "#"
    <CommentChar>         Any Except <LineBreakChar>
    <DoubleQuoteChar>     0022
    <MinusChar>           "-"
    <ValueChar>           Any Except <HashChar> <DoubleQuoteChar> <LineBreakChar> <WsChar>
    <OneCharValueChar>    <ValueChar> Except <MinusChar>
    <SlashChar>           /
    <StringChar>          Any Except <DoubleQuoteChar> <LineBreakChar>
  End
  Tokens
    <LineBreak>           <LineBreakChar>
    <Ws>                  Repeat+ <WsChar>
    <CommentText>         Repeat+ <CommentChar>
    <CommentStart>        <HashChar>
    <Null>                <MinusChar>
    <Value>               [ ( <ValueChar> Repeat+ <ValueChar> ) <OneCharValueChar> ]
    <StringStart>         <DoubleQuoteChar>
    <StringEnd>           <DoubleQuoteChar>
    <StringLineBreak>     <SlashChar>
    <EscapedDoubleQuote>  <DoubleQuoteChar> <DoubleQuoteChar>
    <StringText>          Repeat+ <StringChar>
  End
  Syntax
    <Comment>             <CommentStart> Optional <CommentText>
    <SingleLineString>    <StringStart> Repeat* [ <StringText> <EscapedDoubleQuote> ] <StringEnd>
    <String>              <SingleLineString> Repeat* ( <StringLineBreak> <SingleLineString> )
    <WsvLineValue>        [ <Value> <Null> <String> ]
    <WsvLineValues>       <WsvLineValue> Repeat* ( <Ws> <WsvLineValue> ) Optional <Ws>
    <WsvLine>             Optional <Ws> Optional <WsvLineValues> Optional <Comment>
    <Wsv>                 Optional ( <WsvLine> Repeat* ( <LineBreak> <WsvLine> ) )
  End
  RootRule <Wsv>
End