Stenway Developer Network

Grammar Description Format - GrammarSML

Character Sets

One Character

Specify one character:

Grammar
  Chars
    <SmallA>    a
    <EuroSign>
    <GClef>     𝄞
  End
End

The name of a character set must be written in angular brackets.

Special characters:

Grammar
  Chars
    <Space>       " "
    <Minus>       "-"
    <Hash>        "#"
    <DoubleQuote> """"
    <LineFeed>    ""/""
  End
End

Unicode code points with four digits:

Grammar
  Chars
    <Space>       0020
    <Minus>       002D
    <Hash>        0023
    <DoubleQuote> 0022
    <LineFeed>    000A
  End
End

Unicode code points with six digits (supplementary characters):

Grammar
  Chars
    <SmallA>    0061
    <EuroSign>  20AC
    <GClef>     01D11E
  End
End

Multiple Characters

Mulitple characters:

Grammar
  Chars
    <Digit>   0 1 2 3 4 5 6 7 8 9
  End
End

Reuse sets:

Grammar
  Chars
    <Digit>   0 1 2 3 4 5 6 7 8 9
    <Hex>     <Digit> a b c d e f A B C D E F
  End
End

Ranges

Define ranges of characters from / to:

Grammar
  Chars
    <Digit>   Range 0 9
    <Hex>     <Digit> Range a f Range A F
  End
End

Unicode Categories

All characters belonging to the specified Unicode category:

Grammar
  Chars
    <UpperCaseLetter>     Category Lu
    <NumberDecimalDigit>  Category Nd
  End
End

Categories:

Cc =  Other, control
Cf =  Other, format
Cn =  Other, not assigned
Co =  Other, private use
Cs =  Other, surrogate
Lo =  Letter, other
Ll =  Letter, lowercase
Lm =  Letter, modifier
Lt =  Letter, titlecase
Lu =  Letter, uppercase
Mc =  Mark, spacing combining
Me =  Mark, enclosing
Mn =  Mark, nonspacing
Nd =  Number, decimal digit
Nl =  Number, letter
No =  Number, other
Pc =  Punctuation, connector
Pd =  Punctuation, dash
Pe =  Punctuation, close
Pf =  Punctuation, final quote
Pi =  Punctuation, initial quote
Po =  Punctuation, other
Ps =  Punctuation, open
Sc =  Symbol, currency
Sk =  Symbol, modifier
Sm =  Symbol, math
So =  Symbol, other
Zl =  Separator, line
Zp =  Separator, paragraph
Zs =  Separator, space 

Presets

Predefined sets

Grammar
  Chars
    <Letter>          Letter
    <Digit>           Digit
    <LetterOrDigit>   Letter Digit
  End
End

Letter = Unicode Categories Ll, Lm, Lo, Lt, Lu

Digit = Unicode Categories Nd

Number = Unicode Categories Nd, Nl, No

Exceptions

Exclude characters:

Grammar
  Chars
    <Digit>     0 1 2 3 4 5 6 7 8 9
    <EvenDigit> <Digit> Except 1 3 5 7 9
  End
End

Any

Any character (Unicode scalar):

Grammar
  Chars
    <Any>           Range 0000 D7FF Range E000 10FFFF
    <NotLineFeed>   <Any> Except 000A
  End
End

Note that defining values in the range of U+D800 to U+DFFF is not allowed.

Shortcut:

Grammar
  Chars
    <NotLineFeed>   Any Except 000A
  End
End

Tokens

Use Charsets

Grammar
  Chars
    <DigitChar>   Range 0 9
  End
  Tokens
    <Digit>   <DigitChar>
  End
End

Concatenation

Grammar
  Chars
    <Digit>   Range 0 9
  End
  Tokens
    <ThreeDigitNumber>  <Digit> <Digit> <Digit>
  End
End

Repeat*

Repeat 0..N:

Grammar
  Chars
    <Digit>       Range 0 9
    <Digit1To9>   Range 1 9
  End
  Tokens
    <Number>  <Digit1To9> Repeat* <Digit>
  End
End

Repeat+

Repeat 1..N:

Grammar
  Chars
    <Letter>  Range A Z Range a z
  End
  Tokens
    <Name>  Repeat+ <Letter>
  End
End

Optional

Occurrence 0..1:

Grammar
  Chars
    <Letter>  Range A Z Range a z
  End
  Tokens
    <OneOrTwoLetterName>  <Letter> Optional <Letter>
  End
End

Grouping

Grammar
  Chars
    <Letter>  Range A Z Range a z
  End
  Tokens
    <OneOrThreeLetterName>  <Letter> Optional ( <Letter> <Letter> )
  End
End

Alternative

Use brackets to specify alternatives:

Grammar
  Chars
    <LcLetter>  Range a z
    <UcLetter>  Range A Z
  End
  Tokens
    <TwoLetterName>   [ ( <UcLetter> <UcLetter> ) ( <LcLetter> <LcLetter> ) ]
  End
End

Strings

Case-sensitive string:

Grammar
  Tokens
    <BeginKeyword>  String Begin 
    <EndKeyword>    String End
  End
End

Case-insensitive string:

Grammar
  Tokens
    <BeginKeyword>  CiString Begin 
    <EndKeyword>    CiString End
  End
End

Syntax

Use tokens or other syntax rules. Define the root rule.

Grammar
  Chars
    <StringChar>      Any Except <DoubleQuote>
    <DoubleQuote>     0022
    <SpaceChar>       0020
  End
  Tokens
    <StringStart>     <DoubleQuote>
    <StringEnd>       <DoubleQuote>
    <StringContent>   Repeat+ <StringChar>
    <Space>           Repeat+ <SpaceChar>
  End
  Syntax
    <String>          <StringStart> Optional <StringContent> <StringEnd>
    <Document>        <String> Repeat* ( <Space> <String> )
  End
  RootRule <Document>
End