The Mk2 Markup Language

This is a draft standard, this notice will disappear once the specification is final.

Mk2 is a human readable plain text language for expressing Geneva documents.¹ It is designed with both ergonomics and technical pragmatism in mind.

Syntax

This formal definition uses the modified BNF syntax of ANSI CL's Notational Conventions.¹ The following axioms are used throughout the definition:

String—A character sequence. The exact grammar depends on the surrounding context. See Escape Rules.

LF—A character sequence denoting a line break. The exact representation is platform dependent.

EOF—The end of input.

SP—A whitespace character. The exact set of characters considered whitespace is platform dependent.

Document and Section

SymbolExpression
document[ element separator ]* EOF
section"<" title separator [ element separator ]* ">" separator
titlerich-text
elementsection | table | plaintext | media | listing | paragraph
separatordouble-lf | EOF
double-lfLF [ LF ]+
Table 1. Document and section syntax.

Paragraph and Listing

SymbolExpression
paragraphtext-token+
listingitem+
item"+" rich-text
Table 2. Paragraph and Listing syntax.

Table, Media and Plaintext

SymbolExpression
table"#table" description "#" LF table-body
descriptionrich-text
table-bodyrow* last-row
rowcolumn+ LF
last-rowcolumn+
column"|" rich-text
Table 3. Table syntax.
SymbolExpression
media"#media" description "#" LF String
descriptionrich-text
Table 4. Media syntax.
SymbolExpression
plaintext"#code" description "#" LF line+ end
descriptionrich-text
lineString LF
endSP* "#"
Table 5. Plaintext syntax.

Rich Text

SymbolExpression
rich-texttext-token*
text-tokenbold | italic | fixed-width | url | plain
bold"*" String "*"
italic"_" String "_"
fixed-width"{" String "}"
url"[" String "]" [ "(" String ")" ]
plainString
Table 6. Rich text syntax.

Escape Rules

The “\” (backslash) can be used to escape the next character. The grammatical significance of a character following “\” is ignored.

The exact grammar of the String axiom is context dependent. A String may not contain unescaped terminating sequences. A terminating sequence is the set of any token following the String axiom in a rule and double-lf. In order to escape a terminating sequence its first character must be escaped.

For illustration consider the grammar in Table 7 which utilizes the String axiom. In rule the String axiom is followed by terminator, thus “foo” is a terminating sequence of String in rule. Valid and invalid character sequences for String in rule are shown in Table 8.

SymbolExpression
ruleString terminator
terminator"foo"
Table 7. Exemplary grammar rules to illustrate escape rules for the String axiom.
ValidInvalid
quick brown \fooquick brown foo
Table 8. Valid and invalid character sequences for String in rule.

Examples

Document and Section

The Mk2 file in Figure 1 contains a paragraph (A quick brown fox...) and a section titled “On Pangrams” which contains another paragraph (A pangram is...).

A quick brown fox jumps over the lazy dog.

< On Pangrams

 A pangram is a phrase that contains all of the letters of the
 alphabet.

>
Figure 1

Listing and Text Tokens

The listing in Figure 2 contains six items, each being a single text token.

+ Plain text token
+ *Bold text token*
+ _Italic text token_
+ {Fixed-width text token}
+ [http://example.org/url/text-token]
+ [Labeled URL](http://example.org)
Figure 2

Table, Media and Plaintext

The Mk2 file in Figure 3 contains table, media and plaintext object, each having a description and their respective bodies.

#table Source: Wikipedia.#
| State                  | Area          | Total Population
| Bavaria                | 70,549.44 km² | 12,604,244
| North Rhine-Westphalia | 34,084.13 km² | 17,571,856

#media Imaginary embedded video.#
http://example.org/video.ogv

#code {SQUARE} function in Common Lisp.#
(defun square (n)
  (expt n 2))
#
Figure 3

Escaping

Mk2 is designed to avoid the need of escaping control tokens as much as possible. Still there are some cases where the user has to use the \ (backslash) character to avoid the semantics of a specific token. Below are examples of the most common cases.

Mk2Result
In ECMAScript anonymous functions can be expressed using the {function (...) { ... \}} special form.In ECMAScript anonymous functions can be expressed using the function (...) { ... } special form.
Figure 4 Escaping unintended text token markup.

The Mk2 file in Figure 4 escapes the first } (curly bracket) character inside a fixed width text token in order to avoid terminating the fixed width token prematurely. Not that only the closing bracket needs to be escaped because it is the only terminating token of the String in a fixed width token.

Mk2Result
On DOS, {\\} (backslash) is used to separate the components of a pathname.On DOS, \ (backslash) is used to separate the components of a pathname.
Figure 5 Including the literal backslash character.

Sometimes the user needs to include the literal backslash character in his prose. The \ (backslash) character can be escaped using itself just like any other character as Figure 5 shows.