Namespace Markup Language
This spec is not finished yet! For more info about nml, you can check out my parser.
The Namespace Markup Language (nml
) is a try to create more powerful and user-friendly version of The Extensible Markup Language, aka. xml
. Its name is a creation of Phil Andy, originally thought to be a name of a community-maintaned version of html
(see this thread for more info).
Are you a big fan of xml
? Me too. "But - wait!" you say, "Then why are you trying to get rid of it?". I am not. nml
will probably never replace xml
. And it's not its goal. nml
can be easily converted to xml
at any time.
The problem with xml
is that it's easy-to-read for parsers but hard-to-write for programmers. Why nobody uses xhtml
? Because they would lose those comfortable features like boolean attributes, unquoted values and UPPERCASE TAG NAMES (yes, some people still use them). The goal of nml
is to be as comfortable as possible without losing xml
's readability.
Huh, that's a good point. Actually - I don't know. Personally I'd like to use something based on css
syntax but there are many, many possibilities and it's better to let the community decide.
This spec is intended to be suitable for everybody, either a beginner or an expert writing his own nml-based browser-or-something. You will find four types of content here - normative sections for parser-makers, advanced sections for users that know xml
and html
and beginner sections for - wait for it... - beginners! The fourth type is an example section for better understanding. Note that the advanced sections may be omitted on places without any difference from xml
and html
.
Beginner section
Basically, any file written in nml
is a document. Document is a set of everything in your file. The building units of a document are tags and text.
Example
<library>
<book id=1 >The Little Prince</book>
<book id=2 >The Universe Versus Alex Woods</book>
<book id=42>The Hitchhikers Guide to The Galaxy</book>
</library>
Normative section
Every nml
document is represented by a DOM Document
object.
Every document consists of a root element – either empty or containing child elements and/or text, optionally preceded by a doctype declaration.
Every element may be either empty – represented by a single tag – or may contain one or more nodes enclosed in begin tag and end tag, both must have the same tag name and namespace and must be children of the same element (or in case of root, must not be children of an element).
Node is either an element or a text string containing any character but U+003C
(LESS-THAN SIGN) and U+0000
(NULL).
Beginner section
Doctypes are used to determine the default namespace of a document. Most of the time, you will probably use <!doctype html>
– namespace of the html api
.
Example
<!doctype html>
<!doctype "./mydoc.dtd">
Normative section
Form (regex): /<!doctype\s+([A-\u02AF\-]+|("[^"]*")|('[^']*'))\s*>/i
The group number one of the regular expersion is the name of a globally recognized API or a URL encapsulated in double or single quotes. If a URL is provided, it has to be a valid link to a markdown declaration file (eg. DTD or XML Schema). If a API name is provided, parser's own markdown declaration rules will be loaded. Those define the default namespace that will be used if none is specified.
In the case that URL was provided and the markdown declaration file does not provide any namespace name:
- find the last forward slash
U+002F
(SOLIDUS) - get the subtring after the slash
- apply
/^[A-\u02AF\-]+/
(regex) to the substring - if the result is at least one character long
- use it as the default namespace name
- else use empty string
Beginner section
Tags are used to structure the document. Tags is the things between <
and >
characters.
There are three types of tags:
- Single tag
- Begin tag
- End tag
Begin tag looks like this: <name>
. They mark the beginning of the name
element.
End tag is similar to begin tag but it has a slash at the beginning: </name>
. It marks the end of the name
element.
Single tag is a combination of begin and end tags. It looks like this: <name />
and means the same as this: <name></name>
, it creates an empty element.
What does element mean? Elements are virtual parts of the document that have a special meaning defined by their namespace. Elements may have child elements but they don't have to. As an example of what they can do: in html api
there's an a
element used to mark that certain piece of text is an anchor/link.
Namespaces are sets of rules that define the behaviour of elements. The most common namespace is html
which defines tags such as title
(page title) and a
(anchors). There are many other namespaces, each serves a different purpose. To specify which namespace the element uses you can type <html:a>
– that creates a
element in the html
namespace. If you don't specify the namespace, default one will be used.
Single and begin tags may have attributes to keep some information. They're based on a simple foo=bar
format, written
right after the element name (and a space). In this examle, foo
is called attribute name or key and bar
is an
attribute value. If you want to use whitespace characters in value, you have to quote it this way: foo="some bars"
.
There are also single attributes (also called boolean), created by omitting the value and equal sign. Eg. attribute bool
means bool=""
(value is empty).
Example
<abc>
- Begin tag of an abc
element.
</foo:bar>
- End tag of a bar
element from foo
namespace.
<name first=Joe middle=Patata last=Rodriguez />
- Single tag of a name
element with three attributes.
<a>Hi there!</a>
- A text between two tags, making one a
element together.
<x><foo /></x>
- The foo
element is a child of the x
element.
<hello world />
- Element hello
with a single attribute world
.
<warning catch=attribute/>
- Begin tag (!) - that's because the slash is a part of the attribute value.
<warning catch=attribute />
- Single tag; the attribute value and the slash are divided by a space.
Advanced section
Single tags are interpreted in the same way as in xml.
Attributes are interpreted the same way in html5 (unquoted and boolean attributes are allowed).
To specify which namespace the element uses, type <ns:tagname>
. If you don't, the default namespace will be used.
Note that nml is case insensitive!
All characters in the following range are allowed in tag, namespace and attribute names: U+0041
(LATIN CAPITAL LETTER A) to U+02AF
(LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL) plus the U+002D
(HYPHEN-MINUS) character.
Normative section
Identifier (regex): /[A-\u02AF\-]+/
Attribute (regex): /
Identifier (\s*=\s*([^"'\s]+|("[^"]*")|('[^']*')))?/
Begin tag (regex): /<
Identifier (:
Identifier )?(\s+
Attribute )*\s*>/
Single tag (regex): /<
Identifier (:
Identifier )?(\s+
Attribute )*\s*\/>/
End tag (regex): /<\/
Identifier \s*\/>/
Namespace, tag and attribute names are case insensitive and are converted to lowercase.
If not stated otherwise, nml conforms the Whatwg DOM Standard. The following Web IDL code snippets may extend/implement classes and types that are not defined in this specifation – such classes are defined in the Whatwg standard.
//TODO
- ecma-262 — Regular Expressions.
- The Unicode Consortium – The Unicode Standard.
- W3C – Web IDL.
- Whatwg – DOM Standard.
m93a (Michal Grňo) - wrote this spec
Oscar Godson - invented the way to blend html & xml syntax
Phil Andy - created the name
W3C - Thanks for standardizing the Web and sorry for breaking your standards :)
Copyright (c) 2015 Michal Grňo
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.