ietf-wg-cellar / ebml-specification Goto Github PK
View Code? Open in Web Editor NEWthe specification for the EBML format
License: Creative Commons Attribution 4.0 International
the specification for the EBML format
License: Creative Commons Attribution 4.0 International
From RFC7322:
4.8.5. Security Considerations Section
All RFCs must contain a section that discusses the security
considerations relevant to the specification; see "Guidelines for
Writing RFC Text on Security Considerations" [BCP72] for more
information. - http://www.rfc-editor.org/info/bcp72
The current tables make it hard to use cross-references and forward references within the document. I propose reforming these three sections to use nested headers instead of tables.
Once added include forward references for terms and concepts used in the document before they are explained. Such as forward references to EBMLMaxIDLength, EBMLMaxSizeLength, EBML Data Type, unknownsizeallowed, level.
See suggestions for forward references in https://mailarchive.ietf.org/arch/msg/cellar/Mfj44tK1gyfqjU3Uu1tI5fMovtk.
As discussed here https://github.com/Matroska-Org/ebml-specification/pull/3/files#r29920563.
The path
should be extended with header | body
as the root before the first /
.
"Empty Element" in specification.markdown uses "VINT_DATA" but only "VINT" is defined.
This section from specification.markdown does not appear to define its terms in a structure that is related to the format or in a logical fashion that follows from the format. Instead it appears to be ordered alphabetically, but not fully.
Is that intentional?
On the CELLAR list Tim asked:
What is an implementation supposed to do if it encounters a file that violates this restriction? What is the purpose of reserving the various all-ones Element IDs? Or the zero Element ID, for that matter?
which regards the current line in the spec:
"The VINT_DATA component of the Element ID MUST NOT be set to either all zero values or all one values."
I don't know the answer to this. @robUx4, @mbunkus, do you know the answer to the questions above?
Void is not listed as 'multiple'. Also CRC is not multiple either but certainly multiple CRC are allowed in one EBML Document.
Is it right to consider that 'multiple' indicates whether the element may occurs more than once within its parent master-element (as opposed to multiple within the whole EBML Document)?
In section Element Name: Signature something is missing:
Description: The signature of the data (until a new.
at least ")" but perhaps more, as I cannot understand.
As it stands the EBML specification lacks an Introduction section, which is an IETF requirement.
RFC2223 describes the Introduction Section as follows:
" Each RFC should have an Introduction section that (among other things) explains the motivation for the RFC and (if appropriate) describes the applicability of the protocol described."
"Normally, this will be the "abstract" section from the Internet Draft. If the RFC is not based on an I-D, other possibilities are:
Protocol
This protocol is intended to provide the bla-bla service,
and be used between clients and servers on host computers.
Typically the clients are on workstation hosts and the
servers on mainframe hosts.
or
This protocol is intended to provide the bla-bla service,
and be used between special purpose units such as terminal
servers or routers and a monitoring host.
Discussion
The purpose of this RFC is to focus discussion on particular
problems in the Internet and possible methods of solution.
No proposed solutions in this document are intended as
standards for the Internet. Rather, it is hoped that a
general consensus will emerge as to the appropriate solution
to such problems, leading eventually to the adoption of
standards.
Interest
This RFC is being distributed to members of the Internet
community in order to solicit their reactions to the
proposals contained in it. While the issues discussed may
not be directly relevant to the research problems of the
Internet, they may be interesting to a number of researchers
and implementers.
Status Report
In response to the need for maintenance of current
information about the status and progress of various
projects in the Internet community, this RFC is issued for
the benefit of community members. The information contained
in this document is accurate as of the date of publication,
but is subject to change. Subsequent RFCs will reflect such
changes.
These paragraphs need not be followed word for word, but the
general intent of the RFC must be made clear."
The page 23 is rendered as… miniature.
Something of this kind:
<enum value="1"> <documentation language="en">Rectangular</documentation> </enum>
A lot of language of the ebml spec could be simplified if Void was not allowed to occur at level 0. I'm scanning a large Matroska collection from archive.org to see if Void at level 0 ever occurs. So far 3864 mkv files tested with only EBML and Segment at Level 0. Will report back when the test is done.
This was defined first as a Matroska issue but it seems to be more for EBML.
Add scope to the abstract.
This issue is based on an initial discussion at #98 (comment).
The EBML Schema
section claims:
The
EBML Schema
does not itself document theEBML Header
, but documents all data of theEBML Document
that follows theEBML Header
. TheEBML Header
itself is documented by this specification in theEBML Header Elements
(see EBML Header Elements). TheEBML Schema
also does not documentGlobal Elements
that are defined by this document (namely theVoid Element
and theCRC-32 Element
).
However we have cases where EBML Document Types (Matroska) add additional constraints to the EBML Header. For example, EBML allows EBMLMaxSizeLength=9 but Matroska doesn't.
Should we change the above noted paragraph to say:
The
EBML Schema
MUST begin with a definition of theEBML Header
and EBML'sGlobal Elements
which is followed with a definition of theEBML Body
. TheEBML Schema
MUST document theof the
EBML Headeras documented by this specification in the
EBML Header Elements(see [EBML Header Elements](#ebml-header-elements)), but MAY constrain those
Elementsby adding or constraining the
rangeattributes. For example, an
EBML SchemaMAY constrain the
EBMLMaxSizeLengthto a maximum value of
8`.
The example schema would then need to be updated to include the Header and Global Elements. The downside is that this causes some redundancy between the EBML spec and the EBML Schema, but it makes the EBML Schema less dependent on the EBML spec. Also the old specdata.xml documented the EBML Header and Global Elements in addition to the Body.
Up until 3304085, there was no default for docType in the EBML Schema. So the docType Element was required to be stored according to the EBML specification. The referenced patch was done to resolve a perceived discrepancy between the EBML and Matroska specs since the EBML did not define a default for docType, but the Matroska spec did define a docType default (matroska).
At the time it was deemed that the Matroska spec was correct partly because it had been historically better maintained than the EBML spec.
However, I noted that the Matroska version of that on https://matroska.org/technical/specs/index.html is followed by a statement:
The default values defined for the EBML header correspond to the values for a Matroska stream/file.
When parsing the EBML header the default values are different, irrespective of the DocType defined.
Because of this statement, I think that the discrepancy between was intentional. That EBML's definition of docType is relevant to any EBML, ie docType MUST be stored since it is mandatory with no default. And in the Matroska spec matroska was listed as the default because a Matroska file must semantically understand 'matroska' is the docType.
I propose reverting 3304085, so that a generical EBML Document MUST store docType whether Matroska or not. And, in the Matroska spec, it should note that for the case of Matroska the docType MUST be 'matroska'. This would change the current implication that docType does not necessarily need to be stored if the docType is matroska.
In the new definition of maxOccurs, it's unclear what to say if the maximum occurrence is unbounded. Before we had maxOccurs='unbounded'
, what now?
For instance what to do when finding a bad CRC value or a duplicate Element that isn't allowed to appear multiple times.
Is this a bug? The EBML element is listed as "Multiple". Thus a file could use:
<EBML>
<Segment>
<EBML>
or
<EBML>
<EBML>
<EBML>
<Segment>
I can't see any reason this would be helpful. How do EBML parser handle cases with multiple EBML elements? For instance what if one EBML element says matroska version 1 and another EBML element says webm version 2.
I suggest that if EBML elements are listed as multiple, that there be some documentation to advise on how to interpret their potentially contradictory contents.
Another issue with this is that the definition of the EBML element says that the EBML document must start with an EBML element, but since multiple EBML elements are allowed, more EBML elements may be used elsewhere (the middle or tail). I noticed that in XML specs, their equivalent of the EBML element which is the XML Declaration is only allowed to occur at the beginning and may only occur once (see http://stackoverflow.com/questions/20251560/are-multiple-xml-declarations-in-a-document-well-formed-xml and its citations).
I propose removing the Multiple option for the EBML element.
I think having an XML Schema as a machine-readable document to verify that an EBML Schema (which is an XML Document) is correctly formed.
Similar to this Matroska issue ietf-wg-cellar/matroska-specification#129 but for any EBML format.
Also related to this issue #140 that ended up saying we should do a separate document for that.
As discussed at VDD, I started this work with a hard 72 column line wrap, but if a 72 column output is needed this could be done as an output rather than maintained through all edits. Any objection if I remove the hard 72 column line wrap?
Occurence
-> Occurrence
see #40 (comment)
I suggest that these need a max value. A value of 7,625,597,484,987 for EBMLMaxIDLength is allowed but not sane to expect that an EBML Reader should prepare for it.
Except for the ones used for DRM already in use in WebM.
Suppose an Element has minOccurs=2 and a default value. Should this be allowed?
Either:
I prefer 2.
For example the description of float ranges is very complex:
Within an expression of a float range, as in an integer range, the
-
(hyphen) character is the separator between the minimal and maximum value permitted by the range. Hexadecimal Floating-Point Constants also use a-
(hyphen) when indicating a negative binary power. Within a float range, when a-
(hyphen) is immediately preceded by a letterp
, then the-
(hyphen) is a part of the Hexadecimal Floating-Point Constant which notes negative binary power. Within a float range, when a-
(hyphen) is not immediately preceded by a letterp
, then the-
(hyphen) represents the separator between the minimal and maximum value permitted by the range.
Perhaps we should consider EBNF for these expressions.
And maybe put in a separate paragraph. Even if marking the code in CSS is possible it would be better not to rely on that, since an RFC will not have such features anyway.
need specific documentation on the definition of an ebml element included definitions of element definition attributes like mandatory, default, and range.
As mentioned on matroska-devel the ebml spec needs to clarify that the maxsizeid and maxsizelength do not describe the elements of the ebml header but only the non-header parts of the ebml document. The max sizes of ids and sizes for the header are now fixed to 4 within the specification (via MediaArea@ab4afa2)
hi i need a little help , just a question:
i m realizing a demuxer formatting webm in mse stream.
Initialization segment is the same for all the sourcebuffer? the segment for track x contains just blocks of this track?
Line 33 contains "XML Declaration" with an capital 'D'.
Is that intended to be the case? (I am not objecting if that is how it was intended to be.)
This term is used in a few places but not (yet) defined.
The first table in https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#element-id appears broken, at least in the markdown rendering on github.com.
The '+' in some level values is not explained. In fact the definition of level says it should be an integer.
More generally I think Nesting should be an attribute like default
, unknownsizeallowed
, etc.
There are several places, especially in Matroska tags and chapters, where it's helpful to express the Top Level Element in an XML form. I suggest that the EBML spec (or perhaps the MKV spec) define how this should be done.
I reviewed the xml examples by @mbunkus at https://github.com/mbunkus/mkvtoolnix/tree/master/examples. Overall I think the correlation between EBML Element and XML representation of that element is clear, though in some forms the expressions are not consistent in the primary applications using the XML.
For instance both ChapterTimeStart
and EditionFlagDefault
are defined as unsigned integers; however they are expressed directly in existing implementations.
For instance if a chapter.xml file contains:
<ChapterTimeStart>1</ChapterTimeStart>
and
<EditionFlagDefault>1</EditionFlagDefault>
and use mkvpropedit to insert that data into a file then the result is an error:
mkvpropedit -c chapters.xml unsigned.integers.in.action.S01E01.HDTV.x264-dericed.mkv
Error: The XML chapter file 'chapters.xml' contains an error: The tag or attribute 'ChapterTimeStart' at position 223 contains invalid or mal-formed data. Expected a time in the following format: HH:MM:SS.nnn (HH = hour, MM = minute, SS = second, nnn = millisecond up to nanosecond. You may use up to nine digits for 'n' which would mean nanosecond precision). You may omit the hour as well. Found '1' instead. Additional error message: Invalid format: At least minutes and seconds have to be given, but no colon was found
If there are exceptions to how to express a value in XML perhaps this is better defined in MKV rather than in EBML (though I think it would be more globally useful in EBML). Are there other such differences in implementations of EBML fragments expressed in XML?
Early discussion on this here: http://lists.matroska.org/pipermail/matroska-devel/2015-July/004730.html
Just a note to review all language about 'levels' after the #83 is accepted. See in particular #103 (comment).
See #78 (comment).
Assigned to @dericed.
Based on Tim's comment:
A number of the things listed as "attacks" are simply things you said someone writing an EBML Document MUST NOT do (or not doing things you said they MUST do). But in none of those cases do you say what is expected of someone reading a document containing these attacks.
@mbunkus, @robUx4, ideas for such a section? There is a bit of similar language at https://github.com/Matroska-Org/matroska-test-files/blob/master/readme.md but I don't think we have such language for EBML Readers yet.
Tim asked in cellar review:
If maxOccurs is a positive integer, and maxOccurs instances of the element have already been encountered within a Master Element with an unknown size, is the another instance of this element considered "...the beginning of the next element that is not a valid sub-element of that Master-element”?
For instance if Segment is unknown sized and knowing the Attachments can only occur once within the Segment, then with this structure:
<Segment>
<Attachments/>
<Attachments/>
Is the Segment expected to end at when the second occurrence of is read, since a 2nd expression of Attachments "is not a valid sub-element of that Master-element"?
From Tim's review:
Can you describe reasons someone might reasonably violate these SHOULDs? You've specified when they are allowed to, but not why. Is the "is permitted" case the only case in which it is permitted? If so, you should structure the normative language to make that clear. What does it mean for EBML to be "used in transmission or streaming"? If someone sends you an EBML document over HTTP, and you save it to disk, are you required to strip out any non-element data from Master Elements?
Referenced section is in the Master Element definition. I suggest working on the clarification branch since that also affects this area.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.