Code Monkey home page Code Monkey logo

ebml-specification's People

Contributors

ablwr avatar bastik-1001 avatar dericed avatar epiil avatar lu-zero avatar marionj1 avatar matthewleon avatar mbunkus avatar mcr avatar mkver avatar nerg4l avatar nithin-mk avatar retokromer avatar robux4 avatar stefh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ebml-specification's Issues

move definitions of attributes, data types, and elements from tables to sections

The current tables make it hard to use cross-references and forward references within the document. I propose reforming these three sections to use nested headers instead of tables.

Once added include forward references for terms and concepts used in the document before they are explained. Such as forward references to EBMLMaxIDLength, EBMLMaxSizeLength, EBML Data Type, unknownsizeallowed, level.

See suggestions for forward references in https://mailarchive.ietf.org/arch/msg/cellar/Mfj44tK1gyfqjU3Uu1tI5fMovtk.

Ordering of Notation and Conventions

This section from specification.markdown does not appear to define its terms in a structure that is related to the format or in a logical fashion that follows from the format. Instead it appears to be ordered alphabetically, but not fully.

Is that intentional?

why are Element IDs with all 0 or all 1 in VINT_DATA reserved?

On the CELLAR list Tim asked:

What is an implementation supposed to do if it encounters a file that violates this restriction? What is the purpose of reserving the various all-ones Element IDs? Or the zero Element ID, for that matter?

which regards the current line in the spec:

"The VINT_DATA component of the Element ID MUST NOT be set to either all zero values or all one values."

I don't know the answer to this. @robUx4, @mbunkus, do you know the answer to the questions above?

use of 'multiple' in element definitions

Void is not listed as 'multiple'. Also CRC is not multiple either but certainly multiple CRC are allowed in one EBML Document.

Is it right to consider that 'multiple' indicates whether the element may occurs more than once within its parent master-element (as opposed to multiple within the whole EBML Document)?

Typo?

In section Element Name: Signature something is missing:

Description: The signature of the data (until a new.

at least ")" but perhaps more, as I cannot understand.

Abstract / Introduction section

As it stands the EBML specification lacks an Introduction section, which is an IETF requirement.

RFC2223 describes the Introduction Section as follows:

" Each RFC should have an Introduction section that (among other things) explains the motivation for the RFC and (if appropriate) describes the applicability of the protocol described."

"Normally, this will be the "abstract" section from the Internet Draft. If the RFC is not based on an I-D, other possibilities are:

Protocol

        This protocol is intended to provide the bla-bla service,
        and be used between clients and servers on host computers.
        Typically the clients are on workstation hosts and the
        servers on mainframe hosts.

        or

        This protocol is intended to provide the bla-bla service,
        and be used between special purpose units such as terminal
        servers or routers and a monitoring host.

Discussion

        The purpose of this RFC is to focus discussion on particular
        problems in the Internet and possible methods of solution.
        No proposed solutions in this document are intended as
        standards for the Internet.  Rather, it is hoped that a
        general consensus will emerge as to the appropriate solution
        to such problems, leading eventually to the adoption of
        standards.

     Interest

        This RFC is being distributed to members of the Internet
        community in order to solicit their reactions to the
        proposals contained in it.  While the issues discussed may
        not be directly relevant to the research problems of the
        Internet, they may be interesting to a number of researchers
        and implementers.

     Status Report

        In response to the need for maintenance of current
        information about the status and progress of various
        projects in the Internet community, this RFC is issued for
        the benefit of community members.  The information contained
        in this document is accurate as of the date of publication,
        but is subject to change.  Subsequent RFCs will reflect such
        changes.

  These paragraphs need not be followed word for word, but the
  general intent of the RFC must be made clear."

consider change Void from 0+ to 1+

A lot of language of the ebml spec could be simplified if Void was not allowed to occur at level 0. I'm scanning a large Matroska collection from archive.org to see if Void at level 0 ever occurs. So far 3864 mkv files tested with only EBML and Segment at Level 0. Will report back when the test is done.

how to constrain EBML Header with an EBML Schema

This issue is based on an initial discussion at #98 (comment).

The EBML Schema section claims:

The EBML Schema does not itself document the EBML Header, but documents all data of the EBML Document that follows the EBML Header. The EBML Header itself is documented by this specification in the EBML Header Elements (see EBML Header Elements). The EBML Schema also does not document Global Elements that are defined by this document (namely the Void Element and the CRC-32 Element).

However we have cases where EBML Document Types (Matroska) add additional constraints to the EBML Header. For example, EBML allows EBMLMaxSizeLength=9 but Matroska doesn't.

Should we change the above noted paragraph to say:

The EBML Schema MUST begin with a definition of the EBML Header and EBML's Global Elements which is followed with a definition of the EBML Body. The EBML Schema MUST document the of theEBML Headeras documented by this specification in theEBML Header Elements(see [EBML Header Elements](#ebml-header-elements)), but MAY constrain thoseElementsby adding or constraining therangeattributes. For example, anEBML SchemaMAY constrain theEBMLMaxSizeLengthto a maximum value of8`.

The example schema would then need to be updated to include the Header and Global Elements. The downside is that this causes some redundancy between the EBML spec and the EBML Schema, but it makes the EBML Schema less dependent on the EBML spec. Also the old specdata.xml documented the EBML Header and Global Elements in addition to the Body.

matroska as default docType?

Up until 3304085, there was no default for docType in the EBML Schema. So the docType Element was required to be stored according to the EBML specification. The referenced patch was done to resolve a perceived discrepancy between the EBML and Matroska specs since the EBML did not define a default for docType, but the Matroska spec did define a docType default (matroska).

At the time it was deemed that the Matroska spec was correct partly because it had been historically better maintained than the EBML spec.

However, I noted that the Matroska version of that on https://matroska.org/technical/specs/index.html is followed by a statement:

The default values defined for the EBML header correspond to the values for a Matroska stream/file.
When parsing the EBML header the default values are different, irrespective of the DocType defined.

Because of this statement, I think that the discrepancy between was intentional. That EBML's definition of docType is relevant to any EBML, ie docType MUST be stored since it is mandatory with no default. And in the Matroska spec matroska was listed as the default because a Matroska file must semantically understand 'matroska' is the docType.

I propose reverting 3304085, so that a generical EBML Document MUST store docType whether Matroska or not. And, in the Matroska spec, it should note that for the case of Matroska the docType MUST be 'matroska'. This would change the current implication that docType does not necessarily need to be stored if the docType is matroska.

cc @robUx4 @mbunkus

EBML element as Multiple?

Is this a bug? The EBML element is listed as "Multiple". Thus a file could use:

<EBML>
<Segment>
<EBML>

or

<EBML>
<EBML>
<EBML>
<Segment>

I can't see any reason this would be helpful. How do EBML parser handle cases with multiple EBML elements? For instance what if one EBML element says matroska version 1 and another EBML element says webm version 2.

I suggest that if EBML elements are listed as multiple, that there be some documentation to advise on how to interpret their potentially contradictory contents.

Another issue with this is that the definition of the EBML element says that the EBML document must start with an EBML element, but since multiple EBML elements are allowed, more EBML elements may be used elsewhere (the middle or tail). I noticed that in XML specs, their equivalent of the EBML element which is the XML Declaration is only allowed to occur at the beginning and may only occur once (see http://stackoverflow.com/questions/20251560/are-multiple-xml-declarations-in-a-document-well-formed-xml and its citations).

I propose removing the Multiple option for the EBML element.

draft XML Schema for EBML Schemas

I think having an XML Schema as a machine-readable document to verify that an EBML Schema (which is an XML Document) is correctly formed.

line breaks

As discussed at VDD, I started this work with a hard 72 column line wrap, but if a 72 column output is needed this could be done as an output rather than maintained through all edits. Any objection if I remove the hard 72 column line wrap?

clarify meaning of default if minOccurs>1

Suppose an Element has minOccurs=2 and a default value. Should this be allowed?
Either:

  1. the EBML Reader should insert enough copies of the Element to satisfy the minOccurs with default values as needed. Or.
  2. We could say that 'default' values are only valid when minOccurs equals 1.

I prefer 2.

consider EBNF for range expressions

For example the description of float ranges is very complex:

Within an expression of a float range, as in an integer range, the - (hyphen) character is the separator between the minimal and maximum value permitted by the range. Hexadecimal Floating-Point Constants also use a - (hyphen) when indicating a negative binary power. Within a float range, when a - (hyphen) is immediately preceded by a letter p, then the - (hyphen) is a part of the Hexadecimal Floating-Point Constant which notes negative binary power. Within a float range, when a - (hyphen) is not immediately preceded by a letter p, then the - (hyphen) represents the separator between the minimal and maximum value permitted by the range.

Perhaps we should consider EBNF for these expressions.

define ebml element template

need specific documentation on the definition of an ebml element included definitions of element definition attributes like mandatory, default, and range.

clarify that ebml header doesn't self-define

As mentioned on matroska-devel the ebml spec needs to clarify that the maxsizeid and maxsizelength do not describe the elements of the ebml header but only the non-header parts of the ebml document. The max sizes of ids and sizes for the header are now fixed to 4 within the specification (via MediaArea@ab4afa2)

Segmentation webml

hi i need a little help , just a question:
i m realizing a demuxer formatting webm in mse stream.
Initialization segment is the same for all the sourcebuffer? the segment for track x contains just blocks of this track?

Nested elements are not defined

The '+' in some level values is not explained. In fact the definition of level says it should be an integer.

More generally I think Nesting should be an attribute like default, unknownsizeallowed, etc.

define how to express EBML within XML

There are several places, especially in Matroska tags and chapters, where it's helpful to express the Top Level Element in an XML form. I suggest that the EBML spec (or perhaps the MKV spec) define how this should be done.

I reviewed the xml examples by @mbunkus at https://github.com/mbunkus/mkvtoolnix/tree/master/examples. Overall I think the correlation between EBML Element and XML representation of that element is clear, though in some forms the expressions are not consistent in the primary applications using the XML.

For instance both ChapterTimeStart and EditionFlagDefault are defined as unsigned integers; however they are expressed directly in existing implementations.

For instance if a chapter.xml file contains:

<ChapterTimeStart>1</ChapterTimeStart>

and

<EditionFlagDefault>1</EditionFlagDefault>

and use mkvpropedit to insert that data into a file then the result is an error:

mkvpropedit -c chapters.xml unsigned.integers.in.action.S01E01.HDTV.x264-dericed.mkv

Error: The XML chapter file 'chapters.xml' contains an error: The tag or attribute 'ChapterTimeStart' at position 223 contains invalid or mal-formed data. Expected a time in the following format: HH:MM:SS.nnn (HH = hour, MM = minute, SS = second, nnn = millisecond up to nanosecond. You may use up to nine digits for 'n' which would mean nanosecond precision). You may omit the hour as well. Found '1' instead. Additional error message: Invalid format: At least minutes and seconds have to be given, but no colon was found

If there are exceptions to how to express a value in XML perhaps this is better defined in MKV rather than in EBML (though I think it would be more globally useful in EBML). Are there other such differences in implementations of EBML fragments expressed in XML?

define EBML Reader & Writer and define its expected behavior

Based on Tim's comment:

A number of the things listed as "attacks" are simply things you said someone writing an EBML Document MUST NOT do (or not doing things you said they MUST do). But in none of those cases do you say what is expected of someone reading a document containing these attacks.

@mbunkus, @robUx4, ideas for such a section? There is a bit of similar language at https://github.com/Matroska-Org/matroska-test-files/blob/master/readme.md but I don't think we have such language for EBML Readers yet.

review method to determine end of unknown-size element

Tim asked in cellar review:

If maxOccurs is a positive integer, and maxOccurs instances of the element have already been encountered within a Master Element with an unknown size, is the another instance of this element considered "...the beginning of the next element that is not a valid sub-element of that Master-element”?

For instance if Segment is unknown sized and knowing the Attachments can only occur once within the Segment, then with this structure:

<Segment>
<Attachments/>
<Attachments/>

Is the Segment expected to end at when the second occurrence of is read, since a 2nd expression of Attachments "is not a valid sub-element of that Master-element"?

document why a Master Element should include junk data

From Tim's review:

Can you describe reasons someone might reasonably violate these SHOULDs? You've specified when they are allowed to, but not why. Is the "is permitted" case the only case in which it is permitted? If so, you should structure the normative language to make that clear. What does it mean for EBML to be "used in transmission or streaming"? If someone sends you an EBML document over HTTP, and you save it to disk, are you required to strip out any non-element data from Master Elements?

Referenced section is in the Master Element definition. I suggest working on the clarification branch since that also affects this area.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.