ietf-wg-cellar / flac-specification Goto Github PK

The Free Lossless Audio Codec (FLAC) Specification.

Home Page: https://xiph.org/flac/format.html

License: Other

Makefile 100.00%

flac-specification's Introduction

The FLAC Specification

This document defines FLAC, which stands for Free Lossless Audio Codec, a free, open-source codec for lossless audio compression and decompression.

This is currently a work in progress and welcomes contributions and feedback.

Discussion of the plans for standardization should be regulated to the CELLAR listserv.

flac-specification's People

Contributors

Stargazers

Watchers

Forkers

dericed mediaarea ablwr ruuda marcusjohnson91 ktmf01 mulattokid youtacrands-va

flac-specification's Issues

Rename repo to flac-specification?

I suggest to "normalize" repo names, and I think that the way Matroska and EBML repos are named are easier for searches, so I suggest to rename "Cellar-FLAC" repo to "flac-specification".

LPC decoding details missing

I think some of the LPC decoding details should be included in the specification:

Apparently, it is expected that the decoding happens as if with infinite precision.
Is there an expectation that it is possible to implement subset streams with just an 32x32→32 multiplier if the decoded stream is 16 bit?
What does it mean if the quantized linear predictor coefficient shift in SUBFRAME_LPC is negative? I don't think the reference implementation performs a left shift in this case.
I don't think it is obvious where to apply the quantiziation shift.

It may make sense to include an explicit formula for LPC decoding, so that the it's clear whether there is a minus sign involved or not.

standardize expression of frame header

Both frame header and FRAME_HEADER are used to reference the same concept. I propose standardizing these expressions, see the discussion at #35 (comment).

FLAC does not have a MIME type registered with IANA

I believe the commonly used MIME type audio/flac has not been registered with IANA. Is this a potential problem in any way?

Document what is allowed to change between frames

For example can one assume that for a valid FLAC stream sample rate, channel configuration and channel count must be the same for all frames and also must match what is specified in streaminfo?

fix semantics in frame_footer section

In the html version of the specification the FRAME_FOOTER and SUBFRAME are on the same level, see https://xiph.org/flac/format.html#frame_footer. But in the markdown, the FRAME_FOOTER section, https://github.com/privatezero/flac_markdown/blob/master/flac.md#frame_footer, contains a list of SUBFRAME components that aren't in the HTML.

Also in the HTML the phrase "The SUBFRAME_HEADER specifies which one." refers to a list of 4 types of subframe contents, but in the markdown that phrase just aspects next to the first of the 4 lists types of subframe, so the relationship between "The SUBFRAME_HEADER specifies which one." and the list of what is specified is lost.

Add a "Security Considerations" section

The issues found in files from https://wiki.hydrogenaud.io/index.php?title=FLAC_decoder_testbench may be a good starting point on what needs to be looked at.

Different name for metadata blocks

In the list of definitions blocks and subblocks refer to unencoded PCM audio, frames and subframes to encoded audio. Metadata blocks might be a confusing term. Perhaps another term can be used?

Ideas:

Metadata chunks. Chunks are a term used in WAV for metadata as well, seems to me like a good fit
Metadata elements. Sounds 'too small' to me, as these blocks can be up to 16MiB in size and quite complex
Metadata frames. Like in ID3. Frames is already used for encoded audio
Metadata structure. Sound like there cannot be more than one of them
Metadata tracks. Also confusing in audio
Metadata attachments
Metadata segments
Metadata sections
Metadata slices
Metadata fragments. Sounds like part of it has been lost

Please comment

Document extra bit for stereo side channels

Don't find it anywhere in the spec but essentially this https://github.com/xiph/flac/blob/37e675b777d4e0de53ac9ff69e2aea10d92e729c/src/libFLAC/stream_decoder.c#L2040

uniform copyright notice

In the version draft-weaver-cellar-flac-00 on page 1 is say:

Copyright (c) 2019 IETF Trust and the persons identified as the
document authors.  All rights reserved.

and on page 29:

Copyright (c) 2000-2009 Josh Coalson, 2011-2014 Xiph.Org Foundation

I suggest to uniform the copyright notice and to have it only once in the document.

Add caution note about different bit size for METADATA_BLOCK_HEADER (24) and METADATA_BLOCK_PICTURE (32) length

There are software out there that ends up writing a truncated length in METADATA_BLOCK_PICTURE. ffmpeg used to write truncated size (now it fails instead) and now also has a workaround for decoding truncated files (https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/flac_picture.c#L129).

Missing specification of bits-per-sample adjustment for subframes

The reference implementation increases the number of bits per sample by one in the following cases:

For the first subframe if the channel assignment is 0b1001.
For the second subframe if the channel assignment is 0b1000 or 0b1010.

Not doing this results in an implementation which is not interoperable.

units

Currently there is a mix between numbers and units having and not having a space in between (e.g. 48KHz and 16 bit). I suggest to standardise with an espace, which is easier to read. If you agree, I’m happy to go through the document and change when needed.

link to `pucrunch` compression is deed

It’s currently at line 91.

Perhaps this is the intended document?

Document mapping into containers

As we did for FFV1, we should document mapping into common containers in order to be clear about how to do so.

"fLaC", METADATA_BLOCK_STREAMINFO, METADATA_BLOCK in track header
FRAME in container "blocks".

And document the known issues e.g. METADATA_BLOCK_STREAMINFO info is sometimes (file "cut") not relevant.

anchors in markdown

Currently anchors linking to section headers with underscores in them (such as FRAME_HEADER) are broken in markdown.

The current anchors do, however, create working internal links in the html generated by the makefile.

UTF-8 coding?

In this section: https://github.com/privatezero/flac_markdown/blob/master/flac.md#coded-number

It's talking about UTF-8, and UCS-2 aka UTF-16, so which encoding format does it use?

(btw sorry about the formatting, idk why the UTF encoding process is bolded)

UTF-8 encoding: The highest (left most) X bits are set to 1 to indicate the number of bytes in this code point, a 0 means it's ASCII compatible, and therefore has 7 remaining bits, otherwise 1, 2, 3, and 4 1's can be followed by a zero bit to designate the size of this codepoint in bits.

Example: 🦄 is U+1F984, or 0xF09FA684, the leading byte, 0xF0 says that there are 4 bytes in this code point, then all subsequent bytes are prefixed by 0b10 in the top 2 bits, that means it's a continuation byte, and you skip those bits when decoding it.

0xF0 & 0x07 << 18 = 0x000000
 + 
0x9F & 0x3F << 12 = 0x01F000
 +
0xA6 & 0x3F <<  6 = 0x000980
 +
0x84 & 0x3F <<  0 = 0x000004
 =
0x1F984

UCS-2 aka UTF-16 before Surrogate Pairs is just straight up a 16 bit value with no special encoding.

UCS-2 aka UTF-16 misidentified as UCS-2 is if the codepoint is less than 0xD7FF or it's greater than 0xE000 AND less than 0xFFFF, the value has it's same value, otherwise it's split like this:

Since we're encoding the same Unicorn from above, we need 2 codepoints because 0x1F984 is above 0xFFFF.

So, we take the codepoint, subtract 0x10000, for the Low Surrogate we mod 0xF984 with 0x400, to get 0x184, then add 0xDC00 to get 0xDD84

for the High Surrogate, we take 0xF984 and divide it by 0x400 to get 0x003E, then we add 0xD800 to get 0xD83E.

So which encoding are we actually using? because that wording is very confusing.

Missing singedness indication

In the following places of the specification, signed values are used implicitly (the default are signed values):

The constant value in SUBFRAME_CONSTANT.
The warm-up samples in SUBFRAME_FIXED.
The warm-up samples in SUBFRAME_LPC.
The sample data in SUBFRAME_VERBATIM.
The Golomb encoding (see this comment).

Except for the last item, these values are encoded as two's complement.

Included fixed LPC parameters

I'm not sure how common knowledge these fixed parameters are (the WIkipedia page on linear predictive coding does not list them), but if they are not unique defined in the literature, they need to be included in the specification.

Should decoder limits of current implementations be included in the specification?

I have come across a interoperability problem that I'm not sure should be included in the specification.

For a few years I've been working on a new analysis method to implement in libFLAC to improve compression. This is still to slow for day-to-day use, but it is currently quite useful for research purposes. However, now I've stumbled upon a file created with this modified encoder that is spec-compliant, 13% smaller than the file compressed by an unmodified libFLAC (which indicates the problem isn't caused by being inefficient) but cannot be decoded by ffmpeg and presumably relies on undefined behaviour (signed integer overflow to wrap around) in the libFLAC decoder.

So here is the problem: current decoders are apparently limited to using residuals between 2^31-1 and -(2^31). This is unlikely to change, because the problem is very specific to a certain encoder combined with a specially crafted signal. The real problem to me here seems that the maximum residual sample value isn't bounded by the spec. Current implementations simply assume it fits a 32-bit signed integer.

It could be added to the spec that encoders MUST NOT create residuals that fall outside 2^31-1 and -(2^31) and a file with such residuals is considered invalid. If this is too strong this restriction could be limited to files with a bitdepth of 24 bits or less.

This is in line with what ffmpeg does currently:
https://github.com/FFmpeg/FFmpeg/blob/2e82c610553efd69b4d9b6c359423a19c2868255/libavcodec/flacdec.c#L266-L268

Minimum block size does not include the last block

The minimum block size field in in METADATA_BLOCK_STREAMINFO does not include the last block, it can be shorter. Even if that happens, the stream is still a fixed-blocksize stream.

add RFC2119 section and implement the terminology

An RFC2119 section SHOULD be added just after the introductory section, such as

# Notation and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [@!RFC2119].

All current uses of those keywords with the specification SHOULD be reviewed to ensure that the use the same meaning as RFC2119 defines, see https://www.ietf.org/rfc/rfc2119.txt. If so then the term SHOULD be changed to all caps; if not, the the term should be changed so as not to confuse meaning with RFC2119.

Suggested improvements of the current specs

Posted on 2017-06-06 in [flac-dev]:

Hi all,

I'm jumping in on this thread to make a few remarks about the spec. I
implemented a FLAC decoder by only looking at the spec, and I have a few
notes that would have saved me a lot of time if the spec had mentioned
them. They are obvious in hindsight, of course.

* If the channel assignment includes a difference channel, then the
subframe for that channel has one extra bit per sample in order to
encode the difference.

* The number of bits per sample for a subframe, is the number of bits
per sample of the frame, minus the number of wasted bits per sample of
the subframe (and possibly plus one for a difference channel).

I hope this helps future implementers.

Kind regards,

Ruud van Asseldonk

Frame number reset handling

Hi, i've encountered a group of flac files (from the same album) that all have the same peculiar behaviour. They all have end-of-header frame number that behaves like this:

In frame 0 to 2048 the frame number 256 resets back 0
At frame 2048 the resetting stops and goes from 255 to 2048 and after that seems normal

I can't find anything in the FLAC specification how this should be handled.

The xiph decoder seems to ignore the resets and decodes all the samples. ffmpeg seems to not like it and throws away samples at each reset. Should also note that the files have invalid samples MD5 but i think that is unrelated to this. It's a know issue with the "Switch Plus" flac encoder, for what i can see the frames are fine. I have other switch pro files with invalid MD5 where the frame number does not reset.

Unfortunately i can't share these files because of copyright and i haven't yet been able to generate files with the same behaviour. But here is a session with fq showing the relevant parts and also output from decoding with xiph flac and ffmpeg and then counting number of samples decoded. There is 13692480-13594624 = 97856 samples missing and looking at the output it looks like ffmpeg throws away 3 frames per reset which kind of matches number of resets 97856 / 4096 / 3 = 7.963541666666667.

$ fq -o line_bytes=10 -i . <redacted>.flac

# display all metadata
flac> .metadatablocks[] | d
    │00 01 02 03 04 05 06 07 08 09│0123456789│.metadatablocks[0]{}: (flac_metadatablock)
0x00│            00               │    .     │  last_block: false
0x00│            00               │    .     │  type: "streaminfo" (0)
0x00│               00 00 22      │     .."  │  length: 34
0x00│                        10 00│        ..│  minimum_block_size: 4096
0x0a│10 00                        │..        │  maximum_block_size: 4096
0x0a│      00 00 00               │  ...     │  minimum_frame_size: 0
0x0a│               00 00 00      │     ...  │  maximum_frame_size: 0
0x0a│                        0a c4│        ..│  sample_rate: 44100
0x14│42                           │B         │
0x14│42                           │B         │  channels: 2
0x14│42 f0                        │B.        │  bits_per_sample: 16
0x14│   f0 00 d0 ee 40            │ ....@    │  total_samples_in_stream: 13692480
0x14│                  fd fd 1c 53│      ...S│  md5: "fdfd1c5314bbfe1a8adf8b0b96683bba" (raw bits)
0x1e│14 bb fe 1a 8a df 8b 0b 96 68│.........h│
0x28│3b ba                        │;.        │
    │00 01 02 03 04 05 06 07 08 09│0123456789│.metadatablocks[1]{}: (flac_metadatablock)
0x28│      84                     │  .       │  last_block: true
0x28│      84                     │  .       │  type: "vorbis_comment" (4)
0x28│         00 00 b9            │   ...    │  length: 185
    │                             │          │  comment{}: (vorbis_comment)
0x28│                  00 00 00 00│      ....│    vendor_length: 0
    │                             │          │    vendor: ""
0x32│05 00 00 00                  │....      │    user_comment_list_length: 5
    │                             │          │    user_comments[0:5]:
    │                             │          │      [0]{}:
0x32│            26 00 00 00      │    &...  │        length: 38
0x32│                        43 4f│        CO│        comment: "COPYRIGHT=Switch Plus (c) NCH Software"
0x3c│50 59 52 49 47 48 54 3d 53 77│PYRIGHT=Sw│
0x46│69 74 63 68 20 50 6c 75 73 20│itch Plus │
0x50│28 63 29 20 4e 43 48 20 53 6f│(c) NCH So│
0x5a│66 74 77 61 72 65            │ftware    │
    │                             │          │      [1]{}:
0x5a│                  26 00 00 00│      &...│        length: 38
0x64│45 4e 43 4f 44 45 44 42 59 3d│ENCODEDBY=│        comment: "ENCODEDBY=Switch Plus (c) NCH Software"
0x6e│53 77 69 74 63 68 20 50 6c 75│Switch Plu│
0x78│73 20 28 63 29 20 4e 43 48 20│s (c) NCH │
0x82│53 6f 66 74 77 61 72 65      │Software  │
    │                             │          │      [2]{}:
0x82│                        06 00│        ..│        length: 6
0x8c│00 00                        │..        │
0x8c│      47 45 4e 52 45 3d      │  GENRE=  │        comment: "GENRE="
    │                             │          │      [3]{}:
0x8c│                        26 00│        &.│        length: 38
0x96│00 00                        │..        │
0x96│      50 55 42 4c 49 53 48 45│  PUBLISHE│        comment: "PUBLISHER=Switch Plus (c) NCH Software"
0xa0│52 3d 53 77 69 74 63 68 20 50│R=Switch P│
0xaa│6c 75 73 20 28 63 29 20 4e 43│lus (c) NC│
0xb4│48 20 53 6f 66 74 77 61 72 65│H Software│
    │                             │          │      [4]{}:
... <some metadata redacted> ...
*   │until 0xe6.7 (37)            │          │

# display frame number for frames 252-259 and 2045-2050
flac> (.frames[252:260][], .frames[2045:2051][]).header.end_of_header | d
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[252].header.end_of_header{}:
0x2ed808│                     c3 bc   │       .. │  frame_number: 252
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[253].header.end_of_header{}:
0x2f0152│                  c3 bd      │      ..  │  frame_number: 253
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[254].header.end_of_header{}:
0x2f36fe│            c3 be            │    ..    │  frame_number: 254
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[255].header.end_of_header{}:
0x2f662e│c3 bf                        │..        │  frame_number: 255
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[256].header.end_of_header{}:
0x2f9270│         c0 80               │   ..     │  frame_number: 0
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[257].header.end_of_header{}:
0x2fbb2e│                  c0 81      │      ..  │  frame_number: 1
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[258].header.end_of_header{}:
0x2fe96e│         c0 82               │   ..     │  frame_number: 2
        │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[259].header.end_of_header{}:
0x301cea│                     c0 83   │       .. │  frame_number: 3
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2045].header.end_of_header{}:
0x17f5ba6│   c3 bd                     │ ..       │  frame_number: 253
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2046].header.end_of_header{}:
0x17f8342│               c3 be         │     ..   │  frame_number: 254
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2047].header.end_of_header{}:
0x17faebc│                        c3 bf│        ..│  frame_number: 255
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2048].header.end_of_header{}:
0x17fdd38│                  e0 a0 80   │      ... │  frame_number: 2048
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2049].header.end_of_header{}:
0x18009ac│                           e0│         .│  frame_number: 2049
0x18009b6│a0 81                        │..        │
         │00 01 02 03 04 05 06 07 08 09│0123456789│.frames[2050].header.end_of_header{}:
0x1804034│                        e0 a0│        ..│  frame_number: 2050
0x180403e│82                           │.         │

# collect all frame number deltas that are not 1
flac> [.frames[].header.end_of_header.frame_number] | delta | map(select(. != 1))
[
  -255,
  -255,
  -255,
  -255,
  -255,
  -255,
  -255,
  1793
]

$ flac -f -F -d -o <redacted>.flac.xiph.wav <redacted>.flac

flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

<redacted>.flac: ERROR, MD5 signature mismatch
$ ffmpeg -y -i <redacted>.flac <redacted>.flac.ffmpeg.wav
Input #0, flac, from '<redacted>.flac':
  Metadata:
    COPYRIGHT       : Switch Plus (c) NCH Software
    ENCODEDBY       : Switch Plus (c) NCH Software
    PUBLISHER       : Switch Plus (c) NCH Software
    TITLE           : <redacted>
  Duration: 00:05:10.49, start: 0.000000, bitrate: 1057 kb/s
  Stream #0:0: Audio: flac, 44100 Hz, stereo, s16
Stream mapping:
  Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '<redacted>.flac.ffmpeg.wav':
  Metadata:
    ICOP            : Switch Plus (c) NCH Software
    ENCODEDBY       : Switch Plus (c) NCH Software
    PUBLISHER       : Switch Plus (c) NCH Software
    INAM            : <redacted>
    ISFT            : Lavf58.76.100
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.134.100 pcm_s16le
[wav @ 0x7fcfa861a800] Non-monotonous DTS in output stream 0:0; previous: 1028096, current: 4096; changing to 1028096. This may result in incorrect timestamps in the output file.
[wav @ 0x7fcfa861a800] Non-monotonous DTS in output stream 0:0; previous: 1028096, current: 0; changing to 1028096. This may result in incorrect timestamps in the output file.
[wav @ 0x7fcfa861a800] Non-monotonous DTS in output stream 0:0; previous: 1028096, current: 4096; changing to 1028096. This may result in incorrect timestamps in the output file.
[wav @ 0x7fcfa861a800] Non-monotonous DTS in output stream 0:0; previous: 1028096, current: 8192; changing to 1028096. This may result in incorrect timestamps in the output file.
... <lots of similar logs removed>
size=   53104kB time=00:05:10.59 bitrate=1400.7kbits/s speed= 137x
video:0kB audio:53104kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000287%

$ ffmpeg -v verbose -i <redacted>.flac.xiph.wav -f null - 2>&1 | grep "frames encoded"
  Output stream #0:0 (audio): 13372 frames encoded (13692480 samples); 13372 packets muxed (54769920 bytes);
$ ffmpeg -v verbose -i <redacted>.flac.ffmpeg.wav -f null - 2>&1 | grep "frames encoded"
  Output stream #0:0 (audio): 13276 frames encoded (13594624 samples); 13276 packets muxed (54378496 bytes);

RICE coding

Can we stop calling it Rice coding, because people naturally think you mean unary coding, instead of exponential-golomb coding which is what you actually use.

cross references

As noted by @dericed on the CELLAR list, cross references need to be formatted for better rendering in the RFC.

SUBFRAME_VERBATIM

won't render as expected in a plain text RFC, but would simply render to something like "Section X.X.X".

In EBML we use markdown such as
See [the section onElement Data Size](#element-data-size) for rules that apply to elements of unknown length.
so that in the RFC this renders to
See Section 7 for rules that apply to elements of unknown length.
and in the markdown it renders to

See the section on Element Data Size for rules that apply to elements of unknown length.

Missing specification of wasted bits-per-sample value

It is not immediately obvious howwhen to apply the wasted bits-per-sample value during decoding. It's possible to guess that a left shift involved, but it is unclear whether this happens after or before LPC decoding. (The impact on the bits-per-sample value of the subframe is also unspecified, see #83.)

change in meaning in a METADATA_BLOCK reference

In the markdown is the line

METADATA_BLOCK Zero or more metadata blocks

but the original format.html places an asterik after METADATA_BLOCK. I'm unclear as to the semantic difference here, but I think that there is one.

mark 00 release and upload to data tracker

IMHO the rendered output of this repo should be in the cellar wg document tracker. I suggest marking a 00 release (see these for instance https://github.com/Matroska-Org/ebml-specification/releases) and then uploading draft-xiph-cellar-flac-00.xml to https://datatracker.ietf.org/submit/.

broken list in FRAME_HEADER

In the markdown by the phrase <3> Sample size in bits: the subsequent list is supposed to be a lower depth, but the subsequent list is at the same level in the markdown.

Unclear meaning of maximum frame size (with respect to reference implementation behavior)

The reference implementation produces frames that are larger than the maximum frame size in METADATA_BLOCK_STREAMINFO (even when the maximum frame size is not zero). I do not know if this a bug in the reference implementation, or if something is under-specified here.

add description

In my opinion, we should add a description for this repository, but I do not have the permission to make (a PR for) this.

[v02] add data structure of the `METADATA_BLOCK_VORBIS_COMMENT`

It would be useful to have it in the specification itself.

list hierarchy mismatch

The line The Rice partition order in a Rice-coded residual section must be less than or equal to 8. is at a different level than it exists in https://xiph.org/flac/format.html.

broken list in the RESIDUAL section

In the HTML in the RESIDUAL section, there's one value in a table that is RESIDUAL_CODING_METHOD_PARTITIONED_RICE || RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 (this or that) but in the markdown, at https://github.com/privatezero/flac_markdown/blob/master/flac.md#residual, the components of the list are divided into two separate bullet points which changes the meaning.

list of authors is unclear

We need to acknowledge Xiph.org Foundation (by person name) as an author, and then we need to make sure that former contributors are acknowledged.

Golomb-Rice (from flac-dev)

Posted on 2017-06-06 in [flac-dev]:

Andrew,

I think it is neither Rice Coding nor Exponential Golomb Coding. The one used in FLAC is Golomb-Rice coding, which is almost optimal for the Laplace (exponential) statistical distribution of residuals after modelling.

Best regards,

Federico

daHZ in coded sample rate

doesn't make any sense, can you change it to just multiply by 10?

"run-length encoded" for constant subframe might be confusing

Currently says "The signal is run-length encoded and added to the stream." i remember reading it thinking it might be some generic run length encoding but it's just one constant with length of the block size as i understand it?

endianness

At the second paragraph of Format it’s stated that «All numbers are big-endian coded.» Is this true? I am asking, because my test encoder/decoder works well also with little endian codings, like pcm_s24le which BTW is also signed.

Document how total samples in stream should be used

I've seen FLAC files in the wild where non-streaminfo-block-size aligned number samples in a stream is signalled using total samples but the last frame does not have a variable block size. For example ffmpeg currently seems to decode all frame samples in this case and ignores total samples.

Also maybe good to make it clear what the MD5 sum should be based on, the total samples or all frame samples?

RICE2_PARTITION list is broken in markdown

In HTML the RICE2_PARTITION has a list of two values, 5(+5) and ? and their meanings. In the markdown, https://github.com/privatezero/flac_markdown/blob/master/flac.md#rice2_partition, the ? value is not present but the meaning is there.

logic mismatch

To my reading, this section (a list of block types separated by 'or's which is labelled with an obligation) doesn't match the original format.html. Here is appears that the phrase The block data must match the block type in the block header. only relates to the last item in the list when that doesn't read that way in the original. Needs some adjustment so that both read the same way.

METADATA_BLOCK_STREAMINFO MD5 signature field is underspecified

In the reference implementation, the MD5 signature appears to cover the raw, undecoded byte stream. For example, if encoding starts with a WAVE format file, the signature is computed over a stream of signed, 16-bit little-endian words. This is contrary to the rest of the specification, which is based on network byte order.

At this point, it is probably best not to specify a computation method for this field, and say that is implementation-defined and may be used for decoding consistency checks. This would also avoid mentioning MD5, which should make the security people happy (although of course MD5 is not used in a security-relevant way here).

APPLICATION metadata block: riff (BWF), aiff and w64

The FLAC format has room for application specific data, which may or may not have an open specification. I haven't heard of widespread use of any of them and I'd say most of them are of no value to cellar.

There are however three of these that are used by the flac command line utility that store RIFF (WAVE), AIFF and W64 metadata. As the cellar wg is specifically targeting archival usage, are the metadata capabilities of WAVE and AIFF used to some extent in archival applications? Would it be of any use to include a definition of these riff, aiff and w64 application metadata blocks, despite not technically being part of the FLAC specification?

edit: or perhaps such metadata would (from a cellar perspective) be better stored directly in the Matroska metadata?

Unclear license of this specification

The original HTML file claims it is licensed under the GNU Free Documentation License. I do not think this is something that will be acceptable to the IETF. Will the document be re-licensed under different terms? A liberal license that would permit reusing specification fragments in program code would be best (in other words, not the IETF default license).

Including a list of open-source implementations?

Would it be of any benefit to include a list of (high-quality) open-source decoder implementations? Or would this be something too volatile? I'd think of something like this:

libFLAC (C and C++, BSD-like license)
ffmpeg (C, GNU LGPL)
Firefox (C, MPL)
dr_libs (C. Public Domain/MIT No Attribution)
Claxon (Rust, Apache 2.0)
jFLAC (Java, LGPL)

LPC should be defined

LPC = linear predictive coding

describe method for expressing hex and binary data

I suggest documenting/acknowledgement a method for expressing binary and hex values. Currently the context can be inferred in most places (in not all) by understanding the number of bits described, but I think having expressions such as 0b0000 or 0x0000 would be more clear, rather than 0000 which is ambitiguous (does it mean binary, hex, decimal, etc?).

[v02] number of residuals

In my opinion, the current section encoded residuals is less clear than Federico Miyara’s explanation on page 383 of his Software-Based Acoustical Measurements (Springer 2017):