veovera / enhanced-rtmp Goto Github PK

This industry-sanctioned project introduces significant enhancements to the RTMP and FLV specifications, outlining advanced features aimed at revitalizing and modernizing the RTMP solution.

Home Page: https://veovera.org

License: Apache License 2.0

HTML 90.05% SCSS 9.95%

rtmp enhanced-rtmp rtmp-protocol

enhanced-rtmp's Introduction

Veovera Software Organization (VSO)

Intro

Founded in 2022, the Veovera Software Organization (VSO) is a dedicated not-for-profit entity committed to enhancing software technology for the public good. VSO has played a pivotal role in advancing the Real-Time Messaging Protocol (RTMP), notably through collaboration with industry leaders such as Adobe, YouTube, Twitch and others. Our collective efforts focus on refining and broadening the capabilities of RTMP to ensure it meets the evolving demands of modern media streaming.

Abstract

As the media streaming landscape continues to evolve, the VSO has emerged as a pivotal force in modernizing the RTMP/FLV protocol, a cornerstone of streaming technology that had remained largely unaltered for over two decades. Through our pioneering initiatives, we have introduced initial enhancements “enhanced-rtmp-v1” (or "enhanced-rtmp-v1.pdf”) and the latest alpha specification, "enhanced-rtmp-v2" (or "enhanced-rtmp-v2.pdf”). This key publication outlines substantial advancements to legacy RTMP, including:

Advanced Audio Codecs: Integration of codecs such as AC-3, E-AC-3, Opus, and FLAC, catering to diverse audio quality and compression needs, ensuring compatibility with modern audio playback systems.
Multichannel Audio Configurations: Addition of multichannel audio configurations, enhancing the auditory experience without sacrificing compatibility with existing setups.
Advanced Video Codecs: Integration of video codecs like VP8, VP9, HEVC, and AV1 with High Dynamic Range (HDR) support, catering to modern displays and content requirements.
Video Metadata: Addition of VideoPacketType.Metadata, expanding the range of supported video metadata types.
FourCC Signaling: Addition of FourCC signaling for the advanced codecs mentioned above, as well as for legacy AVC, AAC, and MP3 AV codecs.
Multitrack Capabilities: Introduction of audio and video Multitrack capabilities, allowing for the management and processing of multiple media streams concurrently enhancing the overall media experience.
Reconnect Request Feature: Implementation of a Reconnect Request feature, aimed at improving connection stability and resilience.

These enhancements are designed to align RTMP with contemporary streaming technology standards, ensuring its relevance and efficacy in today's digital landscape.

For the latest insights and updates on these exciting developments, we invite you to explore our news feed.

Community Engagement

In the spirit of open collaboration and continuous improvement, we highly value the community's feedback and contributions. Our work, including detailed documentation, the latest enhancements, and avenues for community engagement, is hosted on GitHub. We invite you to join our efforts by visiting our GitHub repository at <https://github.com/veovera/enhanced-rtmp>. Here, you can access our publications, contribute to the project, and share your insights by creating new issues.

Together, we can shape the future of media streaming technology, ensuring that RTMP remains innovative. Your participation and feedback are crucial to our ongoing endeavors.

Join Veovera

For information on how to join Veovera Organization please contact <mailto:[email protected]>.

enhanced-rtmp's People

Contributors

Stargazers

Watchers

Forkers

fredzeng czzkr chunlicui yuyou goga1992 ouweiquan poiuowrioi jbl19860422 david119807 gxmcool feiwei9696 yaseenhq dgdavey t-bagwell ljx0305 bayareaunicorn lyuji282

enhanced-rtmp's Issues

Hakiiii

What about adding also H.266/VVC?

Indeed, thanks from our side as well for the good job for enhancing the RTMP & FLV specs!

We at Bytedance are interested in adding also the support of H.266/VVC. Is it OK to add that?

connect command fourCcList is video specific but doesn't have "video" in its name

in the "Extending NetConnection connect Command" section, there is a new property "fourCcList". the description says it is an array of strings each "representing a supported video codec". the current Enhanced RTMP document is only extending video codecs; however, one would assume (from #2) that audio is coming soon.

with the name "fourCcList", one would presume that in the future, audio and video fourCCs would be intermingled in that array. this might be fine for some applications, but may complicate the semantics if it's important to tell the difference between "is there an audio codec i've never heard of" and "is there a video codec i've never heard of".

since today the audioCodecs and videoCodecs bitmaps are already separate, for symmetry i propose that the arrays of fourCCs for video and audio also be separated, and that they have names indicating which they're for. that way, if the difference is important to an application, it is at least available.

presence of composition time offset in PacketTypeCodedFrames is codec-specific

in the VideoTagBody section of version 2023-03-v1.0.0-B.9, there are two PacketTypes defined for coded frames: PacketTypeCodedFrames = 1 and PacketTypeCodedFramesX = 3, where type 3 has an implicit composition time offset of 0 (meaning the presentation time and decode time are the same).

the pseudocode for If FourCC == HEVC explicitly shows the composition time offset SI24 for PacketTypeCodedFrames and defines it to 0 for PacketTypeCodedFramesX. the other defined codecs (VP9 and AV1) use PacketTypeCodedFrames but don't show a composition time offset (presumably because those codecs don't code independent frames out of presentation order so such a field isn't needed).

since the notion of a presentation time different from decode time occurs in at least two codecs today (AVC and HEVC), and in the interest of consistency, separation of layers and concerns (notional header vs payload), and code reuse, i think PacketTypeCodedFrames should always have an SI24 Composition Time Offset field, and for codecs where that's always 0, they MAY use PacketTypeCodedFramesX to save those 3 bytes. any codec should be able to use either packet type 1 or 3 with a consistent parsing and interpretation.

ideally the Enhanced RTMP spec wouldn't itself define any new codec mappings, and instead would define the generic syntax. an addendum, appendix, or separate registry would define mappings for AV1, VP9, and HEVC to start.

connect response doesn't indicate support for Enhanced RTMP

the client can indicate support for Enhanced RTMP by including a fourCcList member of the connect command's argument/command object (though see #10 about the name of that member).

however, clients can't tell if servers support Enhanced RTMP. while an unaware server can simply forward Enhanced RTMP messages as they come in, this won't have the desired effect for clients subscribing to a stream after its publish has started. in particular, servers unaware of Enhanced RTMP won't have special treatment of PacketSequenceStart, PacketTypeMetadata, and PacketTypeMPEG2TSSequenceStart messages, remembering them and sending them to new subscribers before the coded frames.

a publishing client should be able to tell that the server will or won't perform the special sequence/metadata processing for subscribers, and subscribing clients should be able to tell that they may not receive the sequence/metadata messages for enhanced messages. this could be accomplished by echoing the fourCcList (and/or others as appropriate) back in the connect transaction's _result Info Object.

Enhance audio codecs

Such as opus over rtmp,so that it can be passed to webrtc,no need to transcode

AudioPacketType.SequenceEnd is not defined

The spec uses AudioPacketType.SequenceEnd in one place, but it has never defined a value for it anywhere.

PacketTypeMetadata SHOULD or MUST come before the video sequence it affects?

the "Metadata Frame" section says "When leveraging PacketTypeMetadata to deliver HDR metadata, the metadata should be sent prior to the video sequence that it affects".

if the intention of the PacketTypeMetadata packet is to set metadata that needs to be known before decoding the video sequence, i think this sentence should use a BCP 14 "MUST" instead of lower-case "should".

also, is the qualification of "when leveraging ... to deliver HDR metadata" needed? presumably this message can be used for any metadata that needs to be known before decoding the video sequence. also presumably there's no more than one of these that's expected to be active at a time, and a new message supersedes a previous one.

would this message ever be expected, or be meaningful, after a video sequence is underway? in other words, when encountering one of these messages, would one expect the immediate next video messages to be a sequence start/init message and then a key frame? can a receiver count on that with the force of a BCP 14 MUST?

please clarify meaning and semantics of PacketTypeMPEG2TSSequenceStart

PacketTypeMPEG2TSSequenceStart is referenced in the pseudocode for AV1. please clarify whether:

this packet type has a meaning for any other codecs
if this packet type can be used in combination with PacketTypeSequenceStart or if they are mutually exclusive. if they can be used in combination, which one SHOULD/MUST come first?

how and for what is PacketTypeMPEG2TSSequenceStart expected to be used?

What's the plan to make the document as Internet standard?

Thanks for the good job for enhance the RTMP specs, I'm glad to ask for any plan to publish the document as Internet standard? Isn't possible to propose the document to IETF like HLS spec and make it more popular and used by more company in the world?

fourCcList description should explicitly state support for receiving those codecs

the fourCcList currently says it contains strings each "representing a supported video codec". as discussed at #11 (comment) , the description should be clarified to say that this lists codecs that the client is able to receive and process.

the practical historical usage of the videoCodecs and audioCodecs bitmaps sent by the client in the connect command has been to indicate which codecs the client is able to receive. historically it's been irrelevant to the server what codecs a client can send, since the server could handle all traditional RTMP codecs.

OBS and SRS supported enhanced RTMP for user to push HEVC via RTMP.

This is NOT an issue but a NOTICE

SRS 6.0.42+ media server support this extended RTMP specification, so you can push HEVC via RTMP to SRS very easy:

git clone https://github.com/ossrs/srs.git
cd srs/trunk && ./configure --h265=on && make
./objs/srs -c conf/http.ts.live.conf

Then, you can use OBS 29.1+ to push HEVC via RTMP.
Start OBS with the following settings in the Settings > Stream tab:

Server: rtmp://localhost/live
Stream Key: livestream
Encoder: Please select the HEVC hardware encoder.

Finally, open the player http://localhost:8080/players/srs_player.html?stream=livestream.ts

Or use VLS or ffplay to play http://localhost:8080/live/livestream.ts

Opus Sequence Headers

I'm a bit confused about the OpusSequenceHeader.
The spec says "read either identification or comment header".

I'm not sure how to interpret that. FFmpeg considers only the identification header as "extradata", and only carries that around.
Is it fine to only send that?

And how is "either" to be interpreted here? If we wanted to send the comment header, we would need to send a second SequenceStart?

consider enabling Discussions on this repo

i wanted to ask if folks have test/sample media (FLVs) with Enhanced content, and clients & servers to test/interop with. i think that sort of thing belongs in a Discussion rather than an Issue. please consider enabling Discussions on this repo to discuss that and other topics that don't belong in Issues. :)

perhaps eventually links to test media and compatible clients & servers could be on a wiki page or in the README.

Additional Audio Codec: FLAC

Per #2 (comment)

Currently, the only way to deliver lossless audio over RTMP is via linear PCM, which has some drawbacks:

Limited to 16-bit audio
Max 2 channels
Max sample rate of 44.1kHz

Adding FLAC would allow some flexibility for stream producers that want to transcode for multiple bandwidths without losing audio quality. Currently the best solution is to max out the AAC bitrate and hope there isn't too much quality loss on transcodes.

Plus, assuming FLAC would signal its own bit-depth, samplerate, and channel count (I know AAC signal its own samplerate, I can't recall if it also signals channel count) - you could stream incredibly high-quality audio.

As far as solutions shipping today that would benefit from FLAC support:

OBS

Say I want to produce a web radio show. Most web radios are based on Icecast/shoutcast and a lot of the tooling around Icecast isn't great.

OBS is actually a pretty decent tool for producing radio-type content with builtin audio filters, support for multiple audio devices, scripting, hotkeys, and so on.

If OBS could output with FLAC over RTMP - a receiving server could take the single RTMP stream, and transcode to different formats (output to Icecast, HLS, and DASH) with significantly less quality loss.

HLS/DASH repackagers

Similar to above - I know the various nginx RTMP add-ons support converting RTMP into HLS and DASH. Being able to accept a FLAC stream would allow for producing HLS, DASH, and even other RTMP outputs with multiple codecs for multiple devices.

Say I want to create a multivariant playlist in HLS with next-generation codecs like xHE-AAC. Having a single endpoint with a lossless codec would allow me to create a multivariant playlist with perfect sample alignment (since it's all from the same source). Currently I need to have my apps produce multiple streams at multiple bitrates and its hard to get everything aligned perfectly, or I have to just stream with AAC and re-encode/repackage server side and lose quality.

Those are two off the top of my head, but it would really expand the possibilities for using RTMP as an intermediary/transport format. A user producing a web radio could stream in FLAC to a server over RTMP, which transcodes to formats like AAC, AAC-HE, xHE-AAC as HLS, and allow the listener to auto-fallback to codecs based on their bandwidth.

Long story short, the audio producer wouldn't need to worry if the final output/destination supports a particular codec - they would just stream in FLAC and have the ingest handle getting the right codecs out to the receivers. FLAC essentially allows nearly any other codecs to be supported via server-side transcoding.

Standardized way to signal audio track language?

While the spec does talk a bit about the ordering being used to identify the language of an audio stream, I do think that some way to tag an audio stream with a language would be beneficial.
Adding three bytes for an ISO country code or something somewhere would be super nice and might prevent potential vendor-specific order-magic.

onMetaData: videocodecid not defined for HEVC, VP9 and AV1

Hi,

I am really glad a new version of RTMP is coming 🙏

In the FLV spec, there is a field in onMetaData called videocodecid where the codec was specified:

videocodecid: Number: Video codec ID used in the file (see E.4.3.1 for available CodecID values)

with CodecID:

Codec Identifier. The following values are defined: 2 = Sorenson H.263
3 = Screen video
4 = On2 VP6
5 = On2 VP6 with alpha channel 6 = Screen video version 2
7 = AVC

Unfortunately, in the current version of enhancements of RTMP spec, there is no mention of onMetaData's videocodecid field.
Is it something missing or is it because onMetaData is not really useful?

Best regards,
Thibault

fourCcList definition should provide for a wildcard

in traditional RTMP, clients that are capable of receiving any codec (like recorders or forwarders) can just set the videoCodecs and audioCodecs fields to 0xffff to enable all possible code points. however, the number of possible future fourCCs is way too large to list them all. to accommodate clients that can handle any codec, i propose defining a special value to include in the fourCcList codec list (or its replacement(s)) to indicate "any/all codecs".

i further propose this special value to be the one-character string "*" (ASTERISK). being one character long it can never conflict with any real fourCC, and * is already commonly used to mean "wildcard".

See issue SRS#512 and SRS#547