Comments (13)
At this point I am leaning towards only supporting ImageBitmap in the first version of WebCodecs. This sidesteps plane and alignment questions (by not providing mappable buffers at all) while maintaining good performance for playback and transcode cases.
This does requires a readback to access pixel data, and conversion to RGB would be done as part of that process.
For now the best approach may be to ensure that future versions of WebCodecs can easily add new image representations.
from webcodecs.
From my point of view, being able to provide an efficient way of manipulating the video frame is a must for webcodecs, funny hats or head tracking being the obvious use cases.
Given encoder/decoder typical apis, i think we should provide direct access to a planar image with stride.
kind of what is already available on blink native video frame object:
https://cs.chromium.org/chromium/src/media/base/video_frame.h
from webcodecs.
Yeah, the ImageData is just what was convenient to stick there at the time.
from webcodecs.
See https://github.com/dsanders11/imagebitmap-getimagedata-demo.
from webcodecs.
I'm not sure if this should be pushed into another bug, or the discussion started here will do. But here goes my $0.02
ImageData is totally unsuitable for modern day applications. The only way to access the content is via a 8 bits RGB data buffer. To access that data, should the decoder be a GPU one, would require to perform a memory readback which would kill performance.
We need to be able to directly retrieve a decoded image such that it can be accessed via a handle such as a surface ID, so to be usable directly with WebGL or be able to access it via GPU methods only (such as a GL shader)
Additionally, we need to know what the format of that image would be. Most hardware decoder would output NV12 (8 bits) P010 or P016 (respectively 10 and 12/16 bits). Software decoder would output YUV 420, etc.
Additionally, need to know if it's 4:2:0, 4:2:2, 4:4:4 etc
from webcodecs.
Yes, the more I've looked into it, the more I have to agree. Unfortunately, the same may be true also of ImageBitmap. Currently, I'm leaning toward defining a new VideoFrame type that has:
.format: enum of "i420", "nv12", etc
.planes: int (convenience; could be inferred by format)
.onGpu: bool
.getPixelData(plane): returns raw pixel data of one plane; blows up if .onGpu
And then the same WebGL methods that work with ImageBitmap and HtmlVideoElement (such as texImage2D) would just work with VideoFrame passed in, and no readback would be required.
If one really wanted to do a readback, we could support that with something like:
.readFromGpu(): returns a new VideoFrame (async) that has .onGpu == false.
from webcodecs.
I'm not good at JS well, so I have one question relative to ImageData, ImageBitMap, and VideFrame data.
How does WebCodecs take care of stride of lines in picture data?
Some HW accelerations such as SIMD like SSE, NEON, etc. expect picture data in which each line is aligned with BUS-Bandwidth. It may not equal in a multiple of MB unit sometimes. (e.g. 720x480 -> MB:16x16, AVX:256bit/512bit register)
So the object of picture data needs to have some offset information to access in each line in each format.
While GPU has some API to extract into aligned memory from CPU's domain to GPU's domain as a transfer function for its streaming processors.
I think that WebCodes should treat memory alignment for effective video processing.
(Should I create a new issue? 😅 )
from webcodecs.
It makes you wonder then whom is this API targeted for then, and who has shown interest to implement such API.
Any APIs that requires dealing with RGB and perform read backs and conversion, will not be used for videos.
from webcodecs.
ImageBitmap provides a relatively efficient (GPU->GPU texture copy) path to shader access in WebGL, and a very efficient display path (ImageBitmapRenderingContext.transferFromImageBitmap()). Especially given that we don't have YUV data from all decoders (Android MediaCodec in particular), I don't think I want to start designing a new image primitive for the web for the first version of WebCodecs.
I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.
from webcodecs.
Efficient manipulation of video frames implies GPU-only operation (hardware buffer or texture primitive). There exist platforms where uncompressed frames are stored in CPU memory, in which case it is convenient to offer that access, but in general it implies a very expensive GPU readback operation.
from webcodecs.
by effective video manipulation I meant with as less mem copy/conversion as possible.
For example in the case of funny hat:
//Get cam
const cam = await navigator.mediaDevices.getUserMedia({video:true,audio:false});
//Get video track reader
const reader = window.reader = new VideoTrackReader(cam.getVideoTracks()[0]);
//Create writter
const writer = new VideoTrackWriter({});
//Create transform stream
const transformer = window.transformer = new TransformStream({
transform : (frame, controller) => {
//paint something on img
controller.enqueue(frame);
}
})
reader.readable.pipeTo(transformer.writable);
transformer.readable.pipeTo(writer.writable);
//Send it
peerconnection.AddTrack(writer.track);
(note that this code works on chrome right now, except obviously the image manipulation)
The image bytes would be already in cpu mem in I420P memory (on most cases) or has to be converted to I420P for the webrtc encoders. (let's ignore vp9 mode 2 for now).
It would be desirable to be able to expose the underlying image data (if in memory) and if not, be able to export it to ImageBitmap. We would also need a way to create a VideoFrame from an ImageBitmap, ImageData or yuv raw data.
from webcodecs.
I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.
My answer probably reference #47 too.
I also don't think it should be a feature within webcodec or substitute ImageBitmap because ImageBitmap is a really good piece of interoperability between the different API available for rendering and displaying.
But specific plane access from the start is a necessary feature, especially given that if not done from the first hand, you'll probably get into the same status as Android and never implement it, or struggle to implement it afterwards. Likewise, an onGpu
flag probably makes little abstraction in comparison with opaque picture in general.
On solution to match both the webcodec API which could stream ImageBitmap and the ImageBitmap API would be to be able to get an ImageBitmap reference for a plane of the ImageBitmap, and expose more metadata on it. That way, you could have
// pic is an ImageBitmap with format NV12
glTexImage2D(...., pic); // perform the NV12 -> RGB conversion like before
ImageBitmap plane = pic.getPlane(0);
glTexImage2D(...., plane); // no conversion, this is a GL_LUMINANCE texture
And then you are still backward compatible regarding ImageBitmap and elegantly handle cases where you cannot extract the plane (Android for example) by exposing a RGB chroma directly, letting the underlying graphic system handle the chroma conversion, without extending vulkan or OpenGL API.
This is also in line with API like GBM, if you take a look at gbm_bo_get_plane_fd
for example.
To get back to #47, we probably don't care about colorspace within the ImageBitmap, it is an information designed for the display system (so it can be private data here) and the processing systems, which will probably generate code or use extension for this so it can be given by webcodec or even the previous layers within a different object than the ImageBitmap itself. That's what we would expect here in VLC at least as the information comes from the demuxer and not the decoder, and it could evolve quickly whenever you want to add colorspace, mastering data, etc.
from webcodecs.
The spec now offers a VideoFrame interface with Plane interfaces for accessing the pixel data. An ImageBitmap can be generated from a VideoFrame for painting to canvas.
With this now defined, I'd like to close this issue and have new sub-issues filed for remaining gaps. Some known issues/plans described below. Please file a new issue for anything I've neglected.
We intend to add new features to this interface shortly, including
Planar access to GPU backed frames is still a problem. In the short term we intend to at least make this transparent by having GPU backed VideoFrame's not initially offer any planar access, but provide a converter function that performs the copy to cpu memory when invoked.
Down the road we would like GPU backed frames to have some "buffer" type from WebGPU, such that inspection/manipulation of the pixels can happen without a GPU:CPU copy using WebGPU APIs.
from webcodecs.
Related Issues (20)
- Define scope for w3c candidate recommendation HOT 3
- Candidate Recommendation tracking issue
- VideoFrame copyTo() behavior with non-RGBA/RGBX/BGRA/BGRX formats HOT 1
- VideoPixelFormat enum values do not follow casing rule guidelines HOT 2
- EncodedAudioChunkInit should probably also support AllowSharedBufferSource HOT 1
- Issue
- numberOfChannels/sampleRate check in AudioDecoderConfig/AudioEncoderConfig HOT 1
- Under what conditions should the ImageDecoder [[completed promise]] and ImageTrackList [[ready promise]] be rejected? HOT 3
- Spec differentiates between ImageDecoder initialization with unsupported non-image MIME type and an unsupported image MIME type HOT 2
- how to config videoEncoder that output frame is annexB not avcC? HOT 1
- `ImageDecoder` is `[SecureContext]` but related interfaces aren't HOT 2
- `ImageBufferSource` should be `[AllowShared]` HOT 1
- Announcement: Background Segmentation metadata entry to WebCodecs VideoFrame Metadata Registry HOT 2
- Why is the audio encoder output timestamp different from the input timestamp
- Video encoder bitrate control in case frame rate is not provided
- Video frames on demand HOT 1
- Error code: SIGSEGV HOT 3
- VideoFrame.timestamp and EncodedVideoChunk.timestamp HOT 6
- video encoder memory leaks HOT 1
- Include some frame info with `VideoDecoder` Errors? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from webcodecs.