fclc / multi-plexer Goto Github PK

Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W

License: MIT License

C 75.22% Cuda 20.86% Makefile 0.46% JavaScript 3.46%

zfs zfsonlinux cuda jetson-nano jetson jetson-xavier-nx raspberry-pi-4 arm64 pocl ffmpeg

multi-plexer's People

Contributors

Stargazers

Watchers

Forkers

sunpengfei0307

multi-plexer's Issues

Distribution infrastructure: UnicornTranscoder vs kube-plex

Have to find out if I'm better off using UT vs KP

Kubernetes is borderline standard for ditributed compute these days. would make scalling out to more nodes much easier but more involved (and I'd need to learn Kubernetes on top of everything else...)

Unicorn transcoder seems relatively simple, but takes a hard coded approach. simpler to deploy

either solution will require to change the build off ffmpeg (kind of the whole point of this exercise?) and build a capture script to translate to whatever arguments I'm looking for.

Parcing of host arguments to node appropriate, HWaccelerated variants

One of the requirements to pull this whole thing together will be the ability to parse arguments requested from the host and change to hardware accelerated versions.
for example, if the host requests a command like:
ffmpeg -i 1080p_HEVC_source -vf scale:1280:h -c:v libx264 -b:v 5M output

we need to parse a few things and change some others
first, we know that the source is HEVC and that HEVC can be HW accelerated. to do this we need to specify to use HW decoding. however, unlike normal cuda filters, the jocover transcode filters do not specify the hardware. they can do this !!!only!!! because it's a known target platform.
therefore, the beginning of our command looks like for hevc and you change to h264 instead if the source is H264. ffmpeg

-c:v hevc_nvmpi -i 1080p_HEVC_source -vf scale:1280:h -c:v libx264 -b:v 5M output
Now, this is still going to be super slow because the scaling and encode is on cpu. in this case, we then change to include the encoder as before (pretty much always go to h264, so output can be locked to -c:v h264_nvpmi -"requested bit rate" ). we also need to use the cuda_scale filter. This involves adding the cuda selector and looks like

ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_cuda=1280:720
Rinse and repeat as needed for different arguments. One of the special cases will be adding in tonemaping if the source is HDR. There's a few ways to do this, but I'm planning on something simple like
char source = ffprobe "input.mp4" | grep "pixel format" \n switch (case): etc.
where you insert the tonemapping filter if/when needed.

Cuda accelerated tonemap Filter

Initial idea of using POCL as a cuda translation layer isnt viable because of POCL not working with image formats on cuda.

Currently reaching out to Yasroslav Pogrebnyak, the developer of the VF_overlay_cuda ffmpeg filter.

Reaching out to nyanmisaka. Seem's to have a lot of experience working on FFmpeg filters and frankly knows more than I do.

In addition to this, collaborating with Ed Borasky to confirm function on jetson platforms.

vf_tonemap_cuda.txt
(renamed from .c to .txt to make github happy )

Missing: tonemap.cu with proper kernel side code. this is easy once I know how to properly call the cuda kernel side from the ffmpeg side.

Standard stride blocks should work, define total amount of blocks using height. most resolution will be 16:9, so by using height parameter, we have a higher chance of hitting divisible by 3 cleanly, so we can take advantage of cuda language data structure.

Other option is taking the R G and B value of a given pixel which is guaranteed to be *3. this might also help for other tone mapping algorithms that use relative offset from local peak luma as input for tonemapping output

fclc / multi-plexer Goto Github PK

multi-plexer's People

Contributors

Stargazers

Watchers

Forkers

multi-plexer's Issues

Distribution infrastructure: UnicornTranscoder vs kube-plex

Parcing of host arguments to node appropriate, HWaccelerated variants

Cuda accelerated tonemap Filter

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent