babitmf / bmf Goto Github PK

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.

Home Page: https://babitmf.github.io/

License: Apache License 2.0

CMake 5.16% Python 24.84% C++ 56.73% C 2.92% Makefile 0.07% Java 1.63% Shell 1.83% Objective-C 1.13% Objective-C++ 3.25% Cuda 1.27% Go 1.13% Dockerfile 0.04%

ai arm bmf bytedance cpp cross-platform cuda ffmpeg gpu heterogeneous live-video mediacodec multimedia numpy nvidia opencv python tensorrt transcode x86-64

bmf's People

Contributors

Stargazers

Watchers

Forkers

huheng cvley taoboyang frankfengw519 sunnie-star flt19940317 yinxungong walkerwjt wljfans mpr0xy bravegang tongyuantongyu ouyangrong1313 hipoooop sbraveyoung germanaizek 1008610010 linecode fengwk decentralised-ai tomke123 wittech galacticalex clay4444 xialixin yagerfgcs zhyh329 ishisan x-smart wwcapple jindingwang ansikang leewlving nnuujj caljer1 sfeiwong sethsnow 2213601279 tohrusky davidtoby yixuqiu operadorhal meowboy326 zhitianwu llftt mmdzzh leekingly surper518 codecodelong dadadabinbin navinelahi le-xiaohuai-speech g7b9 blueairrl haiyang426 sunmeng007 githubofrico jacklau1222 sernger tsgu-osc star-hengxing ruiqurm

bmf's Issues

encoding and decoding are both hardware-accelerated using GPUs,does BMF copy the GPU data back to memory

case：
ffmpeg -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -hwaccel_device 0 -c:v h264_cuvid -i input.h264 -c:v nvenc_h264 output.h264

Q:
The encoding and decoding are both hardware-accelerated using GPUs, with decoding completed on the GPU and encoding also completed on the GPU, without any memory copying from GPU to host memory.
In this case, what are the differences between BMF’s processing of the decoding process and CPU mode decoding? Are both using vaFrame to receive decoding results, with the address in vaFrame being on the GPU for one and on the CPU for the other?
Also, in this scenario, does BMF copy the GPU data back to memory, encapsulate it as a TASK, and place it in the scheduling queue? Would this not change the original purpose of reducing memory copies?

Mac pip install BabitMF could not found error

ERROR: Could not find a version that satisfies the requirement BabitMF (from versions: none)
ERROR: No matching distribution found for BabitMF

BMF提供了graph和pipeline的处理方式，这两者各自的特点是什么？使用场景有何不同？

如题。

运行中的graph，如何取消

1.构建graph；
2.graph.run()；

graph在运行过程中，如何取消，比如发送eof？

fill_task_input don't consider the incomplete data , is it implied that the module must consider the incomplete data and carry out local caching and splicing processing when handling (TASK)?

If a certain module (MODULE) requires multiple input streams (such as OVERLAY) to function normally, and only some of these input streams have data, is it implied that the module must consider the incomplete data and carry out local caching and splicing processing when handling (TASK)? Otherwise, there will be incomplete input data, leading to processing failure.

bool ImmediateInputStreamManager::fill_task_input(Task &task) {
bool task_filled = false;
for (auto & input_stream : input_streams_) {

        if (input_stream.second->is_empty()) {
            continue;
        }
        //one task cantain mult pkts, NEED add max pkts ctl?
        while (not input_stream.second->is_empty()) {
            Packet pkt = input_stream.second->pop_next_packet(false);
            if (pkt.timestamp() == BMF_EOF) {
                if (input_stream.second->probed_) {
                    BMFLOG(BMF_INFO) << "immediate sync got EOF from dynamical update";
                    pkt.set_timestamp(DYN_EOS);
                    input_stream.second->probed_ = false;
                } else
                    stream_done_[input_stream.first] = 1;
            }
            //READ：取到task的对应输入队列中
            task.fill_input_packet(input_stream.second->get_id(), pkt);
            task_filled = true;
        }
    }

AttributeError: 'bmf.lib._bmf.sdk.Packet' object has no attribute 'get_data'

/usr/lib/python3.7/site-packages/bmf/modules/null_sink.py in process(self, task)
21 elif pkt.get_timestamp() != Timestamp.UNSET:
22 Log.log_node(LogLevel.DEBUG, task.get_node(),
---> 23 "process data", pkt.get_data(), 'time',
24 pkt.get_timestamp())
25 return ProcessResult.OK

AttributeError: 'bmf.lib._bmf.sdk.Packet' object has no attribute 'get_data'

demo里，google drive的文件无法下载

!gdown --fuzzy https://drive.google.com/file/d/1l8bDSrWn6643aDhyaocVStXdoUbVC3o2/view?usp=sharing -O big_bunny_10s_30fps.mp4
Traceback (most recent call last):
File "/usr/local/bin/gdown", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/gdown/cli.py", line 151, in main
filename = download(
File "/usr/local/lib/python3.10/dist-packages/gdown/download.py", line 203, in download
filename_from_url = m.groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

python install error,It seems you didn't publish it

ERROR: Could not find a version that satisfies the requirement BabitMF (from versions: none)
ERROR: No matching distribution found for BabitMF

请问BMF移动端上效果如何

请问BMF有支持移动端的硬解和GPU加速吗，有在移动端上做适配吗

BMF提供了graph和pipeline的处理方式，这两者各自的特点是什么？使用场景有何不同？

请问如何在一个graph内设置两个输出？

运行人脸识别demo（trt_face_detect.py）把判断语句：if (output_queue_size >= 2):屏蔽掉，云溪谷会报错：
[2024-05-11 04:09:39.784] [info] node:c_ffmpeg_encoder 3 scheduler 1
[2024-05-11 04:09:39.807] [error] node id:1 catch exception: KeyError: (1,)
At:
/root/bmf/lvyantest/tensorrt_post/trt_face_detect.py(226): process
[2024-05-11 04:09:39.807] [error] node id:1 Process node failed, will exit.
[2024-05-11 04:09:39.808] [info] node 1 got exception, close directly
[2024-05-11 04:09:39.808] [info] schedule queue 0 start to join thread
[2024-05-11 04:09:39.808] [error] node id:1 catch exception: KeyError: (1,)

At:
/root/bmf/lvyantest/tensorrt_post/trt_face_detect_ok.py(226): process

[2024-05-11 04:09:39.808] [error] node id:1 Process node failed, will exit.
Traceback (most recent call last):
File "testtrt1.py", line 54, in
main()
File "testtrt1.py", line 50, in main
video.run()
File "/root/bmf/output/bmf/builder/bmf_stream.py", line 82, in run
return self.node_.get_graph().run(self)
File "/root/bmf/output/bmf/builder/bmf_graph.py", line 747, in run
self.exec_graph_.close()
File "/root/bmf/lvyantest/tensorrt_post/trt_face_detect_ok.py", line 226, in process
output_queue_1 = task.get_outputs()[1]
KeyError: 1
是否是graph内只设置了一个输出，但是trt_face_detect.py内逻辑输出了文本的输出。请问在graph内怎样设置才能正确输出两个结果。

我的graph代码如下：
import time
import tensorrt as trt
import bmf.hml.hmp as mp
from nms import NMS
import PIL
from PIL import Image

def main():
graph1 = graph({'dump_graph':1})

video = graph1.decode({
    "input_path": "./face.mp4",
    "video_params": {
       "hwaccel": "cuda",
    }
})["video"]

video = video.module("trt_face_detect", {
    "model_path": "./version-RFB-640.engine",
    "input_shapes": {
        "input": [1, 3, 480, 640]
    }}, entry="trt_face_detect_ok.trt_face_detect")

video = video.module("face_postprocess",
    entry="face_postprocess.face_postprocess")

video = video.encode(
    None, {
        "output_path": "./trt_out.mp4",
        "video_params": {
            "codec": "h264_nvenc",
            "bit_rate": 5000000,
            "max_fr": 30
        }
    }
    )


video.run()

if name == "main":
main()
感谢！

bmf::builder::Node::AddCallback设置回调之后RealNode::NodeMetaInfo::Dump的时候抛出异常cannot use operator[] with a numeric argument with

cannot use operator[] with a numeric argument with
1、RealNode::NodeMetaInfo::Dump()中使用数值类型作为json的key抛出异常
另外还有个问题：
2、RealNode::NodeMetaInfo::Dump()的时候和NodeMetaInfo::to_json()里面callback_binding字段名不一致格式也不一致

Pass non-image data between modules using Packet

When building the controlnet demo, I am trying to build a pipeline that looks like this:

image decoder ---> controlnet inference module ---> image encoder
^
|
prompt reader ---------+

The prompt is read from files and passed to the controlnet inference module through bmf.Packet. The type of the prompt is Python dict, and bmf/python/py_module_sdk.cpp shows that bmf.Packet support all python types. But when I run the pipeline, I get the following error:

[2023-10-11 02:09:40.995] [error] node id:2 catch exception: BMF(0.0.8) /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/engine/c_engine/src/node.cpp:352: error: (-5:Bad argument) [Node_2_c_ffmpeg_filter] Process result != 0.
 in function 'process_node'

[2023-10-11 02:09:40.996] [error] node id:2 Process node failed, will exit.
[2023-10-11 02:09:40.996] [info] node 2 got exception, close directly
[2023-10-11 02:09:40.996] [info] node id:2 process eof, add node to scheduler
[2023-10-11 02:09:40.996] [info] schedule queue 0 start to join thread
[ipp1-2035:1225 :0:1315] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1e0)
==== backtrace (tid:   1315) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000015856d CFFFilter::init_filtergraph()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/c_modules/src/ffmpeg_filter.cpp:242
 2 0x0000000000159b9c CFFFilter::process_filter_graph()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/c_modules/src/ffmpeg_filter.cpp:393
 3 0x000000000015ae0e CFFFilter::process()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/c_modules/src/ffmpeg_filter.cpp:578
 4 0x0000000000372a9e bmf_engine::Node::process_node()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/engine/c_engine/src/node.cpp:348
 5 0x00000000003a5a7a bmf_engine::SchedulerQueue::exec()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/engine/c_engine/src/scheduler_queue.cpp:153
 6 0x00000000003a5678 bmf_engine::SchedulerQueue::exec_loop()  /home/scratch.xiaoweiw_sw/bytedance/babitmf/bmf/engine/c_engine/src/scheduler_queue.cpp:111
 7 0x00000000003a8b1e std::__invoke_impl<int, int (bmf_engine::SchedulerQueue::*)(), bmf_engine::SchedulerQueue*>()  /usr/include/c++/11/bits/invoke.h:74
 8 0x00000000003a8a72 std::__invoke<int (bmf_engine::SchedulerQueue::*)(), bmf_engine::SchedulerQueue*>()  /usr/include/c++/11/bits/invoke.h:96
 9 0x00000000003a89d3 std::thread::_Invoker<std::tuple<int (bmf_engine::SchedulerQueue::*)(), bmf_engine::SchedulerQueue*> >::_M_invoke<0ul, 1ul>()  /usr/include/c++/11/bits/std_thread.h:259
10 0x00000000003a898a std::thread::_Invoker<std::tuple<int (bmf_engine::SchedulerQueue::*)(), bmf_engine::SchedulerQueue*> >::operator()()  /usr/include/c++/11/bits/std_thread.h:266
11 0x00000000003a896a std::thread::_State_impl<std::thread::_Invoker<std::tuple<int (bmf_engine::SchedulerQueue::*)(), bmf_engine::SchedulerQueue*> > >::_M_run()  /usr/include/c++/11/bits/std_thread.h:211
12 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
13 0x0000000000094b43 pthread_condattr_setpshared()  ???:0
14 0x0000000000126a00 __xmknodat()  ???:0
=================================
Segmentation fault (core dumped)

I haven't used ffmpeg filter module in the graph, tt seems that bmf will insert ffmpeg filter modules in the graph. Test code as follows.

test_controlnet.py:

import sys

sys.path.append("../../")
import bmf

sys.path.pop()

def test():
    input_video_path = "./ControlNet/test_imgs/bird.png"
    input_prompt_path = "./prompt.txt"
    output_path = "./output.jpg"

    graph = bmf.graph()

    video = graph.decode({'input_path': input_video_path})
    prompt = graph.module('text_module', {'path': input_prompt_path})
    concat = bmf.concat(video['video'], prompt)
    concat.module('controlnet_module', {}).run()

if __name__ == '__main__':
    test()

text_module.py:

import sys
import random
from typing import List, Optional
import pdb

from bmf import *
import bmf.hml.hmp as mp

class text_module(Module):
    def __init__(self, node, option=None):
        self.node_ = node
        self.eof_received_ = False
        self.prompt_path = './prompt.txt'
        if 'path' in option.keys():
            self.prompt_path = option['path']

    def process(self, task):
        pdb.set_trace()
        output_queue = task.get_outputs()[0]

        if self.eof_received_:
            output_queue.put(Packet.generate_eof_packet())
            Log.log_node(LogLevel.DEBUG, self.node_, 'output text stream', 'done')
            task.set_timestamp(Timestamp.DONE)
            return ProcessResult.OK

        prompt_dict = dict()
        with open(self.prompt_path) as f:
            for line in f:
                pk, pt = line.partition(":")[::2]
                prompt_dict[pk] = pt

        out_pkt = Packet(prompt_dict)
        out_pkt.timestamp = 0
        output_queue.put(out_pkt)
        self.eof_received_ = True

        return ProcessResult.OK

def register_inpaint_module_info(info):
    info.module_description = "Text file IO module"

人脸检测demo代码trt_face_detect.py内关于output_queue_size的逻辑的使用方法

trt_face_detect.py的代码片段：

def process(self, task):
    input_queue = task.get_inputs()[0]
    output_queue_0 = task.get_outputs()[0]
    output_queue_size = len(task.get_outputs())
    if output_queue_size >= 2:
        output_queue_1 = task.get_outputs()[1]

    while not input_queue.empty():
        pkt = input_queue.get()
        if pkt.timestamp == Timestamp.EOF:
            self.eof_received_ = True
        if pkt.is_(VideoFrame):
            self.frame_cache_.put(pkt.get(VideoFrame))

    while self.frame_cache_.qsize(
    ) >= self.in_frame_num_ or self.eof_received_:
        out_frames, detect_result_list = self.inference()
        for idx, frame in enumerate(out_frames):
            pkt = Packet(frame)
            pkt.timestamp = frame.pts
            output_queue_0.put(pkt)

            if (output_queue_size >= 2):
                pkt = Packet(detect_result_list[idx])
                pkt.timestamp = frame.pts
                output_queue_1.put(pkt)

        if self.frame_cache_.empty():
            break

    if self.eof_received_:
        for key in task.get_outputs():
            task.get_outputs()[key].put(Packet.generate_eof_packet())
            Log.log_node(LogLevel.DEBUG, self.node_, "output stream",
                         "done")
        task.timestamp = Timestamp.DONE

    return ProcessResult.OK

代码中有个判断if (output_queue_size >= 2):
进入逻辑后，会对检测结果：detect_result_list进行额外输出。
但是我执行demo时并不能触发这个逻辑分支。请问做什么操作才能使output_queue_size>=2。额外输出一个关于检测结果的输出。
感谢

require sm at /home/dan/zs/cuda118/bmf/bmf/hml/src/core/stream.cpp:130, Stream on device type 1 is not supported

Python Stack ignored

Stack trace (most recent call last):
#5 Object "/usr/bin/python3.8", at 0x5d6065, in _PyObject_MakeTpCall
#4 Object "/usr/bin/python3.8", at 0x5d5498, in PyCFunction_Call
#3 Object "/home/dan/zs/cuda118/bmf/output/bmf/lib/_hmp.cpython-38-x86_64-linux-gnu.so", at 0x7fe59311d0f4, in PyInit__hmp
#2 Object "/home/dan/zs/cuda118/bmf/output/bmf/lib/hmp.cpython-38-x86_64-linux-gnu.so", at 0x7fe593113266, in
#1 Object "/home/dan/zs/cuda118/bmf/output/bmf/lib/libhmp.so.1", at 0x7fe592ef2948, in hmp::current_stream(hmp::Device::Type)
#0 Object "/home/dan/zs/cuda118/bmf/output/bmf/lib/libhmp.so.1", at 0x7fe592eec1b9, in hmp::logging::dump_stack_trace(int)
Traceback (most recent call last):
File "detect_trt_sample.py", line 41, in
main()
File "detect_trt_sample.py", line 13, in main
trt_face_detect = bmf.create_module(
File "/home/dan/zs/cuda118/bmf/output/bmf/builder/bmf.py", line 28, in create_module
return engine.Module(module_info, json.dumps(option), "", "", "")
File "/home/dan/zs/cuda118/bmf/output/demo/face_detect/trt_face_detect.py", line 90, in init
self.stream = mp.current_stream(mp.kCUDA)
RuntimeError: require sm at /home/dan/zs/cuda118/bmf/bmf/hml/src/core/stream.cpp:130, Stream on device type 1 is not supported

使用torch.from_dlpack(vf.reformat(rgb).frame().plane(0))方法，会增加很多显存

人脸检测模型中使用torch.from_dlpack(vf.reformat(rgb).frame().plane(0))方法后显存明显增加了很多，输入是4K图片，使用与不使用这个方法显存差了500M左右

you said supported windows?

windows can't works,does it has branch for windows?

请问能出一个教程么，跑不起来。

请问能出一个视频教程么，跑不起来，难过

ModuleNotFoundError: No module named 'bmf.lib._hmp'

按照README.md 的引导建立conda虚拟环境，下载完相关依赖运行demo后找不到hmp库

(deoldify_py39) root@bd912f7bf229:~/bmf/bmf/demo/colorization_python# python3.9 deoldify_demo.py 
Traceback (most recent call last):
  File "/root/bmf/bmf/demo/colorization_python/deoldify_demo.py", line 1, in <module>
    import bmf
  File "/root/bmf/output/bmf/__init__.py", line 3, in <module>
    from bmf.python_sdk.module_functor import make_sync_func
  File "/root/bmf/output/bmf/python_sdk/__init__.py", line 1, in <module>
    from .module_functor import make_sync_func, ProcessDone
  File "/root/bmf/output/bmf/python_sdk/module_functor.py", line 1, in <module>
    import bmf.lib._hmp
ModuleNotFoundError: No module named 'bmf.lib._hmp'

RuntimeError: [json.exception.type_error.302] type must be string, but is array

pkts = (
    bmf.graph().decode({
        'input_path': stream,
        "loglevel": "quiet",
    })['video']
    .start()  # this will return a packet generator
)

for i, pkt in enumerate(pkts):
    # convert frame to a nd array
    if pkt.is_(bmf.VideoFrame):
        vf = pkt.get(bmf.VideoFrame)
        rgb = mp.PixelInfo(mp.kPF_RGB24)
        np_vf = vf.reformat(rgb).frame().plane(0).numpy()
        # we can add some more processing here, e.g. predicting
        print("frame", i, "shape", np_vf.shape)
    else:
        break

When I used the above code to read the stream, an error occurred. When I switched the video stream to a local video, the error disappeared. I don't know where the problem is, but my video stream is correct. I can use ffmpeg to read the stream and save it as mp4.
The following is the error message:

PyCUDA ERROR: The context stack was not empty upon module cleanup

graph.decode 输入的input_path为直播流时，当直播流突然断开，bmf 会coredump:

[2024-03-23 04:12:10.668] [info] node:c_ffmpeg_encoder 2 scheduler 1
[2024-03-23 04:14:31.322] [info] node id:0 decode flushing
[2024-03-23 04:14:31.322] [info] node id:0 Process node end
[2024-03-23 04:14:31.364] [info] node id:0 close node
[2024-03-23 04:14:31.364] [info] node 0 close report, closed count: 1
[2024-03-23 04:14:31.364] [info] node id:1 eof received
[2024-03-23 04:14:31.364] [info] node id:1 eof processed, remove node from scheduler
[2024-03-23 04:14:31.365] [info] node id:1 process eof, add node to scheduler
[2024-03-23 04:14:31.373] [info] node id:1 Process node end
[2024-03-23 04:14:31.373] [info] node id:1 close node
[2024-03-23 04:14:31.373] [info] node 1 close report, closed count: 2
[2024-03-23 04:14:31.373] [info] node id:2 eof received
[2024-03-23 04:14:31.373] [info] node id:2 eof processed, remove node from scheduler
[2024-03-23 04:14:31.374] [info] node id:2 process eof, add node to scheduler
[2024-03-23 04:14:31.374] [info] node id:2 Process node end
[2024-03-23 04:14:31.374] [info] node id:2 close node
[2024-03-23 04:14:31.374] [info] node 2 close report, closed count:3
[2024-03-23 04:14:31.374] [info] schedule queue 0 start to join thread
[2024-03-23 04:14:31.374] [info] schedule queue 0 thread quit
[2024-03-23 04:14:31.375] [info] schedule queue 0 closed
[2024-03-23 04:14:31.375] [info] schedule queue 1 start to join thread
[2024-03-23 04:14:31.375] [info] schedule queue 1 thread quit
[2024-03-23 04:14:31.375] [info] schedule queue 1 closed
[2024-03-23 04:14:31.375] [info] all scheduling threads were joint

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

项目C++单测中的使用的一些素材文件去哪里下载呢？

比如：

bmf/engine/c_engine/test/test_graph.cpp

中用到的

../../files/graph_dyn.json
../../files/dynamic_add.json
../../files/dynamic_remove.json

这些文件在项目下没有搜到呢

core dumped in frame extract

Hi, when I run the code below in a 4 CPU machine，Aborted (core dumped) happen. Error rate is 6/10. （Run 10 times and error occur 6 times). But in a 16 CPU machine, it doesn't happen. I observed that when executing on the 4cpu machine, the cpu usage is almost 100%. Maybe it is the reason. Apart from adding more CPU, is there any way to avoid this problem?

import bmf
import time
from multiprocessing.pool import ThreadPool
import glob
import numpy as np

def generator_mode(input_list):
    input_path,threads = input_list
    start = time.time()
    graph = bmf.graph()
    video =  graph.decode({
                    'input_path': input_path,
                    "log_level":"quiet",
                    "dec_params": {"threads": threads},
                })['video'].start() # this will return a packet generator
    for pkt in video:
        # convert frame to a nd array
        if pkt.is_(bmf.VideoFrame):
            vf = pkt.get(bmf.VideoFrame)
            v_frame = vf.frame().plane(2).numpy()
        else:
            break
    use = time.time() - start
    return use



if __name__ == '__main__':
    #串行
    # print(time.time())
    test_threads = [0,2,4,6,8]
    video_paths = glob.glob("/root/ori/*.mp4")

    for threads in test_threads:
        for infilename in video_paths:
            extract_u_frame_time = []
            run_path = []
            for i in range(20):
                run_path.append([infilename, str(threads)])
            with ThreadPool(2) as p:
                extract_u_frame_time.extend(p.map(generator_mode, run_path))

the environment version is below:

python=3.7.12
ffmpeg version 4.1.11-0+deb10u1
numpy==1.21.6
BabitMF==0.0.8

stdout of error:

terminate called without an active exception
Aborted (core dumped) happen

Running command

nohup python3 generator_mode.py

运行demo：detect_trt_sample 报错：node id:0 Could not allocate frame

如题：按照人脸识别的demo,用的官方模型和视频。在bmf_runtime:latest镜像内运行该demo。
报错：[error] node id:0 Could not allocate frame，未生成trt_out.mp4

运行人脸检测demo时会一直打印：[info] *** dropping frame 7 at ts 3584。请问：dropping frame是什么意思？是检测时抽帧检测的吗？谢谢

我的代码如下：
import torch
import torch.nn.functional as F
import numpy as np
import sys
import time
import tensorrt as trt
import bmf.hml.hmp as mp
from nms import NMS
import PIL
from PIL import Image

def main():
graph1 = graph({'dump_graph':1})

video = graph1.decode({
    "input_path": "./face.mp4",
    #"video_params": {
    #    "hwaccel": "cuda",
    #}
})["video"]

video = video.module("trt_face_detect", {
    "model_path": "./version-RFB-640.engine",
    "label_to_frame": 1,
    "input_shapes": {
        "input": [1, 3, 480, 640]
    }}, entry="trt_face_detect.trt_face_detect")

video = video.encode(
    None, {
        "output_path": "./trt_out.mp4",
        "video_params": {
            "codec": "h264_nvenc",
            "bit_rate": 5000000,
        }
    })

video.run()

if name == "main":
main()

[rtsp @ 0x7f7d7a7f8980] max delay reached. need to consume packet [rtsp @ 0x7f7d7a7f8980] RTP: missed 6 packets

[rtsp @ 0x7f7d7a7f8980] max delay reached. need to consume packet
[rtsp @ 0x7f7d7a7f8980] RTP: missed 2 packets
[rtsp @ 0x7f7d7a7f8980] max delay reached. need to consume packet
[rtsp @ 0x7f7d7a7f8980] RTP: missed 2 packets
[rtsp @ 0x7f7d7a7f8980] max delay reached. need to consume packet
[rtsp @ 0x7f7d7a7f8980] RTP: missed 6 packets
[rtsp @ 0x7f7d7a7f8980] max delay reached. need to consume packet
[rtsp @ 0x7f7d7a7f8980] RTP: missed 2 packets

输入的input_path 是一个由摄像头输出的rtsp流，输出output_path 是一个rtmp流，运行过程中会有很多如上丢包告警，导致拉取的rtmp流画面有很多马赛卡以及卡顿

Run demo with a error in mac os 13.4.1 (22F82)

hi， I run the demo in my mac, but got the error, how to fix the error?

demo % python broadcaster/broadcaster.py
Traceback (most recent call last):
File "/Users/weiliang/Develop/bmf/bmf/demo/broadcaster/broadcaster.py", line 7, in
import bmf
File "/Users/weiliang/.pyenv/versions/3.9.18/lib/python3.9/site-packages/bmf/init.py", line 3, in
from bmf.python_sdk.module_functor import make_sync_func
File "/Users/weiliang/.pyenv/versions/3.9.18/lib/python3.9/site-packages/bmf/python_sdk/init.py", line 1, in
from .module_functor import make_sync_func, ProcessDone
File "/Users/weiliang/.pyenv/versions/3.9.18/lib/python3.9/site-packages/bmf/python_sdk/module_functor.py", line 1, in
import bmf.lib._hmp
ImportError: dlopen(/Users/weiliang/.pyenv/versions/3.9.18/lib/python3.9/site-packages/bmf/lib/_hmp.cpython-39-darwin.so, 0x0002): Library not loaded: @executable_path/../../../../Python
Referenced from: /Users/weiliang/.pyenv/versions/3.9.18/lib/python3.9/site-packages/bmf/lib/_hmp.cpython-39-darwin.so
Reason: tried: '/Users/weiliang/Python' (no such file), '/usr/local/lib/Python' (no such file), '/usr/lib/Python' (no such file, not in dyld cache)

Is develop branch available for use?

BMF框架如何去支持音频流PCM数据的输入

具体描述：这个音频流的数据并不是从流媒体上获取的，而是通过网络传输去不断接收到的流式数据包，想去实时的编码处理（不能保存为本地文件后再去读取本地文件），请问要去如何实现呢？

可以支持多个模型串行吗

请问这个框架可以用来做avif图片编码吗？美学模型可以对图片打分吗？

如题

what is the plan support mobile?

docker运行blur_gpu module报错

1.docker pull babitmf/bmf_runtime:latest；

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

3.nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

4.cvcuda.gaussian_into报错
Line 563: '' failed: no kernel image is available for execution on the device

请问docker环境还需要怎么配置吗？

内部注册的 SIGTERM 和 SIGINT 信号处理器导致应用无法正常退出

嗨，bmf 很好用
但我在 bmf/engine/c_engine/src/graph.cpp 中的 Graph::Graph 构造函数中发现：

Graph::Graph(
    GraphConfig graph_config,
    std::map<int, std::shared_ptr<Module>> pre_modules,
    std::map<int, std::shared_ptr<ModuleCallbackLayer>> callback_bindings) {
    std::signal(SIGTERM, terminate);
    std::signal(SIGINT, interrupted);
    ...
}

这里注册的这两个信号处理器总是导致我的应用无法正常退出，因为它接管了这两个信号的处理，就像下面这样：

^Cinterrupted, ending bmf gracefully...
^Cinterrupted, ending bmf gracefully...
^Cinterrupted, ending bmf gracefully...

但是作为 SDK / 依赖包，它不应该依赖信号处理来清理资源，而是应该使用 RAII 或其他方式。

或者，我是否有其他解决办法让我的应用在接收到 SIGTERM 和 SIGINT 时能正常退出？

ffmpegenc 参数能否透传？

int main(int argc, char** argv) {
	std::cout << "hello world!" << std::endl;
	std::string output_file = "rtsp://172.31.60.105/live/test";
	// BMF_CPP_FILE_REMOVE(output_file);

	nlohmann::json graph_para = {{"dump_graph", 0}};
	auto graph = bmf::builder::Graph(bmf::builder::NormalMode, bmf_sdk::JsonParam(graph_para));

	nlohmann::json decode_para = {{"input_path", "/d1/video/renshu.mp4"}};
	auto video = graph.Decode(bmf_sdk::JsonParam(decode_para));

	nlohmann::json logoPara = {{"input_path", "/d1/video/Snipaste_2024-04-27_20-10-56.png"}};
	auto logo = graph.Decode(bmf_sdk::JsonParam(logoPara));

	auto output_stream =
		video["video"].Scale("1280:720").Trim("start=0:duration=7").Setpts("PTS-STARTPTS");

	auto overlay = logo["video"].Scale("300:200").Loop("loop=0:size=10000").Setpts("PTS+0/TB");

	nlohmann::json encode_para = {{"output_path", output_file},
								{"format", "rtsp"},
								{"video_params",
								 {
									 {"rtsp_transport", "tcp"},
									 {"width", 640},
									 {"height", 480},
									 {"codec", "h264"},
								 }}};

	output_stream[0]
		.Overlay({overlay}, "x=if(between(t,0,7),0,NAN):y=if(between(t,0,7),0,NAN):repeatlast=1")
		.EncodeAsVideo(bmf_sdk::JsonParam(encode_para));

	graph.Run();

	return 0;
}

我想尝试指定tcp的方式推流到一个rtsp地址，但是这个参数好像并未生效，还是使用的默认udp

RuntimeError: BMF(0.0.7) /root/bmf/bmf/c_modules/src/ffmpeg_decoder.cpp:736: error: (-224:BMF Transcode Error) avformat_open_input failed: Protocol not found in function 'init_input'

When I use BMF to process RTSP video, the following problems will occur：

RuntimeError: BMF(0.0.7) /root/bmf/bmf/c_modules/src/ffmpeg_decoder.cpp:736: error: (-224:BMF Transcode Error) avformat_open_input failed: Protocol not found in function 'init_input'

Based on the example test_generator.py, replace the “'input_path': "../../files/big_bunny_10s_30fps.mp4"" in the code with the following content :

frames = ( bmf.graph() .decode({'input_path': "https://*****:1101/rtp/0615746E.live.flv"})['video'] .fps(1) # .ff_filter('scale', 299, 299) # or you can use '.scale(299, 299)' .start() # this will return a packet generator )

docker images : babitmf/bmf_runtime:latest

运行demo报：[swscaler @ 0x7f5cbb7f1200] No accelerated colorspace conversion found from yuv420p to rgb24.

运行官方镜像，
docker pull babitmf/bmf_runtime:latest
docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -it babitmf/bmf_runtime:latest bash
export CMAKE_ARGS="-DBMF_ENABLE_CUDA=ON"
./build.sh
编译后运行demo:
python3 ~/bmf/output/demo/video_enhance/enhance_demo.py
代码未报错，能够生成视频，但运行过程中一直报：No accelerated colorspace conversion found from yuv420p to rgb24.
问题：
生成的output.mp4视频，播放时完全看不到视频内容，显示花屏。

修改了ffmpeg_decoder.cpp中的代码，怎么编译让其生效

在ffmpeg_decoder中增加了部分参数设置，使用build_ffmpeg.sh未能生效

In bmf.encode, how do you set the parameters of color space

set output color space:
ffmpeg -i test.mp4 -vf "scale=960:540:out_color_matrix=bt709:flags=lanczos" out.mp4

[swscaler @ 0x7fc7d6f80fc0] No accelerated colorspace conversion found from yuv420p to rgb24.

when i want to run the enhance_demo, i meet the bug, i know it's from ffmpeg, but my computer is in CUDA environment, and the gpu was using by python when i run the demo, i test two computer, still same thing.

i use the docker you provided docker pull babitmf/bmf_runtime:latest

1080Ti CUDA 12.2

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off | 00000000:03:00.0 Off |                  N/A |
| 31%   53C    P2             221W / 250W |    496MiB / 11264MiB |     26%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1872      G   /usr/libexec/Xorg                            56MiB |
|    0   N/A  N/A      2189      G   /usr/bin/gnome-shell                          7MiB |
|    0   N/A  N/A      3434      C   python3.8                                   428MiB |
+---------------------------------------------------------------------------------------+

V100 CUDA 11.4

[root@node02 ~]# nvidia-smi 
Fri May 24 10:59:34 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   44C    P0    37W / 250W |   4846MiB / 16160MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   34C    P0    26W / 250W |      4MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     51555      C   python3                          3363MiB |
|    0   N/A  N/A     76909      C   python3.8                        1479MiB |
+-----------------------------------------------------------------------------+

maybe the problem is caused by CUDA version, i see the project use CUDA 11.8, but in my machine, the version is not compatible for now

and the output videos meet wrong pixel data because the color space convert not work

i think the program deal with the video data as RGB24 but YUV420P, so the UV color data is wrong, and the Y data is also wrong in every single pixel, because it's 1 byte per single pixel as Y but 4 bytes as RGB24

Java可以使用吗？

CFFFilter Demo

The parameters for CFFFilter seem quite complex; could you provide a Python example using CFFFilter?

cpp copy module can't work with gpu transcoding

Module: test/c_module

def test():
    input_video_path = xxx
    output_path = xxxx
    video = bmf.graph().decode({
            "input_path": input_video_path,
            "video_params": {
                "hwaccel": "cuda",
            }
        })["video"]

    video2 = video.c_module('cpp_copy_module',
                            "../../test/c_module/libcopy_module.so", # use your path
                            "copy_module:CopyModule")
        
    (bmf.encode(
        video2,
        video["audio"],
        {
            "output_path": output_path,
            "video_params": {
                "codec": "h264_nvenc",
                "pix_fmt": "cuda",
            }
        }).run())

The output video isn't encoded normally. There're green and red area in the pictures.

But with CPU decoding and GPU encoding, the results are good.

def test():
    input_video_path = xxx
    output_path = xxxx
    video = bmf.graph().decode({
            "input_path": input_video_path,
        })["video"]

    video2 = video.c_module('cpp_copy_module',
                            "../../test/c_module/libcopy_module.so", # use your path
                            "copy_module:CopyModule")
        
    (bmf.encode(
        video2,
        video["audio"],
        {
            "output_path": output_path,
            "video_params": {
                "codec": "h264_nvenc",
            }
        }).run())

RuntimeError: require false at /root/bmf/bmf/hml/src/imgproc/imgproc.cpp:154, Unsupport PixelInfo

There was a problem when I used the bmf/test/generator/test_generator.py for read stream testing.

# bmf/test/generator/test_generator.py
for i, pkt in enumerate(pkts):
    # convert frame to a nd array
    if pkt.is_(bmf.VideoFrame):
        vf = pkt.get(bmf.VideoFrame)
        rgb = mp.PixelInfo(mp.kPF_RGB24)
        np_vf = vf.reformat(rgb).frame().plane(0).numpy()  # <------ RuntimeError: require false at /root/bmf/bmf/hml/src/imgproc/imgproc.cpp:154, Unsupport PixelInfo
        # we can add some more processing here, e.g. predicting
        print("frame", i, "shape", np_vf.shape)
    else:
        break

I also tried the method in the document, but it seems to be incorrect.

# https://babitmf.github.io/docs/bmf/multiple_features/graph_mode/generatemode/
for i, frame in enumerate(frames):
     # convert frame to a nd array
     if frame is not None:
         np_frame = frame.to_ndarray(format='rgb24')    # <------ AttributeError: 'bmf.lib._bmf.sdk.Packet' object has no attribute 'to_ndarray'

         # we can add some more processing here, e.g. predicting
         print('frame', i, 'shape', np_frame.shape)
     else:
         break

What is Unsupport PixelInfo?
Or do I have any other methods to process the stream like OpenCV into video frames that can be read iteratively?

CPU Video Frame Extraction

Hi，I want to use BMF to extract video (yuv420p) frame with cpu, using demo shown in https://babitmf.github.io/docs/bmf/multiple_features/graph_mode/generatemode/ with generator mode. But It seems that it use more time than ffmpeg-python. Is there any way to speed it up? and which demo can i refer? Thanks

缺少sample

test push data with raw frame got an error

I use h264 as encode codec, and got the error bellow, and I change this to be mpeg4 and it works well.
My os is ubuntu 20.04 and python is 3.9 and pip install BabitMF

[2023-09-25 03:08:33.274] [error] node id:1 Codec 'libx264' not found
[2023-09-25 03:08:33.274] [error] node id:1 init codec error
[2023-09-25 03:08:33.274] [error] node id:1 catch exception: BMF(0.0.8) /project/bmf/engine/c_engine/src/node.cpp:352: error: (-5:Bad argument) [Node_1_c_ffmpeg_encoder] Process result != 0.
in function 'process_node'

[2023-09-25 03:08:33.274] [error] node id:1 Process node failed, will exit.

import io

import numpy as np
import bmf
from bmf import GraphMode, Module, Log, LogLevel, InputType, ProcessResult, Packet, Timestamp, scale_av_pts, av_time_base, BmfCallBackType, VideoFrame, AudioFrame, BMFAVPacket
from PIL import Image

def init_push_graph(output):
    graph = bmf.graph({"dump_graph": 1, "loglevel": "debug"})
    video_stream = graph.input_stream("video_stream")
    # audio_stream = graph.input_stream("wav_stream")
    decode_stream = video_stream.decode({
        "loglevel": "trace",
        's': '720:1280',
        'pix_fmt': 'rgb24',
        "push_raw_stream": 1,
        "video_codec": "bmp",
        "video_time_base": "1,30000"
        })

    bmf.encode(
            decode_stream,
            None,
            {
                "video_params": {
                    "codec": "h264",
                    "width": 720,
                    "height": 1280,
                    "max_fr": 30,
                    "crf": "23",
                    "preset": "veryfast"
                },
                # "audio_params": {"sample_rate": 44100, "codec": "aac"},
                "loglevel": "trace",
                "output_path": output
            },
        )
    graph.run_wo_block(mode=GraphMode.PUSHDATA)
    return graph

graph = init_push_graph('./test1.mp4')

pts = 0
timestamp = 0
for _ in range(100):
    frame = np.zeros((1280, 720, 3), dtype=np.uint8)
    image = Image.fromarray(frame, mode="RGB")
    byte_stream = io.BytesIO()
    image.save(byte_stream, format='BMP')
    image_bytes = byte_stream.getvalue()
    pkt = BMFAVPacket(len(image_bytes))
    memview = pkt.data.numpy()
    memview[:] = np.frombuffer(image_bytes, dtype=np.uint8)
    pkt.pts = pts
    packet = Packet(pkt)
    packet.timestamp = timestamp
    pts += 1001
    timestamp += 1
    graph.fill_packet("video_stream", packet)

graph.fill_packet("video_stream", Packet.generate_eof_packet())
graph.close()

images of docs/Page_*.md are mostly missing

内置资源和可复用的Module

1.阅读一些测试代码，发下有些资源找不到，请问哪里可以获取到这些资源？
比如test_graph.cpp中dynamic_add函数"../files/dynamic_add.json"

TEST(graph, dynamic_add) {
BMFLOG_SET_LEVEL(BMF_INFO);

time_t time1 = clock();
std::string config_file = "../files/graph_dyn.json";
std::string dyn_config_file = "../files/dynamic_add.json";
GraphConfig graph_config(config_file);
GraphConfig dyn_config(dyn_config_file);
std::map<int, std::shared_ptr<Module>> pre_modules;
std::map<int, std::shared_ptr<ModuleCallbackLayer>> callback_bindings;
std::shared_ptr<Graph> graph =
    std::make_shared<Graph>(graph_config, pre_modules, callback_bindings);
std::cout << "init graph success" << std::endl;

graph->start();
usleep(400000);

std::cout << "graph dynamic add nodes" << std::endl;
graph->update(dyn_config);

graph->close();
time_t time2 = clock();
std::cout << "time:" << time2 - time1 << std::endl;

}

2.目前内置的Module数量较少，请问是否有可复用的一些Module？如果有，哪里可以获取？

ffmpeg_encoder 仅支持auto 控制threads数量，即使配置了threads，也没有使用配置值

    /* ffmpeg_encoder.cpp*/
    /** @addtogroup EncM
     * @{
     * @arg threads: specify the number of threads for encoder, "auto" by default
     * @} */
    if (!video_params_.has_key("threads")) {
        av_dict_set(&enc_opts, "threads", "auto", 0);
    }

    /* not use  threads configure  value */