Code Monkey home page Code Monkey logo

chenyme-aavt's People

Contributors

chenyme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chenyme-aavt's Issues

希望可以增加docker

非常好的自动化项目,希望可以增加docker的安装方式,这样可以在服务器运行

linux运行后点gpu加速显示Could not load library libcudnn_ops_infer.so.8.

根据install.bat安装了
pip install streamlit -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install -U openai-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install openai -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install langchain -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install faster-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

启动后gpu加速无法运行,/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8有这个
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!

root@ae950ec2447b:/workspace# find / -type f -name libcudnn_ops_infer.so.8
/opt/conda/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
/opt/conda/pkgs/pytorch-2.1.2-py3.10_cuda11.8_cudnn8.7.0_0/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
find: '/proc/17/task/17/net': Invalid argument
find: '/proc/17/net': Invalid argument
find: '/proc/18/task/18/net': Invalid argument
find: '/proc/18/net': Invalid argument
find: '/proc/19/task/19/net': Invalid argument
find: '/proc/19/net': Invalid argument
find: '/sys/kernel/slab': Input/output error
/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8

heygen video translation

假如再大胆一点

  • whisper解决语音到字幕的问题
  • LLMs(chatgpt,google translate)解决多国语言翻译问题
  • MockingBird或者so-vits-svc-fork训练原配角色音色(声纹)
  • 根据分析出的文本时间轴,利用ffmpeg分割不同音色的视频到片段,同时用训练好的原配角色音色按照翻译后的文本生成音轨
  • (可选)再用GeneFace++或者Wav2Lip对应的口型矫正
  • 最后合并回去(ffmpeg)

这个是不是就是heygen video translation的大致实现思路,当然我是一个rookie,真的过程想必远比这个复杂,这里最大的难点是,如何识别出不同的声音的前后时间轴,中间还有相关的去背景音,识别误差校准等很多问题

KeyError: 'st.session_state has no key "w_model_option"

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 119, in getattr
return self[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 90, in getitem
return get_session_state()[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\safe_session_state.py", line 91, in getitem
return self._state[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state.py", line 400, in getitem
raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
exec(code, module.dict)
File "E:\software\Chenyme_AAVT_0.5.1\Chenyme_AAVT_0.5.1\pages\📽️视频(Video).py", line 59, in
result = get_whisper_result(uploaded_file, output_file, device, st.session_state.w_model_option,
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 121, in getattr
raise AttributeError(_missing_attr_error_message(key))
AttributeError: st.session_state has no attribute "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization

关于文字的上限

若使用kimi进行翻译,如果字幕超过一定数量之后,就无法进行翻译了,例如,一个视频超过10分钟,之后,就没办法进行翻译了,所提供的字幕就是还是英文的

有命令行直接能运行的模式吗

挂着程序在后台跑了两个多小时顺便看了部电影, 然后edge直接把挂在后台的标签页杀了, 命令行窗口还能看到网页上显示的东西全都跟被重置了一样. 程序跑完的结果根本看不到.

爆出openai_key不存在

我已经在浏览器中输入了key,或者在config的配置文件找中配置,但是在主页中问AI小助手还是会报错,生成视频也一样,爆出key不存在

1

0.6.1测试发现的Bug

转audio的时候报错,转video没有报错
2024-03-12_22-30-46

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'
2024-03-12 07:30:23.183 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'

推荐使用虚拟环境

执行安装文件的时候创建一个专用的虚拟环境,在里面安装依赖包,而不是在全局Python环境中安装

0.6.2版本生成视频问题

我在使用最新的版本包0.6.2版本进行视频字幕生成时遇到问题,配置如下

image

视频生成的过程中,我观察到前几分钟我的gpu cuda利用率有明显上升,但是后半段则只有CPU占用。

视频生成完毕后,通过VLC等外部播放器打开视频文件时,会发现字幕分成了“视频画面上的字幕”和“视频外挂的字幕”两部分,重叠显示在播放器窗口上。

  1. 视频生成过程中如何更好地利用GPU,避免低效的CPU时段。
  2. 视频生成时可否不将字幕压制到画面内,仅保留外挂字幕即可

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.