numz / sd-wav2lip-uhq Goto Github PK

Wav2Lip UHQ extension for Automatic1111

License: Apache License 2.0

Python 100.00%

lip-sync lipsync stable-diffusion-web-ui stable-diffusion-webui stable-diffusion-webui-plugin wav2lip audio-driven-talking-face deep-fake deep-fakes image-animation

sd-wav2lip-uhq's Introduction

🔉👄 Wav2Lip STUDIO extension for Stable Diffusion WebUI Automatic1111

English | 简体中文

demo.mp4

STANDALONE VERSION CAN BE FOUND HERE : WAV2LIP STUDIO STANDALONE

In the standalone version you can :

♻ Manage project: Add a feature to manage multiple project
👪 Introduced multiple face swap: Can now Swap multiple face in one shot
⛔ Visible face restriction: Can now make whole process even if no face detected on frame!
📺 Video Size: works with high resolution video input, (test with 1980x1080, should works with 4K but slow)
🔑 Keyframe manager: Add a keyframe manager for better control of the video generation
🍪 coqui TTS integration: Remove bark integration, use coqui TTS instead
💬 Conversation: Add a conversation feature with multiple person
🔈 Record your own voice: Add a feature to record your own voice
👬 Clone voice: Add a feature to clone voice from video
🎏 translate video: Add a feature to translate video with voice clone (HEYGEN like)
🔉 Volume amplifier for wav2lip: Add a feature to amplify the volume of the wav2lip output
🕡 Add delay before sound speech start
🚀 Speed up process: Speed up the process

💡 Description

This repository contains a Wav2Lip Studio extension for Automatic1111.

It's an all-in-one solution: just choose a video and a speech file (wav or mp3), and the extension will generate a lip-sync video. It improves the quality of the lip-sync videos generated by the Wav2Lip tool by applying specific post-processing techniques with Stable diffusion tools.

🚀 Updates

2023.09.13

👪 Introduced face swap: facefusion integration (See Usage section) this feature is under experimental.

2023.08.22

👄 Introduced bark (See Usage section), this feature is under experimental.

2023.08.20

🚢 Introduced the GFPGAN model as an option.
▶ Added the feature to resume generation.
📏 Optimized to release memory post-generation.

2023.08.17

🐛 Fixed purple lips bug

2023.08.16

⚡ Added Wav2lip and enhanced video output, with the option to download the one that's best for you, likely the "generated video".
🚢 Updated User Interface: Introduced control over CodeFormer Fidelity.
👄 Removed image as input, SadTalker is better suited for this.
🐛 Fixed a bug regarding the discrepancy between input and output video that incorrectly positioned the mask.
💪 Refined the quality process for greater efficiency.
🚫 Interruption will now generate videos if the process creates frames

2023.08.13

⚡ Speed-up computation
🚢 Change User Interface : Add controls on hidden parameters
👄 Only Track mouth if needed
📰 Control debug
🐛 Fix resize factor bug

🔗 Requirements

latest version of Stable Diffusion WebUI Automatic1111 by following the instructions on the Stable Diffusion Webui repository.
FFmpeg : download it from the official FFmpeg site. Follow the instructions appropriate for your operating system, note ffmpeg have to be accessible from the command line.

💻 Installation

Launch Automatic1111
Face Swap : On Windows, download and install Visual Studio. During the install, make sure to include the Python and C++ packages.
In the extensions tab, enter the following URL in the "Install from URL" field and click "Install":

Go to the "Installed Tab" in the extensions tab and click "Apply and quit".

If you don't see the "Wav2Lip UHQ tab" restart Automatic1111.
🔥 Important: Get the weights. Download the model weights from the following locations and place them in the corresponding directories (take care about the filename, especially for s3fd)

Model	Description	Link to the model	install folder
Wav2Lip	Highly accurate lip-sync	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
Wav2Lip + GAN	Slightly inferior lip-sync, but better visual quality	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
s3fd	Face Detection pre trained model	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
landmark predicator	Dlib 68 point face landmark prediction (click on the download icon)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator	Dlib 68 point face landmark prediction (alternate link)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator	Dlib 68 point face landmark prediction (alternate link click on the download icon)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
face swap model	model used by face swap	Link	extensions\sd-wav2lip-uhq\scripts\faceswap\model\inswapper_128.onnx

🐍 Usage

Choose a video (avi or mp4 format) with a face in it. If there is no face in only one frame of the video, process will fail. Note avi file will not appear in Video input but process will works.
Face Swap (take times so be patient):
1. Face Swap: chose the image of the face you want to swap with the face in the video.
2. Face Index: if there are multiple faces in the image, you can choose the face you want to swap with the face in the video. 0 is the first face from left to right.
Audio, 2 options:
1. Put audio file in the "Speech" input.
2. Generate Audio with the text to speech bark integration.
  1. Choose the language : Turkish, English, Chinese, Hindi, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Polish, German, French
  2. Choose the Gender
  3. Choose your speaker, you can ear a sample in the "Audio Example"
  4. Choose Low VRAM True (default) if you have a Video Card with less than 16GB VRAM
  5. Write your text in the text area "Prompt"
    - Note that bark can only generate 14 seconds of audio, so if you want to generate a longer audio, you have to use "[split]" in your text.
    - For example, if you want to generate a 30 seconds audio, you have to write your text like this :
      - "This is the first part of my text [split] This is the second part of my text"
  6. Temperature: 0.0 is supposed to be closer to the voice, and 1.0 is more creative, but in reality, 0.0 yields strange results and 1.0 something very far from the voice. 0.7 is the default value set by 'bark', try different values to see what works best for you.
  7. Silence : Time in seconds between each punctuation(。！!.？?,). Default is 0.25 seconds.
  8. See Bark documentation for more details.
  9. Below is a list of some known non-speech sounds.
    - [laughter]
    - [laughs]
    - [sighs]
    - [music]
    - [gasps]
    - [clears throat]
    - "-" or ... for hesitations
    - ♪ for song lyrics
    - CAPITALIZATION for emphasis of a word
    - [MAN] and [WOMAN] to bias Bark toward male and female speakers, respectively
choose a checkpoint (see table above).
Padding: Wav2Lip uses this to move the mouth. This is useful if the mouth is not at the good place. Usually, default value is good, but certain video may need to be adjusted.
No Smooth: When checked, this option retains the original mouth shape without smoothing.
Resize Factor: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
Only Mouth: This option tracks only the mouth, removing other facial motions like those of the cheeks and chin.
Mouth Mask Dilate: This will dilate the mouth mask to cover more area around the mouth. depends on the mouth size.
Face Mask Erode: This will erode the face mask to remove some area around the face. depends on the face size.
Mask Blur: This will blur the mask to make it more smooth, try to keep it under or equal to Mouth Mask Dilate.
Code Former Fidelity:
1. A value of 0 offers higher quality but may significantly alter the person's facial appearance and cause noticeable flickering between frames.
2. A value of 1 provides lower quality but maintains the person's face more consistently and reduces frame flickering.
3. Using a value below 0.5 is not advised. Adjust this setting to achieve optimal results. Starting with a value of 0.75 is recommended.
Active debug: This will create step-by-step images in the debug folder.
Click on the "Generate" button.
⚠ "resume" button can be use if face swap and wav2lip step have been done, then you can adjust "mouth mask dilate", "face mask erode", "mask blur" and change "restoration model" without regenerate face swap and wav2lip.

👄 Note on the bark Fidelity

Bark is interesting but sometimes yields strange results (or even hilarious ones). Each generation will give you something different and It may take several generations before you achieve something conclusive. Apart from English, it seems that the other languages speak as if they were being used by a foreigner. Sometimes even if you choose "Male" it will speak like a woman, and vice versa. Sometimes, even when choosing a specific speaker, it will sound like another speaker or even another language.

📺 Examples

deforum_wav2lip.mp4

deforum_wav2lip_en.mp4

real_person_ch.mp4

FaceSwap.mp4

📖 Behind the scenes

This extension operates in several stages to improve the quality of Wav2Lip-generated videos:

Generate face swap video: The script first generates the face swap video if image is in "face Swap" field, this operation take times so be patient.
Generate a Wav2lip video: Then script generates a low-quality Wav2Lip video using the input video and audio.
Video Quality Enhancement: Create a high-quality video using the low-quality video by using the enhancer define by user.
Mask Creation: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
Video Generation: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
Video Post Processing: The script then uses the ffmpeg tool to generate the final video.

💪 Quality tips

Use a high quality video as input
Utilize a video with a consistent frame rate. Occasionally, videos may exhibit unusual playback frame rates (not the standard 24, 25, 30, 60), which can lead to issues with the face mask.
Use a high quality audio file as input, without background noise or music. Clean audio with a tool like https://podcast.adobe.com/enhance.
Dilate the mouth mask. This will help the model retain some facial motion and hide the original mouth.
Mask Blur maximum twice the value of Mouth Mask Dilate. If you want to increase the blur, increase the value of Mouth Mask Dilate otherwise the mouth will be blurred and the underlying mouth could be visible.
Upscaling can be good for improving result, particularly around the mouth area. However, it will extend the processing duration. Use this tutorial from Olivio Sarikas to upscale your video: https://www.youtube.com/watch?v=3z4MKUqFEUk. Ensure the denoising strength is set between 0.0 and 0.05, select the 'revAnimated' model, and use the batch mode. i'll create a tutorial for this soon.
Ensure there is a face on each frame of the video. If the face is not detected, process will stop.

⚠ Noted Constraints

for speed up process try to keep resolution under 1000x1000px, so use resize factor and upscaling after process.
If the initial phase is excessively lengthy, consider using the "resize factor" to decrease the video's dimensions.
While there's no strict size limit for videos, larger videos will require more processing time. It's advisable to employ the "resize factor" to minimize the video size and then upscale the video once processing is complete.

📖 Troubleshooting

Mac users: dlib will not install correctly. in requirements.txt, replace "dlib-bin" with "dlib"

📝 To do

Tutorials
Convert avi to mp4. Avi is not show in video input but process work fine
Add Possibility to use a video for audio input
Standalone version
ComfyUI intergration

😎 Contributing

We welcome contributions to this project. When submitting pull requests, please provide a detailed description of the changes. see CONTRIBUTING for more information.

🙏 Appreciation

☕ Support Wav2lip Studio

this project is open-source effort that is free to use and modify. I rely on the support of users to keep this project going and help improve it. If you'd like to support me, you can make a donation on my Patreon page. Any contribution, large or small, is greatly appreciated!

Your support helps me cover the costs of development and maintenance, and allows me to allocate more time and resources to enhancing this project. Thank you for your support!

patreon page

📝 Citation

If you use this project in your own work, in articles, tutorials, or presentations, we encourage you to cite this project to acknowledge the efforts put into it.

To cite this project, please use the following BibTeX format:

@misc{wav2lip_uhq,
  author = {numz},
  title = {Wav2Lip UHQ},
  year = {2023},
  howpublished = {GitHub repository},
  publisher = {numz},
  url = {https://github.com/numz/sd-wav2lip-uhq}
}

📜 License

The code in this repository is released under the MIT license as found in the LICENSE file.

sd-wav2lip-uhq's People

Contributors

Stargazers

Watchers

Forkers

fujohnwang adambear amaraka jensinjames holycrypto crystalwizard godlook2021 vantang lien006 kekewind tskarthikkumar zhangmeng1847 quantjia luxinzhang902 baiyuzl woodburyeacely sahmad75 makeyhm renfengyi raypengking yinghuozijin qiaoyafeng readytodance haoieac mrmartech doudoudiule liu92528 joeyedward gmg-ryan qjizhi soe6 hozhenwai ziaistan-official proxytype geekcheng notvicent3 zhangziliang04 dharmikjagodana kp-forks angelandy harukaxxxx patrickultra yao202303 aicodedev hitech777 thebadsektor thinkdiffusion minhhai113 7kkkkkkk bachdgvn philsad stonkr ali1898 leungyin irislabs-co natlamir chiang259 neelumsoft aliae2425 kellhuang flightiger renxiangnan ritikvirus way311 cspsolutions-dev 7675323 bianshifeng fanglanfeng happyxy nifty0x shubhk0 annncc uauaua1 russpalms markdonya jacquesgariepy newledge ahrimdon crowaixyz xl714 2495602755 d8ahazard minkhant1996 lianyu125 zmlsw2006 ugo99 tadpoleai dremok ccc0168 dukesun99 dachkovski wasahaiah btksama mjimzhang asoans eliasschwalme chr01x lcsouzamenezes watsoncui kaurson

sd-wav2lip-uhq's Issues

cant reload after installed extension

/test/SDXL/stable-diffusion-webui$ ./webui.sh --listen

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on aiteam user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
Version: v1.5.2
Commit hash: c9c8485bc1e8720aba70f029d25cba1c4abf2b5c
Installing wav2lip_uhq requirement: dlib-bin
Installing wav2lip_uhq requirement: opencv-python
Installing wav2lip_uhq requirement: pillow
Installing wav2lip_uhq requirement: librosa==0.10.0.post2
Installing wav2lip_uhq requirement: opencv-contrib-python
Installing wav2lip_uhq requirement: git+https://github.com/suno-ai/bark.git

Launching Web UI with arguments: --listen
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
./webui.sh: line 254: 1963 Illegal instruction (core dumped) "${python_cmd}" "${LAUNCH_SCRIPT}" "$@"

AUTO1111`s WebUI dev branch launch error

dev branch 60183eebc37a69545e41cb6b00189609b85129b0

sd-wav2lip-uhq launch error

here`s console log

*** Error loading script: ui.py
Traceback (most recent call last):
File "J:\SDWebUI\modules\scripts.py", line 319, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "J:\SDWebUI\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\ui.py", line 3, in
from scripts.wav2lip.w2l import W2l
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip\w2l.py", line 2, in
import cv2, os, scripts.wav2lip.audio as audio
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip\audio.py", line 1, in
import librosa
File "J:\SDWebUI\venv\Lib\site-packages\librosa_init.py", line 211, in
from . import core
File "J:\SDWebUI\venv\Lib\site-packages\librosa\core_init_.py", line 9, in
from .constantq import * # pylint: disable=wildcard-import
^^^^^^^^^^^^^^^^^^^^^^^^
File "J:\SDWebUI\venv\Lib\site-packages\librosa\core\constantq.py", line 1058, in
dtype=np.complex,
^^^^^^^^^^
File "J:\SDWebUI\venv\Lib\site-packages\numpy_init_.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
np.complex was a deprecated alias for the builtin complex. To avoid this error in existing code, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "J:\SDWebUI\modules\scripts.py", line 319, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "J:\SDWebUI\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 15, in
init_wav2lip_uhq()
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 9, in init_wav2lip_uhq
from ui import on_ui_tabs
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\ui.py", line 3, in
from scripts.wav2lip.w2l import W2l
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip\w2l.py", line 2, in
import cv2, os, scripts.wav2lip.audio as audio
File "J:\SDWebUI\extensions\sd-wav2lip-uhq\scripts\wav2lip\audio.py", line 1, in
import librosa
File "J:\SDWebUI\venv\Lib\site-packages\librosa_init.py", line 211, in
from . import core
File "J:\SDWebUI\venv\Lib\site-packages\librosa\core_init_.py", line 9, in
from .constantq import * # pylint: disable=wildcard-import
^^^^^^^^^^^^^^^^^^^^^^^^
File "J:\SDWebUI\venv\Lib\site-packages\librosa\core\constantq.py", line 1058, in
dtype=np.complex,
^^^^^^^^^^
File "J:\SDWebUI\venv\Lib\site-packages\numpy_init_.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
np.complex was a deprecated alias for the builtin complex. To avoid this error in existing code, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Error loading script: wav2lip_uhq

Can't make it work. Can you help me with this? ThankYou.

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.5.1
Commit hash: 68f336bd994bed5442ad95bad6b6ad5564a5409a
Installing wav2lip_uhq requirement: dlib-bin
Installing wav2lip_uhq requirement: opencv-python
Installing wav2lip_uhq requirement: pillow
Installing wav2lip_uhq requirement: librosa==0.10.0.post2
Installing wav2lip_uhq requirement: opencv-contrib-python

Launching Web UI with arguments: --autolaunch --xformers
*** Error loading script: ui.py
Traceback (most recent call last):
File "C:\AI\SDXL\webui\modules\scripts.py", line 319, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "C:\AI\SDXL\webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in call_with_frames_removed
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 4, in
from scripts.wav2lip.wav2lip_uhq import Wav2LipUHQ
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\wav2lip\wav2lip_uhq.py", line 8, in
import dlib
File "C:\AI\SDXL\system\python\lib\site-packages\dlib_init.py", line 19, in
from _dlib_pybind11 import *
ImportError: DLL load failed while importing _dlib_pybind11: A dynamic link library (DLL) initialization routine failed.

*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "C:\AI\SDXL\webui\modules\scripts.py", line 319, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "C:\AI\SDXL\webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in call_with_frames_removed
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 15, in
init_wav2lip_uhq()
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 9, in init_wav2lip_uhq
from ui import on_ui_tabs
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 4, in
from scripts.wav2lip.wav2lip_uhq import Wav2LipUHQ
File "C:\AI\SDXL\webui\extensions\sd-wav2lip-uhq\scripts\wav2lip\wav2lip_uhq.py", line 8, in
import dlib
File "C:\AI\SDXL\system\python\lib\site-packages\dlib_init.py", line 19, in
from _dlib_pybind11 import *
ImportError: DLL load failed while importing _dlib_pybind11: A dynamic link library (DLL) initialization routine failed.

Nothing happens

Hi there
I finale understand the issue with code errors so there should not be an interruptions in video/frames where there is no fase on the screen, so the face should always be on the screen to make it works.
So it works but did nothing.
I choose the video, cut out the audio and then insers that video and the same cut audio as an audio.
program perform the task for several hours and as a result now I have the same video with audio. No any animations no effects there were nothing. looks like and output of video converter where I just merge video and audio.
May me I did somethig wrong. I choose revanimated checkpoint and the rest as it was shown in 'code' page.

Availability thru A1111 API

A1111 can be started with the --api option - the docs are available on http://127.0.0.1:7860/docs
I see, for example, /sdapi/v1/txt2img to access the txt2img functionality. Is there a way to access Wav2lip Studio in a similar fashion? Thanks

安装不上什么情况，大佬说说

*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "D:\sd\sd-webui-aki-v4\modules\scripts.py", line 382, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "D:\sd\sd-webui-aki-v4\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\sd\sd-webui-aki-v4\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 11, in
init_wav2lip_uhq()
File "D:\sd\sd-webui-aki-v4\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 7, in init_wav2lip_uhq
from ui import on_ui_tabs
File "D:\sd\sd-webui-aki-v4\extensions\sd-wav2lip-uhq\scripts\ui.py", line 7, in
from scripts.bark.tts import TTS
File "D:\sd\sd-webui-aki-v4\extensions\sd-wav2lip-uhq\scripts\bark\tts.py", line 5, in
from bark.generation import (
ModuleNotFoundError: No module named 'bark.generation'
提示：Python 运行时抛出了一个异常。请检查疑难解答页面。

LLVM ERROR: Symbol not found: __svml_cosf4_ha

I installed the latest version of sd-wav2lip-uhq and kept only sd-wav2lip-uhq in the extensions folder, just to troubleshoot the problem. SD-webui 1.5.1 latest version

Here is the error message from the backend：
Using cuda for inference.
Reading video frames...
Number of frames available for inference: 459
LLVM ERROR: Symbol not found: __svml_cosf4_ha
Please press any key to continue . . .

RuntimeError

RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.8 whereas TorchAudio has CUDA version 11.7. Please install the TorchAudio version that matches your PyTorch version.

Output video not compleat the talking

Hello...
Thanks for your adaptation of Wav2LIp for Automatic1111 !!

I encountered such a problem. Creating the low resolution video was successful. But the HD process stopped after 279 frames as if the movie was finished.

Processing frame: 265 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 266 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.62it/s]
Processing frame: 267 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 268 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 269 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.64it/s]
Processing frame: 270 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.66it/s]
Processing frame: 271 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 272 of 1133 -- 44s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.66it/s]
Processing frame: 273 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 274 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.63it/s]
Processing frame: 275 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 276 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.64it/s]
Processing frame: 277 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 278 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 279 of 1133 -- 43s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.65it/s]
Processing frame: 280 of 1133 -- 43s/it]
[INFO] Create Video output!
[INFO] Extract Audio from input!
[INFO] Add Audio to Video!
[INFO] Done! file save in output/video_output.mp4

result_voice.mp4

output_video.mp4

How can I solve this problem?

*** Error loading script: ui.py
Traceback (most recent call last):
File "C:\stable-diffusion-webui\modules\scripts.py", line 382, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "C:\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 8, in
from scripts.faceswap.swap import FaceSwap
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\faceswap\swap.py", line 6, in
import insightface
ModuleNotFoundError: No module named 'insightface'

*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "C:\stable-diffusion-webui\modules\scripts.py", line 382, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "C:\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 11, in
init_wav2lip_uhq()
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py", line 7, in init_wav2lip_uhq
from ui import on_ui_tabs
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 8, in
from scripts.faceswap.swap import FaceSwap
File "C:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\faceswap\swap.py", line 6, in
import insightface
ModuleNotFoundError: No module named 'insightface'

Imcompatible issue for lower dependency env of wav2lip-uhp

could you please upgrade relating dependency software version or advise the solution?
issue: i installed wav2lip-uhp and config the dependency environment, downgrade the version such as opencv-python, tqdm,
but that cause lot of tools don't work.
please refer to below alert.

bark

*** stderr: Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-922ue580
*** error: RPC failed; curl 16 Error in the HTTP2 framing layer
*** fatal: expected flush after ref listing
*** error: subprocess-exited-with-error

*** × git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-922ue580 did not run successfully.
*** │ exit code: 128
*** ╰─> See above for output.

*** note: This error originates from a subprocess, and is likely not a problem with pip.
*** error: subprocess-exited-with-error

*** × git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-922ue580 did not run successfully.
*** │ exit code: 128
*** ╰─> See above for output.

*** note: This error originates from a subprocess, and is likely not a problem with pip.

*** [notice] A new release of pip is available: 23.0.1 -> 23.2.1
*** [notice] To update, run: python -m pip install --upgrade pip

Launching Web UI with arguments: --no-download-sd-model --xformers --share --listen --enable-insecure-extension-access

process does not move

The process does not move when click generate

Does not work offline/HD problem

Hi and thanks for great work. Thanks.
I have two issues.
They may be similar to some earlier issues, but being novice, I'm unsure...

Files greater than 896x512, like 720 or1080 it does not seem to work. Gpu stuck at 100%
What is the max video input resolution? I have a 2080 gpu
Tried installing the Nvidia specific A1111 installer, but it did not work, and I had to go with option 2
Every time i run A1111 it does this:
Installing wav2lip_uhq requirement: dlib-bin
Installing wav2lip_uhq requirement: opencv-python
Installing wav2lip_uhq requirement: pillow
Installing wav2lip_uhq requirement: librosa==0.10.0.post2
Installing wav2lip_uhq requirement: opencv-contrib-python
Installing wav2lip_uhq requirement: git+https://github.com/suno-ai/bark.git

If I start without network I get:
venv "C:\Users\Boya PC\sd.webui\webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.5.2
Commit hash: c9c8485bc1e8720aba70f029d25cba1c4abf2b5c
*** Error running install.py for extension C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq.
*** Command: "C:\Users\Boya PC\sd.webui\webui\venv\Scripts\python.exe" "C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq\install.py"
*** Error code: 1
*** stdout: Installing wav2lip_uhq requirement: dlib-bin
*** Installing wav2lip_uhq requirement: opencv-python
*** Installing wav2lip_uhq requirement: pillow
*** Installing wav2lip_uhq requirement: librosa==0.10.0.post2
*** Installing wav2lip_uhq requirement: opencv-contrib-python
*** Installing wav2lip_uhq requirement: git+https://github.com/suno-ai/bark.git

*** stderr: Traceback (most recent call last):
*** File "C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq\install.py", line 10, in
*** launch.run_pip(f"install {lib}", f"wav2lip_uhq requirement: {lib}")
*** File "C:\Users\Boya PC\sd.webui\webui\modules\launch_utils.py", line 136, in run_pip
*** return run(f'"{python}" -m pip {command} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}", live=live)
*** File "C:\Users\Boya PC\sd.webui\webui\modules\launch_utils.py", line 113, in run
*** raise RuntimeError("\n".join(error_bits))
*** RuntimeError: Couldn't install wav2lip_uhq requirement: git+https://github.com/suno-ai/bark.git.
*** Command: "C:\Users\Boya PC\sd.webui\webui\venv\Scripts\python.exe" -m pip install git+https://github.com/suno-ai/bark.git --prefer-binary
*** Error code: 1
*** stdout: Collecting git+https://github.com/suno-ai/bark.git
*** Cloning https://github.com/suno-ai/bark.git to c:\users\boya pc\appdata\local\temp\pip-req-build-wo4lxrkr

*** stderr: Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git 'C:\Users\Boya PC\AppData\Local\Temp\pip-req-build-wo4lxrkr'
*** fatal: unable to access 'https://github.com/suno-ai/bark.git/': Could not resolve host: github.com
*** error: subprocess-exited-with-error

*** git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git 'C:\Users\Boya PC\AppData\Local\Temp\pip-req-build-wo4lxrkr' did not run successfully.
*** exit code: 128

*** See above for output.

*** note: This error originates from a subprocess, and is likely not a problem with pip.
*** error: subprocess-exited-with-error

*** git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git 'C:\Users\Boya PC\AppData\Local\Temp\pip-req-build-wo4lxrkr' did not run successfully.
*** exit code: 128

*** See above for output.

*** note: This error originates from a subprocess, and is likely not a problem with pip.

Launching Web UI with arguments:
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Loading weights [6ce0161689] from C:\Users\Boya PC\sd.webui\webui\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
*** Error executing callback ui_tabs_callback for C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq\scripts\wav2lip_uhq.py
Traceback (most recent call last):
File "C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 68, in on_ui_tabs
audio_example = gr.Audio(label="Audio example",
File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 2390, in init
IOComponent.init(
File "C:\Users\Boya PC\sd.webui\webui\modules\scripts.py", line 654, in IOComponent_init
res = original_IOComponent_init(self, *args, **kwargs)
File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 215, in init
else self.postprocess(initial_value)
File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 2591, in postprocess
file_path = self.make_temp_copy_if_needed(y)
File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 259, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 223, in hash_file
with open(file_path, "rb") as f:
OSError: [Errno 22] Invalid argument: 'https://dl.suno-models.io/bark/prompts/prompt_audio/en_speaker_0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Boya PC\sd.webui\webui\modules\script_callbacks.py", line 153, in ui_tabs_callback
    res += c.callback() or []
  File "C:\Users\Boya PC\sd.webui\webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 37, in on_ui_tabs
    with gr.Blocks(analytics_enabled=False) as wav2lip_uhq_interface:
  File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\blocks.py", line 1411, in __exit__
    self.config = self.get_config_file()
  File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\blocks.py", line 1378, in get_config_file
    props = block.get_config() if hasattr(block, "get_config") else {}
  File "C:\Users\Boya PC\sd.webui\webui\venv\lib\site-packages\gradio\components.py", line 2408, in get_config
    "value": self.value,
AttributeError: 'Audio' object has no attribute 'value'

Creating model from config: C:\Users\Boya PC\sd.webui\webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Startup time: 21.4s (launcher: 12.9s, import torch: 3.3s, import gradio: 0.8s, setup paths: 0.8s, other imports: 0.9s, load scripts: 1.6s, create ui: 0.4s, gradio launch: 0.5s, app_started_callback: 0.1s).
Applying attention optimization: Doggettx... done.
Model loaded in 3.9s (load weights from disk: 0.7s, create model: 0.4s, apply weights to model: 0.8s, apply half(): 0.7s, move model to device: 1.1s).

Getting this error on M1

*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "/Users/chief/stable-diffusion-webui/modules/scripts.py", line 319, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "/Users/chief/stable-diffusion-webui/modules/script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/Users/chief/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip_uhq.py", line 11, in
init_wav2lip_uhq()
File "/Users/chief/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip_uhq.py", line 7, in init_wav2lip_uhq
from ui import on_ui_tabs
File "/Users/chief/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/ui.py", line 5, in
from scripts.wav2lip.wav2lip_uhq import Wav2LipUHQ
File "/Users/chief/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/wav2lip_uhq.py", line 4, in
import dlib
File "/Users/chief/stable-diffusion-webui/venv/lib/python3.10/site-packages/dlib/init.py", line 19, in
from _dlib_pybind11 import *
ImportError: dlopen(/Users/chief/stable-diffusion-webui/venv/lib/python3.10/site-packages/_dlib_pybind11.cpython-310-darwin.so, 0x0002): Library not loaded: '/opt/homebrew/opt/ffmpeg/lib/libavdevice.59.dylib'
Referenced from: '/Users/chief/stable-diffusion-webui/venv/lib/python3.10/site-packages/_dlib_pybind11.cpython-310-darwin.so'
Reason: tried: '/opt/homebrew/opt/ffmpeg/lib/libavdevice.59.dylib' (no such file), '/usr/local/lib/libavdevice.59.dylib' (no such file), '/usr/lib/libavdevice.59.dylib' (no such file), '/opt/homebrew/Cellar/ffmpeg/6.0_1/lib/libavdevice.59.dylib' (no such file), '/usr/local/lib/libavdevice.59.dylib' (no such file), '/usr/lib/libavdevice.59.dylib' (no such file)

several code errors

Could you please explain what happening, looks like some errors in the code.
ffmpeg was installed (but it would be great if the would be a more detailed instruction how to do that to save time for users/ anyway)
ffmpeg installed and seen in cmd. extension installed but there still errors.

SD菜单栏没有显示wave2lip

系统：mac M1 pro 16G
sd版本：1.4
已经git了[sd-wav2lip-uhq],模型文件也放入了正确文件夹，就是sd的菜单不显示，请各位大佬指导，谢谢。

A suggestion: replace "roop" for "facefusion"

The author of roop said that the project would not be updated, and facefusion is a replacement project based on roop. So I suggest author integrate this project. Project address is: https://github.com/facefusion/facefusion

urllib.error.URLError

Downloading the detection model to C:\Users\unis.ifnude/detector.onnx
*** Error loading script: ui.py
Traceback (most recent call last):
File "urllib\request.py", line 1348, in do_open
File "http\client.py", line 1283, in request
File "http\client.py", line 1329, in _send_request
File "http\client.py", line 1278, in endheaders
File "http\client.py", line 1038, in _send_output
File "http\client.py", line 976, in send
File "http\client.py", line 1448, in connect
File "http\client.py", line 942, in connect
File "socket.py", line 845, in create_connection
File "socket.py", line 833, in create_connection
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。
提示：Python 运行时抛出了一个异常。请检查疑难解答页面。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\sd-webui-aki-v4\modules\scripts.py", line 382, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
  File "D:\sd-webui-aki-v4\modules\script_loading.py", line 10, in load_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "D:\sd-webui-aki-v4\extensions\sd-wav2lip-uhq-main\scripts\ui.py", line 8, in <module>
    from scripts.faceswap.swap import FaceSwap
  File "D:\sd-webui-aki-v4\extensions\sd-wav2lip-uhq-main\scripts\faceswap\swap.py", line 16, in <module>
    from ifnude import detect
  File "D:\sd-webui-aki-v4\py310\lib\site-packages\ifnude\__init__.py", line 1, in <module>
    from .detector import detect
  File "D:\sd-webui-aki-v4\py310\lib\site-packages\ifnude\detector.py", line 36, in <module>
    download(model_url, model_path)
  File "D:\sd-webui-aki-v4\py310\lib\site-packages\ifnude\detector.py", line 16, in download
    request = urllib.request.urlopen(url)
  File "urllib\request.py", line 216, in urlopen
  File "urllib\request.py", line 519, in open
  File "urllib\request.py", line 536, in _open
  File "urllib\request.py", line 496, in _call_chain
  File "urllib\request.py", line 1391, in https_open
  File "urllib\request.py", line 1351, in do_open
urllib.error.URLError: <urlopen error [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。>

Downloading the detection model to C:\Users\unis.ifnude/detector.onnx
*** Error loading script: wav2lip_uhq.py
Traceback (most recent call last):
File "urllib\request.py", line 1348, in do_open
File "http\client.py", line 1283, in request
File "http\client.py", line 1329, in _send_request
File "http\client.py", line 1278, in endheaders
File "http\client.py", line 1038, in _send_output
File "http\client.py", line 976, in send
File "http\client.py", line 1448, in connect
File "http\client.py", line 942, in connect
File "socket.py", line 845, in create_connection
File "socket.py", line 833, in create_connection
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。
提示：Python 运行时抛出了一个异常。请检查疑难解答页面。

Running on CPU while saying Using cuda for inference.

I see the rpocess obviously running on cpu, and filling my memory in my task managerm cpu at 100%, even tough it said Using cuda for inference. Would it be possible to add some more cheks and error raising to avoid this situation?

安装报错，Could not find a version that satisfies the requirement dlib-bin (from versions: none)

安装这个插件的时候， ERROR: Could not find a version that satisfies the requirement dlib-bin (from versions: none)；
报错电脑，Macbook M1 Pro
尝试指定最新的版本 pip install dlib-bin 19.21.0 也没用；
有木有兄弟解决的，求分享办法；
have any brother solve this problem? give me a hand~~thank you~~

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg ...

HI. Thanks so much for implementing this in A1111.

I'm running in Paperspace and confirmed FFmpeg -version: "ffmpeg version 4.2.7-0ubuntu0.1"

However I keep getting this error:
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg -y -i /tmp/gradio/5aeb60fa9989b9e4f701c6358b152a2d62947b62/21028_Infiniti_WildWorld-AIRedux_071823_RK.wav -i /notebooks/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/temp/result.avi -strict -2 -q:v 1 /notebooks/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/results/result_voice.mp4'

Any idea if its an FFmpeg issue or something else?

Thanks

Errors when trying to generate

Hi. First, the link to s3fd.pth is broken. But I found it elsewhere.

Then, when I try to generate, I get the following errors:

File "C:\StableDiff\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "C:\StableDiff\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "C:\StableDiff\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\StableDiff\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\StableDiff\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\StableDiff\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\StableDiff\extensions\sd-wav2lip-uhq\scripts\ui.py", line 43, in generate
w2l.execute()
File "C:\StableDiff\extensions\sd-wav2lip-uhq\scripts\wav2lip\w2l.py", line 223, in execute
mel = audio.melspectrogram(wav)
File "C:\StableDiff\extensions\sd-wav2lip-uhq\scripts\wav2lip\audio.py", line 47, in melspectrogram
S = _amp_to_db(_linear_to_mel(np.abs(D))) - hp.ref_level_db
File "C:\StableDiff\extensions\sd-wav2lip-uhq\scripts\wav2lip\audio.py", line 95, in _linear_to_mel
_mel_basis = _build_mel_basis()
File "C:\StableDiff\extensions\sd-wav2lip-uhq\scripts\wav2lip\audio.py", line 100, in _build_mel_basis
return librosa.filters.mel(hp.sample_rate, hp.n_fft, n_mels=hp.num_mels,
TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given

Progress in the console

Hello

I am currently making video+sound. 1080x1920

But I don't see the process in the console, I only see that the video card's cuda is triggered. Somehow it is possible to display the process as when generating in automation, so that the process is visible. Since it is not clear how much more he has to do.

Is it possible to make the progress bar not so big?

Feature request

This is really cool and I would like to request some kind of control for the lip-sync.

For example, the way the lips are moving should be based on some parameters and based on audio frequency (+Increase or -Decrease)
Or something that can help control the amount of movments to the lips.

Errors on code

I update the extension change the video and audio (mp4 and mp3 files)
But the program load something for a 10 mon and then gaves me an errors. See attached file
CVould you please advice what is the problem and how to get rid of it.

Everytime I run I get this error:RuntimeError: unexpected EOF, expected 9218993 more bytes. The file might be corrupted.

Using cuda for inference.
Reading video frames...
Number of frames available for inference: 114
(80, 1335)
Length of mel chunks: 496
0% 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/ui.py", line 154, in generate
w2l.execute()
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/w2l.py", line 250, in execute
for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen,
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/w2l.py", line 114, in datagen
face_det_results = self.face_detect(frames) # BGR2RGB for CNN face detection
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/w2l.py", line 65, in face_detect
detector = face_detection.FaceAlignment(face_detection.LandmarksType._2D,
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/face_detection/api.py", line 59, in init
self.face_detector = face_detector_module.FaceDetector(device=device, verbose=verbose)
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/face_detection/detection/sfd/sfd_detector.py", line 24, in init
model_weights = torch.load(path_to_detector)
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/safe.py", line 108, in load
return load_with_extra(filename, *args, extra_handler=global_extra_handler, **kwargs)
File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/modules/safe.py", line 156, in load_with_extra
return unsafe_torch_load(filename, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1051, in _legacy_load
typed_storage._untyped_storage._set_from_file(
RuntimeError: unexpected EOF, expected 9218993 more bytes. The file might be corrupted.

Tab for Wa2Lip uhq won't load in A1111. ModuleNotFoundError: No Module names 'dlib'

Hi,
I receive the above error. Is there something I'm missing or need to do to get this to load in A1111? Thank you.

Transfer Lips from original Video to generated Vid

Hi,
My video is not English! and I've created a video using SD+CN using Img2img. (creating a video from existing Video)
So i was wondering if it is possible to use your great project to map the original Video Lips and face motions to the generated Video ( which lacks the accuracy on Lips and face motions)?

Thanks
Regards

Unexpected version found while deserializing dlib::shape_predictor

Any way to get around his error?

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/storage/stable-diffusion/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/ui.py", line 46, in generate
w2luhq.execute()
File "/storage/stable-diffusion/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/wav2lip_uhq.py", line 173, in execute
detector, predictor = self.initialize_dlib_predictor()
File "/storage/stable-diffusion/stable-diffusion-webui/extensions/sd-wav2lip-uhq/scripts/wav2lip/wav2lip_uhq.py", line 146, in initialize_dlib_predictor
predictor = dlib.shape_predictor(self.wav2lip_folder + "/predicator/shape_predictor_68_face_landmarks.dat")
RuntimeError: Unexpected version found while deserializing dlib::shape_predictor.

[ FEATURE REQUEST ] - ComfyUI

Would love to see if this could be converted to ComfyUI as a node.

still can't see the "Wav2Lip UHQ" menu

Already tried restarting Automatic1111.
still can't see the "Wav2Lip UHQ" menu

Close the Original Mouth Video Feature?

I'm trying to reuse a video of someone talking to make them say something else but when I process the video using Wav2Lip I can still see the original mouth talking between silence / gaps in the new audio

Is it possible to close the original mouth on a video before adding the new lip movements?
Or is there something we could add to the text prompt to close the mouth as a 1st pass?

Thanks and keep up the good work! 👍

How about to implement HDTR Net?

I found HDTR Net intereseting, tested it on CMD line, and seems to produce good results.
https://github.com/yylgoodlucky/HDTR-Net

Maybe it´s an good alternate for the mouth restauration.

What do you think @numz ?

Add Ressources

Ressources for README

demo_1.mp4

启动运行的时候能加一行执行代码，那样我的电脑就能运行了，谢谢

这个代码 --face_det_batch_size 1 能不能加到执行代码中，不然没法运行

最终视频的长度是取决于音频长度么？

code 1 菜单栏无法显示wav2lip

          你能给我提供cmd控制台日志吗？

Originally posted by @numz in #39 (comment)

note: This error originates from a subprocess, and is likely not a problem with pip.
Traceback (most recent call last):
File "C:\Users\Derry\Desktop\sd-webui-aki\sd-webui-aki-v4.4\extensions\sd-wav2lip-uhq\install.py", line 10, in
launch.run_pip(f"install {lib}", f"wav2lip_uhq requirement: {lib}")
File "C:\Users\Derry\Desktop\sd-webui-aki\sd-webui-aki-v4.4.launcher\pyinterop.hfkx1kkk0g4q7.zip\swlpatches\progress\launch.py", line 49, in wrapped_run_pip
File "C:\Users\Derry\Desktop\sd-webui-aki\sd-webui-aki-v4.4\modules\launch_utils.py", line 138, in run_pip
return run(f'"{python}" -m pip {command} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}", live=live)
File "C:\Users\Derry\Desktop\sd-webui-aki\sd-webui-aki-v4.4\modules\launch_utils.py", line 115, in run
raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't install wav2lip_uhq requirement: git+https://github.com/suno-ai/bark.git.

mouth movement unnatural, dark batches/shadows appear around lips

the dark spots/shadows can be seen more noticeably when in motion.

what parameters should I adjust to make it better? i'm using default settings:

So Far Not So Good

Hi, I'm a content creator on YouTube and I was really excited to try out your new extension so I could make a video about it. . . Well in short, I can't get it to work so it would not be a very good video. . . Seriously though this is to be expected since you just released it. Anyway, when I attempt to generate an output I get the following console error. A1111 is up to date, as are all my dependencies. I have downloaded the appropriate models and placed them in the correcto folders. Do you have any idea what I'm doing wrong, or is this in fact a problem with you extension? Thanks!

Crash at the end of image generation

Hello, I installed your extension but sadly after it generates it crashes after the last frame is done.
I'm using Auto 1.5XX

Here is the error output:

Processing frame: 135 of 135 --
Traceback (most recent call last):
File "E:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "E:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "E:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "E:\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "E:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "E:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "E:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\ui.py", line 46, in generate
w2luhq.execute()
File "E:\stable-diffusion-webui\extensions\sd-wav2lip-uhq\scripts\wav2lip\wav2lip_uhq.py", line 243, in execute
cv2.destroyAllWindows()
cv2.error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\highgui\src\window.cpp:1266: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvDestroyAllWindows'

thanks in advance for your help,

Use without UI

Hi,

Thanks for your great work, can you provide how to use it by command lines? Or do you have any related scripts?

About API

I used this plugin in one of my recent projects and needed to rewrite the ui. Could you please provide the interface documentation? Thank you so much

Stable diffision 1.6.0 version Report an error

Stable diffision 1.6.0 version, during use, after downloading the plug-in, the name changed to wav2lip studio, and it cannot be used. When opening the plug-in interface, it keeps reporting errors.Is it currently unavailable in version 1.6.0, and is there a limit to the voice duration? python 3.10.11

The system cannot find the path specified

Hi!
Trying to install this extension. But after installing and restarting the UI an error is thrown.
I am using the portable build A1111 by Xpuct and all the extensions I have installed are working properly. This build is updated with the original version. FFMPEG and other dependencies are installed, paths are set.

Screenshots of the error THERE

Is there any solution to this problem?

生成的视频好模糊，嘴巴的蒙层太明显了吧，显卡4090

Menu not appear

Hello,

I installed automatic1111 and It wrk well. I follow instruction of installation but menu not appear. I try to reboot server and menu not appear.

Is there anything I can do to see if there is an error?