r3gm / sonitranslate Goto Github PK

Synchronized Translation for Videos. Video dubbing

License: Apache License 2.0

Python 84.14% Jupyter Notebook 15.86%

audio-processing diarization translation translate-audio translate-video video-dubbing asr automatic-dubbing document-translator dubbing

sonitranslate's Introduction

💫 About Me:

Hello there! Welcome to my profile. I'm an individual with a keen interest in data analytics and data science. I am passionate about exploring and deriving meaningful insights from data to drive informed decision-making and solve complex problems. This README will provide you with an overview of my background, skills, and areas of expertise.

🌐 Socials:

💻 Tech Stack:

📊 GitHub Stats:

sonitranslate's People

Stargazers

Watchers

Forkers

hitech777 wonhyeongseo sangramdhurve kennytat oijoijcoiejoijce denskl1 aryusoni27 render-ai transonit magicse ap1075 kingmacth serjik777 huangweiboy2 jmaigc kaptainkangaroo vcstack yaranbarzi clebersonjf83 vamsigottipati kcbf isaacmuxic bzxxxxxx vieux448 b4zz4 michahial alilotfi1389 trananh1992 arghyadipbiswas ponros gusakovgiorgi dev-mallettes shs2008 victorzrv akuzmenkov niumosun nata-art-60 artvandalism fishercc atulshukla telegahjkl davahiatak1 positivewon dsyemen syedusama5556 zhenqi1688 ultramarkorj berkblg96 orkcode grim-reapper lcsouzamenezes imole-bj djmuratb lailson trader6363

sonitranslate's Issues

instalation

could some one write there how to install it in proper way. Not every one is experienced enough in instalation without instructions.

Document Translation (No generated SRT file)

Hey there, when translating documentations and creating videobooks, I would love it we could include the actual documentation in there translated or having the option to have the original on one side and the translated one on the other side.

But my real issue is when I create audiobook, I would definitely need an SRT file, that would be generated when translating audio.

I know it's almost impossible because you have to define the amount of characters and since it's splitted in segmentations (let's pretend I've set the maximum characters to 200) it will be hard to generate a SRT file if the segments are not exactly as the actual documentation. But there has to be some ways to achieve this. This would be a game changer for my work! We work in IT Accessibility.

Thanks!

Speech too fast and out of sync

Hello and congratulations for this work! I just tested a lot of other projects and yours is clearly the most efficient :)

Unfortunately and I do not understand why: the speech of the translation has a speed like x10 and we do not understand anything of course ^^ and then 15 seconds later, the speech speed returns to a correct speed.
Here is a short extract to illustrate the problem: FinalTest

Could you give me a clue so that I can find a solution?

(for the same project, I've done a transcription + translate with wisper without any problems, but I'm missing the voice translation and video sync).

Many thanks in advance for your reply!

public url problem...

can you help?

Issues with diarization and pyannote 3.1

Hey, truly incredible, thank you for all your efforts,

i am having an issue with pyannote 3.1 on Google Collab (pyannote 2.0 works fine)

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/SoniTranslate/app_rvc.py", line 288, in batch_multilingual_media_conversion
output_file = self.multilingual_media_conversion(
File "/content/SoniTranslate/app_rvc.py", line 549, in multilingual_media_conversion
self.result_diarize = diarize_speech(
File "/content/SoniTranslate/soni_translate/speech_segmentation.py", line 175, in diarize_speech
raise error
TypeError: exceptions must derive from BaseException

Whisper

Whisper Do a good translation of subtitles by this command :

whisper x.wav --model large-v3 --task translate --output_format srt --threads 10

even better than google translate and multipe languages.
So I hope you can add an option to use it
Thanks you

how can I add openai api in the colab version?

Request for help on installing on Windows

Could you please add a guide in the readme on how to run everything on Windows, I'm having big problems getting everything running. I will be grateful to you.

Problem after the installation

``Trying to run it after the installation gave me this problem:


python app_rvc.py
Traceback (most recent call last):
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 7, in <module>
    import whisperx
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
    from .asr import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 45, in <module>
    from speechbrain.pretrained import (
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/__init__.py", line 4, in <module>
    from .core import Stage, Brain, create_experiment_directory, parse_arguments
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/core.py", line 38, in <module>
    from speechbrain.utils.optimizers import rm_vector_weight_decay
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/__init__.py", line 11, in <module>
    from . import *  # noqa
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 268, in <module>
    class ProgressSampleLogger:
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 337, in ProgressSampleLogger
    "saver": _get_image_saver(),
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 260, in _get_image_saver
    import torchvision
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/library.py", line 440, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

Stucking after acceleration

[INFO] >> Apply acceleration
[INFO] >> Content in 'audio2/audio/' removed.
0it [00:00, ?it/s]

and its going forever.
Some days ago, all was fine, how to fix?

IDEA: clone voice Whisper Speech

https://github.com/collabora/whisperspeech

Problem with Doc translate (PDF and DOCX)

hello,

I have installed Sonitranslate on windows.

(sonitr) D:\GITHUB\Sonitranslate\SoniTranslate>python app_rvc.py --theme aliabid94/new-theme --language french The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. [INFO] >> PIPER TTS disabled [INFO] >> Coqui XTTS enabled [INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license. You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link: https://coqui.ai/cpml.txt. [INFO] >> Working in: cuda Running on local URL: http://127.0.0.1:7860

DOCX DOCUMENT
I can translate text document. With docx doccument SoniTranslate stopping and disconnecting.
the docx is reading but stopping " de 2014 à 2017 il est élu député en 2017 dans la dixième >> audio/1.0.ogg"

PDF DOCUMENT

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] Une connexion existante a dû être fermée par l’hôte distant

thanks for your help

rmvpe+

Hi, sorry for posting here, you don't respond to huggingface. I have a question about rmvpe+. Where did rmvpe+ come from? I can't find anything about it on the Internet. Where can I download it for local use? It works very well

Voiceless Track Separation Error

Propose a file and function well, but another file not found:

[INFO] >> Voiceless Track Separation...
100%|█████████████████████████████████████████| 210/210 [00:51<00:00,  4.05it/s]
[ERROR] >> Error comnand
Traceback (most recent call last):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 365, in batch_multilingual_media_conversion
    output_file = self.multilingual_media_conversion(
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 1085, in multilingual_media_conversion
    run_command(command_volume_mix)
  File "/home/bazza/src/sonitr/SoniTranslate/soni_translate/utils.py", line 66, in run_command
    raise Exception(errors.decode())
Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.3.0 (conda-forge gcc 12.3.0-5)

I would like to suggest a couple of functions

Hello, your creation is beautiful!
I would like to suggest a couple of functions.
From simple:
Add a checkbox on/off audio acceleration
In the audio mixer output settings (detailed volume settings((volume percentage))
RVC settings (index_rate, rms_mix_rate, protect)
From a more complex one:
Save .srt subtitles files in the original and target languages.
Editing the translated subtitle (a field in which the translation text will be and it can be changed, save and then once again assembling the sound and applying the RVС)

portable version<3

Hello! Can I run this project locally on my PC? <3

Recommend feature that integrates GPT's API (OpenAI) to translate subtitles

Hi,

Could I recommend a feature that integrates GPT's API (OpenAI) to translate subtitles

error on M1 Mac when creating SRT file from audio file.

I'm running a local install on my M1 Mac Air. I'm getting this error when trying to create an SRT file from an audio file.

"Error
Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."

Here's my log `(sonitr) userName@MacBook-Air SoniTranslate % python app_rvc.py
objc[5317]: Class AVFFrameReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x122394798) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a54760). One of the two will be used. Which one is undefined.
objc[5317]: Class AVFAudioReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x1223947e8) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a547b0). One of the two will be used. Which one is undefined.
[INFO] >> Working in: cpu
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
[WARNING] >> Make sure to select a 'TTS Speaker' suitable for the translation language to avoid errors with the TTS.
[INFO] >> Cache flushed
[INFO] >> Processing audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/soni_translate/speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.`

Control working mode: cpu/cuda

Hi guys,

Can someone please suggest how to effectively control the working mode?

app_rvc.py is automatically started in cuda mode:

[INFO] >> Working in: cuda

However I have a quite old MX150 GPU, and it constantly fails with a CUDA out of memory on Transcribing stage, no matter how I tweak Batch size / Compute type / Whisper ASR model or PYTORCH_CUDA_ALLOC_CONF.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacity of 2.00 GiB of which 0 bytes is free. Of the allocated memory 101.51 MiB is allocated by PyTorch, and 72.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Therefore I would like to fallback to cpu mode.

Running SoniTranslate dev_24_3 on Windows 11.

Thanks!

Collaboration Proposal

Hey, your work is truly amazing, love it!!!

I am a UI/UX designer and Python developer, sometimes you know I want to translate Indian, Korean, and Chinese lectures into my target language whether on YouTube or other streaming platforms, some services translate videos to other languages, and yours as well, however, would it not be easier and simpler to develop like an extension that translates right away like how google translates web pages in a click of a button. Plus, podcasts...

I am willing to collaborate and further develop your amazing program if you like the idea, you can reach me at:
[email protected]

Sincerely,
Shakhruz Bakhtiyarov

AI Dubbing API

Thank you for building this project! I work at a company called Sieve and this is a part of what inspired us to build our Dubbing API. It's a bit different than this as it supports voice cloning, different voice engines, and higher quality translations using other closed-source solutions but it's an example of the bounds of what this tech can do today.

I'd love to contribute our learnings in some way to this project. I think the most challenge part of the problem is around how one handles audio speedups and slowdowns across languages. Different applications seem to want different tradeoffs in the "sync"-ness versus how drastic the speedup tends to be.

Curious if there are improvements in the queue on that vector for this project and if we can contribute in any way? Would also love feedback on what we've built as I think it's something the community would love!

[WinError 2] The system cannot find the specified file (Windows)

Hi, I just installed and ran sonitranslate and everything works fine.
But the problem is that it only works when I pass it a YouTube url, but when I want to upload a file locally it gives me the error "[WinError 2] The system cannot find the specified file".
I attach the details of the console.

The translation speed is incredible

Hi, Your work is incredible.
The translator has become much faster, now it runs 10 times faster (on gpu).
Is it possible to speed up RVC processing? When processing, my video card loads by 30%, it would be possible to process it in several threads to speed it up (the option to select threads, if that would be really cool). The same Wishperx uses 100% load on the video card and converts audio into text in just a few seconds.

Also, as I understand, piper tts does not work on Windows
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled

during installation it says: ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.
Maybe there is some solution?

There is also an excellent fast tts silero: https://github.com/snakers4/silero-models
Maybe it will suit you.
Choice tts is always good.

Thank you so much for your work!

Yandex SpeechKit Python SDK

Could you add a Yandex voice speaker?
They have SDK
https://github.com/TikhonP/yandex-speechkit-lib-python

Thanks for your hard work!

Issue installing Piper TTS and Coqui XTTS

So I don't know much about coding but this is the last steps what appeared:

DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.17.1

(sonitr) A:\Art_intel\SoniTranslate>pip install -q piper-tts==1.2.0
ERROR: Could not find a version that satisfies the requirement piper-phonemize~=1.1.0 (from piper-tts) (from versions: none)
ERROR: No matching distribution found for piper-phonemize~=1.1.0

(sonitr) A:\Art_intel\SoniTranslate>pip install -q -r requirements_xtts.txt
DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyannote-audio 3.1.1 requires omegaconf<3.0,>=2.1, but you have omegaconf 2.0.6 which is incompatible.
pyannote-database 5.1.0 requires typer>=0.12.1, but you have typer 0.9.4 which is incompatible.

I would like to add lip sync.

Hi,
I would like to add lip sync in separate tab by using SadTalkerVideo-Lip (my fixed fork) or video-retalking.
I would also like to translate it into my own language.

I hope you like the first idea and add it.

Need Support for FastAPI for the SoniTranslate

I am looking for FastAPi support for this project I have gone through the codebase and cannot find it. Can anyone help me with that I would be very gratefull. Or can someone guide me step by step I want to make a FastAPi that will take a video translates using coqui TTS and then dub the video on it.

Best Regards

ffmpeg error

how to fix this ?

Metadata:
encoder : Lavf60.16.100
Duration: 00:01:43.31, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
[aist#1:0/pcm_s32le @ 0000014CFD9B66C0] Guessed Channel Layout: mono
Input #1, wav, from 'audio_dub_solo.ogg':
Duration: 00:01:42.73, bitrate: 768 kb/s
Stream #1:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s32, 768 kb/s
[aost#0:0 @ 0000014CFDA64CC0] Unknown encoder 'libmp3lame'
[aost#0:0 @ 0000014CFDA64CC0] Error selecting an encoder
Error opening output file audio_mix.mp3.
Error opening output files: Encoder not found

How increase the number of max speakers?

I tried to increase the number of speakers to at least 8 but I end up with error messages such as : "NameError: name 'model_voice_path08' is not defined. Did you mean: 'model_voice_path00'?"
I modified three folders but this is obviously not enough. How to have 8 speakers ?
Modif.zip
Thanks

Ps : In app_rvc.Py look line 307 and 1409 = "auto" compute mode I find this in wisperX documentation

Implement a command line interface

Implement a command line interface for processing of several files with predefined parameters

AttributeError: 'list' object has no attribute 'endswith'

After install dependencies on win 10 i got this error

AttributeError: 'list' object has no attribute 'endswith'
Traceback (most recent call last):
File "C:\SoniTranslate\soni\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1559, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1447, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\file.py", line 247, in postprocess
"name": self.make_temp_copy_if_needed(y),
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 233, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 197, in hash_file
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.[]'

RuntimeError: Model has been downloaded but the SHA256 checksum does not not match

[INFO] >> Cache flushed
[INFO] >> Processing video...
[INFO] >> Process video...
[INFO] >> Process audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1627, in process_api
result = await self.call_function(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "C:\SoniTranslate\app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "C:\SoniTranslate\soni_translate\speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.
How to solve this problem

Selecting a speaker in the subtitle editor

The definition of speaker does not always work correctly. Instead of a female voice, maybe a male voice and vice versa. Is it possible to make it so that the speaker can be assigned in subtitles edit?
Let's say something like:

   {
     "speaker": 1,
     "start": 1.172,
     "text": "Your work is very cool."
   },
   {
     "speaker": 3,
     "start": 2.372,
     "text": "Yes, I agree too, SoniTranslate is great."
   }

So that if necessary, you could fix it manually.

Status gets stuck in transcription (30%)

Status gets stuck in transcription, do you know what the problem could be?

I'm running on Ubuntu WSL, I installed all the dependencies, I set my HF token but even so, when I load the video and put it to dub into Portuguese it gets stuck (for now at 3 hours at 30%, transcription stage).

app_rvc.py

When running python app_rvc.py, how do I resolve this?
/tmp/gradio/6cb4020ad75bb1cb116c865ab91842f8753c7acc/Video_main.mp4 Process video... process audio... process audio... ... Error can't create the audio file Traceback (most recent call last): File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1335, in postprocess_data prediction_value = block.postprocess(prediction_value) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/file.py", line 254, in postprocess "name": self.make_temp_copy_if_needed(y), File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 226, in make_temp_copy_if_needed temp_dir = self.hash_file(file_path) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 190, in hash_file with open(file_path, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.srt'

API Keys

Hello, I'm running sonitranslate on Windows (anaconda), and I have a problem because I don't know where to put the OpenAi API, and is there an option to save it permanently somewhere in a file? the same with the HF token, would someone be kind enough to suggest how to deal with it?

help me please

Hello, I'm deploying on a computer, it seems that everything is installed. I get the link, everything works. I insert a token, and a link to the video, then an error pops up. What's wrong?
Initial log:

end log with error:

Thanks a lot! Good luck)

Python Unicode string stored as \u0410\u0433\u0430.

Hi, Thank you for your work!, there is a small problems with displaying encoding.
Small snippet of example:
Generated subtitles:
{
"start": 19.768,
"text": "\u0410\u0433\u0430."
},

Here is a description of this problem and a solution: https://stackoverflow.com/questions/11094380/python-unicode-string-stored-as-u84b8-u6c7d-u5730-in-file-how-to-convert-it

Add Background music after done please!

wow this is 100% best model chain i never seen before! but here is what i ask for improvement. can you add a function to Add Background music of input video after all process is done!

[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2'

Firstly thanks for your great project <3
I hope will be a free ai translation in the future update its seem openai api cost little expensive for me but Thank you for this awesome tool.
and MY ISSUE IS :
Everything work fine until this problem when I use Voiceless Track, the problem seem on onnxruntime-gpu but I dont know why , I have geforce 960M its 4gb vram

[INFO] >> Creating final translated video...
1it [00:00,  5.05it/s][INFO] >> Avoid overlap for audio2/audio/5.1.ogg with 5.6
19it [00:01, 10.99it/s][INFO] >> Avoid overlap for audio2/audio/90.8.ogg with 91.02
29it [00:02, 10.91it/s][INFO] >> Avoid overlap for audio2/audio/160.8.ogg with 161.22000000000003
31it [00:02, 10.84it/s][INFO] >> Avoid overlap for audio2/audio/165.8.ogg with 166.54000000000005
[INFO] >> Avoid overlap for audio2/audio/170.8.ogg with 171.56000000000006
37it [00:03, 10.36it/s]
[INFO] >> Voiceless Track Separation...
2024-05-19 04:15:50.5631879 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[INFO] >> Done: C:\Users\PORTATIL\SoniTranslate\outputs\Fury __en.mp4

(sonitr) PS C:\Users\PORTATIL\SoniTranslate> python app_rvc.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
[INFO] >> PIPER TTS enabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
[INFO] >> Working in: cuda
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.2, however version 4.29.0 is available, please upgrade.

Sorry for my English

IndexError in text_to_speech.py when Processing Certain WAV Files

Hi, I ran into a problem when trying to manage a specific audio file. The trouble comes up exactly when the code tries to manage a WAV document called "XTTS/AUTOMATIC_SPEAKER_00.wav". Below is the traceback providing more information:

[INFO] >> XTTS/AUTOMATIC_SPEAKER_00.wav
Traceback (most recent call last):
File "/home/lapo/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
...
File "/home/lapo/SoniTranslate/soni_translate/text_to_speech.py", line 474, in create_new_files_for_vc
if filtered_speaker[0]["tts_name"] == "XTTS/AUTOMATIC.wav":
IndexError: list index out of range

It seems like the code running the text-to-speech process has a bug. Specifically, when it makes new audio files for changing the voice, it tries to use an index for a list that does not exist. This causes an 'IndexError' error.

Could you please take a look into this issue? I'm unsure if the problem is with how the audio files are named or somewhere else in the steps used.

Thank you for your assistance on this project and the great work you've done so far.

Tutorial video

Hey, I didn't know how to contact you so I'm posting an issue here instead 😂

I made a tutorial video on how to use it, it's for a presentation

https://youtu.be/SmGkFaSzq_Q?si=16Jt9K144qtdCaR6

Fine with me if you want to use it.

Choosing voices by character

An option to choose the voice for each person detected with whisperx to keep people in a video

New TTS multilingual model with voice cloning

I love this webui!
Coqui released XTTS that will be ideal for sonitranslate

https://huggingface.co/spaces/coqui/xtts

The queue can accept several tasks

"The queue can accept several tasks at the same time."
Can I upload multiple files to the queue?
And how to do it?

Setting your Hugging Face token as an environment in windows ?

how to or where to insert my Hugging Face token in windows ? I already have , but where to insert ?

any way to add srt file in the source?

hi,
translating directly from danish to english is never working correctly with anything i have tried,
but can get AI translated subtitless that is 70-80% correct, and then modify them to be understandable in english.
so is there any way to either modify whatever SoniTranslate translates, or get it to take an srt file with timing into account when generating new audio?

Using RVC 2 model starts the process, does everything up to 90% but then it crashes

I'm using an RVC model pth and index file, everything is working fine but then it crashes when Using RVC 2 model starts the process, does everything up to 90% but then it crashes, not sure why. I changed everything settings, cleared the audios and audios2 folders.

I played those audios and it did a really good job. So what's going on lol

[INFO] >> audio2/audio/0.287.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/2.168.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/6.811.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/8.052.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/14.757.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/18.699.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/20.461.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/21.641.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/23.162.ogg, Tony Robbins.pth

(sonitr) C:\Tools\AI\SoniTranslate>