Code Monkey home page Code Monkey logo

sonitranslate's Introduction

πŸ’« About Me:

Hello there! Welcome to my profile. I'm an individual with a keen interest in data analytics and data science. I am passionate about exploring and deriving meaningful insights from data to drive informed decision-making and solve complex problems. This README will provide you with an overview of my background, skills, and areas of expertise.

🌐 Socials:

LinkedIn

πŸ’» Tech Stack:

Python R Netlify AWS Anaconda FastAPI MySQL Postgres SQLite Adobe After Effects Adobe Audition Adobe Illustrator Adobe Photoshop Blender Canva Gimp Gnu Image Manipulation Program Inkscape Keras NumPy Pandas Plotly PyTorch scikit-learn TensorFlow LINUX Notion Docker Trello Streamlit Folium Google Colab VS Code Jupyter PowerBI

πŸ“Š GitHub Stats:




sonitranslate's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sonitranslate's Issues

instalation

could some one write there how to install it in proper way. Not every one is experienced enough in instalation without instructions.

Document Translation (No generated SRT file)

Hey there, when translating documentations and creating videobooks, I would love it we could include the actual documentation in there translated or having the option to have the original on one side and the translated one on the other side.

But my real issue is when I create audiobook, I would definitely need an SRT file, that would be generated when translating audio.

I know it's almost impossible because you have to define the amount of characters and since it's splitted in segmentations (let's pretend I've set the maximum characters to 200) it will be hard to generate a SRT file if the segments are not exactly as the actual documentation. But there has to be some ways to achieve this. This would be a game changer for my work! We work in IT Accessibility.

Thanks!

Speech too fast and out of sync

Hello and congratulations for this work! I just tested a lot of other projects and yours is clearly the most efficient :)

Unfortunately and I do not understand why: the speech of the translation has a speed like x10 and we do not understand anything of course ^^ and then 15 seconds later, the speech speed returns to a correct speed.
Here is a short extract to illustrate the problem: FinalTest

Could you give me a clue so that I can find a solution?

(for the same project, I've done a transcription + translate with wisper without any problems, but I'm missing the voice translation and video sync).

Many thanks in advance for your reply!

Issues with diarization and pyannote 3.1

Hey, truly incredible, thank you for all your efforts,

i am having an issue with pyannote 3.1 on Google Collab (pyannote 2.0 works fine)

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/SoniTranslate/app_rvc.py", line 288, in batch_multilingual_media_conversion
output_file = self.multilingual_media_conversion(
File "/content/SoniTranslate/app_rvc.py", line 549, in multilingual_media_conversion
self.result_diarize = diarize_speech(
File "/content/SoniTranslate/soni_translate/speech_segmentation.py", line 175, in diarize_speech
raise error
TypeError: exceptions must derive from BaseException

Whisper

Whisper Do a good translation of subtitles by this command :

whisper x.wav --model large-v3 --task translate --output_format srt --threads 10

even better than google translate and multipe languages.
So I hope you can add an option to use it
Thanks you

Problem after the installation

``Trying to run it after the installation gave me this problem:


python app_rvc.py
Traceback (most recent call last):
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 7, in <module>
    import whisperx
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
    from .asr import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 45, in <module>
    from speechbrain.pretrained import (
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/__init__.py", line 4, in <module>
    from .core import Stage, Brain, create_experiment_directory, parse_arguments
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/core.py", line 38, in <module>
    from speechbrain.utils.optimizers import rm_vector_weight_decay
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/__init__.py", line 11, in <module>
    from . import *  # noqa
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 268, in <module>
    class ProgressSampleLogger:
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 337, in ProgressSampleLogger
    "saver": _get_image_saver(),
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 260, in _get_image_saver
    import torchvision
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/library.py", line 440, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

Stucking after acceleration

[INFO] >> Apply acceleration
[INFO] >> Content in 'audio2/audio/' removed.
0it [00:00, ?it/s]

and its going forever.
Some days ago, all was fine, how to fix?

Problem with Doc translate (PDF and DOCX)

hello,

I have installed Sonitranslate on windows.

(sonitr) D:\GITHUB\Sonitranslate\SoniTranslate>python app_rvc.py --theme aliabid94/new-theme --language french The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. [INFO] >> PIPER TTS disabled [INFO] >> Coqui XTTS enabled [INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license. You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link: https://coqui.ai/cpml.txt. [INFO] >> Working in: cuda Running on local URL: http://127.0.0.1:7860

DOCX DOCUMENT
I can translate text document. With docx doccument SoniTranslate stopping and disconnecting.
the docx is reading but stopping " de 2014 à 2017 il est élu député en 2017 dans la dixième >> audio/1.0.ogg"

PDF DOCUMENT

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] Une connexion existante a dΓ» Γͺtre fermΓ©e par l’hΓ΄te distant

thanks for your help

rmvpe+

Hi, sorry for posting here, you don't respond to huggingface. I have a question about rmvpe+. Where did rmvpe+ come from? I can't find anything about it on the Internet. Where can I download it for local use? It works very well

Voiceless Track Separation Error

Propose a file and function well, but another file not found:

[INFO] >> Voiceless Track Separation...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 210/210 [00:51<00:00,  4.05it/s]
[ERROR] >> Error comnand
Traceback (most recent call last):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 365, in batch_multilingual_media_conversion
    output_file = self.multilingual_media_conversion(
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 1085, in multilingual_media_conversion
    run_command(command_volume_mix)
  File "/home/bazza/src/sonitr/SoniTranslate/soni_translate/utils.py", line 66, in run_command
    raise Exception(errors.decode())
Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.3.0 (conda-forge gcc 12.3.0-5)

I would like to suggest a couple of functions

Hello, your creation is beautiful!
I would like to suggest a couple of functions.
From simple:
Add a checkbox on/off audio acceleration
In the audio mixer output settings (detailed volume settings((volume percentage))
RVC settings (index_rate, rms_mix_rate, protect)
From a more complex one:
Save .srt subtitles files in the original and target languages.
Editing the translated subtitle (a field in which the translation text will be and it can be changed, save and then once again assembling the sound and applying the RVΠ‘)

error on M1 Mac when creating SRT file from audio file.

I'm running a local install on my M1 Mac Air. I'm getting this error when trying to create an SRT file from an audio file.

"Error
Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."

Here's my log `(sonitr) userName@MacBook-Air SoniTranslate % python app_rvc.py
objc[5317]: Class AVFFrameReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x122394798) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a54760). One of the two will be used. Which one is undefined.
objc[5317]: Class AVFAudioReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x1223947e8) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a547b0). One of the two will be used. Which one is undefined.
[INFO] >> Working in: cpu
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
[WARNING] >> Make sure to select a 'TTS Speaker' suitable for the translation language to avoid errors with the TTS.
[INFO] >> Cache flushed
[INFO] >> Processing audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/soni_translate/speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.`

Control working mode: cpu/cuda

Hi guys,

Can someone please suggest how to effectively control the working mode?

app_rvc.py is automatically started in cuda mode:

[INFO] >> Working in: cuda

However I have a quite old MX150 GPU, and it constantly fails with a CUDA out of memory on Transcribing stage, no matter how I tweak Batch size / Compute type / Whisper ASR model or PYTORCH_CUDA_ALLOC_CONF.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacity of 2.00 GiB of which 0 bytes is free. Of the allocated memory 101.51 MiB is allocated by PyTorch, and 72.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Therefore I would like to fallback to cpu mode.

Running SoniTranslate dev_24_3 on Windows 11.

Thanks!

Collaboration Proposal

Hey, your work is truly amazing, love it!!!

I am a UI/UX designer and Python developer, sometimes you know I want to translate Indian, Korean, and Chinese lectures into my target language whether on YouTube or other streaming platforms, some services translate videos to other languages, and yours as well, however, would it not be easier and simpler to develop like an extension that translates right away like how google translates web pages in a click of a button. Plus, podcasts...

I am willing to collaborate and further develop your amazing program if you like the idea, you can reach me at:
[email protected]

Sincerely,
Shakhruz Bakhtiyarov

AI Dubbing API

Thank you for building this project! I work at a company called Sieve and this is a part of what inspired us to build our Dubbing API. It's a bit different than this as it supports voice cloning, different voice engines, and higher quality translations using other closed-source solutions but it's an example of the bounds of what this tech can do today.

I'd love to contribute our learnings in some way to this project. I think the most challenge part of the problem is around how one handles audio speedups and slowdowns across languages. Different applications seem to want different tradeoffs in the "sync"-ness versus how drastic the speedup tends to be.

Curious if there are improvements in the queue on that vector for this project and if we can contribute in any way? Would also love feedback on what we've built as I think it's something the community would love!

[WinError 2] The system cannot find the specified file (Windows)

Hi, I just installed and ran sonitranslate and everything works fine.
But the problem is that it only works when I pass it a YouTube url, but when I want to upload a file locally it gives me the error "[WinError 2] The system cannot find the specified file".
I attach the details of the console.
image

The translation speed is incredible

Hi, Your work is incredible.
The translator has become much faster, now it runs 10 times faster (on gpu).
Is it possible to speed up RVC processing? When processing, my video card loads by 30%, it would be possible to process it in several threads to speed it up (the option to select threads, if that would be really cool). The same Wishperx uses 100% load on the video card and converts audio into text in just a few seconds.

Also, as I understand, piper tts does not work on Windows
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled

during installation it says: ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.
Maybe there is some solution?

There is also an excellent fast tts silero: https://github.com/snakers4/silero-models
Maybe it will suit you.
Choice tts is always good.

Thank you so much for your work!

Issue installing Piper TTS and Coqui XTTS

So I don't know much about coding but this is the last steps what appeared:

DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.17.1

(sonitr) A:\Art_intel\SoniTranslate>pip install -q piper-tts==1.2.0
ERROR: Could not find a version that satisfies the requirement piper-phonemize~=1.1.0 (from piper-tts) (from versions: none)
ERROR: No matching distribution found for piper-phonemize~=1.1.0

(sonitr) A:\Art_intel\SoniTranslate>pip install -q -r requirements_xtts.txt
DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyannote-audio 3.1.1 requires omegaconf<3.0,>=2.1, but you have omegaconf 2.0.6 which is incompatible.
pyannote-database 5.1.0 requires typer>=0.12.1, but you have typer 0.9.4 which is incompatible.

Need Support for FastAPI for the SoniTranslate

I am looking for FastAPi support for this project I have gone through the codebase and cannot find it. Can anyone help me with that I would be very gratefull. Or can someone guide me step by step I want to make a FastAPi that will take a video translates using coqui TTS and then dub the video on it.

Best Regards

ffmpeg error

how to fix this ?

Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
built with clang version 17.0.6

Metadata:
encoder : Lavf60.16.100
Duration: 00:01:43.31, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
[aist#1:0/pcm_s32le @ 0000014CFD9B66C0] Guessed Channel Layout: mono
Input #1, wav, from 'audio_dub_solo.ogg':
Duration: 00:01:42.73, bitrate: 768 kb/s
Stream #1:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s32, 768 kb/s
[aost#0:0 @ 0000014CFDA64CC0] Unknown encoder 'libmp3lame'
[aost#0:0 @ 0000014CFDA64CC0] Error selecting an encoder
Error opening output file audio_mix.mp3.
Error opening output files: Encoder not found

How increase the number of max speakers?

I tried to increase the number of speakers to at least 8 but I end up with error messages such as : "NameError: name 'model_voice_path08' is not defined. Did you mean: 'model_voice_path00'?"
I modified three folders but this is obviously not enough. How to have 8 speakers ?
Modif.zip
Thanks

Ps : In app_rvc.Py look line 307 and 1409 = "auto" compute mode I find this in wisperX documentation

AttributeError: 'list' object has no attribute 'endswith'

After install dependencies on win 10 i got this error

AttributeError: 'list' object has no attribute 'endswith'
Traceback (most recent call last):
File "C:\SoniTranslate\soni\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1559, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1447, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\file.py", line 247, in postprocess
"name": self.make_temp_copy_if_needed(y),
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 233, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 197, in hash_file
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.[]'

RuntimeError: Model has been downloaded but the SHA256 checksum does not not match

[INFO] >> Cache flushed
[INFO] >> Processing video...
[INFO] >> Process video...
[INFO] >> Process audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1627, in process_api
result = await self.call_function(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "C:\SoniTranslate\app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "C:\SoniTranslate\soni_translate\speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.
How to solve this problem

Selecting a speaker in the subtitle editor

The definition of speaker does not always work correctly. Instead of a female voice, maybe a male voice and vice versa. Is it possible to make it so that the speaker can be assigned in subtitles edit?
Let's say something like:

   {
     "speaker": 1,
     "start": 1.172,
     "text": "Your work is very cool."
   },
   {
     "speaker": 3,
     "start": 2.372,
     "text": "Yes, I agree too, SoniTranslate is great."
   }

So that if necessary, you could fix it manually.

Status gets stuck in transcription (30%)

Status gets stuck in transcription, do you know what the problem could be?

I'm running on Ubuntu WSL, I installed all the dependencies, I set my HF token but even so, when I load the video and put it to dub into Portuguese it gets stuck (for now at 3 hours at 30%, transcription stage).

app_rvc.py

When running python app_rvc.py, how do I resolve this?
/tmp/gradio/6cb4020ad75bb1cb116c865ab91842f8753c7acc/Video_main.mp4 Process video... process audio... process audio... ... Error can't create the audio file Traceback (most recent call last): File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1335, in postprocess_data prediction_value = block.postprocess(prediction_value) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/file.py", line 254, in postprocess "name": self.make_temp_copy_if_needed(y), File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 226, in make_temp_copy_if_needed temp_dir = self.hash_file(file_path) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 190, in hash_file with open(file_path, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.srt'

API Keys

Hello, I'm running sonitranslate on Windows (anaconda), and I have a problem because I don't know where to put the OpenAi API, and is there an option to save it permanently somewhere in a file? the same with the HF token, would someone be kind enough to suggest how to deal with it?

help me please

Hello, I'm deploying on a computer, it seems that everything is installed. I get the link, everything works. I insert a token, and a link to the video, then an error pops up. What's wrong?
Initial log:
ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠ΅
end log with error:
ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠ΅
Thanks a lot! Good luck)

Add Background music after done please!

wow this is 100% best model chain i never seen before! but here is what i ask for improvement. can you add a function to Add Background music of input video after all process is done!

[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2'

Firstly thanks for your great project <3
I hope will be a free ai translation in the future update its seem openai api cost little expensive for me but Thank you for this awesome tool.
and MY ISSUE IS :
Everything work fine until this problem when I use Voiceless Track, the problem seem on onnxruntime-gpu but I dont know why , I have geforce 960M its 4gb vram

[INFO] >> Creating final translated video...
1it [00:00,  5.05it/s][INFO] >> Avoid overlap for audio2/audio/5.1.ogg with 5.6
19it [00:01, 10.99it/s][INFO] >> Avoid overlap for audio2/audio/90.8.ogg with 91.02
29it [00:02, 10.91it/s][INFO] >> Avoid overlap for audio2/audio/160.8.ogg with 161.22000000000003
31it [00:02, 10.84it/s][INFO] >> Avoid overlap for audio2/audio/165.8.ogg with 166.54000000000005
[INFO] >> Avoid overlap for audio2/audio/170.8.ogg with 171.56000000000006
37it [00:03, 10.36it/s]
[INFO] >> Voiceless Track Separation...
2024-05-19 04:15:50.5631879 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[INFO] >> Done: C:\Users\PORTATIL\SoniTranslate\outputs\Fury __en.mp4
(sonitr) PS C:\Users\PORTATIL\SoniTranslate> python app_rvc.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
[INFO] >> PIPER TTS enabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
[INFO] >> Working in: cuda
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.2, however version 4.29.0 is available, please upgrade.

Sorry for my English

IndexError in text_to_speech.py when Processing Certain WAV Files

Hi, I ran into a problem when trying to manage a specific audio file. The trouble comes up eΒ­xactly when the code trieΒ­s to manage a WAV document called "XTTS/AUTOMATIC_SPEAKER_00.wav". BeΒ­low is the traceback providing more information:

[INFO] >> XTTS/AUTOMATIC_SPEAKER_00.wav
Traceback (most recent call last):
File "/home/lapo/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
...
File "/home/lapo/SoniTranslate/soni_translate/text_to_speech.py", line 474, in create_new_files_for_vc
if filtered_speaker[0]["tts_name"] == "XTTS/AUTOMATIC.wav":
IndexError: list index out of range

It seems like the code running the text-to-speech process has a bug. Specifically, when it makes new audio files for changing the voice, it tries to use an index for a list that does not exist. This causes an 'IndexError' error.

Could you please take a look into this issue? I'm unsure if the problem is with how the audio files are named or somewhere else in the steps used.

Thank you for your assistance on this project and the great work you've done so far.

any way to add srt file in the source?

hi,
translating directly from danish to english is never working correctly with anything i have tried,
but can get AI translated subtitless that is 70-80% correct, and then modify them to be understandable in english.
so is there any way to either modify whatever SoniTranslate translates, or get it to take an srt file with timing into account when generating new audio?

Using RVC 2 model starts the process, does everything up to 90% but then it crashes

I'm using an RVC model pth and index file, everything is working fine but then it crashes when Using RVC 2 model starts the process, does everything up to 90% but then it crashes, not sure why. I changed everything settings, cleared the audios and audios2 folders.

I played those audios and it did a really good job. So what's going on lol

[INFO] >> audio2/audio/0.287.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/2.168.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/6.811.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/8.052.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/14.757.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/18.699.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/20.461.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/21.641.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/23.162.ogg, Tony Robbins.pth

(sonitr) C:\Tools\AI\SoniTranslate>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.