jarodmica / audiosplitter_whisper Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 34.0 11 KB

License: MIT License

Python 100.00%

audiosplitter_whisper's People

Contributors

Stargazers

Watchers

audiosplitter_whisper's Issues

Could not download 'pyannote/segmentation-3.0' model.

Hello, I recently tried to spin up this program again, but it seems like there's some problem with speaker diarization.

It worked okay month ago, and I have not changed anything in the config file.

I tried to fix it generating new token and making sure I was gated in all three programs like in YouTube tutorial, but it's still failing.

`Could not download 'pyannote/segmentation-3.0' model.
It might be because the model is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:

Model.from_pretrained('pyannote/segmentation-3.0',
... use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated:
visit https://hf.co/pyannote/segmentation-3.0 to accept the user conditions.
Traceback (most recent call last):
File "c:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\Scripts\whisperx-script.py", line 33, in
sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\whisperx\transcribe.py", line 211, in cli
diarize_model = DiarizationPipeline(use_auth_token=hf_token, device=device)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\whisperx\diarize.py", line 19, in init
self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\core\pipeline.py", line 136, in from_pretrained
pipeline = Klass(**params)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\pipelines\speaker_diarization.py", line 128, in init
model: Model = get_model(segmentation, use_auth_token=use_auth_token)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\pipelines\utils\getter.py", line 89, in get_model
model.eval()
AttributeError: 'NoneType' object has no attribute 'eval'
Traceback (most recent call last):
File "c:\Users\arkad\Desktop\ai\audiosplitter_whisper\split_audio.py", line 183, in
main()
File "c:\Users\arkad\Desktop\ai\audiosplitter_whisper\split_audio.py", line 180, in main
process_audio_files(input_folder, settings)
File "c:\Users\arkad\Desktop\ai\audiosplitter_whisper\split_audio.py", line 148, in process_audio_files
diarize_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "c:\Users\arkad\Desktop\ai\audiosplitter_whisper\split_audio.py", line 77, in diarize_audio_with_srt
subs = pysrt.open(srt_file)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pysrt\srtfile.py", line 151, in open
source_file, encoding = cls._open_unicode_file(path, claimed_encoding=encoding)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pysrt\srtfile.py", line 292, in _open_unicode_file
encoding = claimed_encoding or cls._detect_encoding(path)
File "C:\Users\arkad\Desktop\ai\audiosplitter_whisper\venv\lib\site-packages\pysrt\srtfile.py", line 279, in _detect_encoding
file_descriptor = open(path, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\arkad\Desktop\ai\Pudzian\output\plik.srt'`

Failed to align segment!!

Erorr when I tried on Google Colab

Fist I Clone the repository with command

git clone https://github.com/JarodMica/audiosplitter_whisper.git

Second, I run this command:

%cd /content/audiosplitter_whisper
!sudo apt update && sudo apt upgrade
!sudo apt-get update
!sudo apt install ffmpeg
!sudo apt install python3.10-venv

Then, I run setup-cuda.py
Next, I run this command to run split_audio.py:

%cd /content/audiosplitter_whisper
!source /content/audiosplitter_whisper/venv/bin/activate; python3 /content/audiosplitter_whisper/split_audio.py

And the error appeared:

Traceback (most recent call last):
  File "/content/audiosplitter_whisper/venv/bin/whisperx", line 33, in <module>
    sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
  File "/content/audiosplitter_whisper/venv/bin/whisperx", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/whisperx/transcribe.py", line 9, in <module>
    from .alignment import align, load_align_model
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/whisperx/alignment.py", line 11, in <module>
    import torchaudio
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 43, in <module>
    _load_lib("libtorchaudio")
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib
    torch.ops.load_library(path)
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/torch/_ops.py", line 643, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_hip.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/content/audiosplitter_whisper/split_audio.py", line 189, in <module>
    process_audio_files(input_folder)
  File "/content/audiosplitter_whisper/split_audio.py", line 185, in process_audio_files
    extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
  File "/content/audiosplitter_whisper/split_audio.py", line 101, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/pysrt/srtfile.py", line 151, in open
    source_file, encoding = cls._open_unicode_file(path, claimed_encoding=encoding)
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/pysrt/srtfile.py", line 292, in _open_unicode_file
    encoding = claimed_encoding or cls._detect_encoding(path)
  File "/content/audiosplitter_whisper/venv/lib/python3.10/site-packages/pysrt/srtfile.py", line 279, in _detect_encoding
    file_descriptor = open(path, 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/content/audiosplitter_whisper/data/output/model'

Also, has anyone tried it on Google Colab?

error while exec setup-cuda.py

Hi there,

thanks for this repo.

when i setup this repo by python setup-cuda.py i got errors below, i understand that to solve this problem i should edit some requirements.txt file remove the platform_system == "Linux" and platform_machine == "x86_64" because i know that i am with linux, but i don't know which file should i edit.

please give me some advise.

  Cloning https://github.com/pyannote/pyannote-audio (to revision 11b56a137a578db9335efc00298f6ec1932e6317) to /tmp/pip-install-qvyhc3rj/pyannote.audio
  Running command git clone -q https://github.com/pyannote/pyannote-audio /tmp/pip-install-qvyhc3rj/pyannote.audio
  Running command git checkout -q 11b56a137a578db9335efc00298f6ec1932e6317
  Running command git submodule update --init --recursive -q
Collecting triton==2.0.0; platform_system == "Linux" and platform_machine == "x86_64" (from torch==2.0.0+cu118->-r requirements-cuda.txt (line 4))
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='download.pytorch.org', port=443): Read timed out. (read timeout=15)")': /whl/torch_stable.html
  ERROR: Could not find a version that satisfies the requirement triton==2.0.0; platform_system == "Linux" and platform_machine == "x86_64" (from torch==2.0.0+cu118->-r requirements-cuda.txt (line 4)) (from versions: 0.4.1, 0.4.2)
ERROR: No matching distribution found for triton==2.0.0; platform_system == "Linux" and platform_machine == "x86_64" (from torch==2.0.0+cu118->-r requirements-cuda.txt (line 4))

Exception has occurred: FileNotFoundError [Errno 2] No such file or directory

i have accepted all user conditions on https://huggingface.co/pyannote but anyway getting this error below

(base) PS E:\Programs\AIVoiceProject\audiosplitter_whisper> venv\Scripts\activate
(venv) (base) PS E:\Programs\AIVoiceProject\audiosplitter_whisper> e:; cd 'e:\Programs\AIVoiceProject\audiosplitter_whisper'; & 'e:\Programs\AIVoiceProject\audiosplitter_whisper\venv\Scripts\python.exe' 'c:\Users\assas.vscode\extensions\ms-python.python-2023.18.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher' '64065' '--' 'E:\Programs\AIVoiceProject\audiosplitter_whisper\split_audio.py'
CUDA is available. Running on GPU.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\venv\Scripts\whisperx-script.py", line 33, in
sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\venv\lib\site-packages\whisperx\transcribe.py", line 162, in cli
model = load_model(model_name, device=device, device_index=device_index, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_options={"vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, threads=faster_whisper_threads)
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 52, in load_model
model = WhisperModel(whisper_arch,
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\venv\lib\site-packages\faster_whisper\transcribe.py", line 128, in init
self.model = ctranslate2.models.Whisper(
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: 'E:\Programs\AIVoiceProject\audiosplitter_whisper\data\output\seriy4.srt'
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
subs = pysrt.open(srt_file)
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\split_audio.py", line 180, in main
process_audio_files(input_folder, settings)
File "E:\Programs\AIVoiceProject\audiosplitter_whisper\split_audio.py", line 183, in
main()
FileNotFoundError: [Errno 2] No such file or directory: 'E:\Programs\AIVoiceProject\audiosplitter_whisper\data\output\seriy4.srt'

FileNotFoundError: [Errno 2] No such file or directory: 'conf.yaml'

Everything ran perfect until the last step with
python split_audio.py

returned

Traceback (most recent call last): File "E:\voice stuff\audiosplitter_whisper\split_audio.py", line 11, in <module> with open("conf.yaml", "r") as file: FileNotFoundError: [Errno 2] No such file or directory: 'conf.yaml'

No such file or directory

[Errno 2] 'F:\audiosplitter_whisper-master\data\output\engTest.srt'
File "F:\audiosplitter_whisper-master\split_audio.py", line 78, in diarize_audio_with_srt
subs = pysrt.open(srt_file)
File "F:\audiosplitter_whisper-master\split_audio.py", line 149, in process_audio_files
diarize_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "F:\audiosplitter_whisper-master\split_audio.py", line 181, in main
process_audio_files(input_folder, settings)
File "F:\audiosplitter_whisper-master\split_audio.py", line 184, in
main()
FileNotFoundError: [Errno 2] No such file or directory: 'F:\audiosplitter_whisper-master\data\output\engTest.srt'

some help me..

Tkinter not working at headless os's

import os
import subprocess
import yaml
import pysrt
import torch
import re
import unicodedata

from pydub import AudioSegment

def sanitize_filename(filename):
# Remove diacritics and normalize Unicode characters
normalized = unicodedata.normalize('NFKD', filename)
sanitized = ''.join(c for c in normalized if not unicodedata.combining(c))

# Regular Expression to match invalid characters
invalid_chars_pattern = r'[<>:"/\\|?*]'

# Replace invalid characters with an underscore
sanitized_filename = re.sub(invalid_chars_pattern, '_', sanitized)

return sanitized_filename

def diarize_audio_with_srt(audio_file, srt_file, output_dir, padding=0.0):
'''
Use whisperx generated SRT files in order to split the audio files with speaker
numbering and diarization

Args:
    - audio_file(str) - path to the audio file being processed
    - srt_file(str) - path to the srt file being used for the splicing
    - output_dir(str) - directory for the outputted files
    - padding(int) - how much additional sound to include before and after audio, can be useful for 
    - audio that is getting clipped.
'''
audio = AudioSegment.from_file(audio_file)
subs = pysrt.open(srt_file)

for i, sub in enumerate(subs):
    # Extract speaker from subtitle
    speaker = sub.text.split(']')[0][1:]
    sanitized_speaker = sanitize_filename(speaker)


    # Create speaker-specific output directory
    speaker_dir = os.path.join(output_dir, sanitized_speaker)
    if not os.path.exists(speaker_dir):
        os.makedirs(speaker_dir)

    # Calculate start and end times with padding (pydub uses milliseconds)
    start_time = max(0, sub.start.ordinal - padding * 1000)
    end_time = min(len(audio), sub.end.ordinal + padding * 1000)

    # Extract segment from audio
    segment = audio[start_time:end_time]

    # Generate output filename with suffix count
    existing_files = os.listdir(speaker_dir)
    file_count = len(existing_files)
    output_filename = f"segment_{file_count + 1}.wav"
    output_path = os.path.join(speaker_dir, output_filename)

    # Save segment
    segment.export(output_path, format="wav")

    print(f"Saved segment {i+1} to {output_path}")

def extract_audio_with_srt(audio_file, srt_file, output_dir, padding=0.0):
'''
Use whisperx generated SRT files in order to split the audio files

Args:
    - audio_file(str) - path to the audio file being processed
    - srt_file(str) - path to the srt file being used for the splicing
    - output_dir(str) - drectory for the outputted files
    - padding(int) - how much additional sound to include before and after audio, can be useful for 
    - audio that is getting clipped.

'''
audio = AudioSegment.from_file(audio_file)
subs = pysrt.open(srt_file)

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Get existing file count in the output directory
existing_files = os.listdir(output_dir)
file_count = len(existing_files)

for i, sub in enumerate(subs):
    # Calculate start and end times with padding (pydub uses milliseconds)
    start_time = max(0, sub.start.ordinal - padding * 1000)
    end_time = min(len(audio), sub.end.ordinal + padding * 1000)

    # Extract segment from audio
    segment = audio[start_time:end_time]

    # Generate output filename with suffix count
    output_filename = f"segment_{file_count + i + 1}.wav"
    output_path = os.path.join(output_dir, output_filename)

    # Save segment
    segment.export(output_path, format="wav")

    print(f"Saved segment {i+1} to {output_path}")

def run_whisperx(audio_files, output_dir):
'''Generate SRT file using whisperx'''
if diarize:
subprocess.run(["whisperx", audio_files,
"--device", device,
"--model", whisper_model,
"--output_dir", output_dir,
"--language", language,
"--diarize",
"--hf_token", hf_token,
"--output_format", "srt",
"--compute_type", compute_type])
else:
subprocess.run(["whisperx", audio_files,
"--device", device,
"--model", whisper_model,
"--output_dir", output_dir,
"--language", language,
"--output_format", "srt",
"--compute_type", compute_type])

def create_directory(name):
if not os.path.exists(name):
os.makedirs(name)

def process_audio_files(input_folder):
output_dir = os.path.join(input_folder, "output")
wav_dir = os.path.join(input_folder, "wav_files")

create_directory(output_dir)
create_directory(wav_dir)

for audio_file in os.listdir(input_folder):
    audio_file_path = os.path.join(input_folder, audio_file)
    if not os.path.isfile(audio_file_path):
        continue

    if not audio_file.endswith(".wav"):
        # Set output .wav file path
        wav_file_path = os.path.join(wav_dir, f"{os.path.splitext(audio_file)[0]}.wav")
        try:
            subprocess.run(['ffmpeg', '-i', audio_file_path, wav_file_path], check=True)
            audio_file_path = wav_file_path  # Update audio_file_path to point to the converted file
        except subprocess.CalledProcessError as e:
            print(f"Error: {e.output}. Couldn't convert {audio_file} to .wav format.")
            continue

    run_whisperx(audio_file_path, output_dir)
    srt_file = os.path.join(output_dir, f"{os.path.splitext(audio_file)[0]}.srt")

    # Set the output directory for speaker segments to be a subdirectory named after the .wav file
    speaker_segments_dir = os.path.join(output_dir, os.path.splitext(audio_file)[0])
    create_directory(speaker_segments_dir)

    if diarize:
        diarize_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
    else: 
        extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)

def choose_input_folder(input_folder):
process_audio_files(input_folder)

input_folder = "/path/to/input/folder"
choose_input_folder(input_folder)

I wanna contribute,a basic audio merger for already splitted datasets.

The aim is to merge audio files before splitting them by using "diarization based splitting."
This can run as a stand alone utility.If people wanna use this,they make sure to install necessary libraries.
Also you can add this to your repo if you like.
merge_all_audio.zip

Audio.merger.2.mp4

And thanks for the project!

Error

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.4. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint --file C:\Users\zafki\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b\pytorch_model.bin
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.
Traceback (most recent call last):
File "\?\D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\Scripts\whisperx-script.py", line 33, in
sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\whisperx\transcribe.py", line 203, in cli
diarize_model = DiarizationPipeline(use_auth_token=hf_token, device=device)
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\whisperx\diarize.py", line 16, in init
self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\core\pipeline.py", line 135, in from_pretrained
pipeline = Klass(**params)
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\pipelines\speaker_diarization.py", line 165, in init
self.embedding = PretrainedSpeakerEmbedding(
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\pipelines\speaker_verification.py", line
490, in PretrainedSpeakerEmbedding
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\pyannote\audio\pipelines\speaker_verification.py", line
249, in init
self.classifier = SpeechBrain_EncoderClassifier.from_hparams(
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\speechbrain\pretrained\interfaces.py", line 367, in from_hparams
hparams_local_path = fetch(
File "D:\Work\Voice_Changer_OpenSorce\WhisperX_AudioSpliter\audiosplitter_whisper\venv\lib\site-packages\speechbrain\pretrained\fetching.py", line 135, in fetch
destination.symlink_to(sourcepath)
File "D:\Work\Python and Pycharm\Python-3.10.6\lib\pathlib.py", line 1255, in symlink_to
self._accessor.symlink(target, self, target_is_directory)
OSError: [WinError 1314] Клиент не обладает требуемыми правами: 'C:\Users\zafki\.cache\huggingface\hub\models--speechbrain--spkrec-ecapa-voxceleb\snapshots\5c0be3875fda05e81f3c004ed8c7c06be308de1e\hyperparams.yaml' -> 'C:\Users\zafki\.cache\torch\pyannote\speechbrain\hyperparams.yaml'

WinError 1314 when attempting to use diarization

Hello there, sorry to bother you, but I'm getting this error when attempting to use Diarization on windows 11
OSError: [WinError 1314] A required privilege is not held by the client: 'C:\Users\nicho\.cache\huggingface\hub\models--speechbrain--spkrec-ecapa-voxceleb\snapshots\5c0be3875fda05e81f3c004ed8c7c06be308de1e\hyperparams.yaml' -> 'C:\Users\nicho\.cache\torch\pyannote\speechbrain\hyperparams.yaml'
I have confirmed I am gated into all 3 models I need, so I'm not sure what's causing this error.

Import "pysrt" could not be resolvedPylancereportMissingImports

import pysrt
^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pysrt'

Some solve this issues I installed pysrt but warning message comes in script I don't know what is the issue

[Errno 2] No such file or directory: 'D:/AI/voice-changer/audiosplitter_whisper/data\\output\\Gawr Gura Vocals.srt'

File "D:\AI\voice-changer\audiosplitter_whisper\split_audio.py", line 83, in extract_audio_with_srt
subs = pysrt.open(srt_file)
File "D:\AI\voice-changer\audiosplitter_whisper\split_audio.py", line 167, in process_audio_files
extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "D:\AI\voice-changer\audiosplitter_whisper\split_audio.py", line 173, in
process_audio_files(input_folder)
FileNotFoundError: [Errno 2] No such file or directory: 'D:/AI/voice-changer/audiosplitter_whisper/data\output\Gawr Gura Vocals.srt'

[Errno 2] No such file or directory

I am running into these errors and im not sure why :
Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: 'C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\data\output\1.srt'
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
subs = pysrt.open(srt_file)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\split_audio.py", line 180, in main
process_audio_files(input_folder, settings)
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\split_audio.py", line 183, in
main()
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\data\output\1.srt'

CUDA is available. Running on GPU.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.0.post0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\bobca\.cache\torch\whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.

Performing transcription...
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Scripts\whisperx.exe_main.py", line 7, in
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\whisperx\transcribe.py", line 176, in cli
result = model.transcribe(audio, batch_size=batch_size, chunk_size=chunk_size, print_progress=print_progress)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\whisperx\asr.py", line 218, in transcribe
for idx, out in enumerate(self.call(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)):
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in next
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in next
processed = self.infer(item, **self.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\transformers\pipelines\base.py", line 1102, in forward
model_outputs = self._forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\whisperx\asr.py", line 152, in _forward
outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\whisperx\asr.py", line 47, in generate_segment_batched
encoder_output = self.encode(features)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bobca\OneDrive\Documents\AI\aiVoiceMaker\audiosplitter_whisper\venv\Lib\site-packages\whisperx\asr.py", line 86, in encode
return self.model.encode(features, to_cpu=to_cpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Library cublas64_12.dll is not found or cannot be loaded

Mac Support

Initially this didn't work on Mac, i have found a fix

On line 174 in the split_audio.py change

`return filedialog.askdirectory(title="Select input folder").replace("/","\\")`

to 

`return filedialog.askdirectory(title="Select input folder")`

I hope this helps any fellow Mac users!

Exception has occurred: SystemExit

When i try to run and debug the setup-cuda.py script inside Visual Studio I get the following error:

"Exception has occurred: SystemExit
1
File "E:\train\audiosplitter_whisper\setup-cuda.py", line 9, in create_virtual_environment
venv.create('venv', with_pip=True)
PermissionError: [Errno 13] Permission denied: 'E:\train\audiosplitter_whisper\venv\Scripts\python.exe'

During handling of the above exception, another exception occurred:

File "E:\train\audiosplitter_whisper\setup-cuda.py", line 12, in create_virtual_environment
sys.exit(1)
File "E:\train\audiosplitter_whisper\setup-cuda.py", line 30, in main
create_virtual_environment()
File "E:\train\audiosplitter_whisper\setup-cuda.py", line 34, in
main()
SystemExit: 1"

Any idea how to fix this? Sorry if its an obvious problem I'm pretty new to all of this and I am following the video guide.

Inaccurate Splitting with whisper

Hey I wanted to ask if anybody else is facing this issue.
I am using parts of this repo to split up a long text for utterances and speakers with diarization.

However the split text is not word accurate. Most of the split parts are cut in the end, which is quite bad for TTS datasets.
Increasing padding didn't solve this cut words issue, it just delayed them to a later position of the sentence.

Any ideas?
Thanks a lot!

Erorr (Requested float16 compute type, but the target device or backend do not support efficient float16 computation.)

Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
Please help me

Exception has occurred: FileNotFoundError (Solution Included)

I used the installation method as shown in the YouTube tutorial line by line

Running split_audio.py threw this error

Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'
File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
subs = pysrt.open(srt_file)
File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 180, in main
process_audio_files(input_folder, settings)
File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 183, in
main()
FileNotFoundError: [Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'

Additionally, the terminal was saying something about not having or not finding cublas64_12 (I can't remember exactly what it said)

The error is thrown because the program can't find the srt file, because it can't make the srt file, and this is caused by a mismatch of CUDA versions. Torch (or something) has CUDA 11, but the script (or whatever) needs CUDA 12. I'm not a programmer, I don't know exactly what is what. All I know is that I fixed it.

To fix this, do the following.

Download and install CUDA 12 https://developer.nvidia.com/cuda-12-0-0-download-archive
Navigate to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin"
Copy cublas64_12.dll, cublasLt64_12.dll, cudart64_12.dll
Navigate to "...\audiosplitter_whisper\venv\Lib\site-packages\torch\lib"
Paste the dlls into this folder

Now when you run split_audio.py, it will be able to create the srt file, fixing the issue with not being able to find said file.

Make it work with AMD ROCm on Linux

Hi,
How can I make it work on AMD ROCm 5.7 through Linux?
I tried to make a new "setup-ROCm" file but it didn't work... here is what I tried to change:

import subprocess
import sys
import venv


def create_virtual_environment():
    # Create a virtual environment in the "venv" directory
    try:
        venv.create('venv', with_pip=True)
    except Exception as e:
        print(f"Failed to create virtual environment. Error: {e}")
        sys.exit(1)


def install_requirements():
    # Specify the path to the Python executable in the virtual environment
    if sys.platform == 'win32':
        python_bin = 'venv\\Scripts\\python'
    else:
        python_bin = 'venv/bin/python'
    
    # Use the Python interpreter in the virtual environment to run pip
    try:
        subprocess.run([python_bin, '-m', 'pip', 'install', '-r', 'requirements-AMD.txt'], check=True)
    except subprocess.CalledProcessError as e:
        print(f"Failed to install requirements. Error: {e}")
        sys.exit(1)

def main():
    create_virtual_environment()
    install_requirements()

if __name__ == '__main__':
    main()

requirements-AMD:

git+https://github.com/m-bain/whisperx.git

--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
pysrt
pydub
pyyaml
wheel

jarodmica / audiosplitter_whisper Goto Github PK

audiosplitter_whisper's People

Contributors

Stargazers

Watchers

Forkers

audiosplitter_whisper's Issues

Recommend Projects

Recommend Topics

Recommend Org