I have been using the script well for transcribing local videos, but when I tried a new video, I got an error.
Here's my process so far:
source venv/bin/activate
then
python3 transcribe.py --local
and its output is:
Option: from local files
Transcribing /home/user/Documents/Multilingual-Video-Transcription-using-Whisper/data/videos/video.mp4
Transcribing /home/user/Documents/Multilingual-Video-Transcription-using-Whisper/data/videos/.gitkeep
Traceback (most recent call last):
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/venv/lib/python3.10/site-packages/whisper/audio.py", line 46, in load_audio
ffmpeg.input(file, threads=0)
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/venv/lib/python3.10/site-packages/ffmpeg/_run.py", line 325, in run
raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/transcribe.py", line 93, in
transcript = transcribe(model, video_path, args.save)
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/transcribe.py", line 40, in transcribe
result = model.transcribe(video_path)
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/venv/lib/python3.10/site-packages/whisper/audio.py", line 130, in log_mel_spectrogram
audio = load_audio(audio)
File "/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/venv/lib/python3.10/site-packages/whisper/audio.py", line 51, in load_audio
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
WARNING: library configuration mismatch
avcodec configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc --enable-libsmbclient
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
/home/user/Documents/Multilingual-Video-Transcription-using-Whisper/data/videos/.gitkeep: Invalid data found when processing input
I've tried multiple ways: ffmpeg to convert it to h264 encoding, and still as .mp4 file, but got the same error like above.
If you have any idea how to sort this, it'd be awesome!
PS: tested on Linux Mint 21.1 Cinnamon, Linux Kernel v6.1.0-1025-oem, Graphics Card Nvidia TU106 GeForce RTX 2060, CPU 12th Gen IntelΒ© Coreβ’ i7-12700F Γ 12