abdeladim-s / pywhispercpp Goto Github PK

View Code? Open in Web Editor NEW

119.0 6.0 14.0 1.37 MB

Python bindings for whisper.cpp

Home Page: https://abdeladim-s.github.io/pywhispercpp/

License: MIT License

CMake 5.88% Python 26.03% C++ 68.00% C 0.01% Shell 0.08%

openai-whisper whisper-cpp

pywhispercpp's Introduction

pywhispercpp

Python bindings for whisper.cpp with a simple Pythonic API on top of it.

whisper.cpp is:

High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:

Plain C/C++ implementation without dependencies

Apple silicon first-class citizen - optimized via Arm Neon and Accelerate framework

AVX intrinsics support for x86 architectures

VSX intrinsics support for POWER architectures

Mixed F16 / F32 precision

Low memory usage (Flash Attention)

Zero memory allocations at runtime

Runs on the CPU

C-style API

Supported platforms:

Mac OS (Intel and Arm)

iOS

Android

Linux / FreeBSD

WebAssembly

Windows (MSVC and MinGW]

Installation
Quick start
Examples
Advanced usage
Discussions and contributions
License

Installation

First Install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

PYPI

Once ffmpeg is installed, install pywhispercpp

pip install pywhispercpp

If you want to use the examples, you will need to install extra dependencies

pip install pywhispercpp[examples]

From source

You can install the latest dev version from GitHub:

pip install git+https://github.com/abdeladim-s/pywhispercpp

CoreML support

Thanks to @tangm, using CoreML is now supported:

To build and install, clone the repository and run the following commands:

export CMAKE_ARGS="-DWHISPER_COREML=1"
python -m build --wheel # in this repository to build the wheel. Assumes you have installed build with pip install build
pip install dist/<generated>.whl

Then download and convert the appropriate model using the original whisper.cpp repository, producing a <model>.mlmodelc directory.

You can now verify if everything's working:

from pywhispercpp.model import Model

model = Model('<model_path>/ggml-base.en.bin', n_threads=6)
print(Model.system_info())  # and you should see COREML = 1

If successful, you should also see the following on your terminal:

whisper_init_state: loading Core ML model from '<model_path>/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded

Quick start

from pywhispercpp.model import Model

model = Model('base.en', n_threads=6)
segments = model.transcribe('file.mp3', speed_up=True)
for segment in segments:
    print(segment.text)

You can also assign a custom new_segment_callback

from pywhispercpp.model import Model

model = Model('base.en', print_realtime=False, print_progress=False)
segments = model.transcribe('file.mp3', new_segment_callback=print)

The ggml model will be downloaded automatically.
You can pass any whisper.cpp parameter as a keyword argument to the Model class or to the transcribe function.
The transcribe function accepts any media file (audio/video), in any format.
Check the Model class documentation for more details.

Examples

The examples folder contains several examples inspired from the original whisper.cpp/examples.

Main

Just a straightforward example with a simple Command Line Interface.

Check the source code here, or use the CLI as follows:

pwcpp file.wav -m base --output-srt --print_realtime true

Run pwcpp --help to get the help message

usage: pwcpp [-h] [-m MODEL] [--version] [--processors PROCESSORS] [-otxt] [-ovtt] [-osrt] [-ocsv] [--strategy STRATEGY]
             [--n_threads N_THREADS] [--n_max_text_ctx N_MAX_TEXT_CTX] [--offset_ms OFFSET_MS] [--duration_ms DURATION_MS]
             [--translate TRANSLATE] [--no_context NO_CONTEXT] [--single_segment SINGLE_SEGMENT] [--print_special PRINT_SPECIAL]
             [--print_progress PRINT_PROGRESS] [--print_realtime PRINT_REALTIME] [--print_timestamps PRINT_TIMESTAMPS]
             [--token_timestamps TOKEN_TIMESTAMPS] [--thold_pt THOLD_PT] [--thold_ptsum THOLD_PTSUM] [--max_len MAX_LEN]
             [--split_on_word SPLIT_ON_WORD] [--max_tokens MAX_TOKENS] [--speed_up SPEED_UP] [--audio_ctx AUDIO_CTX]
             [--prompt_tokens PROMPT_TOKENS] [--prompt_n_tokens PROMPT_N_TOKENS] [--language LANGUAGE] [--suppress_blank SUPPRESS_BLANK]
             [--suppress_non_speech_tokens SUPPRESS_NON_SPEECH_TOKENS] [--temperature TEMPERATURE] [--max_initial_ts MAX_INITIAL_TS]
             [--length_penalty LENGTH_PENALTY] [--temperature_inc TEMPERATURE_INC] [--entropy_thold ENTROPY_THOLD]
             [--logprob_thold LOGPROB_THOLD] [--no_speech_thold NO_SPEECH_THOLD] [--greedy GREEDY] [--beam_search BEAM_SEARCH]
             media_file [media_file ...]

positional arguments:
  media_file            The path of the media file or a list of filesseparated by space

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to the `ggml` model, or just the model name
  --version             show program's version number and exit
  --processors PROCESSORS
                        number of processors to use during computation
  -otxt, --output-txt   output result in a text file
  -ovtt, --output-vtt   output result in a vtt file
  -osrt, --output-srt   output result in a srt file
  -ocsv, --output-csv   output result in a CSV file
  --strategy STRATEGY   Available sampling strategiesGreefyDecoder -> 0BeamSearchDecoder -> 1
  --n_threads N_THREADS
                        Number of threads to allocate for the inferencedefault to min(4, available hardware_concurrency)
  --n_max_text_ctx N_MAX_TEXT_CTX
                        max tokens to use from past text as prompt for the decoder
  --offset_ms OFFSET_MS
                        start offset in ms
  --duration_ms DURATION_MS
                        audio duration to process in ms
  --translate TRANSLATE
                        whether to translate the audio to English
  --no_context NO_CONTEXT
                        do not use past transcription (if any) as initial prompt for the decoder
  --single_segment SINGLE_SEGMENT
                        force single segment output (useful for streaming)
  --print_special PRINT_SPECIAL
                        print special tokens (e.g. <SOT>, <EOT>, <BEG>, etc.)
  --print_progress PRINT_PROGRESS
                        print progress information
  --print_realtime PRINT_REALTIME
                        print results from within whisper.cpp (avoid it, use callback instead)
  --print_timestamps PRINT_TIMESTAMPS
                        print timestamps for each text segment when printing realtime
  --token_timestamps TOKEN_TIMESTAMPS
                        enable token-level timestamps
  --thold_pt THOLD_PT   timestamp token probability threshold (~0.01)
  --thold_ptsum THOLD_PTSUM
                        timestamp token sum probability threshold (~0.01)
  --max_len MAX_LEN     max segment length in characters
  --split_on_word SPLIT_ON_WORD
                        split on word rather than on token (when used with max_len)
  --max_tokens MAX_TOKENS
                        max tokens per segment (0 = no limit)
  --speed_up SPEED_UP   speed-up the audio by 2x using Phase Vocoder
  --audio_ctx AUDIO_CTX
                        overwrite the audio context size (0 = use default)
  --prompt_tokens PROMPT_TOKENS
                        tokens to provide to the whisper decoder as initial prompt
  --prompt_n_tokens PROMPT_N_TOKENS
                        tokens to provide to the whisper decoder as initial prompt
  --language LANGUAGE   for auto-detection, set to None, "" or "auto"
  --suppress_blank SUPPRESS_BLANK
                        common decoding parameters
  --suppress_non_speech_tokens SUPPRESS_NON_SPEECH_TOKENS
                        common decoding parameters
  --temperature TEMPERATURE
                        initial decoding temperature
  --max_initial_ts MAX_INITIAL_TS
                        max_initial_ts
  --length_penalty LENGTH_PENALTY
                        length_penalty
  --temperature_inc TEMPERATURE_INC
                        temperature_inc
  --entropy_thold ENTROPY_THOLD
                        similar to OpenAI's "compression_ratio_threshold"
  --logprob_thold LOGPROB_THOLD
                        logprob_thold
  --no_speech_thold NO_SPEECH_THOLD
                        no_speech_thold
  --greedy GREEDY       greedy
  --beam_search BEAM_SEARCH
                        beam_search

Assistant

This is a simple example showcasing the use of pywhispercpp as an assistant. The idea is to use a VAD to detect speech (in this example we used webrtcvad), and when some speech is detected, we run the transcription.
It is inspired from the whisper.cpp/examples/command example.

You can check the source code here or you can use the class directly to create your own assistant:

from pywhispercpp.examples.assistant import Assistant

my_assistant = Assistant(commands_callback=print, n_threads=8)
my_assistant.start()

Here we set the commands_callback to a simple print, so the commands will just get printed on the screen.

You can run this example from the command line as well

$ pwcpp-assistant --help

usage: pwcpp-assistant [-h] [-m MODEL] [-ind INPUT_DEVICE] [-st SILENCE_THRESHOLD] [-bd BLOCK_DURATION]

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Whisper.cpp model, default to tiny.en
  -ind INPUT_DEVICE, --input_device INPUT_DEVICE
                        Id of The input device (aka microphone)
  -st SILENCE_THRESHOLD, --silence_threshold SILENCE_THRESHOLD
                        he duration of silence after which the inference will be running, default to 16
  -bd BLOCK_DURATION, --block_duration BLOCK_DURATION
                        minimum time audio updates in ms, default to 30

Recording

Another simple example to transcribe your own recordings.

You can use it from Python as follows:

from pywhispercpp.examples.recording import Recording

myrec = Recording(5)
myrec.start()

Or from the command line:

$ pwcpp-recording --help

usage: pwcpp-recording [-h] [-m MODEL] duration

positional arguments:
  duration              duration in seconds

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Whisper.cpp model, default to tiny.en

Live Stream Transcription

This example is an attempt to transcribe a livestream in realtime, but the results are not quite satisfactory yet, the CPU jumps quickly to 100% and I cannot use huge models on my descent machine. (Or maybe I am doing something wrong!) 😅

If you have a powerful machine, give it a try.

From python :

from pywhispercpp.examples.livestream import LiveStream

url = ""  # Make sure it is a direct stream URL
ls = LiveStream(url=url, n_threads=4)
ls.start()

From the command line:

$ pwcpp-livestream --help

usage: pwcpp-livestream [-h] [-nt N_THREADS] [-m MODEL] [-od OUTPUT_DEVICE] [-bls BLOCK_SIZE] [-bus BUFFER_SIZE] [-ss SAMPLE_SIZE] url

positional arguments:
  url                   Stream URL

options:
  -h, --help            show this help message and exit
  -nt N_THREADS, --n_threads N_THREADS
                        number of threads, default to 3
  -m MODEL, --model MODEL
                        Whisper.cpp model, default to tiny.en
  -od OUTPUT_DEVICE, --output_device OUTPUT_DEVICE
                        the output device, aka the speaker, leave it None to take the default
  -bls BLOCK_SIZE, --block_size BLOCK_SIZE
                        block size, default to 1024
  -bus BUFFER_SIZE, --buffer_size BUFFER_SIZE
                        number of blocks used for buffering, default to 20
  -ss SAMPLE_SIZE, --sample_size SAMPLE_SIZE
                        Sample size, default to 4

Advanced usage

First check the API documentation for more advanced usage.
If you are a more experienced user, you can access the C-Style API directly, almost all functions from whisper.h are exposed with the binding module _pywhispercpp.

import _pywhispercpp as pwcpp

ctx = pwcpp.whisper_init_from_file('path/to/ggml/model')

Discussions and contributions

If you find any bug, please open an issue.

If you have any feedback, or you want to share how you are using this project, feel free to use the Discussions and open a new topic.

License

This project is licensed under the same license as whisper.cpp (MIT License).

pywhispercpp's People

Contributors

Stargazers

Watchers

Forkers

nitishymtpl chidiwilliams v0xie antor44 danielusg hslr4 kub3magic fabianonunes tangm cevincent joseporiolayats akrichikov leixy76

pywhispercpp's Issues

Unknown language error

When passing in a language parameter, it doesn't seem to translate the string over properly.

Example:

run pwcpp ./some_audio.wav --language "es" -m base --print_realtime true

Error output:
whisper_lang_id: unknown language '\�'

"Cannot find source file: ggml.h" when trying to install on Ubuntu 22.04 on aarch64

Would appreciate some help on this.
I'm not sure why are some files are missing when trying to build the wheel.

This is an Oracle Cloud Free tier instance.
VM.Standard.A1.Flex (Arm processor from Ampere) - 4 CPU, 24 GB RAM.

ubuntu@server1:~$ pip install pywhispercpp
Defaulting to user installation because normal site-packages is not writeable
Collecting pywhispercpp
  Using cached pywhispercpp-1.0.8.tar.gz (229 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting pydub
  Using cached pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Requirement already satisfied: platformdirs in /usr/lib/python3/dist-packages (from pywhispercpp) (2.5.1)
Requirement already satisfied: tqdm in ./.local/lib/python3.10/site-packages (from pywhispercpp) (4.64.1)
Requirement already satisfied: numpy in ./.local/lib/python3.10/site-packages (from pywhispercpp) (1.24.2)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pywhispercpp) (2.25.1)
Building wheels for collected packages: pywhispercpp
  Building wheel for pywhispercpp (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pywhispercpp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [109 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-aarch64-3.10
      creating build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/_logger.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/__init__.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/constants.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/utils.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/model.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      creating build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/main.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/assistant.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/__init__.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/recording.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/livestream.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      running build_ext
      -- The C compiler identification is GNU 11.3.0
      -- The CXX compiler identification is GNU 11.3.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- pybind11 v2.9.2
      -- Found PythonInterp: /usr/bin/python3 (found version "3.10.6")
      -- Found PythonLibs: /usr/lib/aarch64-linux-gnu/libpython3.10.so
      -- Performing Test HAS_FLTO
      -- Performing Test HAS_FLTO - Success
      -- Looking for pthread.h
      -- Looking for pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- CMAKE_SYSTEM_PROCESSOR: aarch64
      -- ARM detected
      -- CMAKE_SYSTEM_PROCESSOR: aarch64
      -- ARM detected
      -- Configuring done
      CMake Error at whisper.cpp/CMakeLists.txt:190 (add_library):
        Cannot find source file:

          ggml.h

        Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
        .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc


      CMake Error at whisper.cpp/CMakeLists.txt:190 (add_library):
        No SOURCES given to target: whisper


      CMake Generate step failed.  Build files cannot be regenerated correctly.
      Traceback (most recent call last):
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 261, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 230, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
          self.run_setup()
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 132, in <module>
          setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 148, in setup
          return run_commands(dist)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 163, in run_commands
          dist.run_commands()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 967, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
          self.build_extensions()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
          self.build_extension(ext)
        File "setup.py", line 118, in build_extension
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 524, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-0xv4591a/pywhispercpp_c5e26fcae91046c186dddac942177d54', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-install-0xv4591a/pywhispercpp_c5e26fcae91046c186dddac942177d54/build/lib.linux-aarch64-3.10/', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DCMAKE_BUILD_TYPE=Release', '-DEXAMPLE_VERSION_INFO=1.0.8', '-GNinja', '-DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-mt4f3311/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pywhispercpp
Failed to build pywhispercpp
ERROR: Could not build wheels for pywhispercpp, which is required to install pyproject.toml-based projects

Unable to load `quantized` models

I am trying to load whisper tiny quantized version ( ggml-tiny-q5_1.bin)
i trying this using jupyter kernal in visual studio code
But when i try to run
model = Model('./models/quantized/ggml-tiny-q5_1.bin', print_progress=False)
Kernal is dying ( unable to load quantized model ) remaining unquantized models are working file seems to be issue in quantized version

Kindly need someone help !!!

How to add space between subtitles?

Hello. There is no space between two sentences when using this model. In other words, when the speaker finishes the sentence, the subtitle is still shown. I just want it to be displayed only when speaker is speaking. But subtitles always appear. What setting should I change?

word-level timestamps?

Hi - thanks for making this. I was trying to get word-level timestamps, but haven't been able to figure out how to. Any tips? Thanks again!

Using the agent for interacting with ollama models

thank you for simplifying programatic access to whisper.cpp. I really appreciate your kind gift to the community. Please forgive my question, however. I can't seem to figure out how to call ollama from your agent module. I assume I need to modify the callback parameter and use langchain's LLM module's ollama function, but I can't find any example code. Will you be publishing example code? The documentation seems to suggest you will. I would very much appreciate some direction. Thank you again for sharing your wonderful code.

Tool is super slow / runs forever

I'm trying to transcribe the audio of a 45s mp3 of the audio of a YouTube Short.
I'm doing it like this:

from pywhispercpp.model import Model
model = Model('base.en', print_realtime=False, print_progress=True, n_threads=6)
segments = model.transcribe(short_audio_file, speed_up=True, new_segment_callback=print)

It runs forever, doesn't end and this is all the output I get. Then it just keeps running for seemingly nothing. CPU is at 100%:

[2024-01-09 23:28:50,941] {utils.py:38} INFO - No download directory was provided, models will be downloaded to [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,943] {utils.py:46} INFO - Model base.en already exists in [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,944] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_no_state: loading model from '/home/marius/.local/share/pywhispercpp/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
[2024-01-09 23:28:52,186] {model.py:130} INFO - Transcribing ...

Any ideas what could be wrong or how to improve the speed? Thanks for any help. I appreciate it. This is the most promising of the python bindings for whisper.cpp as the others don't even build anymore...

Unable to install on raspberry pi 4

Hello,

The original code by ggerganov works with the raspberry pi as well, I was hoping a python wrapper could also work with it.

Currently when I run pip install pywhispercpp I get a build error:

exit status 1.

ERROR: FAILED building wheel for pywhispercpp
Failed to build pywhispercpp
ERROR: Could not build wheels for pywhispercpp, which is required to install pyproject.toml-based projects

Integrating pywhispercpp as the first extension to lollms-webui

Hi Abdeladim. I finally start to write extensions to lollms and I was thinking that first extension should be audio in and audio out. But I need to comply with my rule number 1: Every thing should be done locally. No data is sent anywhere out of your PC.

To do this, I think whisper is really cool. Even cooler is whispercpp, but since I use python, i need pywhispercpp :)

Do you have an example of your code that uses direct input stream from microphone? That would simplify the integration greatly.

How to make transcription and speaker diarization using pywhispercpp

Hello,

I am interested in using pywhispercpp for speech recognition and speaker diarization.

I have installed the library and followed the instructions in the README file, but I am not sure how to use it for my use case.

Could you please provide some guidance or examples on how to make transcription and speaker diarization using pywhispercpp?

Note: I'm using google colab.

Thank you.

How to use coreML models in Mac M2?

I able to use CoreML models in my mac M2 using the base 'whisper.cpp'

https://github.com/ggerganov/whisper.cpp#core-ml-support

How to use CoreML in this pywishpercpp ?

Also one suggestion

add your library in their bindings list : https://github.com/ggerganov/whisper.cpp#bindings

ERROR - unable to initialize from path

Hello.

It cannot be initialized from any path as shown below.

from pywhispercpp.model import Model
model = Model(''/home/user/.local/share/pywhispercpp/models/ggml-large.bin'')

The following error will be output.

:
Invoked with: PosixPath('/home/user/.local/share/pywhispercpp/models/ggml-large.bin')
Segmentation fault (core dumped)

I think can initialize it by setting the Path object to str.
https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/model.py#L83

About GPU question

Hello author, may I ask if this project supports GPU?

_pywhispercpp module could not be found

Just did a standard PyPi download in my venv as per

pip install pywhispercpp

A standard script with:

import pywhispercpp.model as m

modelPath: str = ...
filePath: str = ...
outputPath: str = ...

model = m.Model('modelPath', n_threads=6)
segments = model.transcribe(filePath, token_timestamps=True, max_len=1)

with open(outputPath, 'w') as file:
    for segment in segments:
        file.write(segment.text + '\n')

Is failing with error:

Traceback (most recent call last):
  File "...\whisper_file.py", line 1, in <module>
    import pywhispercpp.model as m
  File "...\model.py", line 13, in <module>
    import _pywhispercpp as pw
ImportError: DLL load failed while importing _pywhispercpp: The specified module could not be found.

For reference, FFMpeg is installed:

╰─ ffmpeg -version                                                                                                   ─╯
ffmpeg version 4.4-essentials_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100

pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found #include <clblast_c.h>

My Mac M2;
clone the code and download whisper.cpp repository.
run the command: cd pywhispercpp && python setup.py install

Building C object CMakeFiles/_pywhispercpp.dir/whisper.cpp/ggml-opencl.c.o
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found
#include <clblast_c.h>
         ^~~~~~~~~~~~~
1 error generated.
make[2]: *** [CMakeFiles/_pywhispercpp.dir/whisper.cpp/ggml-opencl.c.o] Error 1
make[1]: *** [CMakeFiles/_pywhispercpp.dir/all] Error 2
make: *** [all] Error 2
Traceback (most recent call last):
  File "/Users/diaojunxian/Documents/github/pywhispercpp/setup.py", line 132, in <module>
    setup(
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/Users/diaojunxian/Documents/github/pywhispercpp/setup.py", line 121, in build_extension
    subprocess.run(
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '--build', '.']' returned non-zero exit status 2.

but when run the command cd pywhispercpp/whisper.cpp && make clean && make , it means success.

 make clean && make
[  8%] Building C object CMakeFiles/whisper.dir/ggml.c.o
[ 16%] Building CXX object CMakeFiles/whisper.dir/whisper.cpp.o
[ 25%] Linking CXX shared library libwhisper.dylib
[ 25%] Built target whisper
[ 33%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 41%] Building CXX object examples/CMakeFiles/common.dir/common-ggml.cpp.o
[ 50%] Linking CXX static library libcommon.a
[ 50%] Built target common
[ 58%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 66%] Linking CXX executable ../../bin/main
[ 66%] Built target main
[ 75%] Building CXX object examples/bench/CMakeFiles/bench.dir/bench.cpp.o
[ 83%] Linking CXX executable ../../bin/bench
[ 83%] Built target bench
[ 91%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:112:29: warning: cast from 'const int *' to 'char *' drops const qualifier [-Wcast-qual]
        fout.write((char *) &ftype_dst,             sizeof(hparams.ftype));
                            ^
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:148:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
            finp.read ((char *) word.data(), len);
                                ^
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:149:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
            fout.write((char *) word.data(), len);
                                ^
3 warnings generated.
[100%] Linking CXX executable ../../bin/quantize
[100%] Built target quantize

Nothing happens

Hello, I'm using version 1.1.1 of the pywhispercpp library and when I try to run the code, nothing happens. I've tried using the CLI and the same error persists. Also, when the library breaks the other whisper variables and I need to uninstall it to get it working again.

Is there anything I can do to resolve it?

I'll attach a video showing what appears when I run the code.

2023-07-16.20-43-28.mp4

Model class is not supporting relative paths to files

I'm experimenting with your library, and I've noticed that the Model class is not supporting relative paths to files. Here is the traceback.

In [5]: asr_result = model.transcribe("../../audio1470766962.wav")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 1
----> 1 asr_result = model.transcribe("../../audio1470766962.wav")

File ~/whisper/lib/python3.10/site-packages/pywhispercpp/model.py:118, in Model.transcribe(self, media, n_processors, new_segment_callback, **params)
    116     media_path = Path(media).absolute()
    117     if not media_path.exists():
--> 118         raise FileNotFoundError(media)
    119     audio = self._load_audio(media_path)
    120 # update params if any

FileNotFoundError: ../../audio1470766962.wav

I assume this is because of the using absolute method from pathlib. If I'm reading the documentation correctly, using resolve method instead of absoulte will resolve (pun intended) the issue 🙂

Here is the example

In [3]: pathlib.Path('../audio1470766962.wav').absolute()
Out[3]: PosixPath('/Users/guschin/whisper/../audio1470766962.wav')

In [4]: pathlib.Path('../audio1470766962.wav').resolve()
Out[4]: PosixPath('/Users/guschin/audio1470766962.wav')

Would you consider this change, please? I can send you a PR.

"ggml-metal.metal" file couldn't be found when loading the large-v3 model for CoreML

Hello everyone,
I'm working with an M3 Max and I've tried to load the "ggml-large-v3.bin" model with the following code:

from pywhispercpp.model import Model
model = Model('/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3.bin', n_threads=6)
print(Model.system_info())  # and you should see COREML = 1

But it's unable to find the ggml-metal.metal file when it is actually present in the whisper.cpp folder. It gives me the following result:

[2024-04-30 17:38:52,675] {model.py:221} INFO - Initializing the model ...
AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0

whisper_init_from_file_with_params_no_state: loading model from '/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x60000250f690 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed
whisper_model_load:      CPU total size =  3094.36 MB
whisper_model_load: model size    = 3094.36 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600002508f00 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: loading Core ML model from '/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
whisper_init_state: compute buffer (conv)   =   10.92 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =  209.26 MB

I've tried to add the path to the environment variables with:

export GGML_METAL_PATH_RESOURCES=/Users/gregoiredesauvage/Dev/Modules/pywhispercpp/whisper.cpp/ggml-metal.metal

but it didn't work.

I have the "ggml-large-v3-encoder.mlmodelc" file in the same folder as the "ggml-large-v3.bin" file.

Any idea?

ERROR - Invalid model name `./model.bin`

Hello.
I want create my own assistant so I downloaded the assistant example in assistant.py

from assistant import Assistant

def file(text):
    with open("text.txt","a") as f:
        f.write(text+"\n")

hope = Assistant(commands_callback=file,model="./model.bin")
hope.start()

I want to use my own model. But when I give the Assitant class its path. I get this error :

[2023-08-17 14:48:31,033] {utils.py:34} ERROR - Invalid model name `./model.bin`, available models are: ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large']
[2023-08-17 14:48:31,034] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_no_state: loading model from '(null)'

Why this error ? And how to solve it ?

Thanks in advance.