litongjava / whisper-cpp-server Goto Github PK

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

License: MIT License

CMake 0.08% C++ 3.77% Metal 8.75% Python 1.39% Shell 0.21% Batchfile 0.04% HTML 85.71% C 0.02% Dockerfile 0.04%

asr opneai inference speech-recognition speech-to-text transformer whisper whisper-cpp whisper-cpp-server whisper-server

whisper-cpp-server's Introduction

whisper-cpp service

open sourde address

github gitee

Whisper-CPP-Server Introduction

Whisper-CPP-Server is a high-performance speech recognition service written in C++, designed to provide developers and enterprises with a reliable and efficient speech-to-text inference engine. This project implements technology from ggml to perform inference on the open-source Whisper model. While ensuring speed and accuracy, it supports pure CPU-based inference operations, allowing for high-quality speech recognition services without the need for specialized hardware accelerators.

Real-time speech recognition and display of recognition results in the browser backend

https://github.com/litongjava/whisper-cpp-server

frontend

https://github.com/litongjava/listen-know-web

Test video

whisper-cpp-server-test.mp4

Main Features

1.Pure C++ Inference Engine Whisper-CPP-Server is entirely written in C++, leveraging the efficiency of C++ for rapid processing of vast amounts of voice data, even in environments that only have CPUs for computing power.

2.High Performance Thanks to the computational efficiency of C++, Whisper-CPP-Server can offer exceptionally high processing speeds, meeting real-time or near-real-time speech recognition demands. It is especially suited for scenarios that require processing large volumes of voice data.

3.Support for Multiple Languages The service supports speech recognition in multiple languages, broadening its applicability across various linguistic contexts.

4.Docker Container Support A Docker image is provided, enabling quick deployment of the service through simple command-line operations, significantly simplifying installation and configuration processes. Deploy using the following command:

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-large-v3

This means you can run Whisper-CPP-Server on any platform that supports Docker, including but not limited to Linux, Windows, and macOS.

4.Easy Integration for Clients Detailed client integration documentation is provided, helping developers quickly incorporate speech recognition functionality into their applications. Client Code Documentation

Applicable Scenarios

Whisper-CPP-Server is suitable for a variety of applications that require fast and accurate speech recognition, including but not limited to:

Voice-driven interactive applications
Transcription of meeting records
Automatic subtitle generation
Automatic translation of multi-language content

How to build it

build with cmake and vcpkg

git clone https://github.com/litongjava/whisper-cpp-server.git
git submodule init
git submodule update
cmake -B cmake-build-release
cp ./ggml-metal.metal cmake-build-release 
cmake --build cmake-build-release --config Release -- -j 12

macos

cmake -B cmake-build-release -DWHISPER_COREML=1

run with simplest

./cmake-build-release/simplest -m models/ggml-base.en.bin test.wav

run with http-server

./cmake-build-release/whisper_http_server_base_httplib -m models/ggml-base.en.bin

run with websocket-server

./cmake-build-release/whisper_server_base_on_uwebsockets -m models/ggml-base.en.bin

copy command

mkdir bin
cp ./ggml-metal.metal bin
cp ./cmake-build-release/simplest bin
cp ./cmake-build-release/whisper_http_server_base_httplib bin 
cp ./cmake-build-release/whisper_server_base_on_uwebsockets bin

simplest

cmake-build-debug/simplest -m models/ggml-base.en.bin samples/jfk.wav

simplest [options] file0.wav file1.wav ...

options:                                                                                                                                                                                                
-h,        --help              [default] show this help message and exit                                                                                                                              
-m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path                                                                                                                                   
-di,       --diarize           [false  ] stereo audio diarization

whisper_http_server_base_httplib

Simple http service. WAV mp4 and m4a Files are passed to the inference model via http requests.

./whisper_http_server_base_httplib -h

usage: ./bin/whisper_http_server_base_httplib [options]

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [2      ] number of best candidates to keep
  -bs N,     --beam-size N       [-1     ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -debug,    --debug-mode        [false  ] enable debug mode (eg. dump log_mel)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference
  --host HOST,                   [127.0.0.1] Hostname/ip-adress for the service
  --port PORT,                   [8080   ] Port number for the service

start whisper_http_server_base_httplib

./cmake-build-debug/whisper_http_server_base_httplib -m models/ggml-base.en.bin

Test server
see request doc in doc

request examples

/inference

curl --location --request POST http://127.0.0.1:8080/inference \
--form file=@"./samples/jfk.wav" \
--form temperature="0.2" \
--form response-format="json"
--form audio_format="wav"

/load

curl 127.0.0.1:8080/load \
-H "Content-Type: multipart/form-data" \
-F model="<path-to-model-file>"

whisper_server_base_on_uwebsockets

web socket server
start server

./cmake-build-debug/whisper_server_base_on_uwebsockets -m models/ggml-base.en.bin

Test server see python client

Docker

run whisper-cpp-server:1.0.0

Dockerfile

docker run -dit --name=whisper-server -p 8080:8080 -v "$(pwd)/models/ggml-base.en.bin":/models/ggml-base.en.bin litongjava/whisper-cpp-server:1.0.0 /app/whisper_http_server_base_httplib -m /models/ggml-base.en.bin

the port is 8080

test

curl --location --request POST 'http://127.0.0.1:8080/inference' \
--header 'Accept: */*' \
--header 'Content-Type: multipart/form-data; boundary=--------------------------671827497522367123871197' \
--form 'file=@"E:\\code\\cpp\\cpp-study\\cpp-study-clion\\audio\\jfk.wav"' \
--form 'temperature="0.2"' \
--form 'response-format="json"' \
--form 'audio_format="wav"'

run whisper-cpp-server:1.0.0-base-en

Dockerfile

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-base-en

run whisper-cpp-server:1.0.0-large-v3

Dockerfile

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-large-v3

run whisper-cpp-server:1.0.0-tiny.en-q5_1

Dockerfile

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-tiny.en-q5_1

Client code

whisper-cpp-server's People

Contributors

Stargazers

Watchers

Forkers

eschmidbauer afrizaloky chanli520 bruce-sz-cn

whisper-cpp-server's Issues

can‘t make whisper_server_base_on_uwebsockets on windows

Illegal instruction (core dumped)

Running whisper-cpp-server docker image in a kubernetes cluster as a microservice. I'm attaching the models folder via extending the docker image like this:

FROM litongjava/whisper-cpp-server:1.0.0

ADD models/ models/

which seems to work pretty well. Then, in my kubernetes deployment, I'm translating the docker run command provided in the docs:

      containers:
        - name: whisper
          image: jonnyburkholder/whisper
          stdin: true
          tty: true
          command: ["/app/whisper_http_server_base_httplib"]
          args: ["-m", "$(MODEL_PATH)"]
          ports:
            - containerPort: 8080

This loads the model, then exits immediately with the following output:

 - dev:deployment/whisper: container whisper terminated with exit code 132
    - dev:pod/whisper-c994db857-vp7t8: container whisper terminated with exit code 132
      > [whisper-c994db857-vp7t8 whisper] whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-tiny.en-q5_1.bin'
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: loading model
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_vocab       = 51864
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_ctx   = 1500
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_state = 384
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_head  = 6
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_layer = 4
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_ctx    = 448
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_state  = 384
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_head   = 6
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_layer  = 4
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_mels        = 80
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: ftype         = 9
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: qntvr         = 1
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: type          = 1 (tiny)
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: adding 1607 extra tokens
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_langs       = 99
 - dev:deployment/whisper failed. Error: container whisper terminated with exit code 132.

I've assumed that the container exits because the docker image is being run in the background and not actively doing work in the container. So to keep it from from exiting early, I modified the command to include writing to dev/null:

      containers:
        - name: whisper
          image: jonnyburkholder/whisper
          stdin: true
          tty: true
          command: ["/bin/sh", "-c"]
          args: ["/app/whisper_http_server_base_httplib -m $(MODEL_PATH); tail -f /dev/null"]
          ports:
            - containerPort: 8080

This keeps the container alive, but gives me the error "Illegal instruction (core dumped)". This output pops up from the logs every few seconds

[whisper] whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-tiny.en-q5_1.bin'
[whisper] whisper_model_load: loading model
[whisper] whisper_model_load: n_vocab       = 51864
[whisper] whisper_model_load: n_audio_ctx   = 1500
[whisper] whisper_model_load: n_audio_state = 384
[whisper] whisper_model_load: n_audio_head  = 6
[whisper] whisper_model_load: n_audio_layer = 4
[whisper] whisper_model_load: n_text_ctx    = 448
[whisper] whisper_model_load: n_text_state  = 384
[whisper] whisper_model_load: n_text_head   = 6
[whisper] whisper_model_load: n_text_layer  = 4
[whisper] whisper_model_load: n_mels        = 80
[whisper] whisper_model_load: ftype         = 9
[whisper] whisper_model_load: qntvr         = 1
[whisper] whisper_model_load: type          = 1 (tiny)
[whisper] whisper_model_load: adding 1607 extra tokens
[whisper] whisper_model_load: n_langs       = 99
[whisper] Illegal instruction (core dumped)

I'm using the ggml-tiny.en-q5_1 model. I've tested it with the docker run command on the command line, and it works fine from there. I'm having trouble understanding why it isn't working in my cluster, however. Any insight into why this doesn't work would be appreciated. Thanks!

SDL2::SDL2-static not found

I use Clion IDE and msys64\mingw64 to cmake this，but CMake Error has happened：

`CMake Error at CMakeLists.txt:16 (target_link_libraries):
Target "sdl_version" links to:

SDL2::SDL2-static

but the target was not found. Possible reasons include:

* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.

CMake Error at CMakeLists.txt:25 (target_link_libraries):
Target "stream_local" links to:

SDL2::SDL2-static

but the target was not found. Possible reasons include:

* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.

-- Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.

[Finished]`

what can i do？pls help……

Cannot build on Apple

Hi
I would love to try this out, but if I do the following

git clone https://github.com/litongjava/whisper-cpp-server
cmake .

I get

CMake Error at CMakeLists.txt:11 (find_package):
  Could not find a package configuration file provided by "libuv" with any of
  the following names:

    libuvConfig.cmake
    libuv-config.cmake

  Add the installation prefix of "libuv" to CMAKE_PREFIX_PATH or set
  "libuv_DIR" to a directory containing one of the above files.  If "libuv"
  provides a separate development package or SDK, be sure it has been
  installed.

Even though I have installed libuv:

brew install libuv

Do you have any idea how to get past thgis problem?

Real-time speech recognition and display of recognition results in the browser

Real-time speech recognition and display of recognition results in the browser
Functionality is complete and ready for testing

backend

https://github.com/litongjava/whisper-cpp-server

frontend

https://github.com/litongjava/listen-know-web

whisper-cpp-server-test.mp4

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.