Code Monkey home page Code Monkey logo

transcriptionstream's Introduction

Transcription Stream Community Edition

Created by https://transcription.stream with special thanks to MahmoudAshraf97 and his work on whisper-diarization, and to jmorganca for Ollama and its amazing simplicity in use.

Overview

Transcription Stream is a turnkey self-hosted diarization service that works completely offline. Out of the box it includes:

  • drag and drop diarization and transcription via SSH
  • a web interface for upload, review, and download of files
  • summarization with Ollama and Mistral
  • Meilisearch for full text search

A web interface and SSH drop zones make this simple to use and implement into your workflows. Ollama allows for a powerful toolset, limited only by your prompt skills, to perform complex operations on your transcriptions. Meiliesearch adds ridiculously fast full text search.

Use the web interface to upload, listen to, review, and download output files, or drop files via SSH into transcribe or diarize. Files are processed with output placed into a named and dated folder. Have a quick look at the install and ts-web walkthrough videos for a better idea.

ssh upload and transcribed

upload file to be diarized to the diarize folder transcribed files in their folders

ts-web interface

Example Image

ts-gpu diarization example

watch video on youtube

mistral summary

local ollama mistral summary
prompt_text = f"""
Summarize the transcription below. Be sure to include pertinent information about the speakers, including name and anything else shared.
Provide the summary output in the following style

Speakers: names or identifiers of speaking parties
Topics: topics included in the transcription
Ideas: any ideas that may have been mentioned
Dates: dates mentioned and what they correspond to
Locations: any locations mentioned
Action Items: any action items

Summary: overall summary of the transcription

The transcription is as follows

{transcription_text}

"""

Prerequisite: NVIDIA GPU

Warning: The resulting ts-gpu image is ~26GB and might take a hot second to create

Quickstart (no build)

Pulls all docker images and starts services

./start-nobuild.sh

Build and Run Instructions

If you'd like to build the images locally

Automated Install and Run

chmod +x install.sh;
./install.sh;

Run

chmod +x run.sh;
./run.sh

Additional Information

Ports

  • SSH: 22222
  • HTTP: 5006
  • Ollama: 11434
  • Meilisearch: 7700

SSH Server Access

  • Port: 22222
  • User: transcriptionstream
  • Password: nomoresaastax
  • Usage: Place audio files in transcribe or diarize. Completed files are stored in transcribed.

Web Interface

  • URL: http://dockerip:5006
  • Features:
    • Audio file upload/download
    • Task completion alerts with interactive links
    • HTML5 web player with speed control and transcription highlighting
    • Time-synced transcription scrubbing/highlighting/scrolling

Ollama api

Meilisearch api

Warning: This is example code for example purposes and should not be used in production environments without additional security measures.

Customization and Troubleshooting

  • Update variables in the .env file
  • Change the password for transcriptionstream in the ts-gpu Dockerfile.
  • Update the Ollama api endpoint IP in .env if you want to use a different endpoint
  • Update the secret in .env for ts-web
  • Use .env to choose which models are included in the initial build.
  • Change the prompt text in ts-gpu/ts-summarize.py to fit your needs. Update ts-web/templates/transcription.html if you want to call it something other than summary.
  • 12GB of vram may not be enough to run both whisper-diarization and ollama mistral. Whisper-diarization is fairly light on gpu memory out of the box, but Ollama's runner holds enough gpu memory open causing the diarization/transcription to run our of CUDA memory on occasion. Since I can't run both on the same host reliably, I've set the batch size for both whisper-diarization and whisperx to 16, from their default 8, and let a m series mac run the Ollama endpoint.

To-do

  • Need to fix an issue with ts-web that throws an error to console when loading a transcription when a summary.txt file does not also exist. Lots of other annoyances with ts-web, but it's functional.
  • Need to add a search/control interface to ts-web for Meilisearch

transcriptionstream's People

Contributors

transcriptionstream avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

transcriptionstream's Issues

Improve diarization

Hello,

I have some meeting recordings that I would like to transcribe and diarize. Unfortunately the bare install doesn't do very well. Is there a way to improve this? What should I look for ?

Meilisearch functionality

Hello,

I am not able to have a stable working Meilisearch. At first it showed in the logs that I need to restart Meilisearch with a Master Key, which I did. Then I entered the same master key in the index-single.py but it always gives a response that it was not successfull. Even though the document that is added to the index looks right.

Here is the log:

ts-meilisearch              | 2024-03-15T16:20:09.150794Z  INFO HTTP request{method=GET host="172.30.1.12:7700" route=/ query_parameters= user_agent=python-requests/2.31.0 status_code=200}: meilisearch: close time.busy=28.3µs time.idle=143µs
ts-meilisearch              | 2024-03-15T16:20:09.165457Z  INFO HTTP request{method=POST host="172.30.1.12:7700" route=/indexes/transcriptions/documents query_parameters= user_agent=Meilisearch Python (v0.31.0) status_code=202}: meilisearch: close time.busy=599µs time.idle=11.3ms
ts-meilisearch              | 2024-03-15T16:20:09.170094Z  INFO index_scheduler: A batch of tasks was successfully completed with 0 successful tasks and 1 failed tasks.

Any ideas what could be wrong? How should we use this feature?

ERROR: failed to solve: process /bin/sh -c python3 diarize.py -a /home/transcriptionstream/test.wav did not complete successfully: exit code: 1

when I was to build the docker images locally, I encountered this error:

 => ERROR [16/17] RUN python3 diarize.py -a /home/transcriptionstream/test.wav                                                                                         12.6s
------
 > [16/17] RUN python3 diarize.py -a /home/transcriptionstream/test.wav:
6.664 /usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
6.664   torchaudio.set_audio_backend("soundfile")
7.095 torchvision is not available - cannot save figures
11.17 Traceback (most recent call last):
11.17   File "/whisper-diarization/diarize.py", line 21, in <module>
11.17     from ctc_forced_aligner import (
11.17 ImportError: cannot import name 'load_alignment_model' from 'ctc_forced_aligner' (unknown location)
------
Dockerfile:59
--------------------
  57 |     
  58 |     # Run the necessary scripts so we have our transcription models IN the image. Adds to build time for download.
  59 | >>> RUN python3 diarize.py -a /home/transcriptionstream/test.wav
  60 |     RUN whisperx --model large-v3 --language en /home/transcriptionstream/test.wav --compute_type int8
  61 |     #RUN python3 -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.bin
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 diarize.py -a /home/transcriptionstream/test.wav" did not complete successfully: exit code: 1

any suggestion ?

Docker image fails to build

My env is as below

NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

I have added the traceback where it failes.

Step 17/20 : RUN python3 diarize.py -a /home/transcriptionstream/test.wav
 ---> Running in e1b9141c2bcb
/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
Traceback (most recent call last):
  File "/whisper-diarization/diarize.py", line 21, in <module>
    from ctc_forced_aligner import (
ImportError: cannot import name 'load_alignment_model' from 'ctc_forced_aligner' (unknown location)
The command '/bin/sh -c python3 diarize.py -a /home/transcriptionstream/test.wav' returned a non-zero code: 1
Creating Docker volumes...
transcriptionstream
Starting services with docker-compose...
./install.sh: Zeile 53: docker-compose: Kommando nicht gefunden.
Downloading  transcriptionstream mistral model
curl: (7) Failed to connect to 172.30.1.3 port 11434: Die Wartezeit für die Verbindung ist abgelaufen
Re-attaching to console logs
./install.sh: Zeile 63: docker-compose: Kommando nicht gefunden.
All services are up and running.

Great work, but I have some questions to consult.

1: Is it necessary to upload two files (transcribe & diarize) for ts-web to run as showing in ts-web walkthrough video(https://www.youtube.com/watch?v=pbZ8o7_MjG4 )?
2: What are the transcribe & diarize files? Do they have to be in mp3 format? Are there any restrictions on size or duration of the audio files?
3: Does it support audio in other languages? For example: Chinese.
4: I would greatly appreciate it if you could provide me with the transcribe & diarize audio files used in the example https://www.youtube.com/watch?v=pbZ8o7_MjG4 .

Thinks.

Compatibility with Colab or Kaggle?

Hi, your demo look really interesting. I was just wondering if this would be compatible with a colab or kaggle notebook? It would be very helpful as often you can use a Tesla T4 16gb for free.

OSError: libcudart.so.11.0: cannot open shared object file: No such file or directory

I face the following error after uploading the test transcribe file.

ts-gpu                      | Starting new instance of transcribe_example.sh. Current count: 0
ts-gpu                      | --- transcribing /transcriptionstream/incoming/transcribe/test.wav...
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET / HTTP/1.1" 200 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/styles.css HTTP/1.1" 304 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/gears.svg HTTP/1.1" 304 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/alerts.js HTTP/1.1" 304 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/fileHandling.js HTTP/1.1" 304 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/transcription.js HTTP/1.1" 304 -
ts-web                      | 10.10.20.153 - - [11/Jun/2024 13:07:32] "GET /static/audioControls.js HTTP/1.1" 304 -
ts-gpu                      | Traceback (most recent call last):
ts-gpu                      |   File "/usr/local/bin/whisperx", line 5, in <module>
ts-gpu                      |     from whisperx.transcribe import cli
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/whisperx/__init__.py", line 1, in <module>
ts-gpu                      |     from .transcribe import load_model
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/whisperx/transcribe.py", line 9, in <module>
ts-gpu                      |     from .alignment import align, load_align_model
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/whisperx/alignment.py", line 11, in <module>
ts-gpu                      |     import torchaudio
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/torchaudio/__init__.py", line 1, in <module>
ts-gpu                      |     from torchaudio import (  # noqa: F401
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/torchaudio/_extension/__init__.py", line 43, in <module>
ts-gpu                      |     _load_lib("libtorchaudio")
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/torchaudio/_extension/utils.py", line 61, in _load_lib
ts-gpu                      |     torch.ops.load_library(path)
ts-gpu                      |   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 1032, in load_library
ts-gpu                      |     ctypes.CDLL(path)
ts-gpu                      |   File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
ts-gpu                      |     self._handle = _dlopen(self._name, mode)
ts-gpu                      | OSError: libcudart.so.11.0: cannot open shared object file: No such file or directory
ts-gpu                      | --- done processing /transcriptionstream/incoming/transcribe/test.wav - output placed in /transcriptionstream/transcribed/test_20240611130731
ts-gpu                      | transcription:  
ts-gpu                      | Runtime for processing /transcriptionstream/incoming/transcribe/test.wav = 2
ts-gpu                      | ------------------------------------

P.S: I have the NVIDIA ToolKit installed in my system.

(base) bash-4.4$ yum list installed | grep toolkit
cuda-toolkit-11-4.x86_64                           11.4.0-1                                      @cuda-rhel8-11-4-local                
cuda-toolkit-11-4-config-common.noarch             11.4.43-1                                     @cuda-rhel8-11-4-local                
cuda-toolkit-11-config-common.noarch               11.4.43-1                                     @cuda-rhel8-11-4-local                
cuda-toolkit-config-common.noarch                  11.4.43-1                                     @cuda-rhel8-11-4-local                
libnvidia-container-tools.x86_64                   1.15.0~rc.3-1                                 @nvidia-container-toolkit-experimental
libnvidia-container1.x86_64                        1.15.0~rc.3-1                                 @nvidia-container-toolkit-experimental
nvidia-container-toolkit.x86_64                    1.15.0~rc.3-1                                 @nvidia-container-toolkit-experimental
nvidia-container-toolkit-base.x86_64               1.15.0~rc.3-1                                 @nvidia-container-toolkit-experimental

CUDA Initialization Error with ts-gpu

I am facing a CUDA initialization issue with the ts-gpu Docker container. I have successfully verified that my Docker setup (WSL 2) can access the GPU by running the nvidia-smi command, which showed the following output on my host machine:

NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5
+-----------------------------------------------------------------------------------------+
| GPU Name                 Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC    |
| Fan  Temp  Perf  Pwr:Usage/Cap        |         Memory-Usage | GPU-Util  Compute M.    |
|========================================+======================+=========================|
|   0  NVIDIA RTX A2000 8GB Lap...  On   | 00000000:01:00.0 On  |                  N/A    |
| N/A   63C    P3    19W / 72W           |  3179MiB / 8192MiB   |      0%      Default    |
+----------------------------------------+----------------------+-------------------------+

Despite this, I encounter the following runtime error during the execution of GPU-accelerated tasks within the container:

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

Could you offer any guidance on how to troubleshoot this error, or are there any known issues with this setup that I should be aware of?

$'\r': command not found

I am using dockerdesktop. And I am fairly new to containers so I am not sure if this is an issue with my setup. However, when I run the container I get the following as a repeating error every 5 seconds.

2024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 7: $'\r': command not found
2024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 11: $'\r': command not found
2024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 14: $'\r': command not found
2024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 17: $'\r': command not found
2024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 19: syntax error near unexpected token $'{\r'' '024-03-28 00:41:16 ts-gpu | /root/scripts/ts-control.sh: line 19: start_process_if_allowed() {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.