team-re-verb / re-verb Goto Github PK

speaker diarization system using an LSTM

License: MIT License

Python 76.31% JavaScript 7.23% HTML 2.82% Vue 12.10% Shell 0.23% Dockerfile 1.32%

speaker-diarization ml machine-learning ai lstm neural-network pytorch docker vue reverb

re-verb's Introduction

About the project

RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when

RE:VERB is our final project in Magshimim, and consists of a web client and a server.

The client can record audio and show the the timestamp results graphically
The server can be used with many other clients with the simple REST API it has.

Built With

client

Vue.js - The front end framework used
Wavesurfer.js - A library for waveform visualization

server

Pytorch - library for deep learning with python that has great support for GPUs with CUDA
Express.js - Node.js web server framework

Getting Started

The project contains the server and the web client(a CLI client also exists for debug purposes).

the server is located at ./server and the web client is located at ./client/website.

Server

The model alongside the scripts for downloading, training and the weights from our training is located at ./server/speech_diarization/model

we used Docker to create a cross-platform environment to run the server on.

The server is made up of:

a container for the web server
a container for the diarization process
a container for a redis database that will allow the others to communicate

docker compose will run and manage all 3 at once

Docker and docker-compose need to be installed in order to build and run the server, all the rest will be taken care of.

Installing

cd server
docker-compose up

This will run all 3 containers and install dependencies.

If you make a change in the server, use

docker-compose up --build

to rebuild.

usage:

sending a HTTP POST request with an audio file to the server at http://localhost:1337/upload (default port and url) will return a JSON file with the timestamps in milliseconds.
{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}

Client

The client needs npm or yarn to be installed, more info about the client can be found here.

to install:

cd client/website
npm install

afterwards you can use

npm run serve

to run a development server

Authors

Ofir Naccache - ofirnaccache
Matan Yesharim - Tralfazz

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

The diarization algorithm is an implementation of this research, we also used their implementation of the spectral clustering
We took inspiration and some code from Harry volek's implementation of a different but similar problem - Speaker Verification

Future Plans

We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.
We plan to train again on the VoxCeleb 1 and 2 datasets which contain a lot more data and hopefully improve feature extraction
We want to add integration with a speech-to-text service and transcribe the created segments

re-verb's People

Contributors

Stargazers

Watchers

Forkers

snehalmane30 gururao001 ishine drawrs gooltz schoolze-inc qalabeabbas49 gilokip

re-verb's Issues

"negative dimensions are not allowed" error

Server returns this error for all my WAV files. With the project's test WAV files, its working fine.
I converted the wav files to 41kHz, 16k Bit rate, mono - just like in those test files.
My sample test file: https://www.dropbox.com/s/q1kssodw8dx92f0/test1.wav?dl=0

The server returns with response "ERROR", while the error is "negative dimensions are not allowed"

"ERROR", while the error is "negative dimensions are not allowed"

The server returns with response "ERROR", while the error is "negative dimensions are not allowed". Could you help me with the problem and describe the audio file format please?

Service 'speech_diarization' failed to build

An error occurs when trying to build and launch a project:

Successfully built wave pyyaml wget sklearn future
Failed to build webrtcvad
Installing collected packages: redis, idna, urllib3, certifi, chardet, requests, wave, numpy, future, ffmpeg-python, webrtcvad, pydub, pyyaml, wget, spectralcluster, scipy, speechpy, simpleder, threadpoolctl, joblib, scikit-learn, sklearn
    Running setup.py install for webrtcvad: started
    Running setup.py install for webrtcvad: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"'; __file__='"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-nh4e9vsi/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/webrtcvad
         cwd: /tmp/pip-install-j9x05w_a/webrtcvad/
    Complete output (17 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    copying webrtcvad.py -> build/lib.linux-x86_64-3.7
    running build_ext
    building '_webrtcvad' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/cbits
    creating build/temp.linux-x86_64-3.7/cbits/webrtc
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio/signal_processing
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio/vad
    gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWEBRTC_POSIX -Icbits -I/opt/conda/include/python3.7m -c cbits/pywebrtcvad.c -o build/temp.linux-x86_64-3.7/cbits/pywebrtcvad.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"'; __file__='"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-nh4e9vsi/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/webrtcvad Check the logs for full command output.
ERROR: Service 'speech_diarization' failed to build: The command '/bin/sh -c pip install -r requirements.txt --ignore-installed' returned a non-zero code: 1

What is an "utterance" in this case?

So the data's shape is (speaker, utterance, log filterbanks) and the output is (speaker, utterance, embeddings). What is utterance in this case?