Code Monkey home page Code Monkey logo

re-verb's Introduction


Logo

Logo


About the project

RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when

RE:VERB is our final project in Magshimim, and consists of a web client and a server.

  • The client can record audio and show the the timestamp results graphically

  • The server can be used with many other clients with the simple REST API it has.

Built With

client

server

  • Pytorch - library for deep learning with python that has great support for GPUs with CUDA

  • Express.js - Node.js web server framework

Getting Started

The project contains the server and the web client(a CLI client also exists for debug purposes).

the server is located at ./server and the web client is located at ./client/website.


Server

The model alongside the scripts for downloading, training and the weights from our training is located at ./server/speech_diarization/model

we used Docker to create a cross-platform environment to run the server on.

The server is made up of:

  • a container for the web server
  • a container for the diarization process
  • a container for a redis database that will allow the others to communicate

docker compose will run and manage all 3 at once

Docker and docker-compose need to be installed in order to build and run the server, all the rest will be taken care of.

Installing

cd server
docker-compose up

This will run all 3 containers and install dependencies.

If you make a change in the server, use

docker-compose up --build

to rebuild.

usage:

sending a HTTP POST request with an audio file to the server at http://localhost:1337/upload (default port and url) will return a JSON file with the timestamps in milliseconds.

{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}

Client

The client needs npm or yarn to be installed, more info about the client can be found here.

to install:

cd client/website
npm install

afterwards you can use

npm run serve

to run a development server


Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • The diarization algorithm is an implementation of this research, we also used their implementation of the spectral clustering

  • We took inspiration and some code from Harry volek's implementation of a different but similar problem - Speaker Verification

Future Plans

  • We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.

  • We plan to train again on the VoxCeleb 1 and 2 datasets which contain a lot more data and hopefully improve feature extraction

  • We want to add integration with a speech-to-text service and transcribe the created segments

re-verb's People

Contributors

ccactuss avatar dependabot[bot] avatar tralfazz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

re-verb's Issues

Service 'speech_diarization' failed to build

An error occurs when trying to build and launch a project:

Successfully built wave pyyaml wget sklearn future
Failed to build webrtcvad
Installing collected packages: redis, idna, urllib3, certifi, chardet, requests, wave, numpy, future, ffmpeg-python, webrtcvad, pydub, pyyaml, wget, spectralcluster, scipy, speechpy, simpleder, threadpoolctl, joblib, scikit-learn, sklearn
    Running setup.py install for webrtcvad: started
    Running setup.py install for webrtcvad: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"'; __file__='"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-nh4e9vsi/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/webrtcvad
         cwd: /tmp/pip-install-j9x05w_a/webrtcvad/
    Complete output (17 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    copying webrtcvad.py -> build/lib.linux-x86_64-3.7
    running build_ext
    building '_webrtcvad' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/cbits
    creating build/temp.linux-x86_64-3.7/cbits/webrtc
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio/signal_processing
    creating build/temp.linux-x86_64-3.7/cbits/webrtc/common_audio/vad
    gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWEBRTC_POSIX -Icbits -I/opt/conda/include/python3.7m -c cbits/pywebrtcvad.c -o build/temp.linux-x86_64-3.7/cbits/pywebrtcvad.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"'; __file__='"'"'/tmp/pip-install-j9x05w_a/webrtcvad/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-nh4e9vsi/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/webrtcvad Check the logs for full command output.
ERROR: Service 'speech_diarization' failed to build: The command '/bin/sh -c pip install -r requirements.txt --ignore-installed' returned a non-zero code: 1

What is an "utterance" in this case?

So the data's shape is (speaker, utterance, log filterbanks) and the output is (speaker, utterance, embeddings). What is utterance in this case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.