Code Monkey home page Code Monkey logo

Comments (28)

adamnsandle avatar adamnsandle commented on June 30, 2024 4

Added < 250ms compatibility
image

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024 3

New V3 Silero VAD is Already Here

Main changes

  • One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
  • Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
  • Flexible chunk size, minimum chunk size is just 30 milliseconds!
  • 100k parameters;
  • GPU and batching are supported;
  • Radically simplified examples;

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Initial models, examples, utils for VAD only uploaded (no number detector or language classifier yet)

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

First readable public release

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Added VAD latency and throughput metrics

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Updated VAD quality
Before / after (precision / recall)
image

from silero-vad.

Sontref avatar Sontref commented on June 30, 2024

Added number detector

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Language detector example, readme update + FAQ

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Audiotok benchmarks added
Looks like all energy based solutions are kind of similar

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Added a utility to tune the VAD params properly for a domain

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Some final benchmarks posted here - pyannote/pyannote-audio#604 (comment)
Probably we are done with benchmarks for now

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Added micro (10k params, 100x smaller) VAD models

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Added micro (10k params, 100x smaller) VAD models for 8 kHz audio

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024
  • Added mini (100k params) VAD models for 8 kHz and 16 kHz
  • Added adaptive vad iterator

#54

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024
  • Fixed examples and notebooks
  • Updated README
  • Added adaptive examples

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024
  • Added a language classifier for 116 languages
  • It classifies audios into languages and mutually intelligible language groups (i.e. Serbian + Bosnian + Croatian, Russian + Ukranian + others, Hindi + Urdu, etc), see the full list here and here
  • Probably some artifical / unspoken languages will be excluded and a large model will be trained

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

improved language classifier

  • 95 languages (85% accuracy), 58 language groups (90% accuracy)
  • Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian are very similar)
  • Trained on approx 20k hours of data (10k of which are for 5 most popular languages)
  • 4.7M params

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

updated further reading section

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Even Better V3 Silero VAD

  • Models with even higher quality (just see the plots with metrics!);
  • New model ~ large model >> all previous (even large) models;
  • Now model works properly quality-wise, i.e. 100ms > 60ms > 30ms and16 kHz > 8 kHz;

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

This summarises new progress well

image

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

New V3 ONNX VAD Released

We finally were able to port a model to ONNX:

  • Compact model (~100k params);
  • Both PyTorch and ONNX models are not quantized;
  • Same quality model as the latest best PyTorch release;
  • Only 16kHz available now (ONNX has some issues with if-statements and / or tracing vs scripting) with cryptic errors;
  • In our tests, on short audios (chunks) ONNX is 2-3x faster than PyTorch (this is mitigated with larger batches or long audios);
  • Audio examples and non-core models moved out of the repo to save space;

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

Support For Sampling Rates Higher Than 16 kHz

  • jit model now can handle 8, 16, 32 and 48 kHz directly (change implemented within the model itself);
  • onnx model as well, but only via external wrappers (we just use each n-th sample for higher sampling rates);
  • This support is mostly a hack, i.e. we just use each n-th sample for higher sampling rates (instead of averaging);

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

⚠️ Important Information for VAD Python Users ⚠️

If you are using the VAD in a:

  • multi-threaded or
  • a multi-process application

Do not forget to disable gradients in EACH process and / or thread.
Otherwise memory may leak noticeably.

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

image

image

from silero-vad.

adamnsandle avatar adamnsandle commented on June 30, 2024

New V4 VAD Released

Changes:

  • Improved quality
  • Improved perfomance
  • Both 8k and 16k sampling rates are now supported by the ONNX model
  • Batching is now supported by the ONNX model
  • Added audio_forward method for one-line processing of a single or multiple audio without postprocessing

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024

It is worth posting this chart:

image

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024
  • Remove picovoice mentions

from silero-vad.

snakers4 avatar snakers4 commented on June 30, 2024
  • Deprecate language classifier and number detector models, since they are not maintained anymore.

from silero-vad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.