Comments (28)
from silero-vad.
New V3 Silero VAD is Already Here
Main changes
- One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
- Flexible sampling rate,
8000 Hz
and16000 Hz
are supported; - Flexible chunk size, minimum chunk size is just 30 milliseconds!
- 100k parameters;
- GPU and batching are supported;
- Radically simplified examples;
Migration
Please see the new examples.
New get_speech_timestamps
is a simplified and unified version of the old deprecated get_speech_ts
or get_speech_ts_adaptive
methods.
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)
New VADIterator
class serves as an example for streaming tasks instead of old deprecated VADiterator
and VADiteratorAdaptive
.
vad_iterator = VADIterator(model)
window_size_samples = 1536
for i in range(0, len(wav), window_size_samples):
speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
if speech_dict:
print(speech_dict, end=' ')
vad_iterator.reset_states()
from silero-vad.
Initial models, examples, utils for VAD only uploaded (no number detector or language classifier yet)
from silero-vad.
First readable public release
from silero-vad.
Added VAD latency and throughput metrics
from silero-vad.
Updated VAD quality
Before / after (precision / recall)
from silero-vad.
Added number detector
from silero-vad.
Language detector example, readme update + FAQ
from silero-vad.
Audiotok benchmarks added
Looks like all energy based solutions are kind of similar
from silero-vad.
Added a utility to tune the VAD params properly for a domain
from silero-vad.
Some final benchmarks posted here - pyannote/pyannote-audio#604 (comment)
Probably we are done with benchmarks for now
from silero-vad.
Added micro (10k params, 100x smaller) VAD models
from silero-vad.
Added micro (10k params, 100x smaller) VAD models for 8 kHz audio
from silero-vad.
- Added mini (100k params) VAD models for 8 kHz and 16 kHz
- Added adaptive vad iterator
from silero-vad.
- Fixed examples and notebooks
- Updated README
- Added adaptive examples
from silero-vad.
- Added a language classifier for 116 languages
- It classifies audios into languages and mutually intelligible language groups (i.e. Serbian + Bosnian + Croatian, Russian + Ukranian + others, Hindi + Urdu, etc), see the full list here and here
- Probably some artifical / unspoken languages will be excluded and a large model will be trained
from silero-vad.
improved language classifier
- 95 languages (85% accuracy), 58 language groups (90% accuracy)
- Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian are very similar)
- Trained on approx 20k hours of data (10k of which are for 5 most popular languages)
- 4.7M params
from silero-vad.
updated further reading section
from silero-vad.
Even Better V3 Silero VAD
- Models with even higher quality (just see the plots with metrics!);
- New model ~ large model >> all previous (even large) models;
- Now model works properly quality-wise, i.e. 100ms > 60ms > 30ms and16 kHz > 8 kHz;
from silero-vad.
This summarises new progress well
from silero-vad.
New V3 ONNX VAD Released
We finally were able to port a model to ONNX:
- Compact model (~100k params);
- Both PyTorch and ONNX models are not quantized;
- Same quality model as the latest best PyTorch release;
- Only 16kHz available now (ONNX has some issues with if-statements and / or tracing vs scripting) with cryptic errors;
- In our tests, on short audios (chunks) ONNX is 2-3x faster than PyTorch (this is mitigated with larger batches or long audios);
- Audio examples and non-core models moved out of the repo to save space;
from silero-vad.
Support For Sampling Rates Higher Than 16 kHz
jit
model now can handle 8, 16, 32 and 48 kHz directly (change implemented within the model itself);onnx
model as well, but only via external wrappers (we just use each n-th sample for higher sampling rates);- This support is mostly a hack, i.e. we just use each n-th sample for higher sampling rates (instead of averaging);
from silero-vad.
⚠️ Important Information for VAD Python Users ⚠️
If you are using the VAD in a:
- multi-threaded or
- a multi-process application
Do not forget to disable gradients in EACH process and / or thread.
Otherwise memory may leak noticeably.
from silero-vad.
from silero-vad.
New V4 VAD Released
Changes:
- Improved quality
- Improved perfomance
- Both 8k and 16k sampling rates are now supported by the ONNX model
- Batching is now supported by the ONNX model
- Added
audio_forward
method for one-line processing of a single or multiple audio without postprocessing
from silero-vad.
It is worth posting this chart:
from silero-vad.
- Remove picovoice mentions
from silero-vad.
- Deprecate language classifier and number detector models, since they are not maintained anymore.
from silero-vad.
Related Issues (20)
- armv7的设备上出现错误,请问是什么原因 HOT 1
- android和linux的推理记过不大一致是什么原因? HOT 1
- Bug report - [`AttributeError in torchaudio: 'list_available_backends' missing in video processing application`] HOT 1
- How to batching process properly HOT 6
- Feature request - 10 or 20ms audio support HOT 1
- Bug report — Incorrectly Detects Blowing into Mic as Voice HOT 3
- Can't make v3.1 release work with onnxruntime HOT 3
- How to get silero-vad models HOT 1
- Purpose of torch.set_num_threads(1) HOT 1
- ❓ Can window_size_samples be selected as 160 (10ms)?
- Bug report - Warnings about Unused Initializers HOT 5
- ⚠️Public pre-test of Silero-VAD v5 HOT 6
- This vad algorithm does not work well on Chinese data sets HOT 4
- Bug report - Unable to convert model to CoreML or to C HOT 2
- Failed to compile C++ VAD example HOT 3
- Is there a method or parameter that can filter out noise that is not human voice? HOT 1
- Help / Load model from silero_vad.onnx failed:Protobuf parsing failed. HOT 2
- English version of the dataset README HOT 1
- Compile silero-vad-onnx.cpp with MSVC 2022 HOT 4
- How to export an ONNX with opset version = 13? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from silero-vad.