Comments (4)
Can you quantify "slow"?
When you just process the whole audio, you obviously assume that you have this audio on disk on in memory (or wherever).
Therefore you get tremendous boosts from batching. If you process files on disk - there is no point in streaming in case of PyTorch NNs.
When you do streaming, you pretend to "listen" the audio one chunk at a time like in real-life when you listen to someone.
Anyway it ends up working much more fast than real-time because in the example the python iterator does not really "wait" for audio to play, but processing audio one chunk at a time is obviously slower than just batching several chunks together.
If the speed is not enough for your applications, try a 10k
model, it is 3-4x faster on server hardware and 100x smaller.
We also did some research on model speed, when you mimic streaming it takes:
- Around
3-4ms
for a small model per300ms
audio batch (each batch consists of 8 windows, so you may assume that 1 window "takes" around0.5 ms
); - One
30 ms
window in webrtc takes around0.05 ms
, so a 300ms window would be around0.5 ms
; - We tried going even smaller (1k params) but speed boosts are negligible and quality starts to drop off even more;
- We are already on the same order of magnitude with WebRTC and I doubt you can go much faster;
from silero-vad.
but speed is slow compared with the example of "Full Audio" (take 2.879s)
This is roughly in line with our single thread benchmarks (since this file is 60 seconds long), assuming you use num_steps=8 (also maybe some initial warm-up time)
We have decided not to publish the GPU version (because it has very little production use), but if your target is processing files on disk, you can run 10 1-thread processes in parallel and you will get 300-400 RTS, which is ample
from silero-vad.
how to try 10k model
i use example code to download model
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_vad',
force_reload=True)
from silero-vad.
how to try 10k model
https://github.com/snakers4/silero-vad#getting-started
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_vad_micro',
force_reload=True)
from silero-vad.
Related Issues (20)
- Not able to perform real time VAD detection HOT 3
- how to handle concurrent requests HOT 1
- Feature request - [X]Please tell me if I can fine-tune this model on urdu dataset or train from scratch with same architecture for Urdu??
- Feature request - Finetuning or Pretraining for Urdu HOT 6
- ❓ Same .wav file but got different timestamps HOT 1
- armv7的设备上出现错误,请问是什么原因 HOT 1
- android和linux的推理记过不大一致是什么原因? HOT 1
- Bug report - [`AttributeError in torchaudio: 'list_available_backends' missing in video processing application`] HOT 1
- How to batching process properly HOT 6
- Feature request - 10 or 20ms audio support HOT 1
- Bug report — Incorrectly Detects Blowing into Mic as Voice HOT 3
- Can't make v3.1 release work with onnxruntime HOT 3
- How to get silero-vad models HOT 1
- Purpose of torch.set_num_threads(1) HOT 1
- ❓ Can window_size_samples be selected as 160 (10ms)?
- Bug report - Warnings about Unused Initializers HOT 5
- ⚠️Public pre-test of Silero-VAD v5 HOT 6
- This vad algorithm does not work well on Chinese data sets HOT 4
- Bug report - Unable to convert model to CoreML or to C HOT 2
- Failed to compile C++ VAD example HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from silero-vad.