Comments (6)
Hi,
Batching is complicated and error-prone, and we dicourage users against using it.
from silero-vad.
If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.
Thank you for the helpful answer!
My scenario is to process multi-channel audio files,one file at a time,using one process in python with one model.You can consider it as serial processing. And there will be fixed channel number each time. In my case,I think I don't have to concern the state problem as long as I remember reset state before processing next file, am I right?
Based on my simple tests,batching result is exactly the same as doing single process multiple times,I believe we can say that the batching result is solid. I also believe with appropriate adaptation of function get_speech_timestamps
it will handle the batching result right. BTW,the model output is not perfect each chunk, but the function get_speech_timestamps
can make the final result almost perfect,very impressive. Although I don't fully understand the function, it won't prevent me apllying it to batching result. I will try it tomorrow and observe how much performance improvement can be achieved through batch processing,because performance improvement is all about.
If I understand any part of it wrong,please point it out. Before that I am going to implement my plan .
Thank you so much!
from silero-vad.
Hi,
Batching is complicated and error-prone, and we dicourage users against using it.
Thank you for answering!
One last question,which part is error-prone when doing batching VAD? The result model return or the custom code deal with the result?If it's the later one, I think error is avoidable,right?
Beacause today i figure out how to input batching parameter,I think if the batching result is solid,it worth a try.
Looking forward the answer
from silero-vad.
The result model return or the custom code deal with the result?
The key problem is that the VAD is not stateless, i.e. it holds a state at all times.
When you use a batch, it has some sequential internal state (or memory) for each batch index.
If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.
The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches".
The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern.
If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state.
from silero-vad.
Although I don't fully understand the function, it won't prevent me apllying it to batching result
My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.
You see, the heuristics in the post-processing function are very non-batch friendly.
from silero-vad.
Although I don't fully understand the function, it won't prevent me apllying it to batching result
My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.
You see, the heuristics in the post-processing function are very non-batch friendly.
What about processing all channels in batch then apply voting mechanism to the result separately?
I try a 20s duration audio with 2 channels,it take about 0.28s to process when doing it separately,but only 0.16s when batching (Intel Xeon w-2245 cpu).The improvement is huge enough. The only thing I worry about is the accuracy, but if the batching result is solid,there must a way to deal with it right. In other words,the post-processing function don't have to batch-friendly,do you agree?
from silero-vad.
Related Issues (20)
- ❓ Can window_size_samples be selected as 160 (10ms)?
- Bug report - Warnings about Unused Initializers HOT 5
- ⚠️Public pre-test of Silero-VAD v5 HOT 6
- This vad algorithm does not work well on Chinese data sets HOT 4
- Bug report - Unable to convert model to CoreML or to C HOT 2
- Failed to compile C++ VAD example HOT 3
- Is there a method or parameter that can filter out noise that is not human voice? HOT 1
- Help / Load model from silero_vad.onnx failed:Protobuf parsing failed. HOT 3
- English version of the dataset README HOT 1
- Compile silero-vad-onnx.cpp with MSVC 2022 HOT 4
- How to export an ONNX with opset version = 13? HOT 2
- Feature request - DO NOT disable PyTorch gradient globally when using PyTorch JIT model HOT 3
- ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Load model from /sdcard/Download/huigu/silero_vad.onnx failed:system error number 13 HOT 2
- Is it possible to run silero-vad on a hosted live stream url? HOT 1
- ❓ Questions / Help / Support
- Block when using multiprocess HOT 1
- Bug report - cannot import name 'get_number_ts' HOT 3
- would the c++ example still work after the new silero_vad.onnx release ? HOT 5
- Bug report - [installation] Cannot import name 'get_number_ts' from 'utils_vad' HOT 2
- Properly loading v3.1 and v4 on a non-clean installation HOT 19
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from silero-vad.