<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

How to batching process properly about silero-vad HOT 6 CLOSED

Simon-chai commented on August 16, 2024

How to batching process properly

from silero-vad.

Comments (6)

snakers4 commented on August 16, 2024 2

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

from silero-vad.

Simon-chai commented on August 16, 2024 1

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

Thank you for the helpful answer!
My scenario is to process multi-channel audio files，one file at a time，using one process in python with one model.You can consider it as serial processing. And there will be fixed channel number each time. In my case，I think I don't have to concern the state problem as long as I remember reset state before processing next file, am I right?
Based on my simple tests,batching result is exactly the same as doing single process multiple times,I believe we can say that the batching result is solid. I also believe with appropriate adaptation of function get_speech_timestamps it will handle the batching result right. BTW,the model output is not perfect each chunk, but the function get_speech_timestamps can make the final result almost perfect,very impressive. Although I don't fully understand the function, it won't prevent me apllying it to batching result. I will try it tomorrow and observe how much performance improvement can be achieved through batch processing，because performance improvement is all about.
If I understand any part of it wrong,please point it out. Before that I am going to implement my plan .
Thank you so much！

from silero-vad.

Simon-chai commented on August 16, 2024

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

Thank you for answering!
One last question,which part is error-prone when doing batching VAD? The result model return or the custom code deal with the result?If it's the later one, I think error is avoidable,right?
Beacause today i figure out how to input batching parameter,I think if the batching result is solid,it worth a try.
Looking forward the answer

from silero-vad.

snakers4 commented on August 16, 2024

The result model return or the custom code deal with the result?

The key problem is that the VAD is not stateless, i.e. it holds a state at all times.
When you use a batch, it has some sequential internal state (or memory) for each batch index.

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches".

The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern.

If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state.

from silero-vad.

snakers4 commented on August 16, 2024

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

from silero-vad.

Simon-chai commented on August 16, 2024

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

What about processing all channels in batch then apply voting mechanism to the result separately?
I try a 20s duration audio with 2 channels,it take about 0.28s to process when doing it separately,but only 0.16s when batching (Intel Xeon w-2245 cpu).The improvement is huge enough. The only thing I worry about is the accuracy, but if the batching result is solid,there must a way to deal with it right. In other words,the post-processing function don't have to batch-friendly,do you agree?

from silero-vad.

How to batching process properly about silero-vad HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent