Code Monkey home page Code Monkey logo

Comments (6)

snakers4 avatar snakers4 commented on August 16, 2024 2

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

from silero-vad.

Simon-chai avatar Simon-chai commented on August 16, 2024 1

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

Thank you for the helpful answer!
My scenario is to process multi-channel audio files,one file at a time,using one process in python with one model.You can consider it as serial processing. And there will be fixed channel number each time. In my case,I think I don't have to concern the state problem as long as I remember reset state before processing next file, am I right?
Based on my simple tests,batching result is exactly the same as doing single process multiple times,I believe we can say that the batching result is solid. I also believe with appropriate adaptation of function get_speech_timestamps it will handle the batching result right. BTW,the model output is not perfect each chunk, but the function get_speech_timestamps can make the final result almost perfect,very impressive. Although I don't fully understand the function, it won't prevent me apllying it to batching result. I will try it tomorrow and observe how much performance improvement can be achieved through batch processing,because performance improvement is all about.
If I understand any part of it wrong,please point it out. Before that I am going to implement my plan .
Thank you so much!

from silero-vad.

Simon-chai avatar Simon-chai commented on August 16, 2024

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

Thank you for answering!
One last question,which part is error-prone when doing batching VAD? The result model return or the custom code deal with the result?If it's the later one, I think error is avoidable,right?
Beacause today i figure out how to input batching parameter,I think if the batching result is solid,it worth a try.
Looking forward the answer

from silero-vad.

snakers4 avatar snakers4 commented on August 16, 2024

The result model return or the custom code deal with the result?

The key problem is that the VAD is not stateless, i.e. it holds a state at all times.
When you use a batch, it has some sequential internal state (or memory) for each batch index.

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches".

The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern.

If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state.

from silero-vad.

snakers4 avatar snakers4 commented on August 16, 2024

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

from silero-vad.

Simon-chai avatar Simon-chai commented on August 16, 2024

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

What about processing all channels in batch then apply voting mechanism to the result separately?
I try a 20s duration audio with 2 channels,it take about 0.28s to process when doing it separately,but only 0.16s when batching (Intel Xeon w-2245 cpu).The improvement is huge enough. The only thing I worry about is the accuracy, but if the batching result is solid,there must a way to deal with it right. In other words,the post-processing function don't have to batch-friendly,do you agree?

from silero-vad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.