Code Monkey home page Code Monkey logo

vad-python's Introduction

IMPORTANT NOTE

I'm not working on this project anymore. I advise everyone curious about voice detection to have a look at some more modern approaches using deep learning, like:

Voice Activity Detector

Python code to apply voice activity detector to wave file. Voice activity detector based on ration between energy in speech band and total energy.

Requirements

  • numpy
  • scipy
  • matplotlib
  • tkinter (sudo apt install python3-tk)

Basic Idea

Input audio data treated as following:

  1. Convert stereo to mono.
  2. Move a window of 20ms along the audio data.
  3. Calculate the ratio between energy of speech band and total energy for window.
  4. If ratio is more than threshold (0.6 by default) label windows as speech.
  5. Apply median filter with length of 0.5s to smooth detected speech regions.
  6. Represent speech regions as intervals of time.

How To

Create object:

  1. import vad module.
  2. create instance of class VoiceActivityDetector with full path to wave file.
  3. run method to detect speech regions.
  4. optionally, plot original wave data and detected speech region.

Example python script which saves speech intervals in json file:

./detectVoiceInWave.py ./wav-sample.wav ./results.json

Example python code to plot detected speech regions:

from vad import VoiceActivityDetector

filename = '/Users/user/wav-sample.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()

Alexander USOLTSEV 2015 (c) MIT License

vad-python's People

Contributors

a-n-rose avatar marsbroshok avatar mayank25402 avatar surajmahendra2552 avatar umagunturi avatar yapro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vad-python's Issues

An error occurred running the code

C:\Users\PC\PycharmProjects\untitled\Net resource\vad.py:150: RuntimeWarning: invalid value encountered in double_scalars
speech_ratio = sum_voice_energy / sum_full_energy

The above is an error that occurred when I ran the following code. How can I solve this problem?

from vad import VoiceActivityDetector
filename = r'C:\Users\PC\PycharmProjects\untitled\audio\test.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()

Odd label connection

Hello! For-loop in this function bugs my brain.

def _connect_energy_with_frequencies(self, data_freq, data_energy):
    energy_freq = {}
    for (i, freq) in enumerate(data_freq):
        energy_freq[abs(freq)] = data_energy[i]
    return energy_freq

data_freq array is "symmetrical" such that if there is '-1.3' there also is '1.3'.
We assigning different values to same dict key twice [abs(freq)] .
Not only this is pointless from general performance perspective.
Sign-opposite keys in dict represent amplitudes of same frequency wave in antiphase.
Shouldn't we do this instead ?

def _connect_energy_with_frequencies(self, data_freq, data_energy):
    energy_freq = {}
    for (i, freq) in enumerate(data_freq):
        if abs(freq) in energy_freq:
            energy_freq[abs(freq)] += data_energy[i]
        else:
            energy_freq[abs(freq)] = data_energy[i]
    return energy_freq

Or perhaps '-=' ?

bug in text

你想用 ratio ,却写成了 ration in:

Basic Idea
Input audio data treated as following:

  • Convert stereo to mono
  • Move a window of 20ms along the audio data
  • Calculate ration between energy of speech band and total energy for window
  • If ratio is more than threshold (0.6 by default) label windows as speech
  • Apply median filter with length of 0.5s to smooth detected speech regions
  • Represent speech regions as intervals of time

VAD not recognizing foreign language as speech

Hi. Is there a reason why VAD would fail to identify speech in foreign language in some cases? I tested it using some audio in spanish and french, and the convert_windows_to_readible_labels() failed to return any labels for those audio. I was wondering if there were some parameters I needed to initialize in those cases

VAD on singing voice?

I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from http://github.com/sigsep/open-unmix-pytorch

I have some questions:

  • Does it make sense to compute the threshold for each data_window independently? Instead of having a fixed speech_energy_threshold? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence.

  • Is there a clever way to choose parameters like sample_window, sample_overlap, speech_window that would be more appropriate for singing voice signals?

Thanks a lot!

ValueError: Unexpected end of file

I had an issue when i am running the code; it says:
Traceback (most recent call last):
File "detectVoiceInWave.py", line 6, in
v = VoiceActivityDetector(filename)
File "/home/ebtesam/Desktop/Code/VAD-python-master/VAD-python-master/vad.py", line 11, in init
self._read_wav(wave_input_filename)._convert_to_mono()
File "/home/ebtesam/Desktop/Code/VAD-python-master/VAD-python-master/vad.py", line 20, in _read_wav
self.rate, self.data = wf.read(wave_file)
File "/home/ebtesam/.local/lib/python3.7/site-packages/scipy/io/wavfile.py", line 246, in read
raise ValueError("Unexpected end of file.")
ValueError: Unexpected end of file.

how can i solve it?

Run VAD on Microphone Input Signal

Hello,

is there also an option to run this VAD on a frame (e.g. 30ms) of an microphone input signal?
Would be nice to provide an example script using PyAudio and this module.

Regards,
Josef

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.