marsbroshok / vad-python Goto Github PK

Voice Activity Detector in Python

Python 100.00%

vad-python's Introduction

IMPORTANT NOTE

I'm not working on this project anymore. I advise everyone curious about voice detection to have a look at some more modern approaches using deep learning, like:

Voice Activity Detector

Python code to apply voice activity detector to wave file. Voice activity detector based on ration between energy in speech band and total energy.

Requirements

numpy
scipy
matplotlib
tkinter (sudo apt install python3-tk)

Basic Idea

Input audio data treated as following:

Convert stereo to mono.
Move a window of 20ms along the audio data.
Calculate the ratio between energy of speech band and total energy for window.
If ratio is more than threshold (0.6 by default) label windows as speech.
Apply median filter with length of 0.5s to smooth detected speech regions.
Represent speech regions as intervals of time.

How To

Create object:

import vad module.
create instance of class VoiceActivityDetector with full path to wave file.
run method to detect speech regions.
optionally, plot original wave data and detected speech region.

Example python script which saves speech intervals in json file:

./detectVoiceInWave.py ./wav-sample.wav ./results.json

Example python code to plot detected speech regions:

from vad import VoiceActivityDetector

filename = '/Users/user/wav-sample.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()

Alexander USOLTSEV 2015 (c) MIT License

vad-python's People

Contributors

Stargazers

Watchers

Forkers

lewisling jhoelzl trikota kbespalov aitorbajo moses1994 tux19 yu-shengfeng rafaelcartenet leorez cdyangbo shairoz maggie0830 ludvigolsen dilawar plume yefenyi ningyanzhu ericustc papayazm linjucs expectation-maximization xiaochencen zhoujingwhy jamesmbartlett jongminbaik mbencherif diggerdu runngezhang zhaoforever krigans canaryspeech thuongkhanh284 reinhardhsu lianfei fenildf a-n-rose honglinquan bztia bikebroken frisch1 naoshin84 avsaditya mingmchen yh646492956 ninianhong audiobucket raphpapercup vyas97 xinkez yev111 xueshang-liulp nilportugues karunakarmahanty andy88631 heejae1213 kimhyojin3 karpnv elaa0505 yapro sciencefrog whu933314 mahao8 beimingmaster bant4u wang-baohua ropok dongsig karanamlokesh chenny0808 ebtesamjubran zyiyy cilt-uct shoegazerstella ajilim slseanwu swhan9873 umagunturi leekinboo horieyuan yazidbish eric-seekas colincsl olabiyisam sng2c ploncker shenyi666666 smeschke nayanvats redstorm82 shammur xiongmaoxia surajmahendra2552 mayank25402 yunzhongfei madkote morojs xianruiwang wenwanchen xuridongsheng7142

vad-python's Issues

An error occurred running the code

C:\Users\PC\PycharmProjects\untitled\Net resource\vad.py:150: RuntimeWarning: invalid value encountered in double_scalars
speech_ratio = sum_voice_energy / sum_full_energy

The above is an error that occurred when I ran the following code. How can I solve this problem？

from vad import VoiceActivityDetector
filename = r'C:\Users\PC\PycharmProjects\untitled\audio\test.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()

Odd label connection

Hello! For-loop in this function bugs my brain.

def _connect_energy_with_frequencies(self, data_freq, data_energy):
    energy_freq = {}
    for (i, freq) in enumerate(data_freq):
        energy_freq[abs(freq)] = data_energy[i]
    return energy_freq

data_freq array is "symmetrical" such that if there is '-1.3' there also is '1.3'.
We assigning different values to same dict key twice [abs(freq)] .
Not only this is pointless from general performance perspective.
Sign-opposite keys in dict represent amplitudes of same frequency wave in antiphase.
Shouldn't we do this instead ?

def _connect_energy_with_frequencies(self, data_freq, data_energy):
    energy_freq = {}
    for (i, freq) in enumerate(data_freq):
        if abs(freq) in energy_freq:
            energy_freq[abs(freq)] += data_energy[i]
        else:
            energy_freq[abs(freq)] = data_energy[i]
    return energy_freq

Or perhaps '-=' ?

bug in text

你想用 ratio ，却写成了 ration in：

Basic Idea
Input audio data treated as following:

Convert stereo to mono
Move a window of 20ms along the audio data
Calculate ration between energy of speech band and total energy for window
If ratio is more than threshold (0.6 by default) label windows as speech
Apply median filter with length of 0.5s to smooth detected speech regions
Represent speech regions as intervals of time

VAD not recognizing foreign language as speech

Hi. Is there a reason why VAD would fail to identify speech in foreign language in some cases? I tested it using some audio in spanish and french, and the convert_windows_to_readible_labels() failed to return any labels for those audio. I was wondering if there were some parameters I needed to initialize in those cases

why use 300hz~3000hz to detect?

VAD on singing voice?

I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from http://github.com/sigsep/open-unmix-pytorch

I have some questions:

Does it make sense to compute the threshold for each data_window independently? Instead of having a fixed speech_energy_threshold? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence.
Is there a clever way to choose parameters like sample_window, sample_overlap, speech_window that would be more appropriate for singing voice signals?

Thanks a lot!

Data indices as result of vad instead of time

How can I get data incides where speech starts or ends, instead of time when the same happens?

A more modern VAD with pre-trained models

I recently just searched the term "VAD" in github search and found many abandoned projects with some decent amount of traction, but mostly lacking pre-trained models.

So I decided to share our new pre-trained VAD:

Please see this vad - https://github.com/snakers4/silero-vad
Quality benchmarks here - https://github.com/snakers4/silero-vad#vad-quality-metrics
Overall comparison with webrtc - wiseman/py-webrtcvad#68

Not working with this wav (file attached)

Hello,
I am radio amateur and trying to find solution to detect voice in VHF audio record.
I have this sample without voice, but you algorithm detect it.

Why can't I detect a voice clip? What are the requirements for inputting audio?

the result.json is none

Any reference (paper) about this method?

ValueError: Unexpected end of file

I had an issue when i am running the code; it says:
Traceback (most recent call last):
File "detectVoiceInWave.py", line 6, in
v = VoiceActivityDetector(filename)
File "/home/ebtesam/Desktop/Code/VAD-python-master/VAD-python-master/vad.py", line 11, in init
self._read_wav(wave_input_filename)._convert_to_mono()
File "/home/ebtesam/Desktop/Code/VAD-python-master/VAD-python-master/vad.py", line 20, in _read_wav
self.rate, self.data = wf.read(wave_file)
File "/home/ebtesam/.local/lib/python3.7/site-packages/scipy/io/wavfile.py", line 246, in read
raise ValueError("Unexpected end of file.")
ValueError: Unexpected end of file.

how can i solve it?

Regards,
Josef