dpirch / libfvad Goto Github PK

View Code? Open in Web Editor NEW

469.0 469.0 171.0 1.18 MB

Voice activity detection (VAD) library, based on WebRTC's VAD engine

License: BSD 3-Clause "New" or "Revised" License

Makefile 2.71% Shell 1.48% M4 1.25% C 93.44% CMake 1.12%

libfvad's People

Contributors

Stargazers

Watchers

Forkers

thisvip lhc180 ti-net tsec ulkursln tpurtell nichongjia templeblock leslie-wang technicianted wangbiaobiao kunle12 ahikaml davidmfrey eternityup entn-at hongshui3000 lastshekel seven1240 gcbeyond ericustc t13m itcolossus chuckcho bityangke victorqueiroz btyouth mariahyang soar0603 xieweimin reinforce-lab ulatekh zsmaguc piotrgregor dreamflyforever alvislin alvis0419 fanshuming audiobucket lianfei stevenlol insgc zhuleiustc yh646492956 eugenebas forzalife chuanjiacai wangjinquan jiafenggit sonwendi xushoucai 9define daimon99 mingmchen vishnumadduri wolverindev talonvoice gaoxiao cristeab looktech rasonyang flyasyoucan bernhardglueck wy2609 virtualsistemas speechdnn cheyanggit j000z linecode sibbl yfliao ishine labtwin-gmbh 973432436 wjlee-barco gamemackerel brucewangzhihua hansengregpa vsymguysung yamachu liesenf eric-seekas dipman missineai miaogang1982 zhoug2 19317362 zuowanbushiwo chenny0808 crow-misia peter05010402 zhiqizhang michaeljayw mengxiangru ml2457 rvolosatovs geekcoder1028 evolvedexperiment kkkgirl juneren

libfvad's Issues

how to use the vad ?

hello,

After I build the libfvad,
I want use the VAD to cut a speech wav or raw ,
how to use ?

thx

My audio data is in the form of unsigned char* arrays. fvad_process takes a signed short. Do I need to just convert from char to short? Will there be a loss of correctness as far as the vad is concerned?

cut audio into chunks and extract start time and end time.

Hi,
I want to run libfvab on my own audio file and want to save audio chunks detected as voiced frames in ".wav" format. I want to extract start time and end time of each chunks.

Currently I am able to reproduce the the output "libfvad/tests/data/wavtest.expect". Now I can see it detects voiced and unvoiced frames out of audio file.

Thanks

how to build ？

hi,
When I git clone the project,
and then cd libfvad/
./configure
but tell me
-bash: ./configure: No such file or directory
of course, I have run
sudo apt install autoconf libtool pkg-config

SO what's up ?
how to solve the problem ?

Can't rebuild examples/fvadwav.c

After building and installing I try:

cp examples/fvadwav.c ~/tmp_proj/
g++ -v -g fvadwav.c

and get an error:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 8.3.0-6ubuntu1~18.04.1' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Ubuntu 8.3.0-6ubuntu1~18.04.1) 
COLLECT_GCC_OPTIONS='-v' '-g' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/8/cc1plus -quiet -v -imultiarch x86_64-linux-gnu -D_GNU_SOURCE fvadwav.c -quiet -dumpbase fvadwav.c -mtune=generic -march=x86-64 -auxbase fvadwav -g -version -fstack-protector-strong -Wformat -Wformat-security -o /tmp/ccxmJrHK.s
GNU C++14 (Ubuntu 8.3.0-6ubuntu1~18.04.1) version 8.3.0 (x86_64-linux-gnu)
	compiled by GNU C version 8.3.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/include/x86_64-linux-gnu/c++/8"
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/8/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/8
 /usr/include/x86_64-linux-gnu/c++/8
 /usr/include/c++/8/backward
 /usr/lib/gcc/x86_64-linux-gnu/8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C++14 (Ubuntu 8.3.0-6ubuntu1~18.04.1) version 8.3.0 (x86_64-linux-gnu)
	compiled by GNU C version 8.3.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 27ae9a20c27efba91196488dcf7713bb
COLLECT_GCC_OPTIONS='-v' '-g' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 as -v --64 -o /tmp/ccNkTGt2.o /tmp/ccxmJrHK.s
GNU ассемблер, версия 2.30 (x86_64-linux-gnu); используется BFD версии (GNU Binutils for Ubuntu) 2.30
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-g' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/8/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/8/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper -plugin-opt=-fresolution=/tmp/cc5K4mhk.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. /tmp/ccNkTGt2.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o
/tmp/ccNkTGt2.o: In function `process_sf(SNDFILE_tag*, Fvad*, unsigned long, SNDFILE_tag**, _IO_FILE*)':
/home/t4nner/proj/learning/vad/fvadwav.c:38: undefined reference to `sf_read_double'
/home/t4nner/proj/learning/vad/fvadwav.c:44: undefined reference to `fvad_process'
/home/t4nner/proj/learning/vad/fvadwav.c:57: undefined reference to `sf_write_double'
/tmp/ccNkTGt2.o: In function `main':
/home/t4nner/proj/learning/vad/fvadwav.c:114: undefined reference to `fvad_new'
/home/t4nner/proj/learning/vad/fvadwav.c:126: undefined reference to `fvad_set_mode'
/home/t4nner/proj/learning/vad/fvadwav.c:179: undefined reference to `sf_open'
/home/t4nner/proj/learning/vad/fvadwav.c:181: undefined reference to `sf_strerror'
/home/t4nner/proj/learning/vad/fvadwav.c:190: undefined reference to `fvad_set_sample_rate'
/home/t4nner/proj/learning/vad/fvadwav.c:205: undefined reference to `sf_open'
/home/t4nner/proj/learning/vad/fvadwav.c:207: undefined reference to `sf_strerror'
/home/t4nner/proj/learning/vad/fvadwav.c:242: undefined reference to `sf_close'
/home/t4nner/proj/learning/vad/fvadwav.c:244: undefined reference to `sf_close'
/home/t4nner/proj/learning/vad/fvadwav.c:246: undefined reference to `fvad_free'
collect2: error: ld returned 1 exit status

my /usr/include is:

➜  vad ll /usr/include | grep sndfile  
-rw-r--r--   1 root root  29K июн  8  2019 sndfile.h
-rw-r--r--   1 root root  13K июн  8  2019 sndfile.hh

/usr/local/include:

➜  vad ll -t /usr/local/include | head -n 2
-rw-r--r-- 1 root root 2,6K янв 28 15:01 fvad.h

how can I reproduce your example code without errors?

Update

I found solution

g++ -v -g fvadwav.c -lsndfile -lfvad

Please, add it to description.

Does not take into account bit depth and channel numbers

To measure how many bytes are in x milliseconds for an audio sample you must take into account the bit depth and number of channels. For example, to find n bytes in x milliseconds this is what I do

long bytes_per_second(int sample_rate, int8_t bit_depth, int8_t channels) {
    auto byte_depth = bit_depth / 8;

    return sample_rate * channels * byte_depth;
}

auto bytes_per_second = bytes_per_second(sample_rate, bit_depth, channels);
auto bytes_per_millisecond = bps / 1000;
auto bytes_per_chunk = bytes_per_millisecond * // 10, 20 or 30 milliseconds;

For 10 milliseconds of bytes of an 8000 sample rate this could be 80 bytes or 160 bytes for 16 bits or 160 bytes for 8 bits but 2 channels etc. Currently fvad_process only accepts 80 bytes for 8000 and 10 milliseconds. Does the WebRTC vad place these limitations?

Also, would it hurt the WebRTC's vad accuracy if I gave it bytes within the in-between range of 10 - 20 - 30 milliseconds? Like 11 or 24 milliseconds worth of bytes? This is a problem for me because I end up with left over bytes that don't fit neatly into 10,20,30.

So I was hoping I could redistribute the byte sets to include 1 more byte per chunk that will help use up the remainder bytes. Hope this makes sense given 18 bytes / 4 chunk_size = 4 bytes per chunk with remainder of 2 bytes solution could be to use 5 bytes per chunk for the first 10 bytes and then 4 byte chunks for the remaining 8 bytes

C++ only works when installed from repo not from release

Hey there!

Trying to get this lib to work for C++ was giving me headaches until I discovered this pull request (Add support against linkage with C++ programs ), and your comment that you added the support with this commit.

I downloaded the lib from the release page which is quite outdated and this fix was not included in that release. Cloning and building solved my problem and the header then included the code I needed.

So either advise people to build directly from master or make a new release.

Many thanks :)

are there any parameters I can configure the vad module?

or it just always out put the same thing with the same audio input?

Information request

Hi,
I'm new to the libfvad library.
I wondering about the usage of this library on some embedded 32bit microcontroller Cortex M4 based (or ESP32 too), but I'm not able to find any kind of information about the memory requirements and the CPU power too.
Has anyone experienced that situation ?
Thank you.
Regards.

MinGW build

I'm trying to build libfvad in MS Windows (i.e. using MinGW) and was getting strange errors, but I think I have them resolved. If I get it to run, I'll issue a pull request.

"-std=c11" is required

While README.md states:

Recommended CFLAGS to turn on warnings: -std=c11 -Wall -Wextra -Wpedantic

It is actually required to use "-std=c11" for the compiled library to work, otherwise problems will arise either when compiling or importing. Maybe the flag should be added to the makefiles.

Tested with GCC 4.8.5: if no "-std" flags are specified, compilation fails with

fvad.c:75:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
     for (size_t i = 0; i < arraysize(valid_rates); i++) {
     ^
fvad.c:75:5: note: use option -std=c99 or -std=gnu99 to compile your code

and if "-std=c99" is specified, if the warnings are turned on, many warnings can be seen:

In file included from signal_processing/signal_processing_library.h:34:0,
                 from signal_processing/get_scaling_square.c:18:
signal_processing/spl_inl.h: In function ‘WebRtcSpl_CountLeadingZeros32’:
signal_processing/spl_inl.h:42:3: warning: implicit declaration of function ‘static_assert’ [-Wimplicit-function-declaration]
   RTC_COMPILE_ASSERT(sizeof(unsigned int) == sizeof(uint32_t));
   ^

using the compiled library will raise an error similar to "undefined symbol: static_assert".

node integration

I integrate it with node
https://github.com/4t4nner/js-libfvad

Maybe adding this example to code or description is a good idea?

How do I use this library in my own application?

I'm new to C++, but I just want to use the fvad_process functionality, how do I include this feature in my own application?

Go wrappers

I wrote a simple Go wrapper for libfvad, if you'd like to start a readme section about that. Or not. :)

error of using sndfile

HI,
Thanks for you works,
when I tried to compile the code,
the error occured:
./configure: line 12008: syntax error near unexpected token sndfile,' ./configure: line 12008: PKG_CHECK_MODULES(sndfile, sndfile)'

I have been install libsndfile1-dev already, but still happen,
this will cause the example can not be compiled as well,
Can you tell me how to fix this bug

Regards
Robin

uality benchmarks between audiotok / webrtcvad / silero-vad

Instruments

We have compared 3 easy-to-use off-the-shelf instruments for voice activity / audio activity detection:

Silero-vad from here - https://github.com/snakers4/silero-vad;
A popular python version of the webrtcvad - https://github.com/wiseman/py-webrtcvad);
Audiotok from this repo - https://github.com/amsehili/auditok;

Caveats

Full disclaimer - we are mostly interested in voice detection, not just silence detection;
In our extensive experiments we noticed that WebRTC is actually much better in detecting silence than detecting speech (probably by design). It has a lot of false positives when detecting speech;
audiotok provides Audio Activity Detection, which probably may just mean detecting silence in layman's terms;
silero-vad is geared towards speech detection (as opposed to noise or music);
A sensible chunk size for our VAD is at least 75-100ms (pauses in speech shorter than 100ms are not very meaningful, but we prefer 150-250ms chunks, see quality comparison here), while audiotok and webrtcvad use 30-50ms chunks (we used default values of 30 ms for webrtcvad and 50 ms for audiotok );
We have excluded pyannote-audio for now (https://github.com/pyannote/pyannote-audio), since it features pre-trained models on only limited academic datasets and is mostly a recipe collection / toolkit to build your own tools, not a finished tool per se (also for such a simple task the amount of code bloat is puzzling from a production standpoint, our internal vad training code is just literally 5 python modules);

Methodology

Please refer here - https://github.com/snakers4/silero-vad#vad-quality-metrics-methodology

Quality Benchmarks

Finished tests:

Portability and Speed

Looks like originally webrtcvad is written in С++ around 2016, so theoretically it can be ported into many platforms;
I have inquired in the community, the original VAD seems to have matured and python version is based on 2018 version;
Looks like audiotok is written in plain python, but I guess the algorithm itself can be ported;
silero-vad is based on PyTorch and ONNX, so it boasts the same portability options both these frameworks feature (mobile, different backends for ONNX, java and C++ inference APIs, graph conversion from ONNX);

This is by no means an extensive and full research on the topic, please point out if anything is lacking.

./configure: no such file or directory' error, why?

I solved

libfvad classifies any noice as a human voice.

Any noice or intense sound is classified as a human voice.

Facing issue while Building libfvad on Ubuntu 22.04.1 LTS

unable to install on libfvad on Ubuntu 22.04.1 LTS while doing sudo autoreconf -i . i am facing error as below:

libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'ac-aux'.
libtoolize: copying file 'ac-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
configure.ac:16: error: possibly undefined macro: _AC_C_STD_TRY
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
configure.ac:22: error: possibly undefined macro: AC_MSG_ERROR
autoreconf: error: /usr/bin/autoconf failed with exit status: 1

Please help me to fix these issue

modules/audio_processing/vad porting

Thanks for the good project!

I'm new to VAD. Seems like this folder modules/audio_processing/vad also includes some vad related codes. is it? if yes, are you planning to port that part too?

Also wonder to know if you want to write some python binding examples or not. People can easily manipulate this library. :-)

what does "libfvad" stand for?

I guess "Library for Voice Activity Detection", but I'm not sure... It would be nice in the README

How do I use it in unimrcp

I try to recode unimrcp mpf_activity_detector, and use it to replace the vad mod
but it comes error like:
../../platforms/libunimrcp-client/.libs/libunimrcpclient.so: undefined reference to fvad_new' ../../platforms/libunimrcp-client/.libs/libunimrcpclient.so: undefined reference to fvad_reset'
../../platforms/libunimrcp-client/.libs/libunimrcpclient.so: undefined reference to `fvad_process'

How to use in android application?

How to use this for android application?Who can help me.