jcvasquezc / disvoice Goto Github PK

View Code? Open in Web Editor NEW

330.0 13.0 77.0 52.37 MB

feature extraction from speech signals

Home Page: https://disvoice.readthedocs.io/en/latest/

License: MIT License

Python 4.11% Shell 0.11% Jupyter Notebook 95.79%

speech-analysis prosody phonation articulation pathological-speech signal-processing

disvoice's Introduction

DisVoice

DisVoice is a python framework designed to compute features from speech files. Disvoice computes glottal, phonation, articulation, prosody, phonological, and features representation learnig strategies using autoencders. The features can be computed both from sustained vowels and continuous speech utterances with the aim to recognize praliguistic aspects from speech.

The features can be used in classifiers to recognize emotions, or communication capabilities of patients with different speech disorders including diseases with functional origin such as larinx cancer or nodules; craneo-facial based disorders such as hipernasality developed by cleft-lip and palate; or neurodegenerative disorders such as Parkinson's or Hungtinton's diseases.

The features are also suitable to evaluate mood problems like depression based on speech patterns.

For additional details about each feature type, and how to use DisVoice, please check

Install

Praat should be installed first, and the executable file should be added as an environmental variable.

For linux

apt-get install praat
pip install disvoice

python setup.py install

For Windows

Donwload the latest version of Praat from https://www.fon.hum.uva.nl/praat/download_win.html

and add the path file to the environment variables

Then

pip install disvoice

python setup.py install

Kaldi must be installed beforehand for Kaldi output

Reference

If you use Disvoice for research purposes, please cite the following papers, depending on the features you use:

Glottal features

[1] Belalcázar-Bolaños, E. A., Orozco-Arroyave, J. R., Vargas-Bonilla, J. F., Haderlein, T., & Nöth, E. (2016, September). Glottal Flow Patterns Analyses for Parkinson’s Disease Detection: Acoustic and Nonlinear Approaches. In International Conference on Text, Speech, and Dialogue (pp. 400-407). Springer.

Phonation features

[1] T. Arias-Vergara, J. C. Vásquez-Correa, J. R. Orozco-Arroyave, Parkinson's Disease and Aging: Analysis of Their Effect in Phonation and Articulation of Speech, Cognitive computation, (2017).

[2] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

Articulation features

[1] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

[2]. J. R. Orozco-Arroyave, J. C. Vásquez-Correa et al. "NeuroSpeech: An open-source software for Parkinson's speech analysis." Digital Signal Processing (2017).

Prosody features

[1]. N., Dehak, P. Dumouchel, and P. Kenny. "Modeling prosodic features with joint factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 15.7 (2007): 2095-2103.

[2] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

Phonological features

[1] Vásquez-Correa, J. C., et al (2019). Phonet: a Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. Proc. Interspeech 2019, 549-553.

Representaton learning-based features

[1] Vasquez-Correa, J. C., et al. (2020). Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate. Speech Communication, 122, 56-67.

License

MIT

disvoice's People

Contributors

Stargazers

Watchers

Forkers

tariasvergara pj1527 ag027592 yvonne09 akimach rohithkodali xu-shihao janhavi028 baristaherman adrianarnaiz cdavidrios raven-97 yazidbish ondal90 mycanzhou pauperezt zwbjtu123 deeplearning2012 allensmile priyanjanachowdhury trendingtechnology waynezv samuelcahyawijaya spandandey21 ant-piole romeodingo ylwb lurein yfliao songmyekyo mhryciow ductho9799 ajinkyakulkarni14 virtualcharacters eunjung31 zeweixu0219 sogooddata anitalp luigiattorresi tikuma-lsuhsc anujsaraswat741 annafavaro superankie kitnakalab shwetakakde56 andres-gm lushangjun1998 ishine edzyldrm entn-at toanluongnhu1709 hadryan tustynadia metehangelgi treena908 lhfazry g-thor techthiyanes maxmax2016 speech-rsp baekms chienlinhuang1116 amirhussein96 jaedukseo jabingong runngezhang morenolaquatra somiljain7 artificialnouveau skyingithub lionelmommeja flycloud2010 marianaguez justtuananh aycippo ducky3307

disvoice's Issues

Will a parselmouth-praat version be released?

I'm running my codes on a shared commercial server and it is difficult to install praat, so I've been relying on the python version of praat called parselmouth. I wonder if this would be something that you would implement? Thanks!

TypeError: plot_pros() takes 5 positional arguments but 7 were given

Thanks for this project.
I encounter some errors when i run ./test_prosody.sh
Could you tell me what plot function I should call at this line?
prosody.py#L247

Error Message:
Traceback (most recent call last):
File "prosody.py", line 406, in
profeats = prosody_dynamic(audio_file)
File "prosody.py", line 248, in prosody_dynamic
plot_pros(data_audio, fs, F0, seg_voiced, Ev, featvec, f0v)

Unable to install disvoice on MaAC m1 chip, ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

I have a miniforge python environment on mac m1 chip. The reason am using this environment is beacause its the only way i acan successfully install TensorFlow on the my mac m1 chip. When trying to install disvoice with pip i get the error:
ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

any help will be appreciated.

Is there a simpler way to obtain glottal flow signal?

Hello, my apologies for opening this issue. I just need to extract from a *.wav file the glottal flow signal. Is it there a simple way to do this? In the ideal scenario the signature of my function should be something like follows:

def glottal_Flow(file_id):
some actions
return time,glottal_flow

I have been looking at the glottal.py file it's very complete indeed, I thought you might have gone through this before.

Thanks in advance.

VisibleDeprecationWarning and TypeError: can't convert cuda:0 device type tensor to numpy.

Thanks for the awesome toolkit.

After I installed all required packages.

I got the below warning message when I run the code glottal.py, articulaton.py respectively.

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray return array(a, dtype, copy=False, order=order, subok=True)
But I can get the desired output file.

When I run phonation.py or phonological.py, I can get the desired output file without any warning messages.

And if I run Representationlearning.py, I got the below error.

root@198c2471ad59:/codes/m456_smk/DisVoice/replearning# ./test_replearning.sh /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I tried several solutions based on similar issues on stackoverflow but none of them worked.

I am running these on docker. some environment as below shown:
Ubuntu 20.04.1
Python 3.6.9
Package Version

chainer 7.7.0
chardet 3.0.4
click 7.1.2
cloudpickle 1.3.0
cntk-gpu 2.7
cupy 7.8.0
cycler 0.10.0
Cython 0.29.21
grpcio 1.32.0
h5py 2.10.0
httplib2 0.18.1
idna 2.10
imageio 2.9.0
importlib-metadata 1.7.0
ipykernel 5.3.4
ipython 7.16.1
ipython-genutils 0.2.0
ipywidgets 7.5.1
kaldi-io 0.9.0
Keras 2.4.3
Keras-Preprocessing 1.1.2
librosa 0.8.0
matplotlib 3.0.2
numba 0.51.2
numpy 1.19.5
pandas 1.1.2
pandocfilters 1.4.2
parso 0.7.1
pathlib 1.0.1
pickleshare 0.7.5
Pillow 7.2.0
pip 21.0.1
pooch 1.2.0
praat-parselmouth 0.3.3
ptyprocess 0.6.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pydub 0.24.1
Pygments 2.6.1
pygobject 3.26.1
pygpu 0.7.6
pyparsing 2.4.7
pyrsistent 0.16.0
PySocks 1.7.1
pysptk 0.1.16
python-apt 1.6.5+ubuntu0.3
python-dateutil 2.8.1
python-distutils-extra 2.39
python-gflags 1.5.1
python-speech-features 0.6
pytz 2021.1
PyWavelets 1.1.1
PyYAML 5.3.1
pyzmq 19.0.2
qtconsole 4.7.6
QtPy 1.9.0
scikit-image 0.17.2
scikit-learn 0.23.2
scipy 1.5.2
seaborn 0.9.0
Send2Trash 1.5.0
setuptools 54.1.1
simplegeneric 0.8.1
six 1.15.0
SoundFile 0.10.3.post1
stopit 1.1.1
suds-jurko 0.6
tabulate 0.8.7
tensorboard 2.4.1
tensorboard-plugin-wit 1.7.0
tensorflow 2.4.1
tensorflow-estimator 2.4.0
tensorflow-gpu 2.3.0
tensorflow-probability 0.11.0
Theano 1.0.5
threadpoolctl 2.1.0
tifffile 2020.8.25
torch 1.7.0
torchaudio 0.7.0
torchvision 0.8.0.dev20200828+cu101
Werkzeug 1.0.1
wheel 0.36.2
widgetsnbextension 3.5.1
wrapt 1.12.1
zipp 3.1.0

Could you help me with these?
Many thanks

about the Phonological and replearning problem

thank you for your outstanding work! but i have some problems,
first about the phonological, i input my own wav flie(english and chinese) but get nothing in the pic, so i wonder to know how
to fix it
second about the replearning, typeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
sorry to bother you

Feature Selection Algorithm

you mentioned some feature selection algorithm such as LASSO, Relief-F in your paper. where is the implementation of feature selection algorithm in your code?

Error in Articulation features

Hello,

I am trying to extract articulation features and I am getting the following error. How can I fix it? Thank you!

`calc_residual(x_filt,x_emph,...)` instead of `calc_residual(x_filt,x_filt,...)`?

DisVoice/glottal/GCI.py

Line 147 in 9bda581

residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)

In Line 142 an x_emph is calculated, but it is never used thereafter. Shouldn't it be used in the subsequent lpc-filter process in Line 147? i.e.,

calc_residual(x_filt,x_filt,...) -> calc_residual(x_filt,x_emph,...)

FileNotFoundError working with Articulation Features

Hi, I have pulled the latest version of this repository. I am having trouble extracting the articulation features from my own audio. I was able to successfully run all of the provided IPython Notebooks.

File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/articulation.py", line 251, in extract_features_file
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),self.step)
File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../praat/praat_functions.py", line 139, in decodeF0
if os.stat(fileTxt).st_size==0:

FileNotFoundError: [Errno 2] No such file or directory: 'PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../tempfiles/tempF0articulationID17_pd__12_2_1_0.txt'

If it is relevant, I am running this script in my ML-Parkinson-Disease folder, which contains the DisVoice folder within it.

Can not extract glottal features.

I met a problem when I extracted glottal features, where winShift(=5)/1000*fs=0 cause the error.

So I fixed the error like this:
#Calculate LP-residual and extract N maxima per mean-based signal determined intervals
res = utils_gci.GetLPCresidual(x,winLenfs/1000,winShiftfs/1000,LPC_ord, VUV_inter);

Now the code is still not working. Can someone tell me how to fix this problem?

I used python2.7 in Unbuntu16.04 x64.

By the way, pysptk package can not be successfully installed under python3.6 virtualenv.

Thanks and Regards
XU SHIHAO

Error while extracting articulation features

Unable to extract features :
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/disvoice/articulation/../../tempfiles/tempFormantsartic4065_v.txt'

Thank you

Preprocessing before feature extraction

Hi @jcvasquezc thanks again for the great lib!

I am just wondering if I should perform any data preprocessing before feeding the audio to extract_features_file. My audio files are utterances (> 2 secs) mostly one per speaker (sometimes one contains a second speaker saying "yes" or "um") but there's loudness difference in the utterances between the two speakers. Do you suggest I scale the audio waveforms to (-1, +1), save the audio files, and then feed them to the feature extactors?

The down-stream task is classification so I didn't want to complicate it by performing more advanced preprocessing. minmax scaling seems sufficient enough do you think so?

Minimum length of input audio segment

Hi this is a really useful library for extracting interpretable speech features! Thanks!!

I want to ask about the minimum length of the input audio that goes into each of the feature extraction functions. It seems for the prosody features, the input has to be longer than 0.6 sec?

        pitchON = np.where(F0!=0)[0]
        dchange = np.diff(pitchON)
        change = np.where(dchange>1)[0]
        iniV = pitchON[0]

And this is the same for phonation features?

Thanks again.

Error with Glottal Features

The glottal feature worked for most of my audio files, but for one of them it had this error:

File "/Users/sruthikurada/PycharmProjects/ML-Parkinson-Disease/DisVoice/glottal/glottal.py", line 287, in extract_features_file
    df[k]=[feat_st[e]]
IndexError: index 20 is out of bounds for axis 0 with size 20

typo

I think this line should return pd.DataFrame(dff) instead of pd.DataFrame(df).

https://github.com/jcvasquezc/DisVoice/blob/6e88fcc3dd18d2ee3177133b2c76d07e7e2d5866/disvoice/phonological/phonological.py#L133C1-L134C1

Praat scripts to compute the funamental frequency do not work properly with relative paths

There are some errors when the arguments for the wav files are entered with a relative path, because of praat scripts do not allow relative paths.

There are two options to fix the issue:

Use absolute paths when you enter the audio file, or
Change the default algorithm to compute the fundamental frequency from 'praat' to 'rapt', in phonation and prosody analyses

Text File Not Found. I think that it internally uses Praat to estimate the pitch or Should I need to give the text file while running the prosody.py

Processing audio 1 from 1 001_ddk1_PCGITA.wav
Error: Cannot open file “/home/shsheikh/clones/DisVoice/praat/001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/home/shsheikh/clones/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <<../praat/vuv_praat.praat 001_ddk1_PCGITA.wav /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt ../tempfiles/tempVUV001_ddk1_PCGITA.txt 60 350 0.01 0.02 0.01>> not completed.

Traceback (most recent call last):
File "prosody.py", line 392, in
feat_vec=prosody_static(audio_file, flag_plots, pitch_method='praat')
File "prosody.py", line 271, in prosody_static
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),0.01)
File "/home/shsheikh/clones/DisVoice/prosody/../praat/praat_functions.py", line 136, in decodeF0
pitch_data=np.loadtxt(fileTxt)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 962, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 266, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 624, in open
raise IOError("%s not found." % path)
OSError: /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt not found.

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Hi @jcvasquezc ,

I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)

Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).

For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav

But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?

Many thanks

Getting None object when extracting glottal features

I'm getting a none object when extracting glottal features from an audio file

import disvoice throws error ModuleNotFoundError: No module named 'disvoice.glottal.glottal'

I assume this is because something went wrong during install, but I'm not sure what.

ValueError in glottal feature extraction

Thanks for the library! I am trying to extract glottal features from my audio files. The value error below showed up in two of my feature extraction pipelines. Do you have any idea about the cause of the error? The feature extraction takes a long time so I really want to keep it error-free if possible. Thanks again!

Traceback (most recent call last):
    feats = glottalf.extract_features_file(file_audio, static=False, plots=False, fmt="npy")
  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (20,) (2,)

and

  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (24,) (26,)

Error or prosody extraction

I can reproduce the plot on static mode, but get the error after that plot. Using dynamic mode gives a similar error without the resulting plot. I think the issue is that the argument file (wav) is appended to praat file instead of the current file.
Here is the complete error message:

$python prosody.py "./001_ddk1_PCGITA.wav" "featuresDDKdyn.txt" "static" "true"

Error: Cannot open file “/tmp/DisVoice/praat/./001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/tmp/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <</tmp/DisVoice/prosody/../praat/vuv_praat.praat ./001_ddk1_PCGITA.wav /tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt /tmp/DisVoice/prosody/../tempfiles/voicetemp.txt 60 350 0.01 0.02 0.01>> not completed.

/tmp/DisVoice/prosody/../praat/praat_functions.py:135: UserWarning: loadtxt: Empty input file: "/tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt"
  pitch_data=np.loadtxt(fileTxt)
Traceback (most recent call last):
  File "prosody.py", line 677, in <module>
    avgF0slopes,stdF0slopes,MSEF0, SVU,VU,UVU,VVU,VS,US,URD,VRD,URE,VRE,PR,maxvoicedlen,maxunvoicedlen,minvoicedlen,minunvoicedlen,rvuv,energyslope,RegCoefenergy,msqerrenergy,RegCoeff0,meanNeighborenergydiff,stdNeighborenergydiff, F0_rec, f0real, venergy, uenergy  = intonation_duration(audio_file, flag_plots=flag_plots)
  File "prosody.py", line 352, in intonation_duration
    pitch_z,ttotal = praat_functions.decodeF0(temp_filename_f0,len(data_audio)/fs,size_step)
  File "/tmp/DisVoice/prosody/../praat/praat_functions.py", line 140, in decodeF0
    time_voiced=pitch_data[0] # First datum is the time stamp
IndexError: index 0 is out of bounds for axis 0 with size 0