Code Monkey home page Code Monkey logo

disvoice's Introduction

DisVoice

Documentation Status

Image

DisVoice is a python framework designed to compute features from speech files. Disvoice computes glottal, phonation, articulation, prosody, phonological, and features representation learnig strategies using autoencders. The features can be computed both from sustained vowels and continuous speech utterances with the aim to recognize praliguistic aspects from speech.

The features can be used in classifiers to recognize emotions, or communication capabilities of patients with different speech disorders including diseases with functional origin such as larinx cancer or nodules; craneo-facial based disorders such as hipernasality developed by cleft-lip and palate; or neurodegenerative disorders such as Parkinson's or Hungtinton's diseases.

The features are also suitable to evaluate mood problems like depression based on speech patterns.

For additional details about each feature type, and how to use DisVoice, please check

Install

Praat should be installed first, and the executable file should be added as an environmental variable.

For linux

apt-get install praat
pip install disvoice

or

python setup.py install

For Windows

Donwload the latest version of Praat from https://www.fon.hum.uva.nl/praat/download_win.html

and add the path file to the environment variables

Then

pip install disvoice

or

python setup.py install

Kaldi must be installed beforehand for Kaldi output

Reference

If you use Disvoice for research purposes, please cite the following papers, depending on the features you use:

Glottal features

[1] Belalcázar-Bolaños, E. A., Orozco-Arroyave, J. R., Vargas-Bonilla, J. F., Haderlein, T., & Nöth, E. (2016, September). Glottal Flow Patterns Analyses for Parkinson’s Disease Detection: Acoustic and Nonlinear Approaches. In International Conference on Text, Speech, and Dialogue (pp. 400-407). Springer.

Phonation features

[1] T. Arias-Vergara, J. C. Vásquez-Correa, J. R. Orozco-Arroyave, Parkinson's Disease and Aging: Analysis of Their Effect in Phonation and Articulation of Speech, Cognitive computation, (2017).

[2] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

Articulation features

[1] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

[2]. J. R. Orozco-Arroyave, J. C. Vásquez-Correa et al. "NeuroSpeech: An open-source software for Parkinson's speech analysis." Digital Signal Processing (2017).

Prosody features

[1]. N., Dehak, P. Dumouchel, and P. Kenny. "Modeling prosodic features with joint factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 15.7 (2007): 2095-2103.

[2] Vásquez-Correa, J. C., et al. "Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease." Journal of communication disorders 76 (2018): 21-36.

Phonological features

[1] Vásquez-Correa, J. C., et al (2019). Phonet: a Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. Proc. Interspeech 2019, 549-553.

Representaton learning-based features

[1] Vasquez-Correa, J. C., et al. (2020). Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate. Speech Communication, 122, 56-67.

License

MIT

disvoice's People

Contributors

deepsource-autofix[bot] avatar deepsourcebot avatar dependabot[bot] avatar g-thor avatar jcvasquezc avatar luigiattorresi avatar neshvig10 avatar nicanor5 avatar nicanorgarcia avatar samuelcahyawijaya avatar tariasvergara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

disvoice's Issues

TypeError: plot_pros() takes 5 positional arguments but 7 were given

Thanks for this project.
I encounter some errors when i run ./test_prosody.sh
Could you tell me what plot function I should call at this line?
prosody.py#L247

Error Message:
Traceback (most recent call last):
File "prosody.py", line 406, in
profeats = prosody_dynamic(audio_file)
File "prosody.py", line 248, in prosody_dynamic
plot_pros(data_audio, fs, F0, seg_voiced, Ev, featvec, f0v)

Minimum length of input audio segment

Hi this is a really useful library for extracting interpretable speech features! Thanks!!

I want to ask about the minimum length of the input audio that goes into each of the feature extraction functions. It seems for the prosody features, the input has to be longer than 0.6 sec?

        pitchON = np.where(F0!=0)[0]
        dchange = np.diff(pitchON)
        change = np.where(dchange>1)[0]
        iniV = pitchON[0]

And this is the same for phonation features?

Thanks again.

Error in Articulation features

Hello,

I am trying to extract articulation features and I am getting the following error. How can I fix it? Thank you!

image

VisibleDeprecationWarning and TypeError: can't convert cuda:0 device type tensor to numpy.

Thanks for the awesome toolkit.

After I installed all required packages.

I got the below warning message when I run the code glottal.py, articulaton.py respectively.

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray return array(a, dtype, copy=False, order=order, subok=True)
But I can get the desired output file.

When I run phonation.py or phonological.py, I can get the desired output file without any warning messages.

And if I run Representationlearning.py, I got the below error.

root@198c2471ad59:/codes/m456_smk/DisVoice/replearning# ./test_replearning.sh /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I tried several solutions based on similar issues on stackoverflow but none of them worked.

I am running these on docker. some environment as below shown:
Ubuntu 20.04.1
Python 3.6.9
Package Version


chainer 7.7.0
chardet 3.0.4
click 7.1.2
cloudpickle 1.3.0
cntk-gpu 2.7
cupy 7.8.0
cycler 0.10.0
Cython 0.29.21
grpcio 1.32.0
h5py 2.10.0
httplib2 0.18.1
idna 2.10
imageio 2.9.0
importlib-metadata 1.7.0
ipykernel 5.3.4
ipython 7.16.1
ipython-genutils 0.2.0
ipywidgets 7.5.1
kaldi-io 0.9.0
Keras 2.4.3
Keras-Preprocessing 1.1.2
librosa 0.8.0
matplotlib 3.0.2
numba 0.51.2
numpy 1.19.5
pandas 1.1.2
pandocfilters 1.4.2
parso 0.7.1
pathlib 1.0.1
pickleshare 0.7.5
Pillow 7.2.0
pip 21.0.1
pooch 1.2.0
praat-parselmouth 0.3.3
ptyprocess 0.6.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pydub 0.24.1
Pygments 2.6.1
pygobject 3.26.1
pygpu 0.7.6
pyparsing 2.4.7
pyrsistent 0.16.0
PySocks 1.7.1
pysptk 0.1.16
python-apt 1.6.5+ubuntu0.3
python-dateutil 2.8.1
python-distutils-extra 2.39
python-gflags 1.5.1
python-speech-features 0.6
pytz 2021.1
PyWavelets 1.1.1
PyYAML 5.3.1
pyzmq 19.0.2
qtconsole 4.7.6
QtPy 1.9.0
scikit-image 0.17.2
scikit-learn 0.23.2
scipy 1.5.2
seaborn 0.9.0
Send2Trash 1.5.0
setuptools 54.1.1
simplegeneric 0.8.1
six 1.15.0
SoundFile 0.10.3.post1
stopit 1.1.1
suds-jurko 0.6
tabulate 0.8.7
tensorboard 2.4.1
tensorboard-plugin-wit 1.7.0
tensorflow 2.4.1
tensorflow-estimator 2.4.0
tensorflow-gpu 2.3.0
tensorflow-probability 0.11.0
Theano 1.0.5
threadpoolctl 2.1.0
tifffile 2020.8.25
torch 1.7.0
torchaudio 0.7.0
torchvision 0.8.0.dev20200828+cu101
Werkzeug 1.0.1
wheel 0.36.2
widgetsnbextension 3.5.1
wrapt 1.12.1
zipp 3.1.0

Could you help me with these?
Many thanks

Will a parselmouth-praat version be released?

I'm running my codes on a shared commercial server and it is difficult to install praat, so I've been relying on the python version of praat called parselmouth. I wonder if this would be something that you would implement? Thanks!

Unable to install disvoice on MaAC m1 chip, ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

I have a miniforge python environment on mac m1 chip. The reason am using this environment is beacause its the only way i acan successfully install TensorFlow on the my mac m1 chip. When trying to install disvoice with pip i get the error:
ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

any help will be appreciated.

Feature Selection Algorithm

you mentioned some feature selection algorithm such as LASSO, Relief-F in your paper. where is the implementation of feature selection algorithm in your code?

Prosodic Features

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Hi @jcvasquezc ,

I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)

Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).

For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav

But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?

Many thanks

Error or prosody extraction

I can reproduce the plot on static mode, but get the error after that plot. Using dynamic mode gives a similar error without the resulting plot. I think the issue is that the argument file (wav) is appended to praat file instead of the current file.
Here is the complete error message:

$python prosody.py "./001_ddk1_PCGITA.wav" "featuresDDKdyn.txt" "static" "true"
Error: Cannot open file “/tmp/DisVoice/praat/./001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/tmp/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <</tmp/DisVoice/prosody/../praat/vuv_praat.praat ./001_ddk1_PCGITA.wav /tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt /tmp/DisVoice/prosody/../tempfiles/voicetemp.txt 60 350 0.01 0.02 0.01>> not completed.

/tmp/DisVoice/prosody/../praat/praat_functions.py:135: UserWarning: loadtxt: Empty input file: "/tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt"
  pitch_data=np.loadtxt(fileTxt)
Traceback (most recent call last):
  File "prosody.py", line 677, in <module>
    avgF0slopes,stdF0slopes,MSEF0, SVU,VU,UVU,VVU,VS,US,URD,VRD,URE,VRE,PR,maxvoicedlen,maxunvoicedlen,minvoicedlen,minunvoicedlen,rvuv,energyslope,RegCoefenergy,msqerrenergy,RegCoeff0,meanNeighborenergydiff,stdNeighborenergydiff, F0_rec, f0real, venergy, uenergy  = intonation_duration(audio_file, flag_plots=flag_plots)
  File "prosody.py", line 352, in intonation_duration
    pitch_z,ttotal = praat_functions.decodeF0(temp_filename_f0,len(data_audio)/fs,size_step)
  File "/tmp/DisVoice/prosody/../praat/praat_functions.py", line 140, in decodeF0
    time_voiced=pitch_data[0] # First datum is the time stamp
IndexError: index 0 is out of bounds for axis 0 with size 0

Error while extracting articulation features

Unable to extract features :
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/disvoice/articulation/../../tempfiles/tempFormantsartic4065_v.txt'
Screenshot from 2023-08-09 10-45-12

Thank you

Praat scripts to compute the funamental frequency do not work properly with relative paths

There are some errors when the arguments for the wav files are entered with a relative path, because of praat scripts do not allow relative paths.

There are two options to fix the issue:

  1. Use absolute paths when you enter the audio file, or
  2. Change the default algorithm to compute the fundamental frequency from 'praat' to 'rapt', in phonation and prosody analyses

Error with Glottal Features

The glottal feature worked for most of my audio files, but for one of them it had this error:

File "/Users/sruthikurada/PycharmProjects/ML-Parkinson-Disease/DisVoice/glottal/glottal.py", line 287, in extract_features_file
    df[k]=[feat_st[e]]
IndexError: index 20 is out of bounds for axis 0 with size 20

Preprocessing before feature extraction

Hi @jcvasquezc thanks again for the great lib!

I am just wondering if I should perform any data preprocessing before feeding the audio to extract_features_file. My audio files are utterances (> 2 secs) mostly one per speaker (sometimes one contains a second speaker saying "yes" or "um") but there's loudness difference in the utterances between the two speakers. Do you suggest I scale the audio waveforms to (-1, +1), save the audio files, and then feed them to the feature extactors?

The down-stream task is classification so I didn't want to complicate it by performing more advanced preprocessing. minmax scaling seems sufficient enough do you think so?

Can not extract glottal features.

I met a problem when I extracted glottal features, where winShift(=5)/1000*fs=0 cause the error.

So I fixed the error like this:
#Calculate LP-residual and extract N maxima per mean-based signal determined intervals
res = utils_gci.GetLPCresidual(x,winLenfs/1000,winShiftfs/1000,LPC_ord, VUV_inter);

Now the code is still not working. Can someone tell me how to fix this problem?
image

I used python2.7 in Unbuntu16.04 x64.

By the way, pysptk package can not be successfully installed under python3.6 virtualenv.

Thanks and Regards
XU SHIHAO

ValueError in glottal feature extraction

Thanks for the library! I am trying to extract glottal features from my audio files. The value error below showed up in two of my feature extraction pipelines. Do you have any idea about the cause of the error? The feature extraction takes a long time so I really want to keep it error-free if possible. Thanks again!

Traceback (most recent call last):
    feats = glottalf.extract_features_file(file_audio, static=False, plots=False, fmt="npy")
  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (20,) (2,)

and

  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (24,) (26,) 

FileNotFoundError working with Articulation Features

Hi, I have pulled the latest version of this repository. I am having trouble extracting the articulation features from my own audio. I was able to successfully run all of the provided IPython Notebooks.

File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/articulation.py", line 251, in extract_features_file
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),self.step)
File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../praat/praat_functions.py", line 139, in decodeF0
if os.stat(fileTxt).st_size==0:

FileNotFoundError: [Errno 2] No such file or directory: 'PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../tempfiles/tempF0articulationID17_pd__12_2_1_0.txt'

If it is relevant, I am running this script in my ML-Parkinson-Disease folder, which contains the DisVoice folder within it.

Is there a simpler way to obtain glottal flow signal?

Hello, my apologies for opening this issue. I just need to extract from a *.wav file the glottal flow signal. Is it there a simple way to do this? In the ideal scenario the signature of my function should be something like follows:

def glottal_Flow(file_id):
some actions
return time,glottal_flow

I have been looking at the glottal.py file it's very complete indeed, I thought you might have gone through this before.

Thanks in advance.

Text File Not Found. I think that it internally uses Praat to estimate the pitch or Should I need to give the text file while running the prosody.py

Processing audio 1 from 1 001_ddk1_PCGITA.wav
Error: Cannot open file “/home/shsheikh/clones/DisVoice/praat/001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/home/shsheikh/clones/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <<../praat/vuv_praat.praat 001_ddk1_PCGITA.wav /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt ../tempfiles/tempVUV001_ddk1_PCGITA.txt 60 350 0.01 0.02 0.01>> not completed.

Traceback (most recent call last):
File "prosody.py", line 392, in
feat_vec=prosody_static(audio_file, flag_plots, pitch_method='praat')
File "prosody.py", line 271, in prosody_static
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),0.01)
File "/home/shsheikh/clones/DisVoice/prosody/../praat/praat_functions.py", line 136, in decodeF0
pitch_data=np.loadtxt(fileTxt)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 962, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 266, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 624, in open
raise IOError("%s not found." % path)
OSError: /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt not found.

about the Phonological and replearning problem

thank you for your outstanding work! but i have some problems,
first about the phonological, i input my own wav flie(english and chinese) but get nothing in the pic, so i wonder to know how
to fix it
second about the replearning, typeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
sorry to bother you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.