tyiannak / pyaudioanalysis Goto Github PK

View Code? Open in Web Editor NEW

5.7K 211.0 1.2K 167.6 MB

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

License: Apache License 2.0

Python 92.51% MATLAB 0.78% HTML 1.82% CSS 0.97% Shell 3.92%

audio machine-learning signal-processing audio-data audio-analysis-tasks pyaudioanalysis python

pyaudioanalysis's Introduction

A Python library for audio feature extraction, classification, segmentation and applications

This is general info. Click here for the complete wiki and here for a more generic intro to audio data handling

News

[2022-01-01] If you are not interested in training audio models from your own data, you can check the Deep Audio API, were you can directly send audio data and receive predictions with regards to the respective audio content (speech vs silence, musical genre, speaker gender, etc).
[2021-08-06] deep-audio-features deep audio classification and feature extraction using CNNs and Pytorch
Check out paura a Python script for realtime recording and analysis of audio data

General

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
Train, parameter tune and evaluate classifiers of audio segments
Classify unknown sounds
Detect audio events and exclude silence periods from long recordings
Perform supervised segmentation (joint segmentation - classification)
Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
Train and use audio regression models (example application: emotion recognition)
Apply dimensionality reduction to visualize audio data and content similarities

Installation

Clone the source of this library: git clone https://github.com/tyiannak/pyAudioAnalysis.git
Install dependencies: pip install -r ./requirements.txt
Install using pip: pip install -e .

An audio classification example

More examples and detailed tutorials can be found at the wiki

pyAudioAnalysis provides easy-to-call wrappers to execute audio analysis tasks. Eg, this code first trains an audio segment classifier, given a set of WAV files stored in folders (each folder representing a different class) and then the trained classifier is used to classify an unknown audio WAV file

from pyAudioAnalysis import audioTrainTest as aT
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

Result: (0.0, array([ 0.90156761, 0.09843239]), ['music', 'speech'])

In addition, command-line support is provided for all functionalities. E.g. the following command extracts the spectrogram of an audio signal stored in a WAV file: python audioAnalysis.py fileSpectrogram -i data/doremi.wav

Author

Theodoros Giannakopoulos, Principal Researcher of Multimodal Machine Learning at the Multimedia Analysis Group of the Computational Intelligence Lab (MagCIL) of the Institute of Informatics and Telecommunications, of the National Center for Scientific Research "Demokritos"

pyaudioanalysis's People

Contributors

Stargazers

Watchers

Forkers

shenggaozhu willpearse ardixiv bioshock lcping pacificit bossjones tazjel liangnet ai-cdrone raman-sharma shaowei-su philipnz famguy shilpapantula mabs239 kaushalaman hariag dikoufu shahafh tcfraser vtsatskin sandy4321 dmellop wantee hdubey shashank7099 smartek manxshearwater-clockshift techscientist daidengxin gpollatos agarwalnaimish winterlightlabs ppr10 p4nos ashishmd abhilashrj deepukr85 alonegu norivicjr fib1123 lstoyanov undramatized santhiyaduraisamy lishiting shakdwipeea praetp yoonsen ahmedhamedtn kkouptsov tarunsinghal92 jordicolomer deenhe91 skalskis mhinds hihihippp roboskel rightfront wanjinchang fage2016 silky redreamality chnwentao wbgxx333 zmoon111 hongyunnchen fangzheng354 leomauro coolspiderghy mikempapa habbes mars198356 mfcardenas rhythmize anielsen001 manvig mrd44 ftomi2 dhvfanny lightm leonardsim avi1074 nkhine se4u hsin919 maksymdelta danielcjlee zerismo oselbe buni tma-comms sruthi248 sherkwast mhcn alecharmon chengkai00000 innovarul prajoshpremdas passarel

pyaudioanalysis's Issues

eyeD3 version

this project is amazing! I have tried the speaker diarization feature and it really works! Super Cool! However it is also worthwhile to specify the version of eyeD3 being used.. While I was installing this package, there was an error on eyeD3 module being used. The package will not work with the latest eyeD3 of versions 0.7.x but with versions 0.6.x.

Hope this project will be added in PyPi. The project is promising!

How to setting up pyAudioAnalysis

I am new to Raspberry Pi (Raspbian), I have successfully installed all the dependancies, now i am trying the sample codes that provided by pyAudioAnalysis, when I run the codes, the system shows ImportError: No module name pyAudioAnalysis. which i dont understand that python path steps. can anyone teach me? I have added export PYTHONPATH=$PYTHONPATH:"/home/pi" in the .bashrc file, but i dont understand about update the path details which is source ~/.bashrc .........may i know where to type soruce ~/.bashrc???

Non-music detection training fails

I'm trying to make a classifier with pyAudioAnalysis to detect non-music sections of songs. When I set everything up and run it, the trainer blows through every step in about 5 seconds total. It neither learns, nor changes with different parameters. If 40% of the data is music, it gets 40% wrong. It simply chooses to predict either music or non-music for every piece of data. Any insight into what might cause this?

problem with evaluateSegmentationClassificationDir(dirName, modelName, methodName) from audioSegmentation.py

Error in computePreRec! Confusion matrix and classNames list must be of the same size!
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.2.3\helpers\pydev\pydevd.py", line 1580, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.2.3\helpers\pydev\pydevd.py", line 964, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "E:/int/pyAudioAnalysis-master/recognizerEmotionInVoice.py", line 76, in
testSegmentationEvaluation(select_folderEmotion,selected_types[numModel],selected_models[numModel])
File "E:/int/pyAudioAnalysis-master/recognizerEmotionInVoice.py", line 69, in testSegmentationEvaluation
aA.segmentationEvaluation(dirName, modelName, methodName)
File "E:/int/pyAudioAnalysis-master\audioAnalysis.py", line 212, in segmentationEvaluation
aS.evaluateSegmentationClassificationDir(dirName, modelName, methodName)
File "E:/int/pyAudioAnalysis-master\audioSegmentation.py", line 580, in evaluateSegmentationClassificationDir
[Rec, Pre, F1] = computePreRec(CM, classNames)
TypeError: 'NoneType' object is not iterable

Query in stChromFeatures

In here -->
https://github.com/tyiannak/pyAudioAnalysis/blob/master/audioFeatureExtraction.py#L267
The audio spectral values are divided by a scalar after they are pushed into the chroma audio bins.Could you let me know why that is done?
If it is for some kind of normalization, could you elucidate what normalization it is, as it is not clear to me.

Question about regression model

Hi!
I am trying to create a regression model based on some musical data I have, using pyAudioAnalysis. My training data contains some WAV audio clips and two csv files: one for arousal and one for valence, with values between 1 and 9 (but my data does not include neither end values). What kind of normalization should I apply to get an output with values between -1 and 1?
Does the duration of the audio files in the test set necessarily need to be similar or equal to the duration of the ones in the training set?

Processing segmentation times

I would like to know how can I get the segmentation times in a format similar to the segment files that are provided for training?

I read the issue

#31

but i can't understand why you recommend using the mtFileClassification

thanks

Add FFT function to FeatureExtraction

Since the frequency domain features in https://github.com/tyiannak/pyAudioAnalysis/blob/master/audioFeatureExtraction.py expect abs(fft) of the frame as input value, it would be nice to also provide a FFT function in this library.

Thanks!

Regression Data Preparation

Hi,
This is a query, not an issue.
Is there any annotated audio dataset for valence and arousal that can be freely used for regression purpose.
Thanks
Kathakali Seth

scikits.talkbox should be listed in dependencies on the wiki page

I had to install this module to use this library

I used pip to do that
sudo pip install scikits.talkbox

It is imported in audioFeatureExtraction.py

speakerDiarization(): ValueError: Found array with 0 feature(s)

Hi, I passed a wav file path and numOfSpeakers parameter to speakerDiarization() function, but it is returning "ValueError: Found array with 0 feature(s) (shape=(8, 0)) while a minimum of 1 is required." and is thrown by "k_means.fit(MidTermFeaturesNorm.T)", as MidTermFeaturesNorm is a blank vector as returned by "MidTermFeaturesNorm = (clf.transform(MidTermFeaturesNorm.T)).T".
I tried running the code on Windows as well as RHEL with different input wav but the error does not vanish!
Please help.

Thanks and Regards,
Nishant Mandavkar

word isolation

Hi, I am new in audio analysis, with some background in machine learning. Is there a way to isolate single words in a wav using audioSegmentation?

Thanks.

overflow when calculating nChroma in stChromaFeaturesInit(nfft, fs)

When fs = 8000,
nfft=80,
the following code in stChromaFeaturesInit(nfft, fs) will yield a 'nChroma' that contains values > nfft.

freqs = numpy.array([((f + 1) * fs) / (2 * nfft) for f in range(nfft)])
Cp = 27.50
nChroma = numpy.round(12.0 * numpy.log2(freqs / Cp)).astype(int)

Valence/Arousal tagged file for Emotion Regression

I don't find the 2 files valence.csv & arousal.csv for SVR training. Could you kindly help me with the path to access those files? Thanks.

MemoryError during speakerDiarization

$ python audioAnalysis.py speakerDiarization -i ~/sounds/0F21C41C69FFD804C1257C7F0041617C.wav --num 0 | tee sp.log
Traceback (most recent call last):
  File "audioAnalysis.py", line 465, in <module>
    speakerDiarizationWrapper(args.input, args.num, args.flsd)
  File "audioAnalysis.py", line 229, in speakerDiarizationWrapper
    aS.speakerDiarization(inputFile, numSpeakers, LDAdim=0, PLOT=True)
  File "/[--REDACTED--]/audioSegmentation.py", line 685, in speakerDiarization
    DistancesAll = numpy.sum(distance.squareform(distance.pdist(MidTermFeaturesNorm.T)), axis=0)
  File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1457, in squareform
    M = np.zeros((d, d), dtype=np.double)
MemoryError

The wave file is about 4300 seconds long. The program was running for several minutes at least before crash.

Pickle error: "ImportError: No module named copy_reg"

based on some of the other issues reported here, TensorFlow is having issues with Windows.
I respect that is out of your hands.

I am curious about Pickle on a windows machine though.
I'm using your simple example on the segmentation part of your wiki.

python audioAnalysis.py silenceRemoval -i data/recording3.wav --smoothing 1.0 --weight 0.3 python audioAnalysis.py classifyFolder -i data/recording3_ --model svm --classifier data/svmSM --detail

Love this on my macs, works out, a charm and it's the foot hold I need to understand your code better as I explore.

Then I ran through every dependency to get this working on windows.
The first Command worked after a lot of sweet and tears, but I can't get any headway on command 2.
What I've read there are definitely some issues with pickle and the way carriage return happens on Windows.

EX:
http://stackoverflow.com/questions/5927606/pickle-load-not-working
and
http://stackoverflow.com/questions/556269/importerror-no-module-named-copy-reg-pickle

here's my error:

C:\Users\USER1\Documents\GitHub\pyAudioAnalysis>python audioAnalysis.py
uments\Extracts --model svm --classifier C:\Users\USER1\Documents\GitHub
Traceback (most recent call last):
  File "audioAnalysis.py", line 462, in <module>
    classifyFolderWrapper(args.input, args.model, args.classifier, args.details)
  File "audioAnalysis.py", line 139, in classifyFolderWrapper
    [Result, P, classNames] = aT.fileClassification(wavFile, modelName, modelTyp
  File "C:\Users\USER1\Documents\GitHub\pyAudioAnalysis\audioTrainTest.p
    [Classifier, MEAN, STD, classNames, mtWin, mtStep, stWin, stStep, computeBEA
  File "C:\Users\USER1\Documents\GitHub\pyAudioAnalysis\audioTrainTest.p
    SVM = cPickle.load(fid)
ImportError: No module named copy_reg

Any Advice is much appreciate and I'll run with anything you give me, just please point me in a direction, I'll run down that lead like a dog.

diarization

I have been experimenting with diarization of two-party phone calls. I am using real phonecall recordings and ones "assembled" from various publicly available speech corpus data. These are fast paced phone-calls without significant silence between the speakers.

The diarization code almost always get the number of segments right. However, it consistently gets the segment boarders between 0.5 and 1.5 seconds earlier then the ground truth.

Is this expected? What parameters can be adjusted which would impact the segment boarders?

problem with nchroma in stChromaFeatures

Hi,
I got the following error :
Traceback (most recent call last):
File "/home/jmw/Bureau/bacasable/musiques-stream/testpydub.py", line 42, in
F = pafe.stFeatureExtraction(mysample[:2*w], Fs, w, pas)
File "/home/jmw/Bureau/bacasable/musiques-stream/pyAudioAnalysis-master/pyAudioAnalysis/audioFeatureExtraction.py", line 591, in stFeatureExtraction
chromaNames, chromaF = stChromaFeatures(X, Fs, nChroma, nFreqsPerChroma)
File "/home/jmw/Bureau/bacasable/musiques-stream/pyAudioAnalysis-master/pyAudioAnalysis/audioFeatureExtraction.py", line 283, in stChromaFeatures
print "nFreqsPerChroma[nChroma]",nFreqsPerChroma[nChroma]
IndexError: index 56 is out of bounds for axis 1 with size 55
if the windows is small, e.g. below the max(nChroma), in my case the Win size is 110 (10ms with Fs=11000), half window is 55 then C /= nFreqsPerChroma[nChroma] crashes.

This error happens only with a small window, e.g. Fs=11000 with 10ms window so I set Win=110 samples and half window =55 for FFT.

I put a few prints in the code and saw that he nChroma table contains 92 samples) so that the code :
C /= nFreqsPerChroma[nChroma]
in stChromaFeatures crashes.
This is because the nFreqPerChroma array is of size nChroma (=55) but nChroma contains values larger than 55.
I think you should have a look at this code.
best regards

Add an option to disable PLOT

I've ran python audioAnalysis.py speakerDiarization -i ~/sounds/0F21C41C69FFD804C1257C7F0041617C.wav --num 0 via PuTTY and I've got

Traceback (most recent call last):
  File "audioAnalysis.py", line 465, in <module>
    speakerDiarizationWrapper(args.input, args.num, args.flsd)
  File "audioAnalysis.py", line 229, in speakerDiarizationWrapper
    aS.speakerDiarization(inputFile, numSpeakers, LDAdim=0, PLOT=True)
  File "/home/jlopuszanski/projects/diarization-pyAudioAnalysis/pyAudioAnalysis/audioSegmentation.py", line 855, in speakerDiarization
    fig = plt.figure()
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 423, in figure
    **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_tkagg.py", line 79, in new_figure_manager
    return new_figure_manager_given_figure(num, figure)
  File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_tkagg.py", line 87, in new_figure_manager_given_figure
    window = Tk.Tk()
  File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1767, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

I see that PLOT=True is passed in speakerDiarizationWrapper regardless of command line options.

What should be the sound path?

I have multiple voice recordings. I need to extract features from them and save them as a csv file. What should I do from the scratch? Do I need to select the path?

mlpy has no attribute 'LibSvm'

Fedora21 x86_64
python 2.7.12, Anaconda 4.1.1
downloaded and installed mlpy-3.5.0 as instructed.
When testing silence segmentation, got the following error:

segments = aS.silenceRemoval(x, Fs, 0.020, 0.020, smoothWindow = 1.0, Weight = 0.3, plot = True)
Traceback (most recent call last):
File "", line 1, in
File "/home/yangz2/libs/pyAudioAnalysis/audioSegmentation.py", line 567, in silenceRemoval
SVM = aT.trainSVM(featuresNormSS, 1.0) # train the respective SVM probabilistic model (ONSET vs SILENCE)
File "/home/yangz2/libs/pyAudioAnalysis/audioTrainTest.py", line 165, in trainSVM
svm = mlpy.LibSvm(svm_type='c_svc', kernel_type='linear', eps=0.0000001, C=Cparam, probability=True)
AttributeError: 'module' object has no attribute 'LibSvm'

writeTrainDataToARFF(), tuple index out of range,

So I am trying to make a rudimentary emotion classifier, and I have collected a small datasample to test if it works. my code is as following:

aA.trainClassifierWrapper('svm', False, ["C:\Users\gover_000\Desktop\Angry", "C:\Users\gover_000\Desktop\Happy", "C:\Users\gover_000\Desktop\Sad", "C:\Users\gover_000\Desktop\Scared", "C:\Users\gover_000\Desktop\Neutral"], "testSVM")

If I execute this, it gives this error:

IndexError Traceback (most recent call last)
<ipython-input-7-2e5393432e89> in <module>()
5 "C:\Users\gover_000\Desktop\Scared",
6 "C:\Users\gover_000\Desktop\Neutral"],
----> 7 "testSVM")

C:\Users\gover_000\Documents\GitHub\Emotion-Recognition-Prototype\pyAudioAnalysis\audioAnalysis.pyc in trainClassifierWrapper(method, beatFeatures, directories, modelName)
88 raise Exception("At least 2 directories are needed")
89 aT.featureAndTrain(directories, 1, 1, aT.shortTermWindow, aT.shortTermStep,
---> 90 method.lower(), modelName, computeBEAT=beatFeatures)

C:\Users\gover_000\Documents\GitHub\Emotion-Recognition-Prototype\pyAudioAnalysis\audioTrainTest.pyc in featureAndTrain(listOfDirs, mtWin, mtStep, stWin, stStep, classifierType, modelName, computeBEAT, perTrain)
275 featureNames = ["features" + str(d + 1) for d in range(numOfFeatures)]
276
--> 277 writeTrainDataToARFF(modelName, features, classNames, featureNames)

C:\Users\gover_000\Documents\GitHub\Emotion-Recognition-Prototype\pyAudioAnalysis\audioTrainTest.pyc in writeTrainDataToARFF(modelName, features, classNames, featureNames)
1097 for c, fe in enumerate(features):
1098 for i in range(fe.shape[0]):
-> 1099 for j in range(fe.shape[1]):
1100 f.write("{0:f},".format(fe[i, j]))
1101 f.write(classNames[c]+"\n")

IndexError: tuple index out of range

I know shape[1] is supposed to return the columns of an array dimension.
but why does it return an error here

setup.py

This could probably do with a setup.py.

To make this work, everything should be moved into a subdirectory.

Is there a prefered way to install pyAudioAnalysis currently ?

stFeatureExtraction() ValueError

Thanks for this effort. It really looks promising.

The following code:

[Fs, x] = audioBasicIO.readAudioFile('myWavFile.wav')
F = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050_Fs, 0.025_Fs)

results in the following error:

_**'File "/Users/anthony.mccoy/anaconda/lib/python2.7/site-packages/pyAudioAnalysis/audioFeatureExtraction.py", line 48, in stEnergyEntropy
subWindows = frame.reshape(subWinLength, numOfShortBlocks, order='F').copy()'

ValueError: total size of new array must be unchanged**_

Flask compatibility

Can i use PyAudioAnalysis as an external library in Flask ?
if so, how shall i do that ?

calling mlpy in audioFeature

I don't want to install this because I don't need it but wanted to use the chroma features. Anyways this isn't even used in the entire file so I don't see why it would be imported here. I think it should be removed.

sklearn.hmm deprecated

Hi, It looks like sklearn.hmm is deprecated.
I tried using hmmlearn instead, but get this error:
"This GaussianHMM instance is not fitted yet. Call 'fit' with appropriate arguments before using this method."

how to generate .segment file for each wav file?

ValueError during feature extraction

Hello, I am facing ValueError during feature extraction. Is there anything I am missing to handle?

> Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
> [GCC 4.8.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from pyAudioAnalysis import audioBasicIO
> >>> from pyAudioAnalysis import audioFeatureExtraction
> >>> [Fs, x] = audioBasicIO.readAudioFile("./trainaudio/training80_43/39o1zJFeM7E.004.mp4.wav")
> >>> F = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs);
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyAudioAnalysis/audioFeatureExtraction.py", line 564, in stFeatureExtraction
>     curFV[2] = stEnergyEntropy(x)                    # short-term entropy of energy
>   File "pyAudioAnalysis/audioFeatureExtraction.py", line 48, in stEnergyEntropy
>     subWindows = frame.reshape(subWinLength, numOfShortBlocks, order='F').copy()
> ValueError: total size of new array must be unchanged
> >>>

audio file: http://vocaroo.com/i/s0o0Hs3fZywG

Error on using audioTrainTest, audioBasicIO, and Utilities...

Dear Fellows, I am trying to run pyAudio Analysis on windows, I can run many library functions without errors but some have failed to run and tried to search in all related areas of solution can't get solution.
Errors: Can't import audioTrainTest as aT on Python Jupyter and even on window command function the error says Traceback (most recent call last):
File "", line 1, in
ImportError: No module named 'audioTrainTest and when I tried to install it through pip install it shows me Could not find a version that satisfies the requirement audioTrainTest (from versions: )
No matching distribution found for utilities. This error goes to audioBasicIO and Utilities as well.

I would appreciate your help to correct these errors, I am using windows 10. Anaconda Python 3.5...
Thanks a lot.
Blessings to you all.....

using the available trained HMM model for speech-music discrimination

playing around with your interesting project, I encountered an issue whilst trying to perform a HMM-based segmentation and classification, using the available trained HMM model for speech-music discrimination.

note that I am running this on an EC2 server with only ssh access.

IndexError: index 80 is out of bounds for size 80

Got below erro when run audioSegmentation.py.

/home/gary/git/pyAudioAnalysis/audioFeatureExtraction.py:272: FutureWarning: assignment exception type will change in the future C[nChroma[0:I-1]] = spec Traceback (most recent call last): File "/home/gary/Ide/pycharm-community-5.0.4/helpers/pydev/pydevd.py", line 2411, in <module> globals = debugger.run(setup['file'], None, None, is_module) File "/home/gary/Ide/pycharm-community-5.0.4/helpers/pydev/pydevd.py", line 1802, in run launch(file, globals, locals) # execute the script File "/home/gary/git/pyAudioAnalysis/test.py", line 4, in <module> segments = aS.silenceRemoval(x, Fs, 0.020, 0.020, smoothWindow = 1.0, Weight = 0.3, plot = True) File "/home/gary/git/pyAudioAnalysis/audioSegmentation.py", line 613, in silenceRemoval ShortTermFeatures = aF.stFeatureExtraction(x, Fs, stWin * Fs, stStep * Fs) # extract short-term features File "/home/gary/git/pyAudioAnalysis/audioFeatureExtraction.py", line 579, in stFeatureExtraction chromaNames, chromaF = stChromaFeatures(X, Fs, nChroma, nFreqsPerChroma) File "/home/gary/git/pyAudioAnalysis/audioFeatureExtraction.py", line 272, in stChromaFeatures C[nChroma[0:I-1]] = spec IndexError: index 80 is out of bounds for size 80

Problem with featureAndTrain

Hello.
I have a problem. When I try create a new model using featureAndTrain I get an error:
line 199, in featureAndTrain
numOfFeatures = features[0].shape[1]
IndexError: tuple index out of range
I have no idea what I'm doing wrong. Please, help!

Usability

What are the prerequisites of using this library?

My aim is to extract features from speech in an automatic way, meaning i want to have data from an FTP server as inputs in a programmatic way, shall i use PyAudioAnalysis with Django ?

Is there any full sample code so i can know exactly which steps come first ?

I am still new to python and to the audio analysis field but i need to get better since i need this for my Master thesis.

Thank you for your understanding.

Install pyAudioAnalysis as a Module

Hello,

would be nice to have the ability to install pyAudioAnalysis through pip or at least having a setup.py.

Thanks!

More API doc

Hello, Theodoros,
This library is awesome, but I can't find any doc for the api, for example the meaning of parameters and how to tune them?

Thanks

Python 3 Support

The Scikits.Talkbox module seems to only work on Python 2.7.

Is there a way that I can use this software on Python 3.3 and above?

npy_logl undefined error .

How to resolve this issue ?

ImportError: /usr/local/lib/python2.7/dist-packages/hmmlearn/_hmmc.so: undefined symbol: npy_logl

data folder dependency

Can you please detail in the documentation what is the data folder for and where does the files such as "knnSpeakerAll" come from?
I have tried to use the package after using "pip install" and it didn't work because of dependencies on files under the data folder.
I have tried to use the speakerDiarization code and I fail to see why it needs to be based on pre-saved models. Moreover, the feature selection in the code is based on indexes which is very hard to keep track of.

About regression and valence-arousal values

Hello,
first thank you very much about this project, hope it will be helpful in my diploma thesis!
I would like to clarify something about the value ranges of valence and arousal and svr.
What values I am supposed to input for ground truth?
What are their range? From the wiki I would guess they should vary in the [-1,1] interval but I'm not very sure.

Thank you in advance,
Anastasia

Speaker Verification

Should I be able to do speaker verification using aT.featureAndTrain & aT.fileClassification? I'm trying to train the SVM model using 7 wav files that I have for each of 5 users. Is that too small of a training set? I have another 3 wav files for each user that I'm using as a test set and its correctly classifying the test set. However, it's also classifying some of the wav files in this project as one of the users from my dataset (with a high probability, in one case ~.8).

Any advice/tips would be greatly appreciated.

Arousal and valence calculation

Can you please explain how the arousal and valence values are calculated for the audio signal in this module and what exactly the values from arousal.csv and valence.csv specify?

Use case question

Can pyAudioAnalysis be used for speech recognition.

How do I get segment data from audio file?

Hi,

Thanks for writing this module.

I am a little bit confused with audioSegmentation.py about getting raw segment data(i.e., stuffs like

0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech

) from the audio file.

Which function should I use to get such data(in any form of data type)?

Many thanks.

how to print (or export) diarization data

I see how to run the diarization sample which displays the plot on the screen.

How can I export segmentation data for diarization, where each segment starts and who the speaker is.

thanks

Problems of Feature Extraction

After I type the code for FeatureExtraction step and I got an error.

"audioFeatureExtraction.py", line 536, in stFeatureExtraction
    N = len(signal)                                # total number of samples
TypeError: object of type 'numpy.float64' has no len()

It says that this type does not have len().
Do you know how to fix it?

What data was used for trained the SVM classifier?

I wonder which dataset was used for the default SVM model in the segmentation? I used some of my own data to train a new model but the results are not as good.

bug in PLOT of stChromagram()

Hi,
you should add the following line after the Ratio calculation :
Ratio = chromaGramToPlot.shape[1] / (3*chromaGramToPlot.shape[0])
if Ratio < 1 : Ratio=1

I had Ratio equal to 0 that created problems with range afterwards.
best regards

Can't install scikits.talkbox

I met a issue, when I install scikits.talkbox.
ubuntu 14.04 LTS
python 2.7.6

Traceback (most recent call last):
File "", line 17, in
File "/tmp/pip_build_root/scikits.talkbox/setup.py", line 10, in
from numpy.distutils.core import setup
ImportError: No module named 'numpy'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 17, in

File "/tmp/pip_build_root/scikits.talkbox/setup.py", line 10, in

from numpy.distutils.core import setup

ImportError: No module named 'numpy'

Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/scikits.talkbox
Storing debug log for failure in /home/****/.pip/pip.log

And my numpy install success.I can import numpy and from numpy.distutils.core import setup.

A Question

When dealing with voiced audio, can we conclude that if the number of segments is n for example then the number of pauses is n - 1 ?