Hello, First of all, thank you for providing a great library. I

Hi, Windows 10 Python 3.10.0 <div class="snippet-clipboard-content notrans

Generate ShortTermFeatures for an audio signal of 0.1 seconds of a mono, 16KHz and PCM wave file about pyaudioanalysis HOT 3 OPEN

prashant-saxena commented on July 19, 2024

Generate ShortTermFeatures for an audio signal of 0.1 seconds of a mono, 16KHz and PCM wave file

from pyaudioanalysis.

Comments (3)

Caparrini commented on July 19, 2024

Hi!

I’d like to help with the feature extraction issue you're facing. To do so, I need a bit more info:

Python Version: Which version are you using?
Libraries: Could you list the libraries and their versions you're working with?
Error Message: What error comes up with the smaller values?
Context: Any other details about your setup might be helpful.

This will help me understand the problem better and find a solution for you.

Thanks!

from pyaudioanalysis.

prashant-saxena commented on July 19, 2024

Hi,
Windows 10
Python 3.10.0

customtkinter==5.2.1
dm-tree==0.1.8
dtaidistance==2.3.11
eyed3==0.9.7
fastdtw==0.3.4
fCWT==0.1.18
fqdn==1.5.1
google-auth-oauthlib==1.2.0
isoduration==20.11.0
jsonpointer==2.4
lesscpy==0.15.1
noisereduce==3.0.2
notebook==7.1.2
pandas==2.2.1
pipdeptree==2.16.1
pyAudioAnalysis==0.3.14
pydub==0.25.1
python-speech-features==0.6
resampy==0.4.3
tensorflow==2.16.1
tensorflow-estimator==2.15.0
tkinterdnd2==0.3.0
toml==0.10.2
uri-template==1.3.0
webcolors==1.13
wurlitzer==3.0.3
xlwt==1.3.0

Error when using

F, f_names = ShortTermFeatures.feature_extraction(x[0:1600], Fs, 160, 160, deltas=False)

---------------------------------------------------------------------------
File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:662, in feature_extraction(signal, sampling_rate, window, step, deltas)
    657 feature_vector[n_time_spectral_feats:mffc_feats_end, 0] = \
    658     mfcc(fft_magnitude, fbank, n_mfcc_feats).copy()
    660 # chroma features
    661 chroma_names, chroma_feature_matrix = \
--> 662     chroma_features(fft_magnitude, sampling_rate, num_fft)
    663 chroma_features_end = n_time_spectral_feats + n_mfcc_feats + \
    664                       n_chroma_feats - 1
    665 feature_vector[mffc_feats_end:chroma_features_end] = \
    666     chroma_feature_matrix

File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:293, in chroma_features(signal, sampling_rate, num_fft)
    291     I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
    292     C = np.zeros((num_chroma.shape[0],))
--> 293     C[num_chroma[0:I - 1]] = spec
    294     C /= num_freqs_per_chroma
    295 final_matrix = np.zeros((12, 1))

ValueError: shape mismatch: value array of shape (80,) could not be broadcast to indexing result of shape (27,)

I need a distinct sound feature for my CNN-based project to create a model. The data frame size is 1600 (0.1 seconds)

In the above plot, you can see 7 MFCC generated from 7 different wave files. All the wave files have a similar sound.
The whole idea is to make the feature as same as possible for similar types of data so that a good prediction score can be
created.

from pyaudioanalysis.

Caparrini commented on July 19, 2024

Hello again,

I conducted a small experiment and was able to replicate the issue you described. It appears that there isn't sufficient information to compute chroma features. To address this and ensure the code functions (even if it means the chroma feature values are zeroes), I've implemented a fix. I plan to submit a pull request for this fix, pending the library author's approval.

For testing, I took the following approach (I recommend using fractions of the sampling rate, Fs, rather than sample counts, but the choice is yours. In my tests, I used an Fs of 44100):

from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import audioBasicIO


def extract_features(frac_second, samples_features, Fs, x):
    samples_frac_second = frac_second * Fs
    samples_windows = samples_features // samples_frac_second

    F, f_names = ShortTermFeatures.feature_extraction(x[:samples_features], Fs, frac_second*Fs, frac_second*Fs,
                                                      deltas=False)

    print(f"In {frac_second} there are {samples_frac_second} samples")
    print(f"In {samples_features} there are {samples_windows} windows")
    print(len(F[0]))
    print(len(f_names))

    return F, f_names


def issue_396():
    # Use a breakpoint in the code line below to debug your script.

    [Fs, x] = audioBasicIO.read_audio_file('./audio/limbo_mono.wav')

    for frac_second in [0.1, 0.05, 0.025, 0.01, 0.0036, 0.0018]:
        print(f"Experiment with {frac_second} frac of second")
        F, f_names = extract_features(frac_second, 16000, Fs, x)


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    issue_396()

Output generated:

Experiment with 0.1 frac of second
In 0.1 there are 4410.0 samples
In 16000 there are 3.0 windows
3
34
Experiment with 0.05 frac of second
In 0.05 there are 2205.0 samples
In 16000 there are 7.0 windows
7
34
Experiment with 0.025 frac of second
In 0.025 there are 1102.5 samples
In 16000 there are 14.0 windows
14
34
Experiment with 0.01 frac of second
In 0.01 there are 441.0 samples
In 16000 there are 36.0 windows
36
34
Experiment with 0.0036 frac of second
In 0.0036 there are 158.76 samples
In 16000 there are 100.0 windows
101
34
Experiment with 0.0018 frac of second
In 0.0018 there are 79.38 samples
In 16000 there are 201.0 windows
202
34

Fix: In the method chroma_features inside of the file ShortTermFeatures.py adapt the following part like this:

else:
        I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
        C = np.zeros((num_chroma.shape[0],))
        if I > 1:
            # If I <= 1 there are no chroma features that can be extracted
            C[num_chroma[0:I - 1]] = spec[num_chroma[0:I - 1]]
            C /= num_freqs_per_chroma
    final_matrix = np.zeros((12, 1))

I'm submitting a pull request (https://github.com/Caparrini/pyAudioAnalysis), although I'm uncertain if it aligns with the expected behavior. I've uploaded it here for your convenience, should you prefer this over modifying your local library directly. Please choose whichever option suits you best.

Best regards,

from pyaudioanalysis.

Generate ShortTermFeatures for an audio signal of 0.1 seconds of a mono, 16KHz and PCM wave file about pyaudioanalysis HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent