Code Monkey home page Code Monkey logo

Comments (3)

Caparrini avatar Caparrini commented on June 11, 2024

Hi!

Iā€™d like to help with the feature extraction issue you're facing. To do so, I need a bit more info:

  • Python Version: Which version are you using?
  • Libraries: Could you list the libraries and their versions you're working with?
  • Error Message: What error comes up with the smaller values?
  • Context: Any other details about your setup might be helpful.

This will help me understand the problem better and find a solution for you.

Thanks!

from pyaudioanalysis.

prashant-saxena avatar prashant-saxena commented on June 11, 2024

Hi,
Windows 10
Python 3.10.0

customtkinter==5.2.1
dm-tree==0.1.8
dtaidistance==2.3.11
eyed3==0.9.7
fastdtw==0.3.4
fCWT==0.1.18
fqdn==1.5.1
google-auth-oauthlib==1.2.0
isoduration==20.11.0
jsonpointer==2.4
lesscpy==0.15.1
noisereduce==3.0.2
notebook==7.1.2
pandas==2.2.1
pipdeptree==2.16.1
pyAudioAnalysis==0.3.14
pydub==0.25.1
python-speech-features==0.6
resampy==0.4.3
tensorflow==2.16.1
tensorflow-estimator==2.15.0
tkinterdnd2==0.3.0
toml==0.10.2
uri-template==1.3.0
webcolors==1.13
wurlitzer==3.0.3
xlwt==1.3.0

Error when using

F, f_names = ShortTermFeatures.feature_extraction(x[0:1600], Fs, 160, 160, deltas=False)
---------------------------------------------------------------------------
File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:662, in feature_extraction(signal, sampling_rate, window, step, deltas)
    657 feature_vector[n_time_spectral_feats:mffc_feats_end, 0] = \
    658     mfcc(fft_magnitude, fbank, n_mfcc_feats).copy()
    660 # chroma features
    661 chroma_names, chroma_feature_matrix = \
--> 662     chroma_features(fft_magnitude, sampling_rate, num_fft)
    663 chroma_features_end = n_time_spectral_feats + n_mfcc_feats + \
    664                       n_chroma_feats - 1
    665 feature_vector[mffc_feats_end:chroma_features_end] = \
    666     chroma_feature_matrix

File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:293, in chroma_features(signal, sampling_rate, num_fft)
    291     I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
    292     C = np.zeros((num_chroma.shape[0],))
--> 293     C[num_chroma[0:I - 1]] = spec
    294     C /= num_freqs_per_chroma
    295 final_matrix = np.zeros((12, 1))

ValueError: shape mismatch: value array of shape (80,) could not be broadcast to indexing result of shape (27,)

I need a distinct sound feature for my CNN-based project to create a model. The data frame size is 1600 (0.1 seconds)
graph
In the above plot, you can see 7 MFCC generated from 7 different wave files. All the wave files have a similar sound.
The whole idea is to make the feature as same as possible for similar types of data so that a good prediction score can be
created.

from pyaudioanalysis.

Caparrini avatar Caparrini commented on June 11, 2024

Hello again,

I conducted a small experiment and was able to replicate the issue you described. It appears that there isn't sufficient information to compute chroma features. To address this and ensure the code functions (even if it means the chroma feature values are zeroes), I've implemented a fix. I plan to submit a pull request for this fix, pending the library author's approval.

For testing, I took the following approach (I recommend using fractions of the sampling rate, Fs, rather than sample counts, but the choice is yours. In my tests, I used an Fs of 44100):

from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import audioBasicIO


def extract_features(frac_second, samples_features, Fs, x):
    samples_frac_second = frac_second * Fs
    samples_windows = samples_features // samples_frac_second

    F, f_names = ShortTermFeatures.feature_extraction(x[:samples_features], Fs, frac_second*Fs, frac_second*Fs,
                                                      deltas=False)

    print(f"In {frac_second} there are {samples_frac_second} samples")
    print(f"In {samples_features} there are {samples_windows} windows")
    print(len(F[0]))
    print(len(f_names))

    return F, f_names


def issue_396():
    # Use a breakpoint in the code line below to debug your script.

    [Fs, x] = audioBasicIO.read_audio_file('./audio/limbo_mono.wav')

    for frac_second in [0.1, 0.05, 0.025, 0.01, 0.0036, 0.0018]:
        print(f"Experiment with {frac_second} frac of second")
        F, f_names = extract_features(frac_second, 16000, Fs, x)


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    issue_396()

Output generated:

Experiment with 0.1 frac of second
In 0.1 there are 4410.0 samples
In 16000 there are 3.0 windows
3
34
Experiment with 0.05 frac of second
In 0.05 there are 2205.0 samples
In 16000 there are 7.0 windows
7
34
Experiment with 0.025 frac of second
In 0.025 there are 1102.5 samples
In 16000 there are 14.0 windows
14
34
Experiment with 0.01 frac of second
In 0.01 there are 441.0 samples
In 16000 there are 36.0 windows
36
34
Experiment with 0.0036 frac of second
In 0.0036 there are 158.76 samples
In 16000 there are 100.0 windows
101
34
Experiment with 0.0018 frac of second
In 0.0018 there are 79.38 samples
In 16000 there are 201.0 windows
202
34

Fix: In the method chroma_features inside of the file ShortTermFeatures.py adapt the following part like this:

else:
        I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
        C = np.zeros((num_chroma.shape[0],))
        if I > 1:
            # If I <= 1 there are no chroma features that can be extracted
            C[num_chroma[0:I - 1]] = spec[num_chroma[0:I - 1]]
            C /= num_freqs_per_chroma
    final_matrix = np.zeros((12, 1))

I'm submitting a pull request (https://github.com/Caparrini/pyAudioAnalysis), although I'm uncertain if it aligns with the expected behavior. I've uploaded it here for your convenience, should you prefer this over modifying your local library directly. Please choose whichever option suits you best.

Best regards,

from pyaudioanalysis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.