Code Monkey home page Code Monkey logo

sonopy's Introduction

Sonopy

A simple audio feature extraction library

spectrum-image

Sonopy is a lightweight Python library used to calculate the MFCCs of an audio signal. It implements the following audio vectorization functions:

  • Power spectrogram
  • Mel spectrogram
  • Mel frequency cepstrum coefficient spectrogram

Features

  • Lightweight
  • Tiny, readable source code
  • Visualize steps in calculation

Usage

import numpy as np
from sonopy import power_spec, mel_spec, mfcc_spec, filterbanks

sr = 16000
audio = np.random.random((2 * 16000))

powers = power_spec(audio, window_stride=(100, 50), fft_size=512)
mels = mel_spec(audio, sr, window_stride=(1600, 800), fft_size=1024, num_filt=30)
mfccs = mfcc_spec(audio, sr, window_stride=(160, 80), fft_size=512, num_filt=20, num_coeffs=13)
filters = filterbanks(16000, 20, 257)  # Probably not ever useful

powers, filters, mels, mfccs = mfcc_spec(audio, sr, return_parts=True)

Installation

pip install sonopy
pip install "sonopy[example]"  # For example.py
pip install "sonopy[comparison]"  # For comparison.py

Speed Comparison

speed-chart

Param Set Audio Len Stride Window FFT Size Sample Rate Ceptral Coeffs Num Filters Loops
C 16000 0.1 0.1 2048 16000 13 20 2000
B 240000 0.05 0.05 2048 16000 13 20 200
A 480000 0.01 0.01 2048 16000 13 20 20
D 16000 0.1 0.1 512 16000 13 20 20000

Library links:

Credits

Thanks to SpeechPy for providing an example of the concrete calculations for MFCCs. Much of the calculations in this library take influence from it.

sonopy's People

Contributors

matthewscholefield avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sonopy's Issues

[Question] Dimensionality of output & input parameters relationships

After playing around with the mfcc_spec function I've noticed something and wanted some clarification on the same.

Say I have some audio file of sampling rate sr and length N * sr.
I have chosen the following parameters (matches librosa's defaults):

  • window length: 2048
  • window stride: 512
  • n_fft: 2048
  • n_filters: 128
  • n_coeffs: 20

However,

  1. n_fft has to be 2 * window_length - 1 for dimensionality to be preserved
  2. n_filters does not work unless n_filters = window_length
  3. Memory error with these default params.

When I resolve these issues by setting the params to:

  • window length: 1024
  • window stride: 512
  • n_fft: 2047
  • n_filters: 1024
  • n_coeffs: 20

It results in an mfcc output of (374, 20, 1024). From what I understand, mfcc.shape[1] = n_coeffs and mfcc.shape[-1] = window_length (?) but what does mfcc.shape[0] correspond to? Furthermore, is there any way possible to try to replicate the librosa default params without running into memory issues?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.