Code Monkey home page Code Monkey logo

fast-audiomentations's Introduction

fast-audiomentations

fast-audiomentations is a Python library that leverages GPU acceleration for efficient audio augmentation, suitable for high-throughput audio analysis and machine learning applications.

Key Features

  • Performance Showcase: Highlights the advantages of using GPU for audio processing tasks.
  • Diverse Audio Augmentations: Offers a variety of transformations including noise addition, filtering, and gain adjustments, optimized for GPU performance.
  • User-Friendly API: Designed for ease of use, enabling quick integration into audio processing pipelines.
  • Triton-based GPU code: Triton enables maximized GPU utilization. Although real-world examples of Triton for straightforward kernels are limited, this library serves as an invaluable practical study resource.

Installation

Currently, there is no ready-to-use package on PyPI, so you need to:

git clone https://github.com/Lallapallooza/fast-audiomentations
cd fast-audiomentations
python -m pip install -r requirements.txt
python -m examples.test

Assuming "SUCCESS" is printed, the library is now ready to use. Examples can be found in the examples/ and benchmark/ directories.

Usage

Here's an example of how to apply background noise addition to an audio sample using fast-audiomentations:

from fast_audiomentations import AddBackgroundNoise
import torch

batch_size = 128

dataloader = AddBackgroundNoise.get_dali_dataloader(
    'some/path/to/noises/in/dali/format',
    buffer_size=batch_size,
    n_workers=4
)
add_background_noise = AddBackgroundNoise(
    noises_dataloader=dataloader,
    min_snr=-10,
    max_snr=10,
    buffer_size=batch_size,
    p=1.0
)

sample_rate = 48000
audio = load_my_audio_batch() # load as torch.Tensor 
audio_lens = torch.tensor([audio.shape[0] for i in range(batch_size)], device='cuda')

# torch.Tensor
audios_with_noise = add_background_noise(
    samples=audio, 
    samples_lens=audio_lens, 
    sample_rate=sample_rate
)

Benchmarking

To validate the hypothesis that GPU can accelerate audio augmentation processing, benchmarks were conducted in comparison with two libraries: audiomentations and torch-audiomentations.

Details about the specific batch sizes, augmentation types, and parameters can be found in the code. See the benchmark/ directory.

For my configuration:

  • NVIDIA GeForce RTX 3090 Ti
  • 64GB RAM
  • 12th Gen Intel(R) Core(TM) i9-12900KF
  • Samsung 980 PRO

The results are available in the benchmark_results_local/ directory. To run benchmarks, you need to:

  1. Change the parameters in benchmark/benchmark_data.py.
  2. Run the command: python -m benchmark.run_all.

Observations show that for stateless operations (no IO required), the performance increase is significant (dozens of times). However, for augmentations like adding background noise, the performance benefit is less significant due to IO/PCIe overhead. Even utilizing DALI does not change the situation much, as the GPU is heavily underutilized. Nonetheless, it's observed that the NVMe device is not fully utilized, suggesting there might be some code inefficiency or a limitation in Python's IO performance.

Future work

  • Add more augmentations (RIR convolve, SpecAug, etc.). We welcome requests for additional augmentations and will endeavor to implement them promptly.
  • Rewrite the IO process for stateful cases (RIR, add_background_noise) to fully utilize IO capabilities.
  • Conduct benchmarks with network devices for stateful scenarios to demonstrate that asynchronous data reading may reduce latency.
  • Support XLA backend
  • Add wrappers like Compose, OneOf, etc...

fast-audiomentations's People

Contributors

lallapallooza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

wangqun010101

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.