Code Monkey home page Code Monkey logo

audio-utils-webui's Introduction

audio-utils-webui - A Gradio WebUI for audio utilities.

screenshot


Current audio utilities included in WebUI


Features

  • Separates audio tracks into different stems like vocals, instrumental, drums, bass using demucs models.
  • Split audio files based on silence detection using audio slicer.
  • Translate audio speech to text using OpenAI's Whisper models.
  • Run locally on your own hardware.
  • Supports various audio types including MP3, WAV, and FLAC.
  • Provides a simple and intuitive web UI for easy use.

Pending maybe future utilities

  • Other denoise deecho dereverb utilities
  • WebUI for helpful FFMpeg commands concate audio files etc.

FFmpeg Requirement

# Debian based Linux installation of FFmpeg is simple
sudo apt -y install ffmpeg
# Windows install, you will need to download and extract the latest release.
# Add extracted directory to PATH so ffmpeg binaries can be access by the application.
Direct link to download https://www.gyan.dev/ffmpeg/builds/ffmpeg-git-full.7z

# Windows adding to path
# Open cmd terminal as administrator and run the following command (example assumes binaries are extracted to c:\ffmpeg)
setx /m PATH "C:\ffmpeg\bin;%PATH%"
# MacOS
# homebrew install will be the easiest if installed, otherwise refer to official FFMPEG install guides.
brew install ffmpeg

Python Environment Setup and WebUI Installation

Before running the program, ensure you have the necessary Python packages installed. Using a virtual environment. Example below uses Conda.


Linux, Windows or MacOS

# Using a miniconda virtual environment

conda create -y -n demucs python=3.10.9
conda activate demucs
# Install Python requirements with pip
pip install demucs gradio soundfile
# Pip install latest version of Whisper
pip install git+https://github.com/openai/whisper.git

Next, clone the repository and navigate to the project directory:

git clone https://github.com/bradsec/audio-utils-webui.git
cd audio-utils-webui

Usage

To start the program, run the following command in the project directory:

python webui.py

This will launch the Gradio WebUI on http://0.0.0.0:7860. Open this URL in a web browser to access the user interface. If launching on the same PC you can use http://localhost:7860, otherwise 0.0.0.0 allows it to accessed from any computer on the local network using the assigned IP address of machine hosting the webui ie. http://192.168.0.1:7860.


Note: On first use of demucs and whisper will download the required checkpoints and models. Check the terminal output for status of the download and processing etc.


Testing

  • September 2023 - Tested above install and webui usage on Debian 12 (Bookworm) and Microsoft Windows 11.

Limitations

  • May be issues with very long audio files.

Troubleshooting

Check terminal output for information. All commands and progress will be shown in terminal. As mentioned, models need to download on first use, the number of files and their size may vary. Other messages such as out of memory (GPU) will also be shown in the terminal.


Licence Information / Credit

audio-utils-webui's People

Contributors

blane187 avatar bradsec avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.