Code Monkey home page Code Monkey logo

deep-audio-prior's Introduction

Deep Audio Prior - Pytorch Implementation

Yapeng Tian, Chenliang Xu, and Dingzeyu Li

University of Rochester and Adobe Research

[ArXiv] [Demo] [Dataset]

Our deep audio prior can enable several audio applications: blind sound source separation, interactive mask-based editing, audio textual synthesis, and audio watermarker removal.

Blind source separation

Our DAP-based BSS model can separate individual sound sources from a sound mixture without using any external training data. For evaluation, we compose a 2-channel input sound with two individual sounds: s1 and s2, then we generate a sound mixture: s_mix = s1+s2.

   $ cd ~/code/
   $ python dap_sep.py --input_mix data/sep/violin_basketball.wav --output output/sep

The separated sounds and other intermediate results can be found in the "code/output/sep" folder.

Interactive mask-based editing

User can interact with generated masks for audio sources to further improve separation results.

   $ cd ~/code/
   $ python dap_mask_1st.py --input_mix xxx --out data/mask/ckpt
   $ prepare a binary map to deactivate regions in a generated mask and save it into "data/mask/ckpt"
   $ python dap_mask_2rd.py --input_mix xxx --dea_map xxx --dea_map_id xxx --output xxxx

For the second round with mask interaction, we have two additional parameters: dea_map and dea_map_id, which refer to an annotated binary map and the corresponding audio source ID. We provide one example that refines separation results from a dog and violin mixture with an annotated deactivation binary map for the dog sound:

    $ cd ~/code/
    $ python dap_mask_2rd.py --input_mix data/mask/violin_dog.wav --dea_map data/mask/ckpt/mask2_dea.npy --dea_map-id 2 --output output/mask

Audio Textual Synthesis

DAP can be used to synthesize audio textures.

   $ cd ~/code/
   $ python dap_audio_synthesis.py --input data/synthesis/water.wav --output output/sysnthesis

Co-separation/audio watermarker removal

DAP can also be successfully applied to address audio watermarker removal with co-separation. Given 3 sounds with audio watermarkers, our cosep model can generate 3 individual music sounds and the corresponding watermarker.

   $ cd ~/code/
   $ python dap_cosep.py --input1 data/cosep/audiojungle/01.mp3 --input2 data/cosep/audiojungle/02.mp3 --input3 data/cosep/audiojungle/03.mp3 --output output/cosep

Installing dependencies

Use pip installation to install dependencies from requirements.txt

   $ pip install -r requirements.txt

Citation

@Article{dap2019,
  author={Tian, Yapeng and Xu, Chenliang and Li, Dingzeyu},
  title={Deep Audio Prior},
  booktitle = {ArXiv},
  year = {2019}
}

deep-audio-prior's People

Contributors

abhisheksrikanth avatar dependabot[bot] avatar dingzeyuli avatar yapengtian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-audio-prior's Issues

Creating the binary mask and audio source ID files

Hi,

Thanks for the great project. I was hoping you could give a little more detail about creating the binary mask npy file and the selecting the proper source ID. I figured running dap_mask_1st.py would create this file but it did not.

Thank you!

Environment setup

Hello,

Is there a requirements.txt somewhere that I can use for setup? I'm trying to understand the various requirements for the toolbox

Failing to run due to a depreciation in numba

Hi,

The current version of numba (>.0.48) has deprecated the numba.decorators function causing dap_sep.py to fail. This can be easily resolved by pinning numba==0.48 in the requirements.txt

Failing to run due to possible torch and pillow incompatibility

I'm trying to run blind source separation on the example violin_basketball.wav as described in the readme.
I've created a conda environment environment with python 3.7 and 3.8 and installed the dependencies as described for each version.

However, upon running
$ cd ~/code/ $ python dap_sep.py --input_mix data/sep/violin_basketball.wav --output output/sep

I encounter the error below and it seems like there's some incompatibility between the package versions? Alternatively, is there something I'm missing? Thank you

Traceback (most recent call last): File "dap_sep.py", line 15, in <module> from net import skip, skip_mask_vec File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/__init__.py", line 13, in <module> from .skip_model import skip, skip_mask, skip_mask_vec,unet, sound_rec File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/skip_model.py", line 15, in <module> from .layers import * File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/layers.py", line 17, in <module> from .downsampler import Downsampler File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/downsampler.py", line 16, in <module> from utils.image_io import * File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/utils/image_io.py", line 16, in <module> import torchvision File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/__init__.py", line 2, in <module> from torchvision import datasets File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/datasets/__init__.py", line 9, in <module> from .fakedata import FakeData File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/datasets/fakedata.py", line 3, in <module> from .. import transforms File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/__init__.py", line 1, in <module> from .transforms import * File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 17, in <module> from . import functional as F File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 5, in <module> from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/PIL/__init__.py)

Hardware

Hello, this looks like a great project, but my system is having issues running it. More specifically, the Blind source separation part.

I'm not currently sure if this is due to my CUDA configuration/version or maybe some conflict with the PyTorch version and CUDA since I get an error upon hitting "Flag = s.optimize()".

What type of hardware did you run this on and what CUDA version were you running? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.