adobe / deep-audio-prior Goto Github PK

View Code? Open in Web Editor NEW

157.0 8.0 23.0 133.77 MB

Audio Source Separation Without Any Training Data.

Home Page: https://opensource.adobe.com/Deep-Audio-Prior/

License: Other

Python 100.00%

deep-audio-prior's Introduction

Deep Audio Prior - Pytorch Implementation

Yapeng Tian, Chenliang Xu, and Dingzeyu Li

University of Rochester and Adobe Research

[ArXiv] [Demo] [Dataset]

Our deep audio prior can enable several audio applications: blind sound source separation, interactive mask-based editing, audio textual synthesis, and audio watermarker removal.

Blind source separation

Our DAP-based BSS model can separate individual sound sources from a sound mixture without using any external training data. For evaluation, we compose a 2-channel input sound with two individual sounds: s1 and s2, then we generate a sound mixture: s_mix = s1+s2.

   $ cd ~/code/
   $ python dap_sep.py --input_mix data/sep/violin_basketball.wav --output output/sep

The separated sounds and other intermediate results can be found in the "code/output/sep" folder.

Interactive mask-based editing

User can interact with generated masks for audio sources to further improve separation results.

   $ cd ~/code/
   $ python dap_mask_1st.py --input_mix xxx --out data/mask/ckpt
   $ prepare a binary map to deactivate regions in a generated mask and save it into "data/mask/ckpt"
   $ python dap_mask_2rd.py --input_mix xxx --dea_map xxx --dea_map_id xxx --output xxxx

For the second round with mask interaction, we have two additional parameters: dea_map and dea_map_id, which refer to an annotated binary map and the corresponding audio source ID. We provide one example that refines separation results from a dog and violin mixture with an annotated deactivation binary map for the dog sound:

    $ cd ~/code/
    $ python dap_mask_2rd.py --input_mix data/mask/violin_dog.wav --dea_map data/mask/ckpt/mask2_dea.npy --dea_map-id 2 --output output/mask

Audio Textual Synthesis

DAP can be used to synthesize audio textures.

   $ cd ~/code/
   $ python dap_audio_synthesis.py --input data/synthesis/water.wav --output output/sysnthesis

Co-separation/audio watermarker removal

DAP can also be successfully applied to address audio watermarker removal with co-separation. Given 3 sounds with audio watermarkers, our cosep model can generate 3 individual music sounds and the corresponding watermarker.

   $ cd ~/code/
   $ python dap_cosep.py --input1 data/cosep/audiojungle/01.mp3 --input2 data/cosep/audiojungle/02.mp3 --input3 data/cosep/audiojungle/03.mp3 --output output/cosep

Installing dependencies

Use pip installation to install dependencies from requirements.txt

   $ pip install -r requirements.txt

Citation

@Article{dap2019,
  author={Tian, Yapeng and Xu, Chenliang and Li, Dingzeyu},
  title={Deep Audio Prior},
  booktitle = {ArXiv},
  year = {2019}
}

deep-audio-prior's People

Contributors

Stargazers

Watchers

deep-audio-prior's Issues

Creating the binary mask and audio source ID files

Hi,

Thanks for the great project. I was hoping you could give a little more detail about creating the binary mask npy file and the selecting the proper source ID. I figured running dap_mask_1st.py would create this file but it did not.

Thank you!

Environment setup

Hello,

Is there a requirements.txt somewhere that I can use for setup? I'm trying to understand the various requirements for the toolbox

Failing to run due to a depreciation in numba

Hi,

The current version of numba (>.0.48) has deprecated the numba.decorators function causing dap_sep.py to fail. This can be easily resolved by pinning numba==0.48 in the requirements.txt

Failing to run due to possible torch and pillow incompatibility

I'm trying to run blind source separation on the example violin_basketball.wav as described in the readme.
I've created a conda environment environment with python 3.7 and 3.8 and installed the dependencies as described for each version.

However, upon running
$ cd ~/code/ $ python dap_sep.py --input_mix data/sep/violin_basketball.wav --output output/sep

I encounter the error below and it seems like there's some incompatibility between the package versions? Alternatively, is there something I'm missing? Thank you

Traceback (most recent call last): File "dap_sep.py", line 15, in <module> from net import skip, skip_mask_vec File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/__init__.py", line 13, in <module> from .skip_model import skip, skip_mask, skip_mask_vec,unet, sound_rec File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/skip_model.py", line 15, in <module> from .layers import * File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/layers.py", line 17, in <module> from .downsampler import Downsampler File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/net/downsampler.py", line 16, in <module> from utils.image_io import * File "/home/cheng/tracking_engagement/audio_processing/Deep-Audio-Prior/code/utils/image_io.py", line 16, in <module> import torchvision File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/__init__.py", line 2, in <module> from torchvision import datasets File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/datasets/__init__.py", line 9, in <module> from .fakedata import FakeData File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/datasets/fakedata.py", line 3, in <module> from .. import transforms File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/__init__.py", line 1, in <module> from .transforms import * File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 17, in <module> from . import functional as F File "/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 5, in <module> from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/home/cheng/anaconda3/envs/source_separation/lib/python3.7/site-packages/PIL/__init__.py)

Hardware

Hello, this looks like a great project, but my system is having issues running it. More specifically, the Blind source separation part.

I'm not currently sure if this is due to my CUDA configuration/version or maybe some conflict with the PyTorch version and CUDA since I get an error upon hitting "Flag = s.optimize()".

What type of hardware did you run this on and what CUDA version were you running? Thanks!

Support for separation of >2 sources

Hello, I am just wondering if this code supports separation of more than 2 sources. Thanks!

TypeError: mean() received an invalid combination of arguments

a = torch.mean(x, (1,2,3))
TypeError: mean() received an invalid combination of arguments - got (Tensor, tuple), but expected one of: