Code Monkey home page Code Monkey logo

nuwave2's Introduction

NU-Wave2 โ€” Official PyTorch Implementation

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
Seungu Han, Junhyeok Lee @ MINDsLab Inc., SNU

arXiv GitHub Repo stars githubio

Official Pytorch+Lightning Implementation for NU-Wave 2.

Official Checkpoint can be downloaded from here.

Requirements

Clone our Repository

git clone --recursive https://github.com/mindslab-ai/nuwave2.git
cd nuwave2

Preprocessing

Before running our project, you need to download and preprocess dataset to .wav files

  1. Download VCTK dataset
  2. Remove speaker p280 and p315
  3. Modify path of downloaded dataset data:base_dir in hparameter.yaml
  4. run utils/flac2wav.py
python utils/flac2wav.py

Training

  1. Adjust hparameter.yaml, especially train section.
train:
  batch_size: 12 # Dependent on GPU memory size
  lr: 2e-4
  weight_decay: 0.00
  num_workers: 8 # Dependent on CPU cores
  gpus: 2 # number of GPUs
  opt_eps: 1e-9
  beta1: 0.9
  beta2: 0.99
  • Adjust data section in hparameters.yaml.
data:
  timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'
  base_dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed/'
  dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed_wav/' #dir/spk/format
  format: '*mic1.wav'
  cv_ratio: (100./108., 8./108., 0.00) #train/val/test
  1. run trainer.py.
$ python trainer.py
  • If you want to resume training from checkpoint, check parser.
    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,\
            required = False, help = "Resume Checkpoint epoch number")
    parser.add_argument('-s', '--restart', action = "store_true",\
            required = False, help = "Significant change occured, use this")
    parser.add_argument('-e', '--ema', action = "store_true",\
            required = False, help = "Start from ema checkpoint")
    args = parser.parse_args()
  • During training, tensorboard logger is logging loss, spectrogram and audio.
$ tensorboard --logdir=./tensorboard --bind_all

Evaluation

run for_test.py

python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,
                required = True, help = "Resume Checkpoint epoch number")
    parser.add_argument('-e', '--ema', action = "store_true",
                required = False, help = "Start from ema checkpoint")
    parser.add_argument('--save', action = "store_true",
               required = False, help = "Save file")
    parser.add_argument('--sr', type=int, \
               required=True, help="input sampling rate")

Inference

  • run inference.py
python inference.py -c {checkpoint_path} -i {input audio} --sr {Sampling rate of input audio} {--steps:option} {--gt:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-c',
                        '--checkpoint',
                        type=str,
                        required=True,
                        help="Checkpoint path")
    parser.add_argument('-i',
                        '--wav',
                        type=str,
                        default=None,
                        help="audio")
    parser.add_argument('--sr',
                        type=int,
                        required=True,
                        help="Sampling rate of input audio")
    parser.add_argument('--steps',
                        type=int,
                        required=False,
                        help="Steps for sampling")
    parser.add_argument('--gt', action="store_true",
                        required=False, help="Whether the input audio is 48 kHz ground truth audio.")
    parser.add_argument('--device',
                        type=str,
                        default='cuda',
                        required=False,
                        help="Device, 'cuda' or 'cpu'")

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

The audio samples on our webpage are partially derived from:

  • VCTK dataset(0.92): 46 hours of English speech from 108 speakers.
  • LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Repository Structure

.
|-- Dockerfile
|-- LICENSE
|-- README.md
|-- dataloader.py           # Dataloader for train/val(=test)
|-- diffusion.py            # DPM
|-- for_test.py             # Test with for_loop.
|-- hparameter.yaml         # Config
|-- inference.py            # Inference
|-- lightning_model.py      # NU-Wave 2 implementation.
|-- model.py                # NU-Wave 2 model based on lmnt-com's DiffWave implementation
|-- requirements.txt        # requirement libraries
|-- trainer.py              # Lightning trainer
|-- utils
|   |-- flac2wav.py             # Preprocessing
|   |-- stft.py                 # STFT layer
|   `-- tblogger.py             # Tensorboard Logger for lightning
|-- docs                    # For github.io
|   |-- ...
`-- vctk-silence-labels     # For trimming
    |-- ...

Citation & Contact

If this repository useful for your research, please consider citing!

@article{han2022nu,
  title={NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates},
  author={Han, Seungu and Lee, Junhyeok},
  journal={arXiv preprint arXiv:2206.08545},
  year={2022}
}

If you have a question or any kind of inquiries, please contact Seungu Han at [email protected]

nuwave2's People

Contributors

seungwoo0326 avatar 1ucky40nc3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.