Code Monkey home page Code Monkey logo

valle's Introduction

Language : πŸ‡ΊπŸ‡Έ | πŸ‡¨πŸ‡³

An unofficial PyTorch implementation of VALL-E(Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers).

model

Demo

Broader impacts

Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.

We will not provide well-trained models and services.

Progress

Buy Me A Coffee

  • Text and Audio Tokenizer
  • Dataset module and loaders
  • VALL-F: seq-to-seq + PrefixLanguageModel
    • AR Decoder
    • NonAR Decoder
  • VALL-E: PrefixLanguageModel
    • AR Decoder
    • NonAR Decoder
  • update README.zh-CN
  • Training
  • Inference: In-Context Learning via Prompting

Installation

To get up and running quickly just follow the steps below:

# phonemizer
apt-get install espeak-ng
## OSX: brew install espeak
pip install phonemizer

# lhotse
# https://github.com/lhotse-speech/lhotse/pull/956
# https://github.com/lhotse-speech/lhotse/pull/960
pip uninstall lhotse
pip uninstall lhotse
pip install git+https://github.com/lhotse-speech/lhotse

# k2 icefall
# pip install k2
git clone https://github.com/k2-fsa/k2.git
cd k2
export K2_MAKE_ARGS="-j12"
export K2_CMAKE_ARGS="-DK2_WITH_CUDA=OFF"
python setup.py install
cd -

git clone https://github.com/k2-fsa/icefall
cd icefall
pip install -r requirements.txt
export PYTHONPATH=`pwd`/../icefall:$PYTHONPATH
echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.zshrc
echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.bashrc
cd -

# valle
git clone https://github.com/lifeiteng/valle.git
cd valle
pip install -e .

Getting started

The quickest way to get started is to take a look at the detailed working code examples found in the examples subdirectory.

Training

cd egs/libritts

# Those stages are very time-consuming
./prepare.sh

# nano: on NV GPU with 12G memory
# python3 bin/trainer.py \
#     --decoder-dim 128 --nhead 4 --num-decoder-layers 4 \
#     --max-duration 40 --model-name vallf \
#     --exp-dir exp/vallf_nano_full

python3 bin/trainer.py \
    --decoder-dim 128 --nhead 4 --num-decoder-layers 4 \
    --max-duration 40 --model-name valle \
    --exp-dir exp/valle_nano_full

# same as paper, but need more memory
python3 bin/trainer.py \
  --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 \
  --exp-dir exp/valle

Troubleshooting

Inference: In-Context Learning via Prompting

  • TBD

Contributing

  • Parallelize bin/tokenizer.py on multi-GPUs
  • Reduce memory usage of Training
  • Provide GPU resources (MyEmail: [email protected])
  • Buy Me A Coffee

Citing

To cite this repository:

@misc{valle,
  author={Feiteng Li},
  title={VALL-E: A neural codec language model},
  year={2023},
  url={http://github.com/lifeiteng/valle}
}
@article{VALL-E,
  title     = {Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers},
  author    = {Chengyi Wang, Sanyuan Chen, Yu Wu,
               Ziqiang Zhang, Long Zhou, Shujie Liu,
               Zhuo Chen, Yanqing Liu, Huaming Wang,
               Jinyu Li, Lei He, Sheng Zhao, Furu Wei},
  year      = {2023},
  eprint    = {2301.02111},
  archivePrefix = {arXiv},
  volume    = {abs/2301.02111},
  url       = {http://arxiv.org/abs/2301.02111},
}

valle's People

Contributors

lifeiteng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.