Code Monkey home page Code Monkey logo

deepphonemizer's Introduction



A G2P library in PyTorch

Build Status codecov PyPI Version License

DeepPhonemizer is a library for grapheme to phoneme conversion based on Transformer models. It is intended to be used in text-to-speech production systems with high accuracy and efficiency. You can choose between a forward Transformer model (trained with CTC) and its autoregressive counterpart. The former is faster and more stable while the latter is slightly more accurate.

The main advantages of this repo are:

  • Easy-to-use API for training and inference.
  • Multilingual: You can train a single model on several languages.
  • Accuracy: Phoneme and word error rates are comparable to state-of-art.
  • Speed: The repo is highly optimized for fast inference by using dictionaries and batching.

Check out the inference and training tutorials on Colab!

Read the documentation at: https://as-ideas.github.io/DeepPhonemizer/

Installation

pip install deep-phonemizer

Quickstart

Download the pretrained model: en_us_cmudict_ipa_forward

from dp.phonemizer import Phonemizer

phonemizer = Phonemizer.from_checkpoint('en_us_cmudict_ipa.pt')
phonemizer('Phonemizing an English text is imposimpable!', lang='en_us')

'foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!'

Training

You can easily train your own autoregressive or forward transformer model. All necessary parameters are set in a config.yaml, which you can find under:

dp/configs/forward_config.yaml
dp/configs/autoreg_config.yaml

for the forward and autoregressive transformer model, respectively.

Distributed training is supported. You can specify which GPUs to utilize by setting CUDA_VISIBLE_DEVICES env variable:

CUDA_VISIBLE_DEVICES=0,1 python run_training.py

Inside the training script prepare data in a tuple-format and use the preprocess and train API:

from dp.preprocess import preprocess
from dp.train import train

train_data = [('en_us', 'young', 'jʌŋ'),
                ('de', 'benützten', 'bənʏt͡stn̩'),
                ('de', 'gewürz', 'ɡəvʏʁt͡s')] * 1000

val_data = [('en_us', 'young', 'jʌŋ'),
            ('de', 'benützten', 'bənʏt͡stn̩')] * 100

config_file = 'dp/configs/forward_config.yaml'

preprocess(config_file=config_file,
           train_data=train_data,
           val_data=val_data,
           deduplicate_train_data=False)

num_gpus = torch.cuda.device_count()

if num_gpus > 1:
    mp.spawn(train, nprocs=num_gpus, args=(num_gpus, config_file))
else:
    train(rank=0, num_gpus=num_gpus, config_file=config_file)

Model checkpoints will be stored in the checkpoints path that is provided by the config.yaml.

Inference

Load the phonemizer from a checkpoint and run a prediction. By default, the phonemizer stores a dictionary of word-phoneme mappings that is applied first, and it uses the Transformer model only to predict out-of-dictionary words.

from dp.phonemizer import Phonemizer

phonemizer = Phonemizer.from_checkpoint('checkpoints/best_model.pt')
phonemes = phonemizer('Phonemizing an English text is imposimpable!', lang='en_us')

If you need more inference information, you can use following API:

from dp.phonemizer import Phonemizer

result = phonemizer.phonemise_list(['Phonemizing an English text is imposimpable!'], lang='en_us')

for word, pred in result.predictions.items():
  print(f'{word} {pred.phonemes} {pred.confidence}')

Pretrained Models

Model Language Dataset Repo Version
en_us_cmudict_ipa_forward en_us cmudict-ipa 0.0.10
en_us_cmudict_forward en_us cmudict 0.0.10
latin_ipa_forward en_uk, en_us, de, fr, es wikipron 0.0.10

Torchscript Export

You can easily export the underlying transformer models with TorchScript:

import torch
from dp.phonemizer import Phonemizer

phonemizer = Phonemizer.from_checkpoint('checkpoints/best_model.pt')
model = phonemizer.predictor.model
phonemizer.predictor.model = torch.jit.script(model)
phonemizer('Running the torchscript model!')

Maintainers

References

Transformer based Grapheme-to-Phoneme Conversion

GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

deepphonemizer's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.