Code Monkey home page Code Monkey logo

tortoise-tts-modal-api's Introduction

๐Ÿข Tortoise TTS pay-as-you-go API

Links

Twitter Thread: link

Deployed Website: tts.themetavoice.xyz

Introduction

A "pay-as-you-go" API for Tortoise TTS. It uses Modal underneath.

Tortoise is one of the best text-to-speech systems ever built, but it currently requires the user to deploy their own service on a GPU which can be time-consuming, difficult & expensive. The alternative is to use paid services which offer a monthly pricing tier and are closed-off. This repository aims to provide a usage-based, pay-as-you-go API based on open-source code instead.

We have made some improvements to Tortoise to make the inference ~30% faster, and welcome contributions on our repo to improve it further!

We also provide a Python API wrapper that can used for easy integration into your Python code.

Usage

You can use the HTTP end-point directly, or the provided Python wrapper.

HTTP end-point

The synthesis can be done by send a POST request to "https://vatsalaggarwal--tts-app.modal.run".

Headers

Make sure to include headers that keep the connection alive sufficiently long as the inference can take 30s-150s.

"Connection: keep-alive",
"Keep-Alive: timeout=600, max=100",

Body

  • api_key: Key that manages payment (note the charges are "at-cost"). It can be generated by visiting https://tts.themetavoice.xyz
  • text: Text you want to synthesize. Normalized text (1 -> "one") with proper punctuation works best. 225 characters is probably the maximum you should try given how the model was trained, and the constraints on the Modal backend.
  • voices: String specifying which voice should be used to synthesize the text. There are four ways to get different voices out of this model:
    • "random": The model randomly picks a voice in its embedding space
    • "<name>": Use one of the voices the model was trained on (e.g. train_grace)
      • Choices are: angie, applejack, cond_latent_example, daniel, deniro, emma, freeman, geralt, halle, jlaw, lj, mol, myself, pat, pat2, rainbow, snakes, tim_reynolds, tom, train_atkins, train_daws, train_dotrice, train_dreams, train_empire, train_grace, train_kennard, train_lescault, train_mouse, weaver, william
    • "<name>&<name>": Combine two voices (e.g. train_grace&emma)
    • "": (Zero-shot) Used with target_file parameter described next to synthesize text in the voice of an utterance given by the user.
  • [optional] target_file: A pointer to a mp3 or wav file from a speaker whose voice you're trying to synthesize text in. This should be uploaded to a static file store (like a public s3 bucket).
    • This parameter is unused if voices is specified, and shouldn't be specified as part of the body.
    • If this parameter is used, make sure that voices is set to "".

Python Wrapper

run_api.py provides an example.

Contributing

TODOs

  • Use GPT-3 to perform text normalisation
  • Inference optimisations
    • TODO: add ideas
  • Finetuning code

Developer environment

  • python=3.10.8 (important for Modal)

tortoise-tts-modal-api's People

Contributors

vatsalaggarwal avatar sidroopdaska avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.