Code Monkey home page Code Monkey logo

voicesmith's Introduction

VoiceSmith [WIP]

VoiceSmith makes it possible to train and infer on both single and multispeaker models without any coding experience. It fine-tunes a pretty solid text to speech pipeline based on a modified version of DelightfulTTS and UnivNet on your dataset. Both models were pretrained on a proprietary 5000 speaker dataset. It also provides some tools for dataset preprocessing like automatic text normalization.

If you want to play around with a model trained on a highly emotional emotional 60 speaker dataset using an earlier version of this software click here.

Requirements

Hardware

  • OS: Windows or any Linux based operating system. If you want to run this on macOS you have to follow the steps in build from source in order to create the installer. This is untested since I don't currently own a Mac.
  • Graphics: NVIDIA GPU with CUDA support is highly recommended, you can train on CPU otherwise but it will take days if not weeks.
  • RAM: 8GB of RAM, you can try with less but it may not work.

Software

How to install

  1. Download the latest installer from the releases page.
  2. Double click to run the installer.

How to develop

  1. Make sure you have the latest version of Node.js installed

  2. Clone the repository

    git clone https://github.com/dunky11/voicesmith
    
  3. Install dependencies, this can take a minute

    cd voicesmith
    npm install
    
  4. Click here, select the folder with the latest version, download all files and place them inside the repositories assets folder.

  5. Start the project

    npm start
    

Build from source

  1. Follow steps 1 - 4 from above.

  2. Run make, this will create a folder named out/make with an installer inside. The installer will be different based on your operating system.

    npm make
    

Architecture

VoiceSmith currently uses a two-stage modified DelightfulTTS and UnivNet pipeline.

Contribute

Show your support by โญ the project. Pull requests are always welcome.

License

This project is licensed under the Apache-2.0 license - see the LICENSE.md file for details.

voicesmith's People

Contributors

dunky11 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.