Code Monkey home page Code Monkey logo

echo-xi's Introduction

Echo-XI

Buy Me a Coffee at ko-fi.com

Info

I published a tour of all the various features available on youtube, click here to view it.

The main goal of the project is to offer speech to text to speech.

It now has a GUI, and it stores all the settings you input. Sensitive details such as API Keys are stored in the system keyring.

In case you want to use the cli, simply call the script from the comamnd line with the argument --cli.

It offers three separate speech recognition services:

  • Vosk, with recasepunc to add punctuation
  • Azure speech recognition
  • Whisper, both running locally (now using faster-whisper for faster recognition and lower VRAM usage) and through openAI's API

In addition, it automatically translates the output into a language of the user's choosing (from those supported by ElevenLabs' multilingual model), if the user is speaking a different language.

Each speech recognition provider has different language support, so be sure to read the details.

Translation is provided via either DeepL for supported languages, or Google Translate.

The recognized and translated text is then sent to a TTS provider, of which two are supported:

  • Elevenlabs, through the elevenlabslib module, a high quality but paid online TTS service that supports multiple languages.
  • pyttsx3, a low quality TTS that runs locally.

The project also allows you to synchronize the detected text with an OBS text source using obsws-python.

Installation and usage

Warning: Python 3.11 is still not fully supported by pytorch (but it should work on the nightly build). I'd recommend using python 3.10.6

Before anything else: you'll need to have ffmpeg in your $PATH. You can follow this tutorial if you're on windows

Additionally, if you're on linux, you'll need to make sure portaudio is installed.

On windows:

  1. Clone the repo: git clone https://github.com/lugia19/Echo-XI.git

  2. Run run.bat - it will handle all the following steps for you.

Everywhere else:

  1. Clone the repo: git clone https://github.com/lugia19/Echo-XI.git

  2. Create a venv: python -m venv venv

  3. Activate the venv: venv\Scripts\activate

  4. If you did it correctly, there should be (venv) at the start of the command line.

  5. Install the requirements: pip install -r requirements.txt

  6. Run it.

If you would like to use the voice on something like discord, use VB-Cable. In the script select your normal microphone as input, VB-Cable input as the output, then on discord select VB-Cable output as the input. Yes, it's a little confusing.

Notes on vosk/recasepunc

If you're looking to use the vosk/recasepunc and you need something besides the included (downloadable) models, read on.

Vosk models can be found here. The same page also offers some recasepunc models. For additional ones, you can look in the recasepunc repo.

For english I use vosk-model-en-us-0.22 and vosk-recasepunc-en-0.22. Recasepunc is technically optional when using vosk, but highly recommended to improve the output.

The script looks for models under the models/vosk and models/recasepunc folders.

A typical folder structure would look something like this (recasepunc models can either be in their own folder or by themselves, depending on which source you download them from. Both are supported.):

-misc
-models
    -vosk
        -vosk-model-en-us-0.22
        -vosk-model-it-0.22
    -recasepunc
        -vosk-recasepunc-en-0.22
        it.22000
-speechRecognition
-ttsProviders
helper.py
speechToSpeech.py

For everything else, simply run the script and follow the instructions.

If you would like to use the voice on something like discord, use VB-Cable. In the script select your normal microphone as input, VB-Cable input as the output, then on discord select VB-Cable output as the input. Yes, it's a little confusing.

echo-xi's People

Contributors

lugia19 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.