Code Monkey home page Code Monkey logo

realtimetts's Introduction

RealtimeTTS

Easy to use, low-latency text-to-speech library for realtime applications

About the Project

RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. It stands out in its ability to convert text streams fast into high-quality auditory output with minimal latency.

Hint: Check out Linguflex, the original project from which RealtimeTTS is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.

Short_RealtimeTTS_Demo.mov

Key Features

  • Low Latency
    • almost instantaneous text-to-speech conversion
    • compatible with LLM outputs
  • High-Quality Audio
    • generates clear and natural-sounding speech
  • Multiple TTS Engine Support
    • supports OpenAI TTS, Elevenlabs, Azure Speech Services, Coqui TTS and System TTS
  • Multilingual
  • Robust and Reliable:
    • ensures continuous operation with a fallback mechanism
    • switches to alternative engines in case of disruptions guaranteeing consistent performance and reliability, which is vital for critical and professional use cases

Hint: check out RealtimeSTT, the input counterpart of this library, for speech-to-text capabilities. Together, they form a powerful realtime audio wrapper around large language models.

FAQ

Check the FAQ page for answers to a lot of questions around the usage of RealtimeTTS.

Updates

Latest Version: v0.3.46

See release history.

Tech Stack

This library uses:

  • Text-to-Speech Engines

    • OpenAIEngine: OpenAI's TTS system offers 6 natural sounding voices.
    • CoquiEngine: High quality local neural TTS.
    • AzureEngine: Microsoft's leading TTS technology. 500000 chars free per month.
    • ElevenlabsEngine: Offer the best sounding voices available.
    • SystemEngine: Native engine for quick setup.
  • Sentence Boundary Detection

    • NLTK Sentence Tokenizer: Uses the Natural Language Toolkit's sentence tokenizer for precise and efficient sentence segmentation.

By using "industry standard" components RealtimeTTS offers a reliable, high-end technological foundation for developing advanced voice solutions.

Installation

Simple installation:

pip install RealtimeTTS

This will install all the necessary dependencies, including a CPU support only version of PyTorch (needed for Coqui engine)

Installation into virtual environment with GPU support:

python -m venv env_realtimetts
env_realtimetts\Scripts\activate.bat
python.exe -m pip install --upgrade pip
pip install RealtimeTTS
pip install torch==2.2.1+cu118 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

More information about CUDA installation.

Engine Requirements

Different engines supported by RealtimeTTS have unique requirements. Ensure you fulfill these requirements based on the engine you choose.

SystemEngine

The SystemEngine works out of the box using your system's built-in TTS capabilities. No additional setup is needed.

OpenAIEingine

To use the OpenAIEngine:

  • set environment variable OPENAI_API_KEY
  • install ffmpeg (see CUDA installation point 3)

AzureEngine

To use the AzureEngine, you will need:

  • Microsoft Azure Text-to-Speech API key (provided via AzureEngine constructor parameter "speech_key" or in the environment variable AZURE_SPEECH_KEY)
  • Microsoft Azure service region.

Make sure you have these credentials available and correctly configured when initializing the AzureEngine.

ElevenlabsEngine

For the ElevenlabsEngine, you need:

  • Elevenlabs API key (provided via ElevenlabsEngine constructor parameter "api_key" or in the environment variable ELEVENLABS_API_KEY)

  • mpv installed on your system (essential for streaming mpeg audio, Elevenlabs only delivers mpeg).

    🔹 Installing mpv:

    • macOS:

      brew install mpv
    • Linux and Windows: Visit mpv.io for installation instructions.

CoquiEngine

Delivers high quality, local, neural TTS with voice-cloning.

Downloads a neural TTS model first. In most cases it be fast enought for Realtime using GPU synthesis. Needs around 4-5 GB VRAM.

  • to clone a voice submit the filename of a wave file containing the source voice as "voice" parameter to the CoquiEngine constructor
  • voice cloning works best with a 22050 Hz mono 16bit WAV file containing a short (~5-30 sec) sample

On most systems GPU support will be needed to run fast enough for realtime, otherwise you will experience stuttering.

Quick Start

Here's a basic usage example:

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

Feed Text

You can feed individual strings:

stream.feed("Hello, this is a sentence.")

Or you can feed generators and character iterators for real-time streaming:

def write(prompt: str):
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content" : prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
            yield text_chunk

text_stream = write("A three-sentence relaxing speech.")

stream.feed(text_stream)
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)

Playback

Asynchronously:

stream.play_async()
while stream.is_playing():
    time.sleep(0.1)

Synchronously:

stream.play()

Testing the Library

The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.

Note that most of the tests still rely on the "old" OpenAI API (<1.0.0). Usage of the new OpenAI API is demonstrated in openai_1.0_test.py.

  • simple_test.py

    • Description: A "hello world" styled demonstration of the library's simplest usage.
  • complex_test.py

    • Description: A comprehensive demonstration showcasing most of the features provided by the library.
  • coqui_test.py

    • Description: Test of local coqui TTS engine.
  • translator.py

    • Dependencies: Run pip install openai realtimestt.
    • Description: Real-time translations into six different languages.
  • openai_voice_interface.py

    • Dependencies: Run pip install openai realtimestt.
    • Description: Wake word activated and voice based user interface to the OpenAI API.
  • advanced_talk.py

    • Dependencies: Run pip install openai keyboard realtimestt.
    • Description: Choose TTS engine and voice before starting AI conversation.
  • minimalistic_talkbot.py

    • Dependencies: Run pip install openai realtimestt.
    • Description: A basic talkbot in 20 lines of code.
  • simple_llm_test.py

    • Dependencies: Run pip install openai.
    • Description: Simple demonstration how to integrate the library with large language models (LLMs).
  • test_callbacks.py

    • Dependencies: Run pip install openai.
    • Description: Showcases the callbacks and lets you check the latency times in a real-world application environment.

Pause, Resume & Stop

Pause the audio stream:

stream.pause()

Resume a paused stream:

stream.resume()

Stop the stream immediately:

stream.stop()

Requirements Explained

  • Python Version:

    • Required: Python >= 3.9, < 3.12
    • Reason: The library depends on the GitHub library "TTS" from coqui, which requires Python versions in this range.
  • requests (>=2.31.0): to send HTTP requests for API calls and voice list retrieval

  • PyAudio (>=0.2.13): to create an output audio stream

  • stream2sentence (>=0.1.1): to split the incoming text stream into sentences

  • pyttsx3 (>=2.90): System text-to-speech conversion engine

  • azure-cognitiveservices-speech (>=1.31.0): Azure text-to-speech conversion engine

  • elevenlabs (>=0.2.24): Elevenlabs text-to-speech conversion engine

Configuration

Initialization Parameters for TextToAudioStream

When you initialize the TextToAudioStream class, you have various options to customize its behavior. Here are the available parameters:

engine (BaseEngine)

  • Type: BaseEngine
  • Required: Yes
  • Description: The underlying engine responsible for text-to-audio synthesis. You must provide an instance of BaseEngine or its subclass to enable audio synthesis.

on_text_stream_start (callable)

  • Type: Callable function
  • Required: No
  • Description: This optional callback function is triggered when the text stream begins. Use it for any setup or logging you may need.

on_text_stream_stop (callable)

  • Type: Callable function
  • Required: No
  • Description: This optional callback function is activated when the text stream ends. You can use this for cleanup tasks or logging.

on_audio_stream_start (callable)

  • Type: Callable function
  • Required: No
  • Description: This optional callback function is invoked when the audio stream starts. Useful for UI updates or event logging.

on_audio_stream_stop (callable)

  • Type: Callable function
  • Required: No
  • Description: This optional callback function is called when the audio stream stops. Ideal for resource cleanup or post-processing tasks.

on_character (callable)

  • Type: Callable function
  • Required: No
  • Description: This optional callback function is called when a single character is processed.

output_device_index (int)

  • Type: Integer
  • Required: No
  • Default: None
  • Description: Specifies the output device index to use. None uses the default device.

tokenizer (string)

  • Type: String
  • Required: No
  • Default: nltk
  • Description: Tokenizer to use for sentence splitting (currently "nltk" and "stanza" are supported).

language (string)

  • Type: String
  • Required: No
  • Default: en
  • Description: Language to use for sentence splitting.

muted (bool)

  • Type: Bool
  • Required: No
  • Default: False
  • Description: Global muted parameter. If True, no pyAudio stream will be opened. Disables audio playback via local speakers (in case you want to synthesize to file or process audio chunks) and overrides the play parameters muted setting.

level (int)

  • Type: Integer
  • Required: No
  • Default: logging.WARNING
  • Description: Sets the logging level for the internal logger. This can be any integer constant from Python's built-in logging module.

Example Usage:

engine = YourEngine()  # Substitute with your engine
stream = TextToAudioStream(
    engine=engine,
    on_text_stream_start=my_text_start_func,
    on_text_stream_stop=my_text_stop_func,
    on_audio_stream_start=my_audio_start_func,
    on_audio_stream_stop=my_audio_stop_func,
    level=logging.INFO
)

Methods

play and play_async

These methods are responsible for executing the text-to-audio synthesis and playing the audio stream. The difference is that play is a blocking function, while play_async runs in a separate thread, allowing other operations to proceed.

fast_sentence_fragment (bool)
  • Default: False
  • Description: When set to True, the method will prioritize speed, generating and playing sentence fragments faster. This is useful for applications where latency matters.
buffer_threshold_seconds (float)
  • Default: 2.0

  • Description: Specifies the time in seconds for the buffering threshold, which impacts the smoothness and continuity of audio playback.

    • How it Works: Before synthesizing a new sentence, the system checks if there is more audio material left in the buffer than the time specified by buffer_threshold_seconds. If so, it retrieves another sentence from the text generator, assuming that it can fetch and synthesize this new sentence within the time window provided by the remaining audio in the buffer. This process allows the text-to-speech engine to have more context for better synthesis, enhancing the user experience.

    A higher value ensures that there's more pre-buffered audio, reducing the likelihood of silence or gaps during playback. If you experience breaks or pauses, consider increasing this value.

  • Hint: If you experience silence or breaks between sentences, consider raising this value to ensure smoother playback.

minimum_sentence_length (int)
  • Default: 3
  • Description: Sets the minimum character length to consider a string as a sentence to be synthesized. This affects how text chunks are processed and played.
log_characters (bool)
  • Default: False
  • Description: Enable this to log the individual characters that are being processed for synthesis.
log_synthesized_text (bool)
  • Default: False
  • Description: When enabled, logs the text chunks as they are synthesized into audio. Helpful for auditing and debugging.

By understanding and setting these parameters and methods appropriately, you can tailor the TextToAudioStream to meet the specific needs of your application.

CUDA installation

These steps are recommended for those who require better performance and have a compatible NVIDIA GPU.

Note: to check if your NVIDIA GPU supports CUDA, visit the official CUDA GPUs list.

To use torch with support via CUDA please follow these steps:

Note: newer pytorch installations may (unverified) not need Toolkit (and possibly cuDNN) installation anymore.

  1. Install NVIDIA CUDA Toolkit: For example, to install Toolkit 11.8 please

  2. Install NVIDIA cuDNN: For example, to install cuDNN 8.7.0 for CUDA 11.x please

    • Visit NVIDIA cuDNN Archive.
    • Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
    • Download and install the software.
  3. Install ffmpeg:

    You can download an installer for your OS from the ffmpeg Website.

    Or use a package manager:

    • On Ubuntu or Debian:

      sudo apt update && sudo apt install ffmpeg
    • On Arch Linux:

      sudo pacman -S ffmpeg
    • On MacOS using Homebrew (https://brew.sh/):

      brew install ffmpeg
    • On Windows using Chocolatey (https://chocolatey.org/):

      choco install ffmpeg
    • On Windows using Scoop (https://scoop.sh/):

      scoop install ffmpeg
  4. Install PyTorch with CUDA support:

    pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
  5. Fix for to resolve compatility issues: If you run into library compatility issues, try setting these libraries to fixed versions:

    pip install networkx==2.8.8
    pip install typing_extensions==4.8.0
    pip install fsspec==2023.6.0
    pip install imageio==2.31.6
    pip install networkx==2.8.8
    pip install numpy==1.24.3
    pip install requests==2.31.0

💖 Acknowledgements

Huge shoutout to the team behind Coqui AI being the first giving us local high quality synthesis with realtime speed and even a clonable voice!

Contribution

Contributions are always welcome (e.g. PR to add a new engine).

License Information

❗ Important Note:

While the source of this library is open-source, the usage of many of the engines it depends on are not: External engine providers often restrict commercial use in their free plans. This means the engines can be used for noncommercial projects, but commercial usage requires a paid plan.

Engine Licenses Summary:

CoquiEngine

  • License: Open-source only for noncommercial projects.
  • Commercial Use: Requires a paid plan.
  • Details: CoquiEngine License

ElevenlabsEngine

  • License: Open-source only for noncommercial projects.
  • Commercial Use: Available with every paid plan.
  • Details: ElevenlabsEngine License

AzureEngine

  • License: Open-source only for noncommercial projects.
  • Commercial Use: Available from the standard tier upwards.
  • Details: AzureEngine License

SystemEngine

  • License: Mozilla Public License 2.0 and GNU Lesser General Public License (LGPL) version 3.0.
  • Commercial Use: Allowed under this license.
  • Details: SystemEngine License

OpenAIEngine

Disclaimer: This is a summarization of the licenses as understood at the time of writing. It is not legal advice. Please read and respect the licenses of the different engine providers yourself if you plan to use them in a project.

Author

Kolja Beigel
Email: [email protected]
GitHub

realtimetts's People

Contributors

gayaanid avatar k0hacuu avatar kimjammer avatar koljab avatar lendot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

realtimetts's Issues

Can we get a docker image created for quick setup?

Hey,

I'll take a stab myself, but wanted to suggest getting a docker image created to help with some of these dependency issues to see if that helps out and though it best to record it here for both TTS/STT projects.

async not working as expected?

This code:

stream = TextToAudioStream(engine)
stream.feed("Hello")
stream.play_async()
time.sleep(0.1)
stream.feed("friend")
if stream.is_playing():
    stream.play_async()

... does only play "hello" and not "friend". However, if I comment out time.sleep() it plays both. Also, if I sleep for 2+ seconds it also plays both words.

Is this expected?

[Feature request] Ability to specify the output device

Hi, I'm doing a project now, and I really need to be able to specify the output device (I want to output to a virtual microphone) the result of streaming.

It would be cool if it would be possible to specify the output device as it is done in RealtimeSTT with the input device.

Thanks for your work )

OS Error: No such file or directory

OSError: libespeak.so.1: cannot open shared object file: No such file or directory

trying to run:
from RealtimeTTS import TextToAudioStream, SystemEngine

def dummy_generator():
yield "This is a sentence. And here's another! Yet, "
yield "there's more. This ends now."

TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

ubuntu 22

Unclear output with the CoquiEngine when using short input feed

Hi,
For my usage I am feeding the engine the sentence word by word.
using the SystemEngine I got a somewhat coherent sentence (the words were clear but the sentence was too fast),
but when using the CoquiEngine the words became very unclear and I experienced pauses.
I tried to up the buffer_threshold_seconds=7 but with no apparent improvement.
any suggestions how can I improve the output?
when feeding the engine complete sentences I gotten pretty good result, I am also using voice cloning, but this phenomenon persist with the default voice too.
Thank you!

Small update for your README.md

I've just been doing a load of work with the Coquii TTS engine and I thought it wanted 24000Hz for sample files. Turns out as standard it wants 22050Hz. They both work, but if you look in the config.json file that comes downloaded with the models, it has a set preference for 22050Hz as the input file (and yes, mono 16 bit etc).

I was just taking an interest in your RealtimeTTS and thinking of pulling it into my project and spotted https://github.com/KoljaB/RealtimeTTS#coquiengine figured you may want to update it.

image

Thanks

Can I use this in Flask Python webapp?

Hello, thanks for your work first.

Can I use this in Flask Python webapp?

I'm going to send request to Flask app from JS to get audio streaming.

Is this possible as well?

Hope to hear from you soon.

multiprocessing RuntimeError on Coqui engine

Hey! I've tried running the CoquiEngine using your example, but I'm receiving error

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The traceback points at line 107@RealtimeTTS/engines/coqui_engine.py. Could this be caused by different torch or python version?

[Question] Is there a way to save the streamed audio to file?

I've managed to get the RealTimeTTS library to work, I'm wondering if there's anyway to save/keep appending audio chunks as they come in to a file so that I play it back later on? I want to listen to the output audio exactly as the stream will be playing it using the play_async() function.

Thanks!

on_sentence_START_synthesized

Thanks for great repo again )
It will be great to have something like on_sentence_START_synthesized, so sentence can be printed before the speech started.

No module named 'RealtimeTTS'

I create a python virtual env and install the lib but I am having this error, any idea how to solve this?

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
ModuleNotFoundError: No module named 'RealtimeTTS'

I am using ubuntu lts 22.04

thank you so much

.play(muted=True) in VPS but error ALSA lib confmisc.c:855:(parse_card) cannot find card '0

I'm trying to run a simple TextToAudioStream in an Ubuntu Lightsail container on AWS like this to stream to browser js:

stream = client.chat.completions.create( model="gpt-4-turbo-preview", messages=[{"role": "user", "content": text}], stream=True ) tts_stream = TextToAudioStream( AzureEngine( speech_key='', service_region='westeurope', voice='zh-CN-XiaochenMultilingualNeural', rate='5' ), log_characters=True ) tts_stream.feed(stream).play(on_audio_chunk=handle_audio_chunk, muted=True)

This works in my local environment with muted=True but I'm getting this error even though I don't require audio playback and set muted=True

Is there a way around this or somehow to get this working in a VPS environment?

ALSA lib confmisc.c:855:(parse_card) cannot find card '0' ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings ... ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM sysdefault ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear

KeyError: 'VoiceAge' in mac M1

(venv) ➜  RealtimeTTS git:(main) ✗ python3 simple_test.py
Traceback (most recent call last):
  File "/Users/lout/Documents/projects/explore_ai/RealtimeTTS/simple_test.py", line 12, in <module>
    engine = SystemEngine() # replace with your TTS engine
             ^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/base_engine.py", line 10, in __call__
    instance = super().__call__(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 36, in __init__
    self.set_voice(voice)
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 105, in set_voice
    installed_voices = self.engine.getProperty('voices')
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/engine.py", line 146, in getProperty
    return self.proxy.getProperty(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/driver.py", line 173, in getProperty
    return self._driver.getProperty(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in getProperty
    return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in <listcomp>
    return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 64, in _toVoice
    attr['VoiceAge'])
    ~~~~^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience_mapping.py", line 18, in __getitem__objectForKey_
    return container_unwrap(res, KeyError, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience.py", line 134, in container_unwrap
    raise exc_type(*exc_args)
KeyError: 'VoiceAge'

Code:

from RealtimeTTS import TextToAudioStream, SystemEngine

def dummy_generator():
    yield "This is a sentence. And here's another! Yet, "
    yield "there's more. This ends now."

TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

Also i had to install the
pip3 install pyobjc==9.0.1
for this sample to work

import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()

Getting an error from multiprocessing for Coqui

I installed coqui engine for my project by refering the github docs and i got the following error

File "", line 1, in
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "C:\Users\david\PycharmProjects\ai\e.py", line 3, in
engine = CoquiEngine() # replace with your TTS engine
^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in call
instance = super().call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 190, in init
self.create_worker_process()
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 248, in create_worker_process
self.synthesize_process.start()
File "C:\Users\david\anaconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

What parameters that should be used for speech generation for long text

Thank you for your work, I think it's cool.I find that speech generated using short text works great, but when I try to use it to generate speech for longer text, the speech starts out fast and gets slower and slower later on, and there are occasional repetitive sentences, what are the appropriate parameters that should be used for speech generation for long text? Thank you.

Voice cloning procedure.

The front page says:
to clone a voice submit the filename of a wave file containing the source voice as cloning_reference_wav to the CoquiEngine constructor.

This code works, where do I put the cloning_reference_wav? Thanks
if name == 'main':
from RealtimeTTS import TextToAudioStream, CoquiEngine

import logging
logging.basicConfig(level=logging.INFO)    
engine = CoquiEngine(level=logging.INFO)

stream = TextToAudioStream(engine)

print ("Starting to play stream")
stream.feed("Everything is going perfectly")
stream.play() #pause(), resume(), stop()

engine.shutdown()

KeyError: 'speaker_embedding' and TypeError: Log._log() got an unexpected keyword argument 'exc_info'

Hi, I encountered the following error, could someone help me?

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from RealtimeTTS import TextToAudioStream, CoquiEngine
engine = CoquiEngine()
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
/home/yipyewmun/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
Using model: xtts

Process Process-1:
Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 149, in _synthesize_worker
gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 92, in get_conditioning_latents
speaker_embedding = (torch.tensor(latents["speaker_embedding"]).unsqueeze(0).unsqueeze(-1))
KeyError: 'speaker_embedding'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 152, in _synthesize_worker
logging.exception(f"Error initializing main faster_whisper transcription model: {e}")
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2113, in exception
error(msg, *args, exc_info=exc_info, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2105, in error
root.error(msg, *args, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 1506, in error
self._log(ERROR, msg, args, **kwargs)
TypeError: Log._log() got an unexpected keyword argument 'exc_info'

How can i send every call to stream.feed(content) straight to synthesize.

I made my own logic to build the sentences up.

How can i send each call of

stream.feed(content)

straight into synthesize.

I don't want stream to wait for the next batch or sentence and just go straight to synthesize each time its called.

Have

buffer_threshold_seconds = 0
fast_sentence_fragment = True

stream.stop() doesn't work

Hi, first of all thanks for your project, it is very cool to be able to get the result almost in realtime

I use CoquiEngine
I am using FastAPI and I have a problem that when I want to stop a stream and then restart stream, maybe with a different text, I can't do it.

After I do stream.stop() the subsequent stream.play_async() doesn't work and I have to restart the server to get everything working again.

To make it clearer, I recorded a small video and attached a simple server code

stop_problem.mp4

Here is a simple server code that demonstrates the problem
server.py

import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from RealtimeTTS import TextToAudioStream, CoquiEngine

app = FastAPI()

class SynthesisRequest(BaseModel):
    text: str

# Stopping occurs correctly, but when we start a new stream with new text, 
# we can't do anything with it, the stream stops playing and doesn't work anymore, 
# only restarting helps to fix it
@app.get("/tts_stop")
async def tts_stop():
    stream.stop()

# It works fine, but after we do stream.stop() it crashes and doesn't work anymore.
@app.post("/tts_to_audio")
async def tts_to_audio(request: SynthesisRequest):
    stream.feed(request.text)
    stream.play_async()

    return {"message": "stream"}

if __name__ == "__main__":
    engine = CoquiEngine()
    stream = TextToAudioStream(engine)

    uvicorn.run(app,port=8010)

Ui like in showcase video?

Hello, I'm super interested in your project and have been messing around with AI of various types for a few months, however, I am relatively new to python and coding in general.

This project stands out to me and I'm forcing myself to learn python and to be able to utilize it fully, due to myself being mute and unable to speak. I want to be able to use this to join in and talk to my friends over discord and have more of a presence.

short version, is it possible to share the simple UI (Or is it already there and I'm just dumb..) so that I may learn from it and expand on it?

Sorry, wasn't sure where else to ask this, Kindest regards.

Error initializing main faster_whisper transcription model: Error opening 'female.wav': System error.

Hi, I just tried using your API on a jupyter notebook with the CoquiTTS engine. It seems like it's expected a female.wav file to be present? Here's the error I'm getting:

ERROR:root:Error initializing main faster_whisper transcription model: Error opening 'female.wav': System error.
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 135, in _synthesize_worker
    gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 95, in get_conditioning_latents
    gpt_cond_latent, speaker_embedding = tts.get_conditioning_latents(audio_path=filename)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 343, in get_conditioning_latents
    audio = load_audio(file_path, load_sr)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 73, in load_audio
    audio, lsr = torchaudio.load(audiopath)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 203, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile.py", line 26, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'female.wav': System error.
Process Process-1:

TypeError: issubclass() arg 1 must be a class

I have run this

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

Error

Traceback (most recent call last):
File ".\tts-realtime.py", line 8, in
import RealtimeTTS
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS_init_.py", line 1, in
from .text_to_stream import TextToAudioStream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\text_to_stream.py", line 1, in
from .engines import BaseEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines_init_.py", line 4, in
from .elevenlabs_engine import ElevenlabsEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines\elevenlabs_engine.py", line 2, in
from elevenlabs import voices, generate, stream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs_init_.py", line 3, in
from .types import (
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types_init_.py", line 4, in
from .add_project_response_model import AddProjectResponseModel
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\add_project_response_model.py", line 7, in
from .project_response import ProjectResponse
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\project_response.py", line 15, in
class ProjectResponse(pydantic.BaseModel):
File "pydantic\main.py", line 205, in pydantic.main.ModelMetaclass.new
File "pydantic\fields.py", line 491, in pydantic.fields.ModelField.infer
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 636, in pydantic.fields.ModelField._type_analysis
File "pydantic\fields.py", line 781, in pydantic.fields.ModelField._create_sub_type
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 641, in pydantic.fields.ModelField._type_analysis
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\typing.py", line 774, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class

Dependencies missing after "pip install RealtimeTTS" (Windows 11, VSCode, py_10.11)

Dependencies missing after "pip install RealtimeTTS"

(Windows 11, VSCode, py_10.11) almost fresh VSCode (only torch, numpy).
Resolved manually as described below:

Simple test code:

from RealtimeTTS import TextToAudioStream, SystemEngine
TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

Got an error:

(.venv) C:\dev_free\w1>python th_cuda.py
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
...
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning)
WARNING:root:engine system failed to synthesize sentence "This is a sentence." with error: [WinError 2] The system cannot find the file specified
Traceback: Traceback (most recent call last):
File "C:\dev_free\w1.venv\lib\site-packages\RealtimeTTS\text_to_stream.py", line 279, in synthesize_worker
success = self.engine.synthesize(sentence)

--------------------------------------------

installed the ffmpeg/ffprobe

pip install ffmpeg
pip install ffprobe

did not help

google gave:

https://stackoverflow.com/questions/74651215/couldnt-find-ffmpeg-or-avconv-python

ffmpeg-downloader package:

pip install ffmpeg-downloader

did not help

ffdl install --add-path

did not help

#--> restart VSCode (as after all previous steps)

now it worked!

Use coqui engine play_async Invalid output device error

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)

Use coqui engine play_async Invalid output device error

Thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)

Yield output as numpy array

Hi! I’m interesting is there any way to get audio stream (chunks while generated) output as numpy array?

Possible invalid elevenlabs version being used

I believe elevenlabs dependency version need freezing or should be refactored when their engine is not being imported:

I'm getting following error on python 3.12:

Traceback (most recent call last):
  File "tts.py", line 1, in <module>
    from RealtimeTTS import CoquiEngine, TextToAudioStream
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/__init__.py", line 1, in <module>
    from .text_to_stream import TextToAudioStream
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/text_to_stream.py", line 1, in <module>
    from .engines import BaseEngine
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/__init__.py", line 4, in <module>
    from .elevenlabs_engine import ElevenlabsEngine
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 2, in <module>
    from elevenlabs import voices, generate, stream

ImportError: cannot import name 'generate' from 'elevenlabs' (venv path here)

Install error on Windows - Microsoft Visual C++ 14.0 or greater is required.

Hello!
This seems like a very cool project! :) I'm trying to test it out for this little game thing I'm doing but I get an error message when trying to install on Windows 11. (from inside pycharm venv)

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

Does this seem correct to you?
Would I really need to install MSVC for python RealtimeTTS ?

Thank you for making your work public!

Cheers!
Fred

error in play() with engine system: [Errno -9996] Invalid output device (no default output device)

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine

import ipdb;ipdb.set_trace()

stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
print(f"type:{type(stream)}")
print(f"stream:{stream}")

stream.play_async()

stream.play()

error in play() with engine system: [Errno -9996] Invalid output device (no default output device)

How can such problems be solved and what causes them

wav file cloning seems to not work

Hello,

Since v0.3.0 the cloning on wav file seems to not work.
When it's a json file, it finds it and uses the voice accordingly but when when it's a wav file it fallback to the coqui_default_voice.

Thank you very much for your work.

failed install

install fails at pyaudio:
following are various errors from subsequent attempts to resolve issue.
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
Building wheel for PyAudio (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for PyAudio (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-12.6-x86_64-cpython-39
creating build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
copying src/pyaudio/init.py -> build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
running build_ext
building 'pyaudio._portaudio' extension
creating build/temp.macosx-12.6-x86_64-cpython-39
creating build/temp.macosx-12.6-x86_64-cpython-39/src
creating build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/device_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/device_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/host_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/host_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/init.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/init.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/mac_core_stream_info.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/mac_core_stream_info.o
In file included from src/pyaudio/mac_core_stream_info.c:3:
src/pyaudio/mac_core_stream_info.h:13:10: fatal error: 'pa_mac_core.h' file not found
#include "pa_mac_core.h"
^~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for PyAudio
Building wheel for TTS (pyproject.toml) ... done
Created wheel for TTS: filename=TTS-0.22.0-cp39-cp39-macosx_12_0_x86_64.whl size=903439 sha256=ea358468699ac39beab3575f19324aa69622c73a77c6e429933673579e3aee0d
Stored in directory: /Users/karibu/Library/Caches/pip/wheels/e9/94/e7/52e526c3ef9c07ac0b67a7dce87f81b6fb83ffd2d1754224e3
Successfully built TTS
Failed to build PyAudio
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects

wave.Error: file does not start with RIFF id

I test the tests/chinese_test.py but there is an error. Does anyone know how to solve it?

Traceback: Traceback (most recent call last):
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/text_to_stream.py", line 265, in synthesize_worker
success = self.engine.synthesize(sentence)
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/engines/system_engine.py", line 72, in synthesize
with wave.open(self.file_path, 'rb') as wf:
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 509, in open
return Wave_read(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 163, in init
self.initfp(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 130, in initfp
raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id

Crash on windows when use ConqEngine

When I have a test with RealtimeTTS with the code below:

from RealtimeTTS import TextToAudioStream, CoquiEngine

engine = CoquiEngine()  # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

it crashes and give the error:

   engine = CoquiEngine()  # replace with your TTS engine
  File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in __call__
    instance = super().__call__(*args, **kwargs)
  File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\coqui_engine.py", line 96, in __init__
    self.synthesize_process.start()
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Use coqui engine play_async Invalid output device error

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)

KeyError: 'audio'

I continue to get this error and have yet to figure out why. Any ideas?

Exception in thread Thread-2 (play): Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/text_to_stream.py", line 231, in play self.engine.synthesize(self.char_iter) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 121, in synthesize self.stream(self.audio_stream) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 177, in stream for chunk in audio_stream: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/elevenlabs/api/tts.py", line 134, in generate_stream_input if data["audio"]: ~~~~^^^^^^^^^ KeyError: 'audio'

About a paper

Hello Author,
Really great Job with your 4 speech assistant related projects , to try bring the latency down as much as possible!

I was wondering have you see this paper , I think it claims the same , I still don't know which is faster , yours or theirs
https://arxiv.org/abs/2309.11210

Thanks in advance!

Pass Var to coqui_engine.py from stream.()

I am trying to pass a var i can use inside coqui_engine.py _synthesize_worker loop
But i am not finding the right way to do it.

So we have our

stream.feed(content)

which is
in coqui_engine.py _synthesize_worker
text = data['text']

But i want to be able to pass an id like
tts_id_file = '95986845'
stream.tts_id(tts_id_file)

then in coqui_engine.py _synthesize_worker
be able to get inside the loop.

I am using this instead of the output filesname since i am doing custom logic with the chunks i want to be able to name each chunk batch with the unique id.

I know its a bit unusual, i'v been trying for a few hours with no luck ;/

Can we use on_audio_chunk callback as input data in realtime service?

play(self,
fast_sentence_fragment: bool = True,
buffer_threshold_seconds: float = 0.0,
minimum_sentence_length: int = 10,
minimum_first_fragment_length : int = 10,
log_synthesized_text = False,
reset_generated_text: bool = True,
output_wavfile: str = None,
on_sentence_synthesized = None,
on_audio_chunk = None,
tokenizer: str = "nltk",
language: str = "en",
context_size: int = 12,
muted: bool = False,
):
I need to use the on_audio_chunk data as input data in realtime service,can I just use the callback data,but not use the system to play audio?
thanks!

"tests/write_to_file.py" does not seem producing any wav files

Example "tests/write_to_file.py" not producing any files.
I've tried for SystemEngine() and for CoquiEngine()
Used file name, relative and full file name.
Same result as .play():
.play(file_name) outputs to speakers and no "system_output.wav" anywhere on C:\ drive

  stream.load_engine(system_engine)
  stream.feed(dummy_generator())
  # works as a .play() without parameters->output to speakers
  stream.play(output_wavfile=stream.engine.engine_name + "_output.wav") 

in my case it looks like:

def speakSys():
  text_gen   = dummy_generator()
  sys_engine = SystemEngine()
  stream = TextToAudioStream(sys_engine)
  
  stream.feed(text_gen)

  # last attempt to put it into the working folder since just file name or ".\\" did not work  
  output_wavfile = "C:\\dev_free\\w1\\" + stream.engine.engine_name + "_output.wav" 
 
  print (f"Writing to {output_wavfile} ...")
  stream.play(output_wavfile)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.