olaviinha / neuraltexttoaudio Goto Github PK

Text prompt steered synthetic audio generators

Jupyter Notebook 100.00%

text2music text2audio audio-generation audio-synthesis audioldm music-generation voice-synthesis mubert mubertai voice-cloning

neuraltexttoaudio's Introduction

Colab notebooks for text-to-audio generators

User-friendly Colab notebooks for various text prompt steered synthetic audio generators.

Available notebooks:

AudioLDM – text-to-audio
TorToiSe TTS – text-to-speech w/ voice-cloning
MubertAI Text-to-Music – text-to-music
TTS Voice Cloning – text-to-speech w/ voice-cloning

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

Paper: Text-to-Audio Generation with Latent Diffusion Models

Colab for AudioLDM. Generates audio based on text description. This is probably the beginning of "Stable Diffusion of audio". Currently capable of producing 16 kHz audio only.

TorToiSe: Text-to-speech

Paper: TorToiSe - Spending Compute for High Quality TTS

Colab for TorToiSe text-to-speech voice-cloning. This notebook takes a text string and an audio file (or files) of a speaker's voice, and attempts to synthesize the text using the given voice. Currently works with English text only.

MubertAI Text-to-Music

UPDATE: it seems like Mubert API now requires (paid) API key.

Colab for MubertAI Text-to-Music. Generates music using predefined blocks created by the community (afaik) based on text description. See the source repository for information, such as licensing.

TTS Voice Cloning

Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Colab for Real-Time-Voice-Cloning text-to-speech voice-cloning. This notebook takes a text string and an audio file of a speaker's voice, and attempt to synthesize the text using the given voice. Fair warning: results are not great.

neuraltexttoaudio's People

Contributors

Stargazers

Watchers

Forkers

zekelhealthcare tankvtt blocksats wumuyu9 deyh2020

neuraltexttoaudio's Issues

Adding support to seamlessm4t model

It will be great to have support for Seamless m4t so that we can use any language for TTS

Issue running the Tortoise TTS collab

I'm getting the following error while trying to run the generation:

IndexError Traceback (most recent call last)
in <cell line: 64>()
88 bytes_collected = 0
89 for voice_file in voice_files:
---> 90 voice_file = remove_silence(voice_file, window_size=2, threshold=0.1, save_as=dir_tmp_processed+path_leaf(voice_file))
91 file_duration = get_audio_duration(voice_file)
92 slice_file = dir_tmp_slices+path_leaf(voice_file)

2 frames
in clip_audio(audio_data, start, duration, sr)
94 xstart = librosa.time_to_samples(start, sr=sr)
95 xduration = librosa.time_to_samples(start+duration, sr=sr)
---> 96 audio_data = audio_data[:, xstart:xduration]
97 return audio_data
98

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

AudioLDM colab notebook broken

During setup

/root/.cache/audiol 100%[===================>]   2.38G  19.3MB/s    in 2m 9s   

2023-05-30 22:35:33 (19.0 MB/s) - ‘/root/.cache/audioldm/audioldm-full-s-v2.ckpt’ saved [2559017383/2559017383]

---------------------------------------------------------------------------
SameFileError                             Traceback (most recent call last)
[<ipython-input-1-37733fc54a37>](https://localhost:8080/#) in <cell line: 200>()
    219   op(c.warn, 'Downloading', use_ckpt)
    220   get_ipython().system('wget {ckpt_url} -O {models_dir}{use_ckpt}')
--> 221   shutil.copy(models_dir+use_ckpt, use_ckpt_path+use_ckpt)
    222   op(c.ok, 'Done.')
    223 

1 frames
[/usr/lib/python3.10/shutil.py](https://localhost:8080/#) in copy(src, dst, follow_symlinks)
    415     if os.path.isdir(dst):
    416         dst = os.path.join(dst, os.path.basename(src))
--> 417     copyfile(src, dst, follow_symlinks=follow_symlinks)
    418     copymode(src, dst, follow_symlinks=follow_symlinks)
    419     return dst

[/usr/lib/python3.10/shutil.py](https://localhost:8080/#) in copyfile(src, dst, follow_symlinks)
    232 
    233     if _samefile(src, dst):
--> 234         raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
    235 
    236     file_size = 0

SameFileError: '/root/.cache/audioldm/audioldm-full-s-v2.ckpt' and '/root/.cache/audioldm/audioldm-full-s-v2.ckpt' are the same file

When attempting to generate audio

NameError                                 Traceback (most recent call last)
[<ipython-input-7-28de7531aad6>](https://localhost:8080/#) in <cell line: 77>()
    144     else:
    145       file_out = fo_head+slug(prompt)[:trunc]+'.wav'
--> 146     generated_audio = text2audio(prompt, duration, None, guidance_scale, seed, candidates, ddim_steps)
    147   elif action == 'audio2audio':
    148     file_out = fo_head+basename(init_path)+'.wav'

NameError: name 'text2audio' is not defined

olaviinha / neuraltexttoaudio Goto Github PK

neuraltexttoaudio's Introduction

Colab notebooks for text-to-audio generators

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

TorToiSe: Text-to-speech

MubertAI Text-to-Music

TTS Voice Cloning

neuraltexttoaudio's People

Contributors

Stargazers

Watchers

Forkers

neuraltexttoaudio's Issues

Adding support to seamlessm4t model

Issue running the Tortoise TTS collab

AudioLDM colab notebook broken

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent