How to select speakers on inference and how to show the list of speakers?

Multi-Speaker Inference,about digitalphonetics/ims-toucan

tomschelsen commented on August 20, 2024

It is exposed through the voice_seed argument of the read method of the ControllableInterface class, see :

IMS-Toucan/InferenceInterfaces/ControllableInterface.py

Line 44 in 1c581e0

voice_seed,

See https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/run_gradio_demo.py for an example usage of that class.

from ims-toucan.

kin0303 commented on August 20, 2024

It is exposed through the voice_seed argument of the read method of the ControllableInterface class, see :

IMS-Toucan/InferenceInterfaces/ControllableInterface.py

Line 44 in 1c581e0

voice_seed,

See https://github.com/DigitalPhonetics/IMS-Toucan/blob/ControllableMultilingual/run_gradio_demo.py for an example usage of that class.

can you give me an example?

from ims-toucan.

kin0303 commented on August 20, 2024

I did training in 2 languages, each with 2 speakers: male and female. How do I make it so that during inference I can choose the speakers I will use? Because I only understand that to replace the speakers we can use the audio reference

from ims-toucan.

Flux9665 commented on August 20, 2024

There is no fixed list of speakers, there are theoretically infinite speakers possible. To change the voice, you first create a Inference Interface using a multispeaker model

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 23 in 1c581e0

class InferenceFastSpeech2(torch.nn.Module):

and then call the set utterance embedding method on the interface

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 93 in 1c581e0

def set_utterance_embedding(self, path_to_reference_audio="", embedding=None):

As argument it takes a filepath to a reference audio, which it will then load and extract an embedding. This embedding is then used as conditioning signal during inference. To change between the two speakers in your data, just call the set utterance embedding method and pass in one of the samples from the dataset from one of the speakers.

from ims-toucan.

kin0303 commented on August 20, 2024

There is no fixed list of speakers, there are theoretically infinite speakers possible. To change the voice, you first create a Inference Interface using a multispeaker model

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 23 in 1c581e0

class InferenceFastSpeech2(torch.nn.Module):

and then call the set utterance embedding method on the interface

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 93 in 1c581e0

def set_utterance_embedding(self, path_to_reference_audio="", embedding=None):

As argument it takes a filepath to a reference audio, which it will then load and extract an embedding. This embedding is then used as conditioning signal during inference. To change between the two speakers in your data, just call the set utterance embedding method and pass in one of the samples from the dataset from one of the speakers.

Thank you for your reply. For this line how to use the language if I use multilingual models?

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 25 in 1c581e0

    
           def __init__(self, device="cpu", model_name="Meta", language="en", use_enhancement=False):

from ims-toucan.

Flux9665 commented on August 20, 2024

You can either set the language when you create the inference interface object, but you can also change it later with the set_language method of the inference object:

IMS-Toucan/InferenceInterfaces/FastSpeech2Interface.py

Line 108 in 1c581e0

def set_language(self, lang_id):

from ims-toucan.

Multi-Speaker Inference about ims-toucan HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent