Code Monkey home page Code Monkey logo

Comments (7)

primepake avatar primepake commented on July 30, 2024

what is your language?

from styletts2.

yl4579 avatar yl4579 commented on July 30, 2024

It depends on how much speech data you have. If you only have 10-20 hours of data like LJSpeech, then you need millions of sentences in PL-BERT to have a good performance. Otherwise, if you have like thousands of hours speech data, probably you don’t even need a pretrained PL-BERT to begin with.

from styletts2.

JohnHerry avatar JohnHerry commented on July 30, 2024

Thanks, Is there any PL

what is your language?

For Mandarin and English mixed TTS

from styletts2.

yl4579 avatar yl4579 commented on July 30, 2024

@JohnHerry What do you mean by mixed? Like a sentence contains both English and Mandarin, or just something trained on the mixture of English and Mandarin datasets?

from styletts2.

JohnHerry avatar JohnHerry commented on July 30, 2024

@JohnHerry What do you mean by mixed? Like a sentence contains both English and Mandarin, or just something trained on the mixture of English and Mandarin datasets?

@yl4579 I am sorry for my ill expression. I mean one sentence contains both English and Mandarin. Eg. "我们来讨论一下这个 paper idea"。 The PLM should be based on Mandarin phonemes but also support English phonemes. I made an issure in the phonemizer project and got that its "language_switch" function is limited and the output may contains some faults when the input text is bilingual.

And what I concern about also, is the alphabeta of bilingual phonemes. As what you said in the StyleTTS-issure10, You had tried to map Mandarin-PinYin into IPA characters but not work. I guess that may be raised in that the Alphabeta in your PinYin-IPA mapping, is not the same with the Alphabeta used in the https://github.com/bootphon/phonemizer . So if the output of phonemizer is what the PLM takes, which means they are sharing the same phoneme Alphabeta, then your PinYin-IPA mapping will not work. I did not find the way to get its IPA alphabeta that support cross-lingual phoneme transcription from the phonemizer project. Do you have any ideas how to get such an Alphabeta?

from styletts2.

yl4579 avatar yl4579 commented on July 30, 2024

@JohnHerry Actually I have trained a multilingual model in English, Mandarin and Japanese and it works quite well with all the settings I provided in yl4579/StyleTTS#10. The PL-BERT was also pre-trained on the Wikipedia corpora of these 3 languages, where I used word-level tokenizers for all languages and merged shared graphemes for Japanese and Chinese (for example, both /ɽˈʲokˈai/ and /lˈjaʊtɕˈjɛ/ correspond to the grapheme 了解). It is well compatible with phonemizer as the English IPA was generated by it.

As for your question regarding mixed language, I'm not sure how to get training data for this. The PL-BERT I pre-trained was trained separately on mixed datasets but not mixed languages in a single sentence. However, if your speech data contains a lot of samples of this type it probably would work with this pre-trained PL-BERT as well.

from styletts2.

JohnHerry avatar JohnHerry commented on July 30, 2024

Thanks for the information, it is helpfull.

from styletts2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.