Code Monkey home page Code Monkey logo

Comments (3)

yl4579 avatar yl4579 commented on July 30, 2024 1

Yes, you can leave multispeaker setting to true. I used the same inference code as in the Colab notebook: https://colab.research.google.com/github/yl4579/StyleTTS2/blob/main/Colab/StyleTTS2_Finetune_Demo.ipynb

I haven't really tested with different max_len, but try to increase it as much as you can while keeping the batch size at least 2, and also do the SLM adversarial training run if you could (this is very RAM consuming though). I know right now the code is not very friendly to low RAM GPUs because of DP implementation. You can wait for fixed DDP implantations for mixed precision training.

from styletts2.

yl4579 avatar yl4579 commented on July 30, 2024

For 4, did you change multispeaker to true or false? The default is true, and the default settings do produce better results than you have. The only difference I can see is batch_size (from 16 to 4), but it shouldn't produce this big difference. max_len from 400 to 100 is probably the cause. This is what I got by finetuning with one hour of data: https://voca.ro/1aC4vr4jErDL using the default setting.

from styletts2.

danielmsu avatar danielmsu commented on July 30, 2024

For 4, did you change multispeaker to true or false?

I fine-tuned the model with multispeaker:true and then tried inference with both true and false. It definitely works better with true, the example I attached is also generated with multispeaker:true. I didn't try to fine-tune it with false, but I guess a model fine-tuned with true in the config should produce better results anyway, is that correct?

max_len from 400 to 100 is probably the cause

Do you know what is the minimal value for decent results? Unfortunately, I cannot use 400, but maybe I could set it a bit higher than 100 if I reduce batch_size even more. Training speed is not a concern for me.

This is what I got by finetuning with one hour of data: https://voca.ro/1aC4vr4jErDL using the default setting.

Yes, that sounds much better. Could you please share inference parameters? Would be awesome if you still have alpha/beta values and the name of the reference clip, so I can compare my results using the same values.

Thanks!

from styletts2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.