Code Monkey home page Code Monkey logo

Comments (5)

albertaparicio avatar albertaparicio commented on June 5, 2024

Dear Hudson,

Sorry for the delay in my response, I have been busy working on the seq2seq model.

Regarding the Zaska and dtw scripts, they belong to the Signal Theory and Communications department at the UPC university in Catalonia (this project is being developed for my bachelor thesis).

I have contacted the people at the department to ask them if I can distribute these scripts. I'll get back to you as soon as I have a response

Regarding the resulting sound of the system, I am aware that it does not give very accurate results. You see, the scripts you write about belong to the 'baseline' of the system. This version was developed only to have a reference level of results quality, as we have been focusing our efforts (and still are) on the sequence-to-sequence model.

If you find a way to improve this baseline, that is great news, but we are not going to work on it anymore

As always, thank you for your interest in this project

Cheers!

from tfg-voice-conversion.

HudsonHuang avatar HudsonHuang commented on June 5, 2024

Dear Albert,

Thank you so much for your response. The seq2seq model is definitely a good idea.

And as a reference, you can also check up this company:https://lyrebird.ai/. They are trying to give out an API-level Voice Conversion Solution, for commercial purposes. And it seems they have a good team including Yoshua Bengio.

But as you can see, they still didn't reach a much higher quality as the Mixture Neural Network solution in your project, I mean, maybe they have set a peak level for the Voice Conversion Systems, which is still not very natural, so don't be discouraged if the seq2seq solution doesn't work much better than the Mixture Neural Network solution.

Best regards!

from tfg-voice-conversion.

HudsonHuang avatar HudsonHuang commented on June 5, 2024

@MissPassenger
I found that the ZASKA is an DTW toolkit developed by the UPC and,the dtw is a DTW tool inside of it.
so, I am trying to instead it with mfcc and dtw code in SPTK。
like this:
`
b=2
sox mfcc/${DIR_REF}/${FILENAME}_sil.wav mfcc/${DIR_REF}/${FILENAME}_sil.raw
sox mfcc/${DIR_TST}/${FILENAME}_sil.wav mfcc/${DIR_TST}/${FILENAME}_sil.raw

x2x +sf < mfcc/${DIR_REF}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_REF}/${FILENAME}.mfcc
	
x2x +sf < mfcc/${DIR_TST}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_TST}/${FILENAME}.mfcc

dtw -l 480 mfcc/${DIR_REF}/${FILENAME}.mfcc < mfcc/${DIR_TST}/${FILENAME}.mfcc >> ${DIR_DTW}/${FILENAME}_ascii.dtw

x2x +af ${DIR_DTW}/${FILENAME}_ascii.dtw  ${DIR_DTW}/beam${b}/${FILENAME}.dtw

`

but the dtw command output a unreadable format for x2x command and build_datatable
ansd which seem to be ASCII
I use x2x +af to convert it but it fails.
Any idea?
Thanks.

from tfg-voice-conversion.

albertaparicio avatar albertaparicio commented on June 5, 2024

from tfg-voice-conversion.

HudsonHuang avatar HudsonHuang commented on June 5, 2024

That helps a lot~ many thanks.

from tfg-voice-conversion.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.