I tried the solution provided by lf0_lstm.py and so. When I tried to modify the parame

A question about "Zaska" and "dtw -b", how could I get more feature by running "compute_dtw.sh"? about tfg-voice-conversion HOT 5 OPEN

HudsonHuang commented on June 5, 2024

A question about "Zaska" and "dtw -b", how could I get more feature by running "compute_dtw.sh"?

from tfg-voice-conversion.

Comments (5)

albertaparicio commented on June 5, 2024

Dear Hudson,

Sorry for the delay in my response, I have been busy working on the seq2seq model.

Regarding the Zaska and dtw scripts, they belong to the Signal Theory and Communications department at the UPC university in Catalonia (this project is being developed for my bachelor thesis).

I have contacted the people at the department to ask them if I can distribute these scripts. I'll get back to you as soon as I have a response

Regarding the resulting sound of the system, I am aware that it does not give very accurate results. You see, the scripts you write about belong to the 'baseline' of the system. This version was developed only to have a reference level of results quality, as we have been focusing our efforts (and still are) on the sequence-to-sequence model.

If you find a way to improve this baseline, that is great news, but we are not going to work on it anymore

As always, thank you for your interest in this project

Cheers!

from tfg-voice-conversion.

HudsonHuang commented on June 5, 2024

Dear Albert,

Thank you so much for your response. The seq2seq model is definitely a good idea.

And as a reference, you can also check up this company:https://lyrebird.ai/. They are trying to give out an API-level Voice Conversion Solution, for commercial purposes. And it seems they have a good team including Yoshua Bengio.

But as you can see, they still didn't reach a much higher quality as the Mixture Neural Network solution in your project, I mean, maybe they have set a peak level for the Voice Conversion Systems, which is still not very natural, so don't be discouraged if the seq2seq solution doesn't work much better than the Mixture Neural Network solution.

Best regards!

from tfg-voice-conversion.

HudsonHuang commented on June 5, 2024

@MissPassenger
I found that the ZASKA is an DTW toolkit developed by the UPC and，the dtw is a DTW tool inside of it.
so， I am trying to instead it with mfcc and dtw code in SPTK。
like this：
`
b=2
sox mfcc/${DIR_REF}/${FILENAME}_sil.wav mfcc/${DIR_REF}/${FILENAME}_sil.raw
sox mfcc/${DIR_TST}/${FILENAME}_sil.wav mfcc/${DIR_TST}/${FILENAME}_sil.raw

x2x +sf < mfcc/${DIR_REF}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_REF}/${FILENAME}.mfcc
	
x2x +sf < mfcc/${DIR_TST}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_TST}/${FILENAME}.mfcc

dtw -l 480 mfcc/${DIR_REF}/${FILENAME}.mfcc < mfcc/${DIR_TST}/${FILENAME}.mfcc >> ${DIR_DTW}/${FILENAME}_ascii.dtw

x2x +af ${DIR_DTW}/${FILENAME}_ascii.dtw  ${DIR_DTW}/beam${b}/${FILENAME}.dtw

but the dtw command output a unreadable format for x2x command and build_datatable
ansd which seem to be ASCII
I use x2x +af to convert it but it fails.
Any idea?
Thanks.

from tfg-voice-conversion.

albertaparicio commented on June 5, 2024

Regarding Zaska, I have contacted the people at UPC who developed this toolkit to ask if I can redistribute these programs, but I have had no answer. Regarding the data formats, I am not aware of the format used by SPTK. In the case of the output of Zaska, the data was stored in float format with no headers, but I do not know how SPTK outputs the data. I have checked in past commits, and if I am not mistaken, in this script I used the SPTK DTW. Maybe this can help: https://github.com/albertaparicio/tfg-voice-conversion/blob/a4aeea2a244cf9f74ae3f03d4d7a6bb10c0a6594/data/training/align_training.sh Albert

…

On 05/06/17 04:48, zhongyi huang wrote: @MissPassenger <https://github.com/misspassenger> I found that the ZASKA is an DTW toolkit developed by the UPC and，the dtw is a DTW tool inside of it. so， I am trying to instead it with mfcc and dtw code in SPTK。 like this： ` b=2 sox mfcc/${DIR_REF}/${FILENAME}_sil.wav mfcc/${DIR_REF}/${FILENAME}_sil.raw sox mfcc/${DIR_TST}/${FILENAME}_sil.wav mfcc/${DIR_TST}/${FILENAME}_sil.raw |x2x +sf < mfcc/${DIR_REF}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \ mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_REF}/${FILENAME}.mfcc x2x +sf < mfcc/${DIR_TST}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \ mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_TST}/${FILENAME}.mfcc dtw -l 480 mfcc/${DIR_REF}/${FILENAME}.mfcc < mfcc/${DIR_TST}/${FILENAME}.mfcc >> ${DIR_DTW}/${FILENAME}_ascii.dtw x2x +af ${DIR_DTW}/${FILENAME}_ascii.dtw ${DIR_DTW}/beam${b}/${FILENAME}.dtw` | but the dtw command output a unreadable format for x2x command and build_datatable ansd which seem to be ASC11]. I use x2x +af to convert it but it fails. Any idea? Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIfQzgColGzElIUttI4HpO5E_Y3-Ebtaks5sA2yKgaJpZM4M2fyH>.

from tfg-voice-conversion.

HudsonHuang commented on June 5, 2024

That helps a lot~ many thanks.

from tfg-voice-conversion.

A question about "Zaska" and "dtw -b", how could I get more feature by running "compute_dtw.sh"? about tfg-voice-conversion HOT 5 OPEN

Comments (5)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent