ncsoft / avocodo Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
License: Other
Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
License: Other
Hi,
Thanks for sharing Avocodo, I'd like to use this vocoder on higher sampling rate, 32Khz. Can you give me some suggestions on how to change the PQMF part when training 32KHz Avocodo?
Thanks,
Bolong Wen
Hello,
Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.
In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.
2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.
I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.
Can you explain the discrepancy?
Thank you
Nice work! The example results sound promising. It would be better if you could provide some pretrained models.
As I saw in HiFiGAN, after training using GT mels, they further used teacher forcing mels from TTS inference to fine tune the model, and got better result.
Is this strategy also suitable for avocodo?
Hello, I'm training Avocodo Model with my own dataset consist of multiple datasets.
I touched some Generator's Parameter to change input and target sample rate. Generating 32kHz wave from 24kHz Mel. Hop size is 400.
When I train my avocodo model, Feature matching loss increases even Discriminator loss's descent stops.
As an aside, strangely enough, Mel Loss's descent, and the quality of the audio output is pretty good.
Is it normal while train vocoder? Will the feature matching loss`s acendent ever stop?
We'd love to hear about your experiences.
Thank you.
HYPER PARAMETERS
model:
upsample_rates: '[[5], [5], [4], [4]]'
upsample_kernel_sizes: '[[11], [11], [8], [8]]'
upsample_initial_channel: 384
resblock_kernel_sizes: '[3,7,11]'
resblock_dilation_sizes: '[[1,3,5], [1,3,5], [1,3,5]]'
projection_filters: '[0, 1, 1, 1]'
projection_kernels: '[0, 5, 7, 11]'
combd_h_u: '[[16, 64, 256, 1024, 1024, 1024], [16, 64, 256, 1024, 1024, 1024], [16,
64, 256, 1024, 1024, 1024]]'
combd_d_k: '[[7, 11, 11, 11, 11, 5], [11, 21, 21, 21, 21, 5], [15, 41, 41, 41, 41,
5]]'
combd_d_s: '[[1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1]]'
combd_d_d: '[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]'
combd_d_g: '[[1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256,
1]]'
combd_d_p: '[[3, 5, 5, 5, 5, 2], [5, 10, 10, 10, 10, 2], [7, 20, 20, 20, 20, 2]]'
combd_op_f: '[1, 1, 1]'
combd_op_k: '[3, 3, 3]'
combd_op_g: '[1, 1, 1]'
sbd_filters: '[[64, 128, 256, 256, 256],[64, 128, 256, 256, 256],[64, 128, 256,
256, 256],[32, 64, 128, 128, 128]]'
sbd_strides: '[[1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1]]'
sbd_kernel_sizes: '[ [[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7]], [[5,
5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5]], [[3, 3, 3],[3, 3, 3],[3,
3, 3],[3, 3, 3],[3, 3, 3]], [[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5,
5, 5]] ]'
sbd_dilations: '[ [[5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7,
11]], [[3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7]], [[1,
2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2,
3], [1, 2, 3], [2, 3, 5], [2, 3, 5]] ]'
sbd_band_ranges: '[[0, 6], [0, 11], [0, 16], [0, 64]]'
sbd_transpose: '[False, False, False, True]'
model_pqmf_config: '{ ''sbd'': [16, 256, 0.03, 10.0], ''fsbd'': [64,
256, 0.1, 9.0] }'
segment_size: 32000
pqmf_config: '{ ''lv1'': [4, 192, 0.25, 10.0], ''lv2'': [16, 256,
0.03, 10.0] }'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.