yangdongchao / academicodec Goto Github PK
View Code? Open in Web Editor NEWAcademiCodec: An Open Source Audio Codec Model for Academic Research
AcademiCodec: An Open Source Audio Codec Model for Academic Research
解开 launch.py 里面关于 OMP_NUM_THREADS 的注释可以加速训练,也能提高 GPU 利用率,因为默认会使用所有核心(对于核心数很多的机器如 A100),多核心之间的交互可能有耗时,如果觉得 1 太小,可以额外在 train.sh 前面控制(如使用 8),LibriTTS 的训练尚未测试
also see yangdongchao/SoundStorm#34
在该仓库中暂未验证
missed json for HiFi-Codec
FileNotFoundError: [Errno 2] No such file or directory: 'logs/config.json'
--config_path ${log_root}/config.json
As I am analyzing new HiFi-codec code I encountered three small bugs:
AcademiCodec/HiFi-Codec/train.py
Line 31 in 3ee7baf
MelSpectrogram
not imported before use :from torchaudio.transforms import MelSpectrogram
Modules
not present inside HiFi-Codec:
Here :
AcademiCodec/HiFi-Codec/msstftd.py
Line 16 in 3ee7baf
modules
not present inside HiFi-Codec folder. So, neede to copy or change modules reference from other model's modules implementation.
Shape of input tensor x, here :
AcademiCodec/HiFi-Codec/vqvae.py
Line 33 in 3ee7baf
24 khz mono channel wav
shape of x
before line 33 comes out -> [Batch, Samples, 1]
and after .unsqueeze(1)
operation at line 33 it becomes [batch, 1, samples, 1]
a 4D tensor which supposed to be 3D tensor. So shape of x needed to be check before line 33 and if it has 3 dimensions and last dimension is 1 then we needed to squeeze last dimension.x
, code is working fine without an error, and I am able to get desired output.
Thanks @yangdongchao .
hello, The final amount of training loss of Encodec_24k_240d model ?
HiFi-codec 的 VQVAE 这个类没有看到 decode 这个函数,是不是直接用 VQVAE 的 forward 函数就代表 decode 过程,因为我看
额,好像理解错误,按照
AcademiCodec/HiFi-Codec-24k-320d/vqvae.py
Line 32 in 4e277f4
vq_code
并不是 acoustic_tokens, 需要过一下 self.quantizer.embed()
才是 acoustic_tokens,不知道这样理解对不对?Hi @yangdongchao
I am planning to training SoundStream codec from this repo to clean version of Libri light dataset + VCTK datasets and will open source the checkpoint, but I have single A100 for that, is it possible to train Soundstream on single A100 with lower batch size for longer time period?
我目前正在学习这个模型的训练流程,但是我看论文说训练到收敛需要8张卡训练一个多月,所以说对我而言短时间内肯定是训练不好的,我想知道有没有训练好的discriminator模型,谢谢。
Hello, I mixed the three data sets libritts, aishell and vctk according to the dataset set in the HIFI-CODEC paper, which lasted about 400 hours, but the performance of the trained model could not reach the performance of the pretrained model you gave. May I ask what datasets are specifically mixed in your paper "and more, with a total duration of over 1000 hours"?
可以例如下面这种格式保存,要不然单机保存的模型根据索引会出现问题,我会在后面提交修复的版本
if epoch % config.common.save_interval == 0:
model_to_save = model.module if config.distributed.data_parallel else model
disc_model_to_save = disc_model.module if config.distributed.data_parallel else disc_model
if not config.distributed.data_parallel or dist.get_rank() == 0:
save_master_checkpoint(epoch, model_to_save, optimizer, scheduler, f'{config.checkpoint.save_location}epoch{epoch}_lr{config.optimization.lr}.pt')
save_master_checkpoint(epoch, disc_model_to_save, optimizer_disc, disc_scheduler, f'{config.checkpoint.save_location}epoch{epoch}_disc_lr{config.optimization.lr}.pt')
I am training encodec on my own dataset (300+ hours, 1.2 million samples), it takes 1.7s for one iteration (8 V100, the batch size for one GPU is 28). It totally takes 30days to train 300 epoches. 😱
I am not sure the speed is okay?
Have you ever evaluated it in case of higher bitrate? It seems the HiFi-Codec-16k only supports two kinds of code rate 1kbps using 1 layer quantization and 2kbps using 2 layer. By the way, the training code seems only training using 2 layer quantization.
encodec_16k_lanch 解决了以下问题:
from feiteng
https://github.com/yangdongchao/AcademiCodec/blob/master/academicodec/quantization/core_vq.py#L149 这个不应该被注释掉
注释掉的话,多卡训练会效果差些, 单卡不影响,可以认为,目前放出来的 encodec 的权重,用最新的代码训练的话,可以拿到更好的效果
dongchao
嗯,我后面更新一下,如果把这个注释调,现在的代码没法跑多卡。我现在有一版不注释也能跑多卡的代码
但是 encodec_16k_lanch 没有合并到仓库的 academicodec 目录
e.g. for results in Table 1.
is it 'audio' and are there audio results?
there are too many typos...
Hi,
Thanks for your great work!
can you share the config file for HiFi-Codec-24k-240d?
(soundstream) root@autodl-container-1cb1119f52-820c06c3:~/autodl-tmp/paper/HiFi-Codec# bash test.sh
checkpoint path: ./checkpoint/HiFi-Codec-24k-240d
Init model and load weights
Traceback (most recent call last):
File "./vqvae_copy_syn.py", line 35, in
model = VqvaeTester(args)
File "/root/autodl-tmp/paper/HiFi-Codec/vqvae_tester.py", line 20, in init
self.vqvae = VQVAE(hp.config_path, hp.model_path, with_encoder=True)
File "/root/autodl-tmp/paper/HiFi-Codec/vqvae.py", line 12, in init
ckpt = torch.load(ckpt_path)
File "/root/miniconda3/envs/soundstream/lib/python3.8/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/miniconda3/envs/soundstream/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
Hi yangdongchao!
When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?
<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000
46%|██████████████████████████████████████████████████▋ | 8259/18075 [1:34:07<1:51:14, 1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000
46%|██████████████████████████████████████████████████▊ | 8269/18075 [1:34:15<1:51:27, 1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000
46%|██████████████████████████████████████████████████▊ | 8279/18075 [1:34:21<1:51:37, 1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000
46%|██████████████████████████████████████████████████▉ | 8289/18075 [1:34:29<1:51:56, 1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000
46%|██████████████████████████████████████████████████▉ | 8299/18075 [1:34:36<1:52:12, 1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000
46%|███████████████████████████████████████████████████ | 8309/18075 [1:34:41<1:52:00, 1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000
46%|███████████████████████████████████████████████████ | 8319/18075 [1:34:47<1:51:03, 1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▏ | 8329/18075 [1:34:56<1:50:40, 1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▏ | 8339/18075 [1:35:01<1:50:31, 1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▎ | 8349/18075 [1:35:08<1:50:59, 1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▎ | 8359/18075 [1:35:15<1:49:04, 1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▍ | 8369/18075 [1:35:23<1:50:03, 1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▍ | 8379/18075 [1:35:30<1:50:01, 1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▌ | 8389/18075 [1:35:37<1:50:02, 1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000
46%|███████████████████████████████████████████████████▌ | 8399/18075 [1:35:43<1:50:03, 1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000
47%|███████████████████████████████████████████████████▋ | 8409/18075 [1:35:49<1:48:46, 1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000
47%|███████████████████████████████████████████████████▋ | 8419/18075 [1:35:57<1:49:31, 1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000
47%|███████████████████████████████████████████████████▊ | 8429/18075 [1:36:04<1:49:48, 1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000
47%|███████████████████████████████████████████████████▊ | 8438/18075 [1:36:10<1:49:44, 1.46it/s]
2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0>
2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0>
2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2>
2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3>
2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3>
2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3>
2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3>
2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>
感谢开源精彩的工作!
我想确认一下我对输出的 codes 的 ordering 的理解:
VQVAE encode 函数的输出形状是 [B, T, 4]。
假设 B=1, T=2,codes 是
[[a,b,c,d]
[e,f,g,h]]
判断:
a 是 T=1 的feature 的前一半 第一次quantize 得到的code,
b 是 T=1 的feature 的后一半 第一次 quantize 得到的code,
c 是 quantize a 的 residual 得到的 code
...
h 是 quantize f 的 residual 得到的 code
请问这样的判断对吗?
谢谢
Puyuan
Hi, thanks for your great work. I notice that the NSynthDataset
used in SoundStream contains data augmentation by adding two audio waveforms, which does not appear in Encodec. I wonder where is this technique proposed, and have you found that it helps the audio quality? Thanks.
Excuse me, I want to ask if hifi-codec is used for vall-e, will it be similar to encodec, the first layer is used for AR, and the 2-4 layers are used for NAR?
The model does not converge when I use hifi-codec to train NAR of valle. The data i used is a chinese dataset while its duration is 5000 hours. How can I do to train valle with hificodec?
the command :
python test.py "../datasets/Nsynth/nsynth-valid/audio/" "./audiofake"
--resume_path "./model_path/2023-05-10-08-55/best_1.pth"
The error as follows:
(soundstream) root@autodl-container-1cb1119f52-820c06c3:~/autodl-tmp/paper/SoundStream_24k_240d# bash test.sh
Traceback (most recent call last):
File "test.py", line 159, in
test_batch()
File "test.py", line 151, in test_batch
soundstream.load_state_dict(new_state_dict) # load model
File "/root/miniconda3/envs/soundstream/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SoundStream:
size mismatch for encoder.model.0.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for encoder.model.0.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for encoder.model.0.conv.conv.weight_v: copying a param with shape torch.Size([32, 1, 7]) from checkpoint, the shape in current model is torch.Size([48, 1, 7]).
size mismatch for encoder.model.1.block.1.conv.conv.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([24]).
size mismatch for encoder.model.1.block.1.conv.conv.weight_g: copying a param with shape torch.Size([16, 1, 1]) from checkpoint, the shape in current model is torch.Size([24, 1, 1]).
size mismatch for encoder.model.1.block.1.conv.conv.weight_v: copying a param with shape torch.Size([16, 32, 3]) from checkpoint, the shape in current model is torch.Size([24, 48, 3]).
size mismatch for encoder.model.1.block.3.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for encoder.model.1.block.3.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for encoder.model.1.block.3.conv.conv.weight_v: copying a param with shape torch.Size([32, 16, 1]) from checkpoint, the shape in current model is torch.Size([48, 24, 1]).
size mismatch for encoder.model.1.shortcut.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for encoder.model.1.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for encoder.model.1.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([32, 32, 1]) from checkpoint, the shape in current model is torch.Size([48, 48, 1]).
size mismatch for encoder.model.3.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.model.3.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for encoder.model.3.conv.conv.weight_v: copying a param with shape torch.Size([64, 32, 4]) from checkpoint, the shape in current model is torch.Size([96, 48, 4]).
size mismatch for encoder.model.4.block.1.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for encoder.model.4.block.1.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for encoder.model.4.block.1.conv.conv.weight_v: copying a param with shape torch.Size([32, 64, 3]) from checkpoint, the shape in current model is torch.Size([48, 96, 3]).
size mismatch for encoder.model.4.block.3.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.model.4.block.3.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for encoder.model.4.block.3.conv.conv.weight_v: copying a param with shape torch.Size([64, 32, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
size mismatch for encoder.model.4.shortcut.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.model.4.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for encoder.model.4.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([64, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 96, 1]).
size mismatch for encoder.model.6.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.model.6.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for encoder.model.6.conv.conv.weight_v: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
size mismatch for encoder.model.7.block.1.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for encoder.model.7.block.1.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for encoder.model.7.block.1.conv.conv.weight_v: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([96, 192, 3]).
size mismatch for encoder.model.7.block.3.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.model.7.block.3.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for encoder.model.7.block.3.conv.conv.weight_v: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
size mismatch for encoder.model.7.shortcut.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.model.7.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for encoder.model.7.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]).
size mismatch for encoder.model.9.conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for encoder.model.9.conv.conv.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for encoder.model.9.conv.conv.weight_v: copying a param with shape torch.Size([256, 128, 10]) from checkpoint, the shape in current model is torch.Size([384, 192, 10]).
size mismatch for encoder.model.10.block.1.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for encoder.model.10.block.1.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for encoder.model.10.block.1.conv.conv.weight_v: copying a param with shape torch.Size([128, 256, 3]) from checkpoint, the shape in current model is torch.Size([192, 384, 3]).
size mismatch for encoder.model.10.block.3.conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for encoder.model.10.block.3.conv.conv.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for encoder.model.10.block.3.conv.conv.weight_v: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
size mismatch for encoder.model.10.shortcut.conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for encoder.model.10.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for encoder.model.10.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([256, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]).
size mismatch for encoder.model.12.conv.conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.model.12.conv.conv.weight_g: copying a param with shape torch.Size([512, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 1, 1]).
size mismatch for encoder.model.12.conv.conv.weight_v: copying a param with shape torch.Size([512, 256, 12]) from checkpoint, the shape in current model is torch.Size([768, 384, 12]).
size mismatch for encoder.model.13.lstm.weight_ih_l0: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.model.13.lstm.weight_hh_l0: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.model.13.lstm.bias_ih_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for encoder.model.13.lstm.bias_hh_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for encoder.model.13.lstm.weight_ih_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.model.13.lstm.weight_hh_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.model.13.lstm.bias_ih_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for encoder.model.13.lstm.bias_hh_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for encoder.model.15.conv.conv.weight_v: copying a param with shape torch.Size([512, 512, 7]) from checkpoint, the shape in current model is torch.Size([512, 768, 7]).
size mismatch for decoder.model.0.conv.conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.model.0.conv.conv.weight_g: copying a param with shape torch.Size([512, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 1, 1]).
size mismatch for decoder.model.0.conv.conv.weight_v: copying a param with shape torch.Size([512, 512, 7]) from checkpoint, the shape in current model is torch.Size([768, 512, 7]).
size mismatch for decoder.model.1.lstm.weight_ih_l0: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.model.1.lstm.weight_hh_l0: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.model.1.lstm.bias_ih_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for decoder.model.1.lstm.bias_hh_l0: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for decoder.model.1.lstm.weight_ih_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.model.1.lstm.weight_hh_l1: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.model.1.lstm.bias_ih_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for decoder.model.1.lstm.bias_hh_l1: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for decoder.model.3.convtr.convtr.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for decoder.model.3.convtr.convtr.weight_g: copying a param with shape torch.Size([512, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 1, 1]).
size mismatch for decoder.model.3.convtr.convtr.weight_v: copying a param with shape torch.Size([512, 256, 12]) from checkpoint, the shape in current model is torch.Size([768, 384, 12]).
size mismatch for decoder.model.4.block.1.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.model.4.block.1.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for decoder.model.4.block.1.conv.conv.weight_v: copying a param with shape torch.Size([128, 256, 3]) from checkpoint, the shape in current model is torch.Size([192, 384, 3]).
size mismatch for decoder.model.4.block.3.conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for decoder.model.4.block.3.conv.conv.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for decoder.model.4.block.3.conv.conv.weight_v: copying a param with shape torch.Size([256, 128, 1]) from checkpoint, the shape in current model is torch.Size([384, 192, 1]).
size mismatch for decoder.model.4.shortcut.conv.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for decoder.model.4.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for decoder.model.4.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([256, 256, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]).
size mismatch for decoder.model.6.convtr.convtr.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.model.6.convtr.convtr.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 1, 1]).
size mismatch for decoder.model.6.convtr.convtr.weight_v: copying a param with shape torch.Size([256, 128, 10]) from checkpoint, the shape in current model is torch.Size([384, 192, 10]).
size mismatch for decoder.model.7.block.1.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.model.7.block.1.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for decoder.model.7.block.1.conv.conv.weight_v: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([96, 192, 3]).
size mismatch for decoder.model.7.block.3.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.model.7.block.3.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for decoder.model.7.block.3.conv.conv.weight_v: copying a param with shape torch.Size([128, 64, 1]) from checkpoint, the shape in current model is torch.Size([192, 96, 1]).
size mismatch for decoder.model.7.shortcut.conv.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
size mismatch for decoder.model.7.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for decoder.model.7.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]).
size mismatch for decoder.model.9.convtr.convtr.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.model.9.convtr.convtr.weight_g: copying a param with shape torch.Size([128, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 1, 1]).
size mismatch for decoder.model.9.convtr.convtr.weight_v: copying a param with shape torch.Size([128, 64, 8]) from checkpoint, the shape in current model is torch.Size([192, 96, 8]).
size mismatch for decoder.model.10.block.1.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for decoder.model.10.block.1.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for decoder.model.10.block.1.conv.conv.weight_v: copying a param with shape torch.Size([32, 64, 3]) from checkpoint, the shape in current model is torch.Size([48, 96, 3]).
size mismatch for decoder.model.10.block.3.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.model.10.block.3.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for decoder.model.10.block.3.conv.conv.weight_v: copying a param with shape torch.Size([64, 32, 1]) from checkpoint, the shape in current model is torch.Size([96, 48, 1]).
size mismatch for decoder.model.10.shortcut.conv.conv.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for decoder.model.10.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for decoder.model.10.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([64, 64, 1]) from checkpoint, the shape in current model is torch.Size([96, 96, 1]).
size mismatch for decoder.model.12.convtr.convtr.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for decoder.model.12.convtr.convtr.weight_g: copying a param with shape torch.Size([64, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 1, 1]).
size mismatch for decoder.model.12.convtr.convtr.weight_v: copying a param with shape torch.Size([64, 32, 4]) from checkpoint, the shape in current model is torch.Size([96, 48, 4]).
size mismatch for decoder.model.13.block.1.conv.conv.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([24]).
size mismatch for decoder.model.13.block.1.conv.conv.weight_g: copying a param with shape torch.Size([16, 1, 1]) from checkpoint, the shape in current model is torch.Size([24, 1, 1]).
size mismatch for decoder.model.13.block.1.conv.conv.weight_v: copying a param with shape torch.Size([16, 32, 3]) from checkpoint, the shape in current model is torch.Size([24, 48, 3]).
size mismatch for decoder.model.13.block.3.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for decoder.model.13.block.3.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for decoder.model.13.block.3.conv.conv.weight_v: copying a param with shape torch.Size([32, 16, 1]) from checkpoint, the shape in current model is torch.Size([48, 24, 1]).
size mismatch for decoder.model.13.shortcut.conv.conv.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
size mismatch for decoder.model.13.shortcut.conv.conv.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 1, 1]).
size mismatch for decoder.model.13.shortcut.conv.conv.weight_v: copying a param with shape torch.Size([32, 32, 1]) from checkpoint, the shape in current model is torch.Size([48, 48, 1]).
size mismatch for decoder.model.15.conv.conv.weight_v: copying a param with shape torch.Size([1, 32, 7]) from checkpoint, the shape in current model is torch.Size([1, 48, 7]).
Hello,
I am wondering how you computed STOI and PESQ, which repositories were used?
Sorry Yang Sir , I can't find about the command instructions about Fine-tuning the pre-trained model , cound you provide some information?
运行egs/SoundStream_24k_240d/main3_ddp.py时,当运行到第9行,也就是导入自定义库academicodec/models/encodec/distributed/launch.py时,launch.py会在第5行报错,说找不到库。
这里只需要把launch.py第5行改写成from . import distributed as dist_fn
就可以了。
Hi, dongchao
我最近在调研 AudioLM 系列的文章,发现了你复现的 SoundStorm 版本比较完整打算进一步复现(因为现在 https://github.com/yangdongchao/SoundStorm 只有 S2 没有 S1),然后又看到了 AcademiCodec 这个仓库,我查看 Encodec_24k_32d 和 Encodec_16k_320 的 test.py 和训练文件 main3_ddp.py,发现加载的模型是 SoundStream
AcademiCodec/Encodec_16k_320/main3_ddp.py
Line 10 in d03142b
There is no LICENSE file.
What is the license for this project and the pretrained models?
I want to train soundstorm, but it needs soundstream. Do you have any plans to open source the pre-trained model of soundstream? I can't find it in the Hugging Face branch.
Hi! Thanks for work.
Do you have an example command line to train the EnCodec?
the second param is not resample rate, but is the resample num
Thanks for open source your wonderful work!
I was trying to finetune on my own dataset, however, I found there is only pretrained generator, no pretrained discriminator.
So, would you please release your pretrained discriminator?
Thanks!
122 z = self.spec_transform(x) # [B, 2, Freq, Frames, 2]
But when I try to train the mode, z : torch.Size([8, 1, 513, 43, 2]),
the second dim is 1 not 2.
And errors when run z = torch.cat([z.real, z.imag], dim=1)
RuntimeError: real is not implemented for tensors with non-complex dtypes.
Hello! Do you have any plans of uploading the code for SoundStream and Encodec in the near future? Thanks in advance!
I found that the Encodec project does not use LM for the codebook compared to Facebook's Encodec. Have you made any attempts to do this?
Hello,
I could not figure out which validation set was used for results in the paper?
Could you help?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.