doubiiu / codetalker Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
License: MIT License
[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
License: MIT License
I sign in 'https://voca.is.tue.mpg.de/' but no find these files, can you tell me how to downlaod? thanks~
Hi, thanks for sharing your great work.
When training stage 2 by executing sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2
, I get cuDNN error: CUDNN_STATUS_NOT_INITIALIZED. The error occurs when passing the audio into the audio encoder (wav2vec), at F.conv1d.
I have followed the provided environment setting and the error still occurs.
Can you help me solve the problem?
Hello, when I was looking at the evaluation code, I only found some files for BIWI, such as the vertex index files lve.txt and fdd.txt. Can you provide these two files for the VOCA dataset? I would be very grateful if it was possible
First of all, thank the author for opening up the code. I am currently confused about this output video. I would like to apply it to a 3D model of the human body as a whole. What method should I use to apply it? Can you provide an answer?
Evaluation score between my retraining and CodeTalker paper is not the same.
The first stage model was reused and the second stage model was retrained, but the final score were inconsistent with those in the paper.
Could you help explain? Thanks.
Retraing result:
Frame Number: 3879
Lip Vertex Error: 5.2776e-04
FDD: 4.4944e-05
Can you provide some generated video demos? Do not see any link to your generated videos in your paper
Traceback (most recent call last):
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 242, in
main()
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 48, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 109, in main_worker
loss_train, motion_loss_train, reg_loss_train = train(train_loader, model, loss_fn, optimizer, epoch, cfg)
File "/media/E/3DTalk/CodeTalker-main/main/train_pred.py", line 148, in train
model.autoencoder.eval()
File "/home/anaconda3/envs/3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VQAutoEncoder' object has no attribute 'autoencoder'
Could you tell me where is the "autoencoder"?
Hi, what a nice work!
I am currently attempting to reproduce this work on the MEAD dataset. Stage 1 of the process has gone smoothly, however, I am encountering an issue in Stage 2. After 20 epochs of training, I am not observing any movement in the output, and it remains static.
Do you have any idea?
Many thanks!
BIWI cal_metric training_subjects are the same with training data.
Maybe cal_metric training_subjects should be the same with test data.
Line 9 in e687bbe
Hi,”the autoregressive model is trained in a teachingforcing scheme“ is mentioned in your article,but why?In previous related work, Faceformer pointed out that using this strategy will lead to poor results,can you please tell me your opinion?
I try to train on the multiface dataset which has a lot of details in the upper face, but the driving results cannot close eyes when talking. While the meshtalk can close the eyes when talking.
The reconstruction stage works well when closing eyes?
Any suggestion?
CodeTalker/config/vocaset/stage1.yaml
Line 34 in 3335dcd
Ture->True?
We use the pre-trained model to test on the vocaset dataset, but all the rendered videos cannot blink. How should we solve it?
The name is right, but the link is another paper.
Can I use the flame face model outside the vocaset(Training set, Validation set and test set) as a template to generate face animation, how do I set it up, and can I do it using a pre-trained model? Or need to retrain the model
Hello,
Thanks for releasing this amazing work. I was curious if vocaset pre-trained models have diverse upper face expressions as well because the original dataset does not have them. If yes, could you comment on how you trained it?
Once again, great work! Thank you!
Hello !
The work is impressive! I wonder if it would be proficient to use with real time generated TTS and produce realistic facial animation on a 3D face model in Unity.
I used the pretrained model(biwi_stage2.pth.tar) for fine-tuning training and found that there were significant loss in both the training and validation dataset, which is a bit unreasonable. Was biwi_stage2.pth.tar trained in the testing set Or there are some additional techniques outside of the code. Could you help explain? Very Thanks.
After BIWI processed, found BIWI dataset has 536 vertices_npy and 532 wav. Is the number of BIWI dataset right? Thx.
Thanks for sharing the great work! I want to follow your work and I'm trying to reproduce all the experiment results. Could you provide more details about Fig.4 in the paper? I have successfully generated videos using the scripts provided, but I don't know how to export a single frame w/ or w/o background color. Moreover, how did you generate the heat map (mean & std) in the figure?
Generally, when training to the second epoch, the output results are all nan. At this time, I check the bias and weight of the linear layer, and the results are all nan.
self.encoder.vertice_mapping[0]
Linear(in_features=15069, out_features=1024, bias=True)
self.encoder.vertice_mapping[0].bias
Parameter containing:
tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0',
requires_grad=True)
self.encoder.vertice_mapping[0].weight
Parameter containing:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
requires_grad=True)
Hello, I have been learning about 3D speech-driven animation recently, and your paper inspired me a lot. I wonder when you will release the code so that I can learn more details. Thank you.
May I ask who has the biwi dataset to download, can you share it?
Using autoregressive training scheme, out of gpu memory in Epoch[1/100][110/190].
My GPU Memory is 40G(A100). Where is the problem? Most of the time, the gpu memory is less 15000M.
Which epoch model were used for evaluation?
After testing, it is found that overfitting will occur with the model training, and the last epoch(epoch100) may not be the best.
Could you help explain? Thanks.
Is it necessary for commenting out the self.padding_mode != 'zeros'?
It won't report an error without making any modifications. Will it affect model accuracy. Thanks.
IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown https://github.com/NVIDIA/tacotron2/issues/182.
Hello & thanks for your work. I was wondering about lip sync error comparison for VOCASET-test data. I saw it reported for BIWI but couldn't find the one for VOCASET in the paper. Please let me know if I'm missing something
No .pkl file(data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl) found from this URL:https://voca.is.tue.mpg.de/download.php
Could you tell me exactly where to find these files?
when running, !sh scripts/demo.sh vocaset
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
=> loading checkpoint 'vocaset/vocaset_stage2.pth.tar'
=> loaded checkpoint 'vocaset/vocaset_stage2.pth.tar'
Generating facial animation for demo/wav/man.wav...
2023-08-01 13:21:18.492516: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/content/CodeTalker/main/demo.py", line 219, in <module>
main()
File "/content/CodeTalker/main/demo.py", line 129, in main
test(model, cfg.demo_wav_path, save_folder, condition, subject)
File "/content/CodeTalker/main/demo.py", line 167, in test
prediction = model.predict(audio_feature, template, one_hot)
File "/content/CodeTalker/models/stage2.py", line 115, in predict
hidden_states = self.audio_encoder(audio, self.dataset).last_hidden_state
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/content/CodeTalker/models/lib/wav2vec.py", line 132, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 788, in forward
position_embeddings = self.pos_conv_embed(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 397, in forward
hidden_states = hidden_states.transpose(1, 2)
AttributeError: 'tuple' object has no attribute 'transpose'
audio duration: 23s
error:
File "CodeTalker/main/demo.py", line 187, in test
prediction = model.predict(audio_feature, template, one_hot)
File "CodeTalker/models/stage2.py", line 133, in predict
feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).
Hi, I tried to train stage 2 on my dataset, but the loss only oscillates and doesn't go down.
Should I reduce the learning rate (it is currently 1e-4) ?
Or should I reset the weight between loss_motion and loss_reg?
How much should the loss go down?
Do you have any tips for stage 2 training?
Thank you!
Frame Number: 3879
Lip Vertex Error: 1.8149e-03
FDD: 4.6162e-05
During the training, the perplexity has been rising, what does the perplexity mean?
For instance, given a mesh of metahuman face mesh, does it able to driven?
In order to train with 4 graphics cards at the same time, I set a to True and set train_gpu to [0,1,2,3], but I got the following error
[2023-04-18 15:43:41,052 INFO train_pred.py line 71 682101]=>=> creating model ... [195/1825]
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/__main__.py", line 39, in <module>
cli.main()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/server/cli.py", line 430, in main
run()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module
_code
_run_code(code, mod_globals, init_globals,
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "main/train_pred.py", line 259, in <module>
main()
File "main/train_pred.py", line 45, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/root/autodl-tmp/hzt/code/CodeTalker/main/train_pred.py", line 120, in main_worker
loss_train, motion_loss_train, reg_loss_train = train(train_loader, model, loss_fn, optimizer, epoch, cfg)
File "/root/autodl-tmp/hzt/code/CodeTalker/main/train_pred.py", line 162, in train
model.autoencoder.eval()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'autoencoder'
Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.
cat config/vocaset/stage2.yaml
vqvae_pretrained_path: RUN/vocaset/CodeTalker_s1/model/model.pth.tar
wav2vec2model_path: facebook/wav2vec2-base-960h
where to get RUN/vocaset/CodeTalker_s1/model/model.pth.tar and facebook/wav2vec2-base-960h ?
Maybe test_pred.py condition subject should be test_subjects, not the train_subjects.
Line 48 in e687bbe
I completed two stages of training using the vocaset dataset, but when I input the audio test, the output results are not good, except for the first few frames, the later face is not much movement.
Data Preparation
Place your vertices data (.npy files) and audio data (.wav files) in <dataset_dir>/vertices_npy and <dataset_dir>/wav folders, respectively.
Save the templates of all subjects to a templates.pkl file and put it in <dataset_dir>, as done for BIWI and vocaset dataset. Export an arbitary template to .ply format and put it in <dataset_dir>/.
Ask about Data Preparation of Play with Your Own Data
.npy
templates.pkl
.ply
How are these documents prepared?
Why using teacher-forcing scheme?
Teacher-forcing scheme proved to be worse than autoregressive scheme in many paper such as Faceformer and FaceXHuBERT?
sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1
sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2
On the vocaset training, Nan appears in both the first and second stages.
[2023-04-21 18:27:21,120 INFO train_vq.py line 189 19610]=>Epoch: [1/200][60/314] Data: 0.027 (0.038) Batch: 0.076 (0.141) Remain: 02:27:53 Loss: 0.1405
[2023-04-21 18:27:22,283 INFO train_vq.py line 189 19610]=>Epoch: [1/200][70/314] Data: 0.028 (0.037) Batch: 0.077 (0.138) Remain: 02:24:02 Loss: 0.1339
[2023-04-21 18:27:23,436 INFO train_vq.py line 189 19610]=>Epoch: [1/200][80/314] Data: 0.025 (0.036) Batch: 0.070 (0.135) Remain: 02:21:09 Loss: 0.1392
[2023-04-21 18:27:24,593 INFO train_vq.py line 189 19610]=>Epoch: [1/200][90/314] Data: 0.027 (0.035) Batch: 0.144 (0.133) Remain: 02:18:51 Loss: 0.1353
[2023-04-21 18:27:25,681 INFO train_vq.py line 189 19610]=>Epoch: [1/200][100/314] Data: 0.025 (0.034) Batch: 0.068 (0.130) Remain: 02:16:20 Loss: 0.1325
[2023-04-21 18:27:26,705 INFO train_vq.py line 189 19610]=>Epoch: [1/200][110/314] Data: 0.027 (0.033) Batch: 0.072 (0.128) Remain: 02:13:38 Loss: 0.1300
[2023-04-21 18:27:27,809 INFO train_vq.py line 189 19610]=>Epoch: [1/200][120/314] Data: 0.027 (0.033) Batch: 0.075 (0.126) Remain: 02:12:05 Loss: 0.1290
[2023-04-21 18:27:28,815 INFO train_vq.py line 189 19610]=>Epoch: [1/200][130/314] Data: 0.027 (0.032) Batch: 0.139 (0.124) Remain: 02:10:00 Loss: nan
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
[2023-04-21 18:27:29,784 INFO train_vq.py line 189 19610]=>Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan
INFO:main-logger:Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
Hello, have you tried to use blendshape on this network? I tried to train under the teachingforcing strategy, but I found that the inference was very problematic, but if I also use the teachingforcing similar guidance for inference, the result is very good . But it's pointless to reason like this,do you have any suggestions for this?
Hello,
I'm trying to run the Colab online demo but I obtain different errors at the runtime
1)```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.11.0 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.11.0 which is incompatible.
2) `ERROR: Cannot install pyglet==1.5.27, pyopengl==3.1.5, pyrender==0.1, pyrender==0.1.1, pyrender==0.1.10, pyrender==0.1.11, pyrender==0.1.12, pyrender==0.1.13, pyrender==0.1.14, pyrender==0.1.15, pyrender==0.1.16, pyrender==0.1.17, pyrender==0.1.18, pyrender==0.1.2, pyrender==0.1.20, pyrender==0.1.21, pyrender==0.1.22, pyrender==0.1.23, pyrender==0.1.24, pyrender==0.1.25, pyrender==0.1.26, pyrender==0.1.27, pyrender==0.1.28, pyrender==0.1.29, pyrender==0.1.3, pyrender==0.1.30, pyrender==0.1.31, pyrender==0.1.32, pyrender==0.1.33, pyrender==0.1.34, pyrender==0.1.35, pyrender==0.1.36, pyrender==0.1.39, pyrender==0.1.4, pyrender==0.1.40, pyrender==0.1.41, pyrender==0.1.42, pyrender==0.1.43, pyrender==0.1.44, pyrender==0.1.45, pyrender==0.1.5, pyrender==0.1.6, pyrender==0.1.7, pyrender==0.1.8 and pyrender==0.1.9 because these package versions have conflicting dependencies.`
3) ```
Building wheels for collected packages: tokenizers, sacremoses
error: subprocess-exited-with-error
× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
16 packages can be upgraded. Run 'apt list --upgradable' to see them.
4)```
Traceback (most recent call last):
File "/content/CodeTalker/main/demo.py", line 9, in
from transformers import Wav2Vec2Processor
ModuleNotFoundError: No module named 'transformers'
Do you know how to get the talking result with texture when I reconstruct a new face in FLAME template and obtain its material file (.mtl file)?
The material file has the depth image, normal image and texture image.
Hello, Your work is excellent!May I ask if the online demo you mentioned is on Google Clobe?And another question is whether this can be tested using the obj file by FFHQ?I am look forward to your reply.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.