The talkshow from yhw-yhw

Why not use rot6D as motion representation, but use SMPL-X parameters to represent it?

rot6D is a motion representation commonly used in co-speech gesture generation tasks. In the code, the author sets convert2rot6D to False. What is the reason?

Cannot Download LS3DCG Pretrained Checkpoints

Hi, it seems that the link for downloading the LS3DCG pre-trained models seems invalid. Could you update the link?

ValueError: need at least one array to concatenate

Hello,

I am getting this Value Error while trying to run your scripts by following the README.md

ValueError: need at least one array to concatenate

this is the traceback for the error :

Traceback (most recent call last):
  File "scripts/train.py", line 10, in <module>
    trainer = Trainer()
  File "/data/users1/user/TalkSHOW/trainer/Trainer.py", line 72, in __init__
    self.init_dataloader()
  File "/data/users1/user/TalkSHOW/trainer/Trainer.py", line 168, in init_dataloader
    config=self.config
  File "/data/users1/user/TalkSHOW/data_utils/dataloader_torch.py", line 255, in __init__
    self.complete_data=np.concatenate(self.complete_data, axis=0)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

for now, i am trying to load only two speakers, oliver and conan to make sure everything works smoothly.

I have changed this in apply_split.py :

speakers = ['oliver', 'conan']

and did the same in body_vq.json
and given my data_root as the dataset path

Kindly let me know what is the fix for this issue?

Thanks a bunch in advance.

Generalization Ability To Chinese

Really excellent project! I am wondering whether TalkSHOW could generalise to chinese speech? If not, how can I make this possible? Perhaps I should create paired chineseSpeech-smplx labels(fitted from videos using pipeline in your paper).

Thank you very much.

Configuration files are inconsistent

The configuration file in the folder (./experiments) in the pre-training weights folder is inconsistent with the one in ./talkshow/config。
e.g. ./TalkSHOW/experiments/2022-11-02-smplx_S2G-body-pixel-3d/smplx_S2G.json VS ./TalkSHOW/config/body_pixel.json

ModuleNotFoundError: No module named 'python_speech_features'

Why Occur this problem Like ModuleNotFoundError: No module named 'python_speech_features' ?

Are these rendered videos without sound?

The videos I have rendered have no sound, how to combine the sound and video?

Missing train_3d_mfcc.pkl?

I meet the follow errors when run the 'train_body_vq.sh'. It mentions that I need the train_3d_mfcc.pkl. Where can I get it?

No problems

No problems now.

Possible Miscalculation of MSELoss in Face Generator

I would like to report a possible miscalculation of the loss in the face generator.

Issue description

Please have a look at the following code snippet:

TalkSHOW/nets/smplx_face.py

Lines 155 to 159 in 38aab30

    
           MSELoss = torch.mean(torch.abs(pred_poses[:, :, :6] - gt_poses[:, :, :6])) 
        
           if self.expression: 
        
               expl = torch.mean((pred_poses[:, :, -100:] - gt_poses[:, :, -100:])**2) 
        
           else: 
        
               expl = 0

I believe the loss calculation at line 155 is wrong. The slice should go up to index 3, not 6.
That's because the dimensions for the jaw_pose are 3.

I would like to remind you that pred_poses shape is (N, seq_length, 103), where the first 3 dimensions are for the jaw_pose while the rest 100 are for the expression.

For the gt_poses the shape is (N, seq_length, 265) where the first 3 dimensions are for the jaw pose and the last 100 are for expression.
The 3 next dimensions after the first 3 of the jaw pose are for the left eye.

When we do MSELoss = torch.mean(torch.abs(pred_poses[:, :, :6] - gt_poses[:, :, :6])) we compare correctly the first 3 jaw_pose features but also we compare 3 left eye features from gt_poses with 3 features expression features from pred_poses.

Proposed Fix:

I believe the correct way to calculate the loss is by changing 6 to 3, as follows:
MSELoss = torch.mean(torch.abs(pred_poses[:, :, :3] - gt_poses[:, :, :3])).

Please let me know if my assertion is correct or whether I misunderstood something.

Can we still expect a Colab for TalkSHOW?

Thanks for the Colab for SHOW. That's very helpful. I'm still hopeful that there will be a Colab for TalkSHOW. Please let us know if we can still expect one. Thanks again!

这是一个关于通过语音到3D面部生成，适用于数字人的建模以及应用到视频中

Dataset Subjects Number

Hi Hongwei,

I noticed that your dataset SHOW_dataset_v1.0.zip has four subjects. However, in your data preprocessing code, there are only three persons:

Is this a mistake?

Thanks,
Jeremy

Consider avoid uinsg psbody as visualize lib

this mesh is no longer maintained, and 99% users have build failed issue on it's code (windows, macos etc.)

Consider using simpler visualization tools instead.

New speaker dataset generation

Is it possible to generate the GT dataset for a new speaker and train the model on that?

Different Results with Paper

I got very different metric values for the face using your provided checkpoint, compared with the values on paper.

On paper:

I tested:

the questions about autoregressive models 关于自回归模型的问题

大家好，请教大家问题：

我在训练pixel自回归模型时，遇到以下两个问题：
When I was training a pixel autoregressive model, I encountered the following two problems:

gated pixelcnn 非常容易过拟合，在1、2个epoch之后，val loss就一直上升不降。
Gated pixelcnn is very easy to overfit. After 1 or 2 epochs, the val loss keeps rising.
2.自回归模型会出现身份泄露的问题，比如speakerA生成时，会出现speakerB的动作和手势。
The autoregressive model will have the problem of identity leakage. For example, when speakerA is generated, the actions and gestures of speakerB will appear.

请教大家有什么解决问题的策略？

@yhw-yhw @feifeifeiliu

Any codes to make our own dataset?

From short video into .pkl and .wav

Is it real-time?

Does it support stream voice input?

Windows 11

Is there any clear instructions to follow for Windows 11 with Conda Env?

Runtime error on Huggingface demo

there is a runtime error on huggingface.

File body_pixel.json is Missing

The ./config/body_pixel.json is missing

What does convert_to_6d mean?

File config/style_gestures.json is Missing

I did not find the config/style_gestures.json in your repo. when i run test.py it report "FileNotFoundError: [Errno 2] No such file or directory: './config/style_gestures.json'". and while i run test_face.sh it report "ValueError: need at least one array to concatenate". i m not sure how can i fix this problem.

Avoid using MPI-Mesh lib

this lib is not maintained, please switch to pytorch3d

rendering problem

AttributeError: 'GLXPlatform' object has no attribute 'OSMesa'

/NeRF/TalkSHOW/scripts/diversity.py(340)main()
sudo apt-get install libosmesa6
sudo apt-get install libosmesa6-dev
set os.environ['PYOPENGL_PLATFORM'] = 'osmesa'
still doesn't work

new data training

Awesome work! I use a new video data to train, but find the test results are shaky and jitter.
The test data is the same as train, and the score is:
LVD=0.06410979024071979
error=10.288773031787722
diverse=0.006435648955709916
The training data is less or other training params are not setting right?
Or the show outputs are shaky?

Missing 'var' in scripts/diversity.py

The 'var' is missing in scripts/diversity.py

Missing Realism Score (RS) Computation

I did not find the RS computation code in your repo. Could you release the code for RS evaluation and the corresponding checkpoint?

	MSELoss = torch.mean(torch.abs(pred_poses[:, :, :6] - gt_poses[:, :, :6]))
	if self.expression:
	expl = torch.mean((pred_poses[:, :, -100:] - gt_poses[:, :, -100:])**2)
	else:
	expl = 0

yhw-yhw / talkshow Goto Github PK

talkshow's People

Contributors

Stargazers

Watchers

Forkers

talkshow's Issues

Issue description

Proposed Fix:

Recommend Projects

Recommend Topics

Recommend Org