yhw-yhw / talkshow Goto Github PK
View Code? Open in Web Editor NEWThis is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
rot6D is a motion representation commonly used in co-speech gesture generation tasks. In the code, the author sets convert2rot6D to False. What is the reason?
Hi, it seems that the link for downloading the LS3DCG pre-trained models seems invalid. Could you update the link?
Hello,
I am getting this Value Error while trying to run your scripts by following the README.md
ValueError: need at least one array to concatenate
this is the traceback for the error :
Traceback (most recent call last):
File "scripts/train.py", line 10, in <module>
trainer = Trainer()
File "/data/users1/user/TalkSHOW/trainer/Trainer.py", line 72, in __init__
self.init_dataloader()
File "/data/users1/user/TalkSHOW/trainer/Trainer.py", line 168, in init_dataloader
config=self.config
File "/data/users1/user/TalkSHOW/data_utils/dataloader_torch.py", line 255, in __init__
self.complete_data=np.concatenate(self.complete_data, axis=0)
File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate
for now, i am trying to load only two speakers, oliver and conan to make sure everything works smoothly.
I have changed this in apply_split.py
:
speakers = ['oliver', 'conan']
and did the same in body_vq.json
and given my data_root as the dataset path
Kindly let me know what is the fix for this issue?
Thanks a bunch in advance.
Really excellent project! I am wondering whether TalkSHOW could generalise to chinese speech? If not, how can I make this possible? Perhaps I should create paired chineseSpeech-smplx labels(fitted from videos using pipeline in your paper).
Thank you very much.
The configuration file in the folder (./experiments) in the pre-training weights folder is inconsistent with the one in ./talkshow/config。
e.g. ./TalkSHOW/experiments/2022-11-02-smplx_S2G-body-pixel-3d/smplx_S2G.json VS ./TalkSHOW/config/body_pixel.json
Why Occur this problem Like ModuleNotFoundError: No module named 'python_speech_features' ?
The videos I have rendered have no sound, how to combine the sound and video?
No problems now.
I would like to report a possible miscalculation of the loss in the face generator.
Please have a look at the following code snippet:
Lines 155 to 159 in 38aab30
I believe the loss calculation at line 155 is wrong. The slice should go up to index 3, not 6.
That's because the dimensions for the jaw_pose
are 3.
I would like to remind you that pred_poses
shape is (N, seq_length, 103)
, where the first 3 dimensions are for the jaw_pose while the rest 100 are for the expression.
For the gt_poses
the shape is (N, seq_length, 265)
where the first 3 dimensions are for the jaw pose and the last 100 are for expression.
The 3 next dimensions after the first 3 of the jaw pose are for the left eye.
When we do MSELoss = torch.mean(torch.abs(pred_poses[:, :, :6] - gt_poses[:, :, :6]))
we compare correctly the first 3 jaw_pose features but also we compare 3 left eye features from gt_poses
with 3 features expression features from pred_poses
.
I believe the correct way to calculate the loss is by changing 6 to 3, as follows:
MSELoss = torch.mean(torch.abs(pred_poses[:, :, :3] - gt_poses[:, :, :3]))
.
Please let me know if my assertion is correct or whether I misunderstood something.
Thanks for the Colab for SHOW. That's very helpful. I'm still hopeful that there will be a Colab for TalkSHOW. Please let us know if we can still expect one. Thanks again!
Hi Hongwei,
I noticed that your dataset SHOW_dataset_v1.0.zip has four subjects. However, in your data preprocessing code, there are only three persons:
Is this a mistake?
Thanks,
Jeremy
this mesh is no longer maintained, and 99% users have build failed issue on it's code (windows, macos etc.)
Consider using simpler visualization tools instead.
Is it possible to generate the GT dataset for a new speaker and train the model on that?
大家好,请教大家问题:
我在训练pixel自回归模型时,遇到以下两个问题:
When I was training a pixel autoregressive model, I encountered the following two problems:
请教大家有什么解决问题的策略?
From short video into .pkl and .wav
Does it support stream voice input?
Is there any clear instructions to follow for Windows 11 with Conda Env?
there is a runtime error on huggingface.
The ./config/body_pixel.json is missing
I did not find the config/style_gestures.json in your repo. when i run test.py it report "FileNotFoundError: [Errno 2] No such file or directory: './config/style_gestures.json'". and while i run test_face.sh it report "ValueError: need at least one array to concatenate". i m not sure how can i fix this problem.
this lib is not maintained, please switch to pytorch3d
AttributeError: 'GLXPlatform' object has no attribute 'OSMesa'
/NeRF/TalkSHOW/scripts/diversity.py(340)main()
sudo apt-get install libosmesa6
sudo apt-get install libosmesa6-dev
set os.environ['PYOPENGL_PLATFORM'] = 'osmesa'
still doesn't work
Awesome work! I use a new video data to train, but find the test results are shaky and jitter.
The test data is the same as train, and the score is:
LVD=0.06410979024071979
error=10.288773031787722
diverse=0.006435648955709916
The training data is less or other training params are not setting right?
Or the show outputs are shaky?
I did not find the RS computation code in your repo. Could you release the code for RS evaluation and the corresponding checkpoint?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.