postech-ami / fastmetro Goto Github PK

[ECCV'22] Official PyTorch Implementation of "Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers"

Home Page: https://fastmetro.github.io/

License: MIT License

Python 99.73% Shell 0.27%

eccv2022 human-mesh-recovery human-pose-estimation transformer 3d-human-shape-and-pose-estimation

fastmetro's People

Contributors

Stargazers

Watchers

Forkers

fastmetro vladpaunescu youwang-kim mur6 fangwudi kuijiang94 freeman449s jhkim0759 xushuolin andreistirb kimx3966 robinsonkwame 5l1v3r1

fastmetro's Issues

FLOPs

Hi, Thanks for this work. I wanted to ask how you calculated the FLOPs for each of your models and if there was a particular library you used.

3DPW experimental result without training with 3dpw training data

Hi, in some HMR papers, they released their 3dpw eval results without 3dpw training data. For a fair comparison, I tried your released checkpoint [FastMETRO-S-R50_h36m_state_dict.bin], evaluated it on 3dpw test, and got a MPVPE ~129. This result is actually worse than some papers that didn't train with 3dpw (3DCrowdnet, PyMAF...).
And then I finetune this on 3dpw training data, and got an MPVPE ~93. I found the result surprisingly got a larger gain compared to other papers after fine-tuning on 3dpw (ex. PyMAF).
Do you know the reason why your paper didn't get a quite well performance before fine-tuning on 3dpw but got a SOTA performance after fine-tuning?
Thank you!

Script for reorganizing data set?

I wonder how TSV files are generated.

Could you share the script for generating the dataset?

Remember to use mixed dataset when reproducing results

I can't reopen #27 likely due to access limitation, so I open this new issue. Please help me reopen #27 , and I will delete this issue :) Thx!

Linked model licensing

Just wanted to clarify that the trained non-parametric models and src/data/*.pt files are MIT licensed since the repository contains an MIT license.

Do you have plans to release mano rotmat regression as like smpl regression?

or you have mano regression model?

How to generate the heatmap and attention map

@FastMETRO
Can you share the script to generate the heatmap and attention maps like below for analysis?

About hand pose estimation

When I ran the hand pose estimation code on other datasets, vertices_3d_loss did not converge.

About the best training args in Freihand dataset

Hello，I am very interested in your work.
Are the experimental results on the Freihand dataset in your paper achieved by the following parameters and the parameters defined in parse_args()?

python -m torch.distributed.launch --nproc_per_node=4 \ src/tools/run_fastmetro_handmesh.py \ --train_yaml freihand/train.yaml \ --val_yaml freihand/test.yaml \ --arch hrnet-w64 \ --model_name FastMETRO-L \ --num_workers 4 \ --per_gpu_train_batch_size 16 \ --per_gpu_eval_batch_size 16 \ --lr 1e-4 \ --num_train_epochs 200 \ --output_dir FastMETRO-L-H64_freihand/

and

def parse_args():
    parser = argparse.ArgumentParser()
    #########################################################
    # Data related arguments
    #########################################################
    parser.add_argument("--data_dir", default='datasets', type=str, required=False,
                        help="Directory with all datasets, each in one subfolder")
.......

Especially about the parameters of the loss weight, I would like to know if you have trained according to the following settings？

    parser.add_argument("--joints_2d_loss_weight", default=100.0, type=float)
    parser.add_argument("--vertices_3d_loss_weight", default=100.0, type=float)
    parser.add_argument("--edge_normal_loss_weight", default=100.0, type=float)
    parser.add_argument("--joints_3d_loss_weight", default=1000.0, type=float)
    parser.add_argument("--vertices_fine_loss_weight", default=0.50, type=float) 
    parser.add_argument("--vertices_coarse_loss_weight", default=0.50, type=float)
    parser.add_argument("--edge_gt_loss_weight", default=1.0, type=float) 
    parser.add_argument("--normal_loss_weight", default=0.1, type=float)

I would be very grateful if you could reply me, have a good day!

About 3dpw evaluation

Hi! Thanks for the good job!
I have some quetion about how you test the 3dpw dataset on multi person sequences. Do you use only one person or every person one by one to evaluate?

About edge length loss (GT supervision) & normal vector loss

They didn't seems to be appeared in your paper. Is there any existing evaluation result when training using just 2d and 3d loss?

Demo body mesh result with FastMETRO-S-R50

Thank you for sharing your work.

I followed your instruction in the Demo section and I got the same result with FastMETRO-L models.
However, predicting the body mesh with FastMETRO-S models is very strange.
I changed args.model_name to FastMETRO-S before running the demo.

Do I need to change anything else? Can you please take a look at it?

About performance difference between different `PA_MPJPE` calculation method on hands.

Hi, I adopted reconstruction_error function in utils/metric_pampjpe.py to calculate PA_MPJPE metric on hands(which is originally used for body, in the repo)., and I'm pretty sure the result is aligned(i.e. substract the position of Wrist)
But the result is inferior to the de-facto standard when calculating PA_MPJPE for hands, which is written in the code given: saving results as pred.json and using eval.py in freihand to calucalate align result(and that will be the final PA_MPJPE, to my knowledge).
The result reconstruction_error given is roughly 24~25mm, however, the pred.json method mentioned above says that result is about 6~7mm. The huge difference bothers me.
Could you please explain the reason why the two methods given differ so much? Your reply will be highly appreciated.

Remember to use mixed dataset when reproducing H3.6M results

Hi, I trained FastMETRO-L-H64 on H3.6M but only got this performance:
INFO:FastMETRO:Best Results: (PA-MPJPE) <MPVPE> 0.00 <MPJPE> 75.36 <PA-MPJPE> 47.05 at Epoch 60.00
I tried evaluating the official checkpoint and got the same performance as published:
INFO:FastMETRO:Validation Epoch: 0 MPVPE: 0.00, MPJPE: 52.95, PA-MPJPE: 33.58

I didn't alter any hyperparameters, except that I am using 8 V100 GPUs:
python3.8 -m torch.distributed.launch --nproc_per_node=8 --master_port=29502
src/tools/run_fastmetro_bodymesh.py
--arch hrnet-w64
--model_name FastMETRO-L
--num_workers 4
--per_gpu_train_batch_size 16
--per_gpu_eval_batch_size 16
--lr 1e-4
--num_train_epochs 60
--output_dir FastMETRO-L-H64_h36m/

I did modified run_fastmetro_bodymesh.py by deleting all mesh visualization code.
I am using backbone hrnetv2_w64_imagenet_pretrained.pth

Any clue on what could I be doing wrong? Thx

FLOPs of the model

Would you like to provide the FLOPs of your small, middle, and large models?
Thanks!

About DDP train on the specified gpu

Hello, I am very interested in your work, but I have encountered some difficulties in running the project. I want to know whether I can specify some gpus on a single-machine multi-card server to train your model.
For example, my server is a server with 8 RTX 3080 gpus. So can I distributed train your model with only the index 4 gpu and the index 7 gpu.
Thank you very much.

Attention mask and global attention

Hi, in your code, an attention mask is applied to prevent non-adjacent vertex to attend to each other. However, in your visualization, there are strong attention score between non-adjacent vertices like between right wrist and head. Since the visualization code is not yet available, could you explain why would the said situation happen? This might be a dumb question but still hope to get an answer. Thx so much in advanced!!

ValueError: could not broadcast input array from shape (2048,2025,3) into shape (2048,0,3)

When I try to change the size of the input image from 224x224 to 256x192, I get an error:
ValueError: could not broadcast input array from shape (2048,2025,3) into shape (2048,0,3)

I modified self.img_res = [192,256] in human_mesh_tsv.py, Then load the data in the main program (for _, (img_keys, images, annotations) in enumerate(train_dataloader):) Times is wrong, only some of the images are wrong, The new_x and old_x calculated by the crop function in image_ops.py are both larger than the second value, such as new_x =(1957, 1934) old_x =(0, -23), resulting in the final error. How do I change the code?
I noticed that the x coordinates marked in the center of the wrong picture are all negative values with large absolute values (more than 100). Is it caused by him?

pretrained models

Very interested in your work! How to train my own dataset or can you provide some pre training models?

Time cost and memory cost of FastMETRO

@FastMETRO

Hello! FastMETRO is a nice work. I want to know the memory and time cost of FastMETRO of two versions when training.

Under the setting of per_gpu_train_batch_size 16 and mixed datasets, how long to train FastMETRO of two versions for one epoch on your cards, and how much GPU memory do you cost on one single card?

Extract 3d mesh reconstruction to .obj file?

Hi,
I've successfully run the demo and have the jpg image output, can you show me how to extract the 3d mesh prediction to .obj file for other use? Many thanks

Training args?

I understand that this code repo is under development.
Could you simply tell me the training args

 self.transformer_config_1 = {"model_dim": args.model_dim_1, "dropout": args.transformer_dropout, "nhead": args.transformer_nhead, 
                                     "feedforward_dim": args.feedforward_dim_1, "num_enc_layers": num_enc_layers, "num_dec_layers": num_dec_layers, 
                                     "pos_type": args.pos_type}
        # configurations for the second transformer
        self.transformer_config_2 = {"model_dim": args.model_dim_2, "dropout": args.transformer_dropout, "nhead": args.transformer_nhead,
                                     "feedforward_dim": args.feedforward_dim_2, "num_enc_layers": num_enc_layers, "num_dec_layers": num_dec_layers, 
                                     "pos_type": args.pos_type}

...
        self.conv_1x1 = nn.Conv2d(args.conv_1x1_dim, self.transformer_config_1["model_dim"], kernel_size=1)

for network training?

Thanks!

Demo with object detector and project to the original image

Hi,
I used an off-the-shelf object detector and get the bounding box of the input image. I sent the bounding box into the model, but how could I project the mesh back to the original input image?
I could only project the mesh back to the bounding box image, but if I project back to the original image, the ratio is incorrect.
Is there any formula for converting the camera parameters or anything else I can do?

Thank you!

Hello, can u tell me the dim of conv_1x1_dim ? I want to know

can you provide me your position_encoding.py? thx

About hand eval using freihand code

Hi, I followed the instruction in your EXP.md, and run Eval on Freihand using the same command as you give. But I got a json file which only contains 1980 predicted result, whereas the Freihand evaluation code says there shall be 3960 result, which confuses me.
Is the tsv format test data doesn't match the eval set used by freihand? Or it is necessary to rewrite pred.py in Freihand evaluation code to get things right?
Looking forward to your reply.

when will fully code and weights release?

Question about the option "use_smpl_param_regressor"

Thank you for your awesome research!
What is the difference whether using the option "use_smpl_param_regressor"?

Tensor size not matched during training

After running the Training on Human3.6M steps in Experiments.md.

python -m torch.distributed.launch --nproc_per_node=1 \
       src/tools/run_fastmetro_bodymesh.py \
       --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml \
       --val_yaml human3.6m/valid.protocol2.yaml \
       --arch hrnet-w64 \
       --model_name FastMETRO-L \
       --num_workers 1 \
       --per_gpu_train_batch_size 16 \
       --per_gpu_eval_batch_size 16 \
       --lr 1e-4 \
       --num_train_epochs 60 \
       --output_dir FastMETRO-L-H64_h36m/

I had encounter the following error

File "/hdd/input_pruning_exp/HMR_transformer/FastMETRO/src/modeling/_smpl.py", line 99, in forward
    v_posed = v_shaped + torch.matmul(posedirs, lrotmin[:, :, None]).view(-1, 6890, 3)
RuntimeError: The size of tensor a (480) must match the size of tensor b (16) at non-singleton dimension 0
Killing subprocess 5813

It seems like the v_shaped dimension does not match posedirs and lrotmin with a batch size 16. I had printed out the tensor size for v_shaped, posedirs & lrotmin for reference

v_shaped.shape = torch.Size([480, 6890, 3])
lrotmin[:, :, None].shape = torch.Size([16, 207, 1])
posedirs.shape = torch.Size([16, 20670, 207])

About pretrained CNN network

In PARE(ICCV2021) and a number of other articles, they mentioned that training ResNet or HR-Net with attitude estimation task on COCO dataset can help improve the performance of human reconstruction network. However, it seems that in your experiment, the network was pre-trained with image-net. Have you done any relevant experiments？