Code Monkey home page Code Monkey logo

fastmetro's People

Contributors

fastmetro avatar jhcho99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fastmetro's Issues

FLOPs

Hi, Thanks for this work. I wanted to ask how you calculated the FLOPs for each of your models and if there was a particular library you used.

3DPW experimental result without training with 3dpw training data

Hi, in some HMR papers, they released their 3dpw eval results without 3dpw training data. For a fair comparison, I tried your released checkpoint [FastMETRO-S-R50_h36m_state_dict.bin], evaluated it on 3dpw test, and got a MPVPE ~129. This result is actually worse than some papers that didn't train with 3dpw (3DCrowdnet, PyMAF...).
And then I finetune this on 3dpw training data, and got an MPVPE ~93. I found the result surprisingly got a larger gain compared to other papers after fine-tuning on 3dpw (ex. PyMAF).
Do you know the reason why your paper didn't get a quite well performance before fine-tuning on 3dpw but got a SOTA performance after fine-tuning?
Thank you!

Linked model licensing

Just wanted to clarify that the trained non-parametric models and src/data/*.pt files are MIT licensed since the repository contains an MIT license.

About the best training args in Freihand dataset

Hello,I am very interested in your work.
Are the experimental results on the Freihand dataset in your paper achieved by the following parameters and the parameters defined in parse_args()?

python -m torch.distributed.launch --nproc_per_node=4 \ src/tools/run_fastmetro_handmesh.py \ --train_yaml freihand/train.yaml \ --val_yaml freihand/test.yaml \ --arch hrnet-w64 \ --model_name FastMETRO-L \ --num_workers 4 \ --per_gpu_train_batch_size 16 \ --per_gpu_eval_batch_size 16 \ --lr 1e-4 \ --num_train_epochs 200 \ --output_dir FastMETRO-L-H64_freihand/

and

def parse_args():
    parser = argparse.ArgumentParser()
    #########################################################
    # Data related arguments
    #########################################################
    parser.add_argument("--data_dir", default='datasets', type=str, required=False,
                        help="Directory with all datasets, each in one subfolder")
.......

Especially about the parameters of the loss weight, I would like to know if you have trained according to the following settings?

    parser.add_argument("--joints_2d_loss_weight", default=100.0, type=float)
    parser.add_argument("--vertices_3d_loss_weight", default=100.0, type=float)
    parser.add_argument("--edge_normal_loss_weight", default=100.0, type=float)
    parser.add_argument("--joints_3d_loss_weight", default=1000.0, type=float)
    parser.add_argument("--vertices_fine_loss_weight", default=0.50, type=float) 
    parser.add_argument("--vertices_coarse_loss_weight", default=0.50, type=float)
    parser.add_argument("--edge_gt_loss_weight", default=1.0, type=float) 
    parser.add_argument("--normal_loss_weight", default=0.1, type=float)

I would be very grateful if you could reply me, have a good day!

About 3dpw evaluation

Hi! Thanks for the good job!
I have some quetion about how you test the 3dpw dataset on multi person sequences. Do you use only one person or every person one by one to evaluate?

Demo body mesh result with FastMETRO-S-R50

Thank you for sharing your work.

I followed your instruction in the Demo section and I got the same result with FastMETRO-L models.
However, predicting the body mesh with FastMETRO-S models is very strange.
I changed args.model_name to FastMETRO-S before running the demo.

Do I need to change anything else? Can you please take a look at it?

About performance difference between different `PA_MPJPE` calculation method on hands.

Hi, I adopted reconstruction_error function in utils/metric_pampjpe.py to calculate PA_MPJPE metric on hands(which is originally used for body, in the repo)., and I'm pretty sure the result is aligned(i.e. substract the position of Wrist)
But the result is inferior to the de-facto standard when calculating PA_MPJPE for hands, which is written in the code given: saving results as pred.json and using eval.py in freihand to calucalate align result(and that will be the final PA_MPJPE, to my knowledge).
The result reconstruction_error given is roughly 24~25mm, however, the pred.json method mentioned above says that result is about 6~7mm. The huge difference bothers me.
Could you please explain the reason why the two methods given differ so much? Your reply will be highly appreciated.

Remember to use mixed dataset when reproducing H3.6M results

Hi, I trained FastMETRO-L-H64 on H3.6M but only got this performance:
INFO:FastMETRO:Best Results: (PA-MPJPE) <MPVPE> 0.00 <MPJPE> 75.36 <PA-MPJPE> 47.05 at Epoch 60.00
I tried evaluating the official checkpoint and got the same performance as published:
INFO:FastMETRO:Validation Epoch: 0 MPVPE: 0.00, MPJPE: 52.95, PA-MPJPE: 33.58

I didn't alter any hyperparameters, except that I am using 8 V100 GPUs:
python3.8 -m torch.distributed.launch --nproc_per_node=8 --master_port=29502
src/tools/run_fastmetro_bodymesh.py
--arch hrnet-w64
--model_name FastMETRO-L
--num_workers 4
--per_gpu_train_batch_size 16
--per_gpu_eval_batch_size 16
--lr 1e-4
--num_train_epochs 60
--output_dir FastMETRO-L-H64_h36m/

I did modified run_fastmetro_bodymesh.py by deleting all mesh visualization code.
I am using backbone hrnetv2_w64_imagenet_pretrained.pth

Any clue on what could I be doing wrong? Thx

FLOPs of the model

Would you like to provide the FLOPs of your small, middle, and large models?
Thanks!

About DDP train on the specified gpu

Hello, I am very interested in your work, but I have encountered some difficulties in running the project. I want to know whether I can specify some gpus on a single-machine multi-card server to train your model.
For example, my server is a server with 8 RTX 3080 gpus. So can I distributed train your model with only the index 4 gpu and the index 7 gpu.
Thank you very much.

Attention mask and global attention

Hi, in your code, an attention mask is applied to prevent non-adjacent vertex to attend to each other. However, in your visualization, there are strong attention score between non-adjacent vertices like between right wrist and head. Since the visualization code is not yet available, could you explain why would the said situation happen? This might be a dumb question but still hope to get an answer. Thx so much in advanced!!
image

ValueError: could not broadcast input array from shape (2048,2025,3) into shape (2048,0,3)

When I try to change the size of the input image from 224x224 to 256x192, I get an error:
ValueError: could not broadcast input array from shape (2048,2025,3) into shape (2048,0,3)
image

I modified self.img_res = [192,256] in human_mesh_tsv.py, Then load the data in the main program (for _, (img_keys, images, annotations) in enumerate(train_dataloader):) Times is wrong, only some of the images are wrong, The new_x and old_x calculated by the crop function in image_ops.py are both larger than the second value, such as new_x =(1957, 1934) old_x =(0, -23), resulting in the final error. How do I change the code?
I noticed that the x coordinates marked in the center of the wrong picture are all negative values with large absolute values (more than 100). Is it caused by him?

pretrained models

Very interested in your work! How to train my own dataset or can you provide some pre training models?

Time cost and memory cost of FastMETRO

@FastMETRO

Hello! FastMETRO is a nice work. I want to know the memory and time cost of FastMETRO of two versions when training.

Under the setting of per_gpu_train_batch_size 16 and mixed datasets, how long to train FastMETRO of two versions for one epoch on your cards, and how much GPU memory do you cost on one single card?

Training args?

I understand that this code repo is under development.
Could you simply tell me the training args

 self.transformer_config_1 = {"model_dim": args.model_dim_1, "dropout": args.transformer_dropout, "nhead": args.transformer_nhead, 
                                     "feedforward_dim": args.feedforward_dim_1, "num_enc_layers": num_enc_layers, "num_dec_layers": num_dec_layers, 
                                     "pos_type": args.pos_type}
        # configurations for the second transformer
        self.transformer_config_2 = {"model_dim": args.model_dim_2, "dropout": args.transformer_dropout, "nhead": args.transformer_nhead,
                                     "feedforward_dim": args.feedforward_dim_2, "num_enc_layers": num_enc_layers, "num_dec_layers": num_dec_layers, 
                                     "pos_type": args.pos_type}

...
        self.conv_1x1 = nn.Conv2d(args.conv_1x1_dim, self.transformer_config_1["model_dim"], kernel_size=1)


for network training?

Thanks!

Demo with object detector and project to the original image

Hi,
I used an off-the-shelf object detector and get the bounding box of the input image. I sent the bounding box into the model, but how could I project the mesh back to the original input image?
I could only project the mesh back to the bounding box image, but if I project back to the original image, the ratio is incorrect.
Is there any formula for converting the camera parameters or anything else I can do?

Thank you!

Tensor size not matched during training

After running the Training on Human3.6M steps in Experiments.md.

python -m torch.distributed.launch --nproc_per_node=1 \
       src/tools/run_fastmetro_bodymesh.py \
       --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml \
       --val_yaml human3.6m/valid.protocol2.yaml \
       --arch hrnet-w64 \
       --model_name FastMETRO-L \
       --num_workers 1 \
       --per_gpu_train_batch_size 16 \
       --per_gpu_eval_batch_size 16 \
       --lr 1e-4 \
       --num_train_epochs 60 \
       --output_dir FastMETRO-L-H64_h36m/

I had encounter the following error

File "/hdd/input_pruning_exp/HMR_transformer/FastMETRO/src/modeling/_smpl.py", line 99, in forward
    v_posed = v_shaped + torch.matmul(posedirs, lrotmin[:, :, None]).view(-1, 6890, 3)
RuntimeError: The size of tensor a (480) must match the size of tensor b (16) at non-singleton dimension 0
Killing subprocess 5813

It seems like the v_shaped dimension does not match posedirs and lrotmin with a batch size 16. I had printed out the tensor size for v_shaped, posedirs & lrotmin for reference

v_shaped.shape = torch.Size([480, 6890, 3])
lrotmin[:, :, None].shape = torch.Size([16, 207, 1])
posedirs.shape = torch.Size([16, 20670, 207])

About pretrained CNN network

In PARE(ICCV2021) and a number of other articles, they mentioned that training ResNet or HR-Net with attitude estimation task on COCO dataset can help improve the performance of human reconstruction network. However, it seems that in your experiment, the network was pre-trained with image-net. Have you done any relevant experiments?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.