hellojialee / improved-body-parts Goto Github PK

View Code? Open in Web Editor NEW

260.0 9.0 43.0 32.92 MB

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Home Page: https://arxiv.org/abs/1911.10529

Python 99.84% Shell 0.16%

pytorch pose-estimation heatmap focal-l2-loss apex bottom-up tutorial training distributed mixed-precision

improved-body-parts's Introduction

Hi there 👋

🔭 I’m currently working on Multimodal Emotion Recognition and Computer Vision.
🌟 I'm interested in Neural Networks and Self-Supervised Learning.
🌱 I like to learn some interesting thoerys that can be diagrammed.

improved-body-parts's People

Contributors

Stargazers

Watchers

Forkers

liuguoyou starstylesky tuq820 labimage maniaajia darkknightzh zzmcdc chengguixiong1010 louisnust ducongju aidarikako sunstin sokunmin aneeshgadhwal wuxiaolianggit pedestrian-detection mingr-ma wuxiaomin0110 peternara robot-ai-machinelearning smorodov sixitingting wangxihao xinfushe aqsc mbencherif brooks0519 houshuaishuai jie311 shizi7677 trendingtechnology sanjingshou11 pigeonwwww goyallon caluoy xcd4p elijahahianyo ifhubs belle9217 asdlei99 5l1v3r1

improved-body-parts's Issues

About focal l2 loss

Hi,

Thank you for sharing this great work.
I have implemented focal l2 loss but unfortunately didn't get better results compared to normal l2 loss.
Here are some questions about focal l2 loss.

In the paper, you mentioned that before apply focal l2 loss, first train the network with normal l2 loss. Does that mean you should have two stage training, the first stage would be l2 loss training and then use the best checkpoint from first stage as initial weight and train the network with focal l2 loss? Or you can just directly train focal l2 loss without any pretrained stage?
Is focal l2 loss sensitive to hyper-parameter? I adopt nearly the same hyper-parameters as your implementation. I guess maybe this is the reason why I didn't get better results?.
I'm looking forward to your suggestion.
Thank you in advance.

关于论文中before generating precise ground truth Gaussian peaks

Hi,jialee
有幸阅读到simple pose这篇论文，关于论文当中产生GT高斯热图的解释我没有理解为何要涉及到R(stride步长)以及对位置的换算。论文当中是这样描述的：
By the way, we map the pixel p at the location p(x; y) in
the j-th ground truth heatmap to its original floating point
location p~(~ x; y~) = p~(x · R + R=2 − 0:5; y · R + R=2 − 0:5)
in the input image, in which R is the output stride, before
generating the precise ground truth Gaussian peaks.

在heatmapper.py中有看到相关注释，比如：
x,y coordinates of centers of bigger grid, stride / 2 -0.5是为了在计算响应图时，使用grid的中心
basically we should use center of grid, but in this place classic implementation uses left-top point.
期待您的回复，谢谢~

About pretrained model and focal loss

Hi,

Could you help explain the method of the 3 pretrained model? How does they relate to the method in the paper?
Could you introduce more info on the Focal loss function? Seems the latest code in loss_model.py is different from the formula in the paper(a=b=0, abs(1. - st) instead (1. - st) ** gamma). And the loss in loss_model_parrel.py also different from the loss_model.py . Which is better?
Seems the model with Focal loss will cause more false positive and error connection, even the AP is more higher. Does that make sense?
Could you comment the Focal loss used on the Cornernet on heatmap? Will that work on this model also?

Thanks

With the evaluate.py

Heatmap Visualization

Hello, what part of the code is the visualization of the heat map? How can I get it?

further improvements in speed and maintaining high accuracy

Hi, nice work. But I have some recommendations to further improve the speed and maintaining high accuracy.
1, Optimize the post-processing speed (write in C++)
2, Try Knowledge distillation (currently I think we could reduce 70 -> 80% size of the model with comparable accuracy).
3, Try TensorRT.
But first I think you need to clean the source code architecture so that everyone could easy to help you.

Pretrained model provided on Google Drive?

Hi, thanks for your work. Can you provide pretrained model on google drive? It's too slow to down model on BaiduNetDisk. Thanks a lot.

Test without multi scale and flip

Hi,
Thanks for your job and the project. I have demo the pretrained model on some pics. With the default setting( with FLIP, multi scale) the pose result is OK, similar to the openpose BODY25 model. But, if without the flip, and multi scale, the result is not so good even on very simple image.
Could you help check if any issue there?

origin image

with flip

without flip

Evaluation results are always zero

Average precision and average recall from evaluate.py always give zero.

I am using images of size 512(other parameters are default as it is), I have trained the model without checkpoint and for 20 epochs with COCO dataset. The results were decent. But evaluation is zero(evaluation on validation set for 50 ids). What would be the reason ?

Thanks,
Aravind

Request for licensing information

I hope this message finds you well. I came across your project on GitHub and I'm impressed with its capabilities. However, I noticed that there is no information about the licensing of the project.

I would like to inquire about the licensing of this project and whether it can be used for commercial purposes. I am interested in using your project to develop a commercial software, and I need to know the licensing terms before proceeding.

I would greatly appreciate it if you could provide me with more information about the licensing of SimplePose. If there are any specific terms or conditions, please let me know as well.

Thank you very much for your time and assistance. I look forward to hearing from you soon.

Best regards,
Bing BAI

Out of memory

Hi, I run python evaluation.py to evaluate the model with 2080Ti.
But it reminds me that it is out of memory.

I don't change anything in this repository. why it appears? Thank you very much.

IndentationError in py_cocodata_server/py_data_transformer.py

Hi, it appears IndentationError in line 59 of py_cocodata_server/py_data_transformer.py

(width, weight) = center

The first space be removed.

Aboat fig 4 is or not improved hourglass ?

Hello, your work is excellent, but there is one thing I don’t understand very well. Figure 4 in the paper is an improved hourglass network, but in Figure 3, it is still marked as hourglass, and you don’t see the part about Figure 4 in your code. Code. Excuse me, I don’t know if I misunderstood Figure 4.

Understanding body part heatmap implementation

@hellojialee , Kudos to the excellent work,

I am referring to your ground truth heatmap generation implementation because I must generate body part heatmaps for my study. In short, I am looking to achieve this :

I intend to generate 18 channel heatmap where each channel stores one body part ( line segment between wrist and elbow, line segment between elbow and shoulder etc )
It is not clear to me how the elliptical gaussian has been implemented. Can you please explain the steps in put_limb_gaussian_maps in /py_cocodata_server/py_data_heatmapper.py because I believe this is the function doing my desired task.

Thanks in advance,
jysa01

Joints heatmaps

Hi I want to ask you what we have in the multi-person joints heatmap generated with the heatmapper.
Is It just a gaussian around each joint location so that the same semantic joint (i.e. left shoulder) is on the same heatmap channel for all the human targets in the scene but at different x,y location?

So could you vectorize the joint heatmapper emitter i.e. with render gaussian? Cause I see you have many loop there with numpy code and so I am guessing if It could be vectorized with some Pytorch ops.

data/coco_masks_hdf5.py

prev_center.append(np.append(person_center, max(img_anns[p]["bbox"][2], img_anns[p]["bbox"][3])))

这里的p是固定值值，是否存在问题？宽高的最大值，是当前main_persons的吗？

cocomaskshdf5.py

how to divide different people's kp in the same image?

你好，我之前在学习CornerNet，CenterNet时，不同目标的相同关键点是通过嵌入向量分组的，如83channel=80类+嵌入向量+xy偏移量，所以当图片中有多个人时，IMHN的输出是如何解析的？IMHN的输出是什么样的，通道有什么含义？我理解的是所有人体的同个关键点预测在同张heatmap上，这样如何区分不同人呢？谢谢

How much does multi-scale search contribute?

Hi, from experiment 16 vs 19 in Table 1, we find multi-scale search seems to contribute more than 3% AP.
Did all other models including your baseline in Table 1 have multi-scale search?

make_mask中一定要有mask信息？

作者你好，在转coco信息的时候，一定要mask信息，不知作者在用MPII数据的时候是怎么处理的，有没有脚本呢？

Data pre-processing

Thank you for a job well done! I have some doubts like you ask，In the data pre-processing,Why is there a problem with center point alignment？Is this because the size of the input image through CNN has changed?

An error appears during the training that may pass in a non-contiguous input.

So glad to see your project, I successfully run the demo, create the h5 file. But when I try to train the model, An error appears just like:
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
I really hope to get your help, thank you very much.

Inference is very slow, 6 seconds per frame.

Hi, and thank you for making this code available.

I am running it in windows, on a GTX 1080, and using the demo_image.py file with the model from google drive and the time it takes to detect keypoints is more than 6 seconds.

What am i doing wrong? How can i get close to the 38 fps that you mention on the readme?

Thank you again!


>python demo_image.py --image input.jpg
0 neck->nose
1 neck->Reye
2 neck->Leye
3 neck->Rear
4 neck->Lear
5 nose->Reye
6 nose->Leye
7 Reye->Rear
8 Leye->Lear
9 neck->Rsho
10 Rsho->Relb
11 Relb->Rwri
12 neck->Lsho
13 Lsho->Lelb
14 Lelb->Lwri
15 neck->Rhip
16 Rhip->Rkne
17 Rkne->Rank
18 neck->Lhip
19 Lhip->Lkne
20 Lkne->Lank
21 nose->Rsho
22 nose->Lsho
23 Rsho->Rhip
24 Rhip->Lkne
25 Lsho->Lhip
26 Lhip->Rkne
27 Rear->Rsho
28 Lear->Lsho
29 Rhip->Lhip
{0: 'neck->nose',
 1: 'neck->Reye',
 2: 'neck->Leye',
 3: 'neck->Rear',
 4: 'neck->Lear',
 5: 'nose->Reye',
 6: 'nose->Leye',
 7: 'Reye->Rear',
 8: 'Leye->Lear',
 9: 'neck->Rsho',
 10: 'Rsho->Relb',
 11: 'Relb->Rwri',
 12: 'neck->Lsho',
 13: 'Lsho->Lelb',
 14: 'Lelb->Lwri',
 15: 'neck->Rhip',
 16: 'Rhip->Rkne',
 17: 'Rkne->Rank',
 18: 'neck->Lhip',
 19: 'Lhip->Lkne',
 20: 'Lkne->Lank',
 21: 'nose->Rsho',
 22: 'nose->Lsho',
 23: 'Rsho->Rhip',
 24: 'Rhip->Lkne',
 25: 'Lsho->Lhip',
 26: 'Lhip->Rkne',
 27: 'Rear->Rsho',
 28: 'Lear->Lsho',
 29: 'Rhip->Lhip',
 30: 'nose',
 31: 'neck',
 32: 'Rsho',
 33: 'Relb',
 34: 'Rwri',
 35: 'Lsho',
 36: 'Lelb',
 37: 'Lwri',
 38: 'Rhip',
 39: 'Rkne',
 40: 'Rank',
 41: 'Lhip',
 42: 'Lkne',
 43: 'Lank',
 44: 'Reye',
 45: 'Leye',
 46: 'Rear',
 47: 'Lear',
 48: 'background',
 49: 'reverseKeypoint'}
Resuming from checkpoint ......
Network weights have been resumed from checkpoint...
cuda
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
start processing...
the 0th keypoint detection result is :  ([(384.98810766687865, 156.99848021452428), (392.0089789786089, 140.00016588448665), (372.00392927155144, 141.9994244210869), (396.997404715929, 137.00354114471122), (339.00678492184926, 140.0066329927729), (424.0065017794617, 191.99842561943024), (304.9960763460449, 220.00916854059585), (443.0001489242592, 272.0109579295975), (292.00050351624543, 310.9984260760411), (465.0083100132065, 350.99493035095674), (293.00562399904305, 404.00513994760007), (420.99916662586236, 393.0031377139439), (349.9987046664099, 401.00452761418853), (413.99545615615057, 536.0021693790678), (351.0002542695355, 541.9933765298466), (376.0021593526506, 644.988972815169), (352.00185668667876, 677.9945526718805)], 0.9674948892626798)
processing time is 6.45740

pretrained model link failed

Can you reupload the pretrained model PoseNet_52_epoch.pth?
Seems like the link was failed.

Group

你好，请问这个库有针对HigherHRNet中valid文件使用的AE分组策略导致的关键点不合理连接的解决方案？

it seems the inference speed is slow!

Hi, @jialee93
Thanks for your great work. When I use demo_images.py to test on my own image , each image consumes about 40 seconds(TitianV)!. But as you said in the readme, the speed has achived real time. I don't know why and hope you can give some advice~

How to use train.py to run

because my computer don't have GPU,so i should run train.py. What request should i give in terminal? I tried but faild. @hellojialee Thank you very much~

train_distributed.py does not distribute to other gpus

Hi, thanks for the work.

I tried train_distributed.py but the other gpus clearly don't get to run at all. Can you please check?

issue with python demo_image.py and python evaluate.py

Where is the coco_train_dataset512.h5 file?

Hi,Thanks for your great work.I have a question about the coco_train_dataset512.h5 file which is need to train model.Where can I download it?

hi when I tried to run "python demo.py

getting an error as shown below

If you can help me in this, it will be great help
I am using windows 10 +GPU

Originally posted by @manishkth in #47 (comment)

l2 focal loss is diffirent from paper

So glad to see your project, I successfully run the demo.But i found that the l2 focal loss in this project (models/loss_model_parallel.py), set the alpha=0 and beta=0, factor = torch.abs(1.- st), which is different from your paper shows, alpha=0.1, beta=0.02 and gamma=2, factor = (1. - st) **gamma.I'm really confused about that.
I really hope to get your help, thank you very much.

With python demo_image.py

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
start processing...
Traceback (most recent call last):
File "demo_image.py", line 637, in
params, model_params = config_reader()
File "/scratch/gp/Improved-Body-Parts-master/utils/config_reader.py", line 9, in config_reader
param = config['param'] # 继承了dict的一种字典类型
File "/scratch/mool/ana3/envs/gpp/lib/python3.6/site-packages/configobj.py", line 554, in getitem
val = dict.getitem(self, key)
KeyError: 'param'
(gpp) zhhu@k8s-master01:/scratch/gp/Improved-Body-Parts-master$ python demo_image.py
I am in ubuntu18.04 CUDA10.0 run it.But it failed.can you help me to watch the errror? thanks!!!

A small question about the formulation (2)

Hi, thanks for sharing your brilliant work. I noticed that when computing Sd, the two formulations are all close to 1 ( the first line when S ~= 1, Sd ~= 0.9, the second line when S ~= 0.01, Sd ~= 0.97). How does Sd balance easy and hard samples?

how to inferencce on video？

ValueError: not enough values to unpack (expected 5, got 3)

Hi writer, I encountered an error in running train.py,How should I modify it?
Thank you for your answer!!!

Test phase, Epoch: 0
Traceback (most recent call last):
File "train.py", line 206, in
test(epoch, show_image=False)
File "train.py", line 178, in test
images, mask_misses, heatmaps, offsets, mask_offsets = target_tuple
ValueError: not enough values to unpack (expected 5, got 3)

It can be Running in Ubuntu18.04 with RTX3080, CUDA11.1？

Thank your wonderful work，I downloaded your code and tried to run it.But I encountered a lot of errors when downloading the packages.Can this run on Ubuntu18.04, RTX3080?