chenyilun95 / tf-cpn Goto Github PK

View Code? Open in Web Editor NEW

792.0 792.0 197.0 720 KB

Cascaded Pyramid Network for Multi-Person Pose Estimation (CVPR 2018)

License: MIT License

Python 93.62% Shell 0.04% C++ 2.67% Cuda 3.63% Makefile 0.03%

tf-cpn's People

Contributors

Stargazers

Watchers

Forkers

miracle-fmh rkshuai fanglw danache trantorrepository zgsxwsdxg megvii-wzc ericzgw fengqingyue starstylesky miracle-fmh-casia yd8534976 qingsong99 10183308 statml dreadlord1984 yw155 gengdavid 0ohadeso0 7color94 daijucug ml-lab lhwcv wjx2 aihgf sandmanup labimage shlpu cvtower grseb9s yamlong hzshonny soccergame feilaoda nebuladream locussam dwyanetalk shubhampachori12110095 cooldiao ziminglu liucc0001 zhangchichichi touchylk chuan92 tangyoubao wanjinchang ouceduxzk lilemin huaifeng1993 geyanteng jkznst helenacsu xtanitfy xinxun-xu yuxiang-zhou cwbjyy tlatlbtle column6942 sunshine352 nick-yeh aaronlau0 myzhencai liwanfei999 yizhiyuzhang gyw0228 baby47 lhh17 ar9013 tinyjie chestnut111 chengyedut shreyasrajesh tsingjinyun hzhang57 naykira linhanxiao jiangwenpl fenglf zhaowwenzhong wcy116 sklf lturing liviust fninaparavecino huajianni666 zbxzc35 lihua213 eyebies hss1737 yushenxiang 1987618girl wangsj18 x2ss bbobxx zhjpqq sugartom renneamber mikeshihyaolin dedoogong nmxnql

tf-cpn's Issues

How about the training details?

Thanks you very much for your work!
And could you tell me the details about your training:

How about the batch size setting in your training. Unfortunately, my GPU can only set 16, 24 is Out Of Memory, and I get 72.5 (yours is 72.9) AP in the COCO minival dataset. And, I think the larger batch size can get better performance.
How many GPUs used and how much time spent for your training ? I used 4 GPUs, spending 3.x days.
What's more, how about the memory of your one GPU card?

Thanks!

Windows

Great repo! Do you have any plans to make this compatible with windows?

Performing Inference

How to perform inference on an image? I want to run the model on an image and get the locations of the human joints. Can you please help me with this?

error "local variable 'label' referenced before assignment" shows if vis=True

set vis=True
run python mptest.py -d 0-1 -r 350
the follow error shows:

05-29 15:47:43 Current epoch is 350.
ran 0s >> << left 0s
Process Worker-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/../../lib/tfflat/mp_utils.py", line 34, in run
msg = self._func(self.id, *self.args, **self.kwargs)
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/mptest.py", line 220, in func
return test_net(tester, logger, dets, range)
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/mptest.py", line 90, in test_net
test_img, detail = Preprocessing(test_data[i], stage='test')
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/dataset.py", line 241, in Preprocessing
draw_skeleton(tmpimg, label.astype(int))
UnboundLocalError: local variable 'label' referenced before assignment

Question about RefineNet architecture

Hi,

I am wondering why 8x upsampling after three bottlenect is used in RefineNet, isn't 8x sampling too harsh?

Thanks.

evaluation results

First of all, thanks for sharing the work. I quickly run a test of AP with following results, do you know why it is too low?

python3 models/COCO.res50.256x192.CPN/mptest.py -d 0-1 -r 350
loading annotations into memory...
Done (t=2.09s)
creating index...
index created!
loading the precalcuated json files
Loading and preparing results...
4581
4581
DONE (t=2.98s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
there are 40504 unique images
DONE (t=14.41s).
Accumulating evaluation results...
DONE (t=0.53s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.093
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.116
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.102
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.089
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.099
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.097
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.117
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.104
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.092
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.103
AP50
ap50 is 0.141489
ap is 0.099431

I added the AP calculation and saved the json file already

dataset config file confused

why you change the prefix on the stage of train in COCOAllJoints.py ? and the dataset config name is 2014 ?

Training loss fluctuates

Hello， I train resnet101 of 384x288 with batchsize 16 and lr 1.5625e-05, the loss fluctuates between 60-100. Is it normal?

单张图像测试程序

哈喽，你们会考虑编写单张图像或视频流的测试程序吗？

Train Details

First of all, thanks for sharing the work. You used 8 Titank GPUs for training . Now, I want to retrain the keypoints detection model as you using COCO dataset .but I only have one GPU （GTX1080, 8G memory）， Can I finish the retrain with this GPU? and retrain need how much memory of GPU at least?
Thank you

Ensemble in the paper?

Hi, what is your ensemble methods used in your paper? What models do you use ? How do you ensemble them? Thanks.

ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

Traceback (most recent call last):
File "mptest.py", line 17, in
from lib_kernel.lib_nms.gpu_nms import gpu_nms
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

About the provided human detection boxes results

Hello, could you share the Total Ap and Human AP of the model which produces your provided human detection boxes results. Thanks a lot.

About heatmap generation

tf-cpn/models/COCO.res50.256x192.CPN/dataset.py

Line 156 in 7c01aba

ret[i][j] /= am / 255

I am curious about the generation of heatmap, why multiply 255 here? It seems that the network can not converge when i remove 255.

calculate BN on multi GPUs?

Hi, thanks for your great work. I have a question on BN calculation. In your code, BN is only calculated on a single GPU. What if I want calculate BN on multi GPUs to make the normalization step more accurate? Since it may help network converge better. Thanks.

Batch size

How do you think a batch size of 8 versus the batch size you used affect the final output? Have you experimented with batch size at all to see the affect on results?

How about training from scratch?

Hi! Thanks for providing such a wonderful work.
I wonder have you tried a ResNet backbone without ImageNet pretraining?
Is it possible that a pre-trained model might become one of the keys of the performance improvement?

About last_fm = None

Hi, thanks for sharing nice work.
In the create_global_net function of network.py, last_fm is define as None. However, that would prevent upsampling and pixel-wise summation of feature map described in your paper (Fig1).

Which one is correct? The code or Fig1 of paper?

About saving model during training

The original code save model after every one ecpoh. What if I want to save model only when validation loss is smaller? How can I add evaluation code during training? Thanks.

how about generate the label heatmap in the 384x288 resolution and resize the label to 96x72?

Hi,
I find that you generate the label heatmap in 92x72 resolution, so the [int(x/4.),[int(y/4.)]] was the center to generate Gaussian Blur. But it seems may cause mismatch with the original coordinate.e.g.int(17/4)=4,but 4*4=16. So I wonder could I generate the label heatmap in 384x288 resolution and resize it to 96x72? This method would be much slower than your implement but more accuracy?
Thanks in advance!

the influence of detector

the performance of human AP from the 57.2 to 62.9 increase extra 1% in human pose, will you release the result of 62.9 human detection result for analyze the importance of detector?

About BN location

Hi! Thank you for making this project open to the world!
I run into some confuses because I am new to Tensorflow.
I wonder whether you add Batch Normalization(BN) to your added layers, e.g. 1*1 conv kernel and bottleneck in RefineNet.
Could you please give me a hint about where you add BN?

JSON Training File

What are the parameters per image that are needed in the JSON training file if we want to train with our own data? Currently trying to write a script that can automate the JSON file generation for my own data. Thanks!

How about the speed?

Hi @chenyilun95, thanks for your great work. I would like to ask a question about the speed of CPN. Thanks.

About data preprocess and data augmentation

I want to know more detail about data prepocessing especially cropping strategy.

e2e training or 2stage training?

I am wondering whether e2e training is the best way.

If I train global net to a steady stage and then I fix global net and start to train refine net, what the result would be like? Have you tried this approach and what about the results.

FileNotFoundError: [Errno 2] No such file or directory: '.../data/COCO/dets/person_detection_minival411_human553.json'

Hello,recently I forked this project and tried to evaluate its performance. But there is an error message when I run 'python3 mptest.py -d 0-1 -r 350'. It says "FileNotFoundError: [Errno 2] No such file or directory: '.../data/COCO/dets/person_detection_minival411_human553.json' ". I can't find this file in the Google Drive. So where can I download this json file?
Thank you very much ^_^

Why does GPU have not been used?

Thanks for your good work.
Use this command python3 network.py -d 0-1 . Why does GPU have not been used?

annotations files not complete?

Hi, I want to evalute your pretrained model. I download "minival annotation" file and "Person detection results in COCO Minival" file from your link, but it seems that some images are missing. Here are my test codes:

val = json.load(open('./MSCOCO/annotations/person_keypoints_minival2014.json', 'r'))
train = json.load(open('./MSCOCO/annotations/person_keypoints_trainvalminusminival2014.json', 'r'))
det = json.load(open('./dets/person_detection_minival411_human553.json', 'r'))

det_image_ids = set([i['image_id'] for i in det])
val_image_ids = set([i['id'] for i in val['images']])
val_annot_ids = set([i['id'] for i in val['annotations']])
val_annot_img_ids = set([i['image_id'] for i in val['annotations']])

print(len(val_image_ids & val_annot_img_ids))
print(len(det_image_ids & val_annot_img_ids))

output is

2693
2692

it seems that many annotatated images(ground truth) do not exist in your detection result file and your minival. What is the reason? Thanks.

A little confuse about "epoch_size".

Thanks for your great work!
But I'm a little confuse about epoch_size = 60000 # include flip * 2, aug * 4, batch * 16 in config.py
Could you explain about how to get 60000 a little? We know that COCO has about 150k person instances for training. Is there any connection between these two figures?

Test on my own data

I want to use the pretrain model on my own dataset (not COCO dataset). Which python files do I need to modify?

About coordination recovery

when do coordination recovery from the predicted heatmaps, i notice that the result add 2 in x-axis and y-axis, why do this to the coordination? Is this due to the misalignment in prediction?

tf-cpn/models/COCO.res101.256x192.CPN/mptest.py

Line 143 in bdc09bb

cls_skeleton[test_image_id, w, :2] = (x * 4 + 2, y * 4 + 2)

why making the boarder for image in image process？

Hi，I'm quite a newer for human pose estimate, and your work helps me a lot. I'm confused that why you make boarder for the image in the image process, like this code:

bimg = cv2.copyMakeBorder(img, add, add, add, add, borderType=cv2.BORDER_CONSTANT,
                              value=cfg.pixel_means.reshape(-1))

It seems to avoid the region with human beyond the image size(e.g. xmin < 0), you pad the image before cropping the region with human. Did I understand it correctly? If so, crop first then pad the cropped image is another choice?
Thanks in advance.

How about just detecting persons during training?

Hi, I'm reading your paper recently and feel it so cool. It is mentioned that you utilized all eighty categories in the dataset to train the detector but only caught person boxes for the follow-up work. I wonder if it is possible to detect person only while ignoring other categories. What are the benefits to detect so match categories?

Loss evaluations

Very interested work.

During your training what were your refine loss, global loss and total loss values like in the final epochs? I've had to modify the repo due to different graphics cards so I am wondering if my values are similar.

Thanks again for all your hard work in this and other repos.

In the function of Preprocessing() of dataset.py, does the object objcenter means the center point of bounding box?

I saw that bbox is read from the json file, and the value of key 'bbox' is (start_x, start_y, width, height). So the object adds the latter 2 values with the first 2 values to get the end_x and end_y.
Then comes the issue I'm confused of,
objcenter = np.array([bbox[0] + bbox[2] / 2., bbox[1] + bbox[3] / 2.])
if it was to calculate the center point, why wasn't it wrote as (bbox[0] + bbox[2]) / 2. instead? Division comes before addition, right? Am I getting the wrong idea?

A doubt about global loss and refine loss

I find the calculation of the global loss and the refine loss is different. The refine loss ignore the valid < 0.1, which doesn't generate the loss. But in the global loss when valid < 1.1, the label change to 0 as global_label, but the global_out doesn't change, which means the global loss only focus these visible points. Is my understanding correct?

how much fps?

Hi,
This can be run in RealTime?
Do you have any data about its benchmark (fps) on a given hardware?

I'd like to know if it's possible to get RealTime performance on Jetson TX2

More training detatils

How many epochs did you use to train the models? Did you train the different models using different epochs？ I read your paper. You said, the learning rate is decreased by a factor of 2 every 10 epcohs. Is this learning strategy used for every model? Or is there some difference between training different models?

Compare with paper result

The paper report 73.0 on test-dev using ensemble models while this code can achieve it by single model ?

doubt about the pre-model precision and recall

this is because of released model is underfitting？i try to draw the keypoints model predicted，show result is not correct.

error ：no default reduce due to non-trivial cinit

Hello, I cloned this project recently and tried to evaluate the pretrained model's performance. But when I ran 'python3 mptest.py -d 0-1 -r 350' , I got this error message
ran 2337s >> << left 0s Traceback (most recent call last): File "/share1/home/chunyang/anaconda3/envs/cpn/lib/python3.6/multiprocessing/queues.py", line 234, in _feed obj = _ForkingPickler.dumps(obj) File "/share1/home/chunyang/anaconda3/envs/cpn/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "stringsource", line 2, in pyarrow.lib.Buffer.__reduce_cython__ TypeError: no default __reduce__ due to non-trivial __cinit__
So can you guys please tell me how to slove this problem ?

P.S. This error didn't come up until the evaluation process was almost done.

How about the training detail?

Thank you very much for your wonderful work.
I trained resnet50.256x192 model and used the default settings. And here is my performance in the COCO minival dataset:

I did not know why the gap is so huge. I trained the model with 4 1080 gpu.

something about continue_train

Hi, if I want to finetune the 350epoch-snapshot.ckpt model on my own keypoints data, if I need to set the cfg variable "continue_train" to True?

my results are lower than what is indicated

Congratulation for your COCO Challenge result, and thank you for sharing your code.

I'm testing your code and the problem is I have these results on the validation set of 2014:

-I'm using ResNet-101 with an input size of 384x288
-I haven't almost changed anything in the code(except the config file, in mptest I put dump_method=1 as argument for the function MultiProc).
-I'm using the pretrained model you have trained.
-Also, for the dataset, I have downloaded the 2014 version(train and val).
-I'm using the annotation file and also the bounding box you've given.
-I don't understand why is there a huge difference between your results and mine ! Have I done something wrong ?

about heatmap size

Hi @chenyilun95,great work!
how about generate heatmap size the same as original image （img :256x192 ， heatmap: 256x192）? will it increase AP due to pixel to pixel match?
Thanks.

error about multiprocessing

about training without segmentation

dear author:
did you train without segmentation?

Why the keypoints whose coordinates are out of the input shape are reserved when generating the label heatmaps?

Hi,
I find when you generate the heatmaps, you throw away the points whose coordinates are less than 0, while how about those ones whose coordinates are bigger than the input shape, you replace them with the boundary's coordinates:
label[i][j << 1 | 1] = min(label[i][j << 1 | 1], ori_size[0] - 1)
label[i][j << 1] = min(label[i][j << 1], ori_size[1] - 1)
I wonder why you reserve these keypoints and generate the heatmaps different from the keypoints' original location.
Thanks!

how can I run the code in my task,and see the result of my picture or video?

thank you very much