Comments (19)
@dragonfly90 I find that my previous error "simple bind error" is caused by mxnet with cpu version. I didn't realize it before. So I just start training from beginning. I will try larger batch_size.
By the way, I'm trying Mask R-CNN by He Kaiming for pose estimation. If you are intersted in it, we can talk later.
from mxnet_pose_for_ai_challenger.
@qqsh0214 Maybe you could try batch_size = 10. It seems to work well for coco dataset. I am interesting in Mask R-CNN, too. Which version of mask rcnn do you want to try? I am trying to implement feature pyramid head in Mask rcnn in mxnet.
from mxnet_pose_for_ai_challenger.
@dragonfly90 OK, I will try batch_size=10. I refer to section 5 in https://arxiv.org/abs/1703.06870. I am trying implement mask rcnn for pose estimation based on faster rcnn in mxnet https://github.com/precedenceguo/mx-rcnn
from mxnet_pose_for_ai_challenger.
@qqsh0214 . Cool! Then we may work together. How is it going? I could work on the feature pyramid head first. I have wrote some code there. You could work on mask and ROIalign first if you would like to.
from mxnet_pose_for_ai_challenger.
@dragonfly90 I have worked on mask. I will try ROIAlign based on ROIPooling in mxnet with C++ source code. But I am a little confused about how to change codes with data IO for training on pose.Do you have any ideas?
from mxnet_pose_for_ai_challenger.
@qqsh0214 I think we could first use the ground truth bounding box to train region proposal network, then get human mask and then do keypoint regression in the mask. But I am not sure this is right. We may need to code and debug a lot. Did you get some result using cpm? I am short of GPU now. Our server is occupied by other tasks.
from mxnet_pose_for_ai_challenger.
@dragonfly90 I don't get result because I get the error:
ValueError: Too many slices. Some splits are empty.
and the training is terminated at around 4500 iterators. We have GPU available.
from mxnet_pose_for_ai_challenger.
@qqsh0214 I don't know what is this kind of error, code issue, could you figure out which image cause this error? I am using others' computer to train the validation dataset because it is small than the train dataset(200k images if I am right). Are you using training dataset? Maybe we could talk tomorrow?
from mxnet_pose_for_ai_challenger.
@dragonfly90 It is trained for a day and terminate in the morning today. I don't know which image causes this error. I trained the training dataset and it is 210K. We can have a talk tomorrow.
from mxnet_pose_for_ai_challenger.
@qqsh0214 Did you fix the bug? My training has some result. It seems to work well on neck, but could not distinguish left and right shoulder or other symmetry parts.
from mxnet_pose_for_ai_challenger.
@dragonfly90 I have not fixed the bug but I think it may be multi-gpu training and one of which don't work. I'm sorry that I go back home for the National's day recently. My work will be stopped for some days.
from mxnet_pose_for_ai_challenger.
@qqsh0214 No problem. Have a good holiday! I will think about the problem.
from mxnet_pose_for_ai_challenger.
@dragonfly90 I update the evaluation code. You can have a look.
https://github.com/PoseAIChallenger/mxnet_pose_for_AI_challenger/blob/master/evaluate.py
from mxnet_pose_for_ai_challenger.
@qqsh0214 Cool, thank you.
from mxnet_pose_for_ai_challenger.
@qqsh0214 @dragonfly90 I meet the same error while training on the 210k training images. Here is the training log:
iteration: 4518
start heat: 28.3897827148
start paf: 119.586230469
end heat: 28.4323547363
end paf: 119.595092773
Traceback (most recent call last):
File "TrainWeight.py", line 222, in
cmodel.fit(aidata, num_epoch = iteration, batch_size = batch_size, carg_params = newargs)
File "TrainWeight.py", line 120, in fit
cmodel.forward(data_batch, is_train=True) # compute predictions
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 594, in forward
self.reshape(new_dshape, new_lshape)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 459, in reshape
self._exec_group.reshape(self._data_shapes, self._label_shapes)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 348, in reshape
self.bind_exec(data_shapes, label_shapes, reshape=True)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 310, in bind_exec
self.data_layouts = self.decide_slices(data_shapes)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 255, in decide_slices
self.slices = _split_input_slice(self.batch_size, self.workload)
File "/usr/local/lib/python2.7/dist-packages/mxnet/executor_manager.py", line 64, in _split_input_slice
raise ValueError('Too many slices. Some splits are empty.')
ValueError: Too many slices. Some splits are empty.
Any suggestion about this bug now? Thx
from mxnet_pose_for_ai_challenger.
@neilyoyoyoyo One of the solution is that you can divide the training data into 5 parts with each less than 45000 images. And you also have to get 5 related data.json files.
from mxnet_pose_for_ai_challenger.
@qqsh0214 well, it's useful, thank you
from mxnet_pose_for_ai_challenger.
@dragonfly90 I may find why the above bug happens. In "TrainWeight.py", you set a break in the "next" function of class "AIChallengerIterweightBatch"
`
def next(self):
if self.cur_batch < self.num_batches:
transposeImage_batch = []
heatmap_batch = []
pagmap_batch = []
heatweight_batch = []
vecweight_batch = []
for i in range(batch_size):
if self.cur_batch >= 45174:
break
image, mask, heatmap, pagmap = getImageandLabel(self.data[self.keys[self.cur_batch]])
maskscale = mask[0:368:8, 0:368:8, 0]
heatweight = np.ones((numofparts, 46, 46))
vecweight = np.ones((numoflinks*2, 46, 46))
for i in range(numofparts):
heatweight[i,:,:] = maskscale
for i in range(numoflinks*2):
vecweight[i,:,:] = maskscale
transposeImage = np.transpose(np.float32(image), (2,0,1))/256 - 0.5
self.cur_batch += 1
transposeImage_batch.append(transposeImage)
heatmap_batch.append(heatmap)
pagmap_batch.append(pagmap)
heatweight_batch.append(heatweight)
vecweight_batch.append(vecweight)
return DataBatchweight(mx.nd.array(transposeImage_batch),
mx.nd.array(heatmap_batch),
mx.nd.array(pagmap_batch),
mx.nd.array(heatweight_batch),
mx.nd.array(vecweight_batch))
else:
raise StopIteration
`
I don't know what "45174" means and if possible you can fix it.
from mxnet_pose_for_ai_challenger.
@neilyoyoyoyo Thank you very much! I made a mistake there. I used the same code for Microsoft coco. And the number of training images is 45174.
from mxnet_pose_for_ai_challenger.
Related Issues (9)
- how to deal the mask HOT 2
- training error too big. HOT 1
- 如何生成ground truth? HOT 3
- mask rcnn HOT 1
- How about the performance? HOT 10
- a problem about computing a person's center HOT 5
- Training dataset HOT 1
- can you share the dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mxnet_pose_for_ai_challenger.