Code Monkey home page Code Monkey logo

Comments (19)

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 I find that my previous error "simple bind error" is caused by mxnet with cpu version. I didn't realize it before. So I just start training from beginning. I will try larger batch_size.
By the way, I'm trying Mask R-CNN by He Kaiming for pose estimation. If you are intersted in it, we can talk later.

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 Maybe you could try batch_size = 10. It seems to work well for coco dataset. I am interesting in Mask R-CNN, too. Which version of mask rcnn do you want to try? I am trying to implement feature pyramid head in Mask rcnn in mxnet.

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 OK, I will try batch_size=10. I refer to section 5 in https://arxiv.org/abs/1703.06870. I am trying implement mask rcnn for pose estimation based on faster rcnn in mxnet https://github.com/precedenceguo/mx-rcnn

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 . Cool! Then we may work together. How is it going? I could work on the feature pyramid head first. I have wrote some code there. You could work on mask and ROIalign first if you would like to.

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 I have worked on mask. I will try ROIAlign based on ROIPooling in mxnet with C++ source code. But I am a little confused about how to change codes with data IO for training on pose.Do you have any ideas?

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 I think we could first use the ground truth bounding box to train region proposal network, then get human mask and then do keypoint regression in the mask. But I am not sure this is right. We may need to code and debug a lot. Did you get some result using cpm? I am short of GPU now. Our server is occupied by other tasks.

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 I don't get result because I get the error:
ValueError: Too many slices. Some splits are empty. and the training is terminated at around 4500 iterators. We have GPU available.

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 I don't know what is this kind of error, code issue, could you figure out which image cause this error? I am using others' computer to train the validation dataset because it is small than the train dataset(200k images if I am right). Are you using training dataset? Maybe we could talk tomorrow?

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 It is trained for a day and terminate in the morning today. I don't know which image causes this error. I trained the training dataset and it is 210K. We can have a talk tomorrow.

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 Did you fix the bug? My training has some result. It seems to work well on neck, but could not distinguish left and right shoulder or other symmetry parts.
aichallengerneck

aichalllenger

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 I have not fixed the bug but I think it may be multi-gpu training and one of which don't work. I'm sorry that I go back home for the National's day recently. My work will be stopped for some days.

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 No problem. Have a good holiday! I will think about the problem.

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@dragonfly90 I update the evaluation code. You can have a look.
https://github.com/PoseAIChallenger/mxnet_pose_for_AI_challenger/blob/master/evaluate.py

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@qqsh0214 Cool, thank you.

from mxnet_pose_for_ai_challenger.

neilyoyoyoyo avatar neilyoyoyoyo commented on August 24, 2024

@qqsh0214 @dragonfly90 I meet the same error while training on the 210k training images. Here is the training log:
iteration: 4518
start heat: 28.3897827148
start paf: 119.586230469
end heat: 28.4323547363
end paf: 119.595092773
Traceback (most recent call last):
File "TrainWeight.py", line 222, in
cmodel.fit(aidata, num_epoch = iteration, batch_size = batch_size, carg_params = newargs)
File "TrainWeight.py", line 120, in fit
cmodel.forward(data_batch, is_train=True) # compute predictions
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 594, in forward
self.reshape(new_dshape, new_lshape)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 459, in reshape
self._exec_group.reshape(self._data_shapes, self._label_shapes)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 348, in reshape
self.bind_exec(data_shapes, label_shapes, reshape=True)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 310, in bind_exec
self.data_layouts = self.decide_slices(data_shapes)
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 255, in decide_slices
self.slices = _split_input_slice(self.batch_size, self.workload)
File "/usr/local/lib/python2.7/dist-packages/mxnet/executor_manager.py", line 64, in _split_input_slice
raise ValueError('Too many slices. Some splits are empty.')
ValueError: Too many slices. Some splits are empty.
Any suggestion about this bug now? Thx

from mxnet_pose_for_ai_challenger.

qqsh0214 avatar qqsh0214 commented on August 24, 2024

@neilyoyoyoyo One of the solution is that you can divide the training data into 5 parts with each less than 45000 images. And you also have to get 5 related data.json files.

from mxnet_pose_for_ai_challenger.

neilyoyoyoyo avatar neilyoyoyoyo commented on August 24, 2024

@qqsh0214 well, it's useful, thank you

from mxnet_pose_for_ai_challenger.

neilyoyoyoyo avatar neilyoyoyoyo commented on August 24, 2024

@dragonfly90 I may find why the above bug happens. In "TrainWeight.py", you set a break in the "next" function of class "AIChallengerIterweightBatch"
`
def next(self):
if self.cur_batch < self.num_batches:

        transposeImage_batch = []
        heatmap_batch = []
        pagmap_batch = []
        heatweight_batch = []
        vecweight_batch = []
        
        for i in range(batch_size):
            if self.cur_batch >= 45174:
                break
            image, mask, heatmap, pagmap = getImageandLabel(self.data[self.keys[self.cur_batch]])
            maskscale = mask[0:368:8, 0:368:8, 0]
            heatweight = np.ones((numofparts, 46, 46))
            vecweight = np.ones((numoflinks*2, 46, 46))

            for i in range(numofparts):
                heatweight[i,:,:] = maskscale

            for i in range(numoflinks*2):
                vecweight[i,:,:] = maskscale
            
            transposeImage = np.transpose(np.float32(image), (2,0,1))/256 - 0.5
        
            self.cur_batch += 1
            
            transposeImage_batch.append(transposeImage)
            heatmap_batch.append(heatmap)
            pagmap_batch.append(pagmap)
            heatweight_batch.append(heatweight)
            vecweight_batch.append(vecweight)
            
        return DataBatchweight(mx.nd.array(transposeImage_batch),
                               mx.nd.array(heatmap_batch),
                               mx.nd.array(pagmap_batch),
                               mx.nd.array(heatweight_batch),
                               mx.nd.array(vecweight_batch))
    else:
        raise StopIteration

`
I don't know what "45174" means and if possible you can fix it.

from mxnet_pose_for_ai_challenger.

dragonfly90 avatar dragonfly90 commented on August 24, 2024

@neilyoyoyoyo Thank you very much! I made a mistake there. I used the same code for Microsoft coco. And the number of training images is 45174.

from mxnet_pose_for_ai_challenger.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.