Code Monkey home page Code Monkey logo

pytorch-deform-conv's Introduction

PyTorch implementation of Deformable Convolution

!!!Warning: There is some issues in this implementation and this repo is not maintained any more, please consider using for example: TORCHVISION.OPS.DEFORM_CONV

TODO List

  • implement offsets mapping in pytorch
  • all tests passed
  • deformable convolution module
  • Fine-tuning the deformable convolution modules
  • scaled mnist demo
  • improve speed with cached grid array
  • use MNIST dataset from pytorch (instead of Keras)
  • support input image with different width and height
  • benchmark with tensorflow implementation

Deformable Convolutional Networks

Dai, Jifeng, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. “Deformable Convolutional Networks.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1703.06211

The following animation is generated by Felix Lau (with his tensorflow implementation):

Also Check out Felix Lau's summary of the paper: https://medium.com/@phelixlau/notes-on-deformable-convolutional-networks-baaabbc11cf3

pytorch-deform-conv's People

Contributors

oeway avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-deform-conv's Issues

GPU mode is not available !

Hello oeway

Firstly, thanks for sharing your pytorch version code of deformable convolution.
When I tried to port your code to another code which uses CUDA mode, it gives me an error saying like below.

'Type torch.cuda.FloatTensor doesn't implement stateless method range'

However, when I change it to CPU mode, there is no error.

It seems like 'torch.range' does not fit on GPU mode in torch. Then what should be the solution?

Thanks in advance

Why only fine tuning on deformable convnet

Hi, why train first on the convnet and only after do finetuning on the deformable convnet?
why not straight traing on deformable (with freezing the ofsets as written in the paper)

about the implementation.. view instead of permute ?

Hello. Thanks for sharing the code.
I have a question about the implementation of offset, in [https://github.com/oeway/pytorch-deform-conv/blob/master/torch_deform_conv/deform_conv.py#L182]
the code :

offsets = offsets.view(batch_size, -1, 2)

the input tensor offsets in b * (2c) * h * w after first normal conv, I think the offset of defom-conv is the output channels, therefore is the code should be ? :

offsets = offsets.view(b, 2*c, h, w)
offsets = offsets.permute(0, 2, 3, 1)

related to TF_version #4

The issue proposed in issue 4.
I test the situation of : both network is training on the dataset (not deform mnist), the test accuracy

---original net---
test_data:             99.03%
test_scaled_data: 62.22%
---deformable net---
test_data:              98.67%
test_scaled_data: 54.43%

I think the train phase should use undeform data, this can show the advantage of the deformable CNN's advantage.

Offset output channel

super(ConvOffset2D, self).__init__(self.filters, self.filters*2, 3, padding=1, bias=False, **kwargs)
Your implementation is not the same as the original paper described in section 2.1. Offset output channel dimension should be 2 x N, where N is the conv kernel size k*k in 2D case. That is, the offset is shared across the channel dimension. You could refer to the implementation of https://github.com/ChunhuanLin/deform_conv_pytorch

Gif image visualization

Hi,

is there visualization for the method, as seen in the provided gif file? How would one ideally implement this feature?

related to TF_version #4

The issue proposed in issue 4.
I test the situation of : both network is training on the dataset (not deform mnist), the test accuracy

---original net---
test_data:             99.03%
test_scaled_data: 62.22%
---deformable net---
test_data:              98.67%
test_scaled_data: 54.43%

I think the train phase should use undeform data, this can show the advantage of the deformable CNN's advantage.

The deform-conv layers reduce the detection accuracy

I trained your deform-conv on the origin data. It gets 70% on origin test and 89% on scaled data, while the the normal CNN model trained on origin data gets 99% on origin test and 64% on scaled data. The deform-conv may not work.
So I try to delete two of the deform-conv, keep one left. It gets 93% on origin test and 97% on scaled data.
Your deform-conv may not work. I think you should check it.

Input image w!=h

Hi~ Does anyone know what if input image w!=h, e.g h=256,w=128. where shoud i change?

Weighting in deformed kernel

I am a little bit confused how the weighting is done of each input element of the deformed kernel in your implementation.

In your implementation it looks like you are first calculating the offsets and with these rearrange the input in the ConvOffset2D module (so ConvOffset2D just outputs the rearranged features according to the learned offsets). After that, on the rearranged feature maps you apply a normal Conv2D, which should then do the actual weighting for the deformed kernel, right?

Is this the same as in the original paper where I understand it that way that the weighting is applied directly to the deformed input elements (kernel elements with offsets) , like

sum_p_n[ w(p_n) * x(p_0 + p_n + p_delta)]

I don't see if this is the same or if this has a different assumption on deformable confolutions than the original paper.

Confusion about the shape of offset

Hello,

I am confused about the shape of offset.
The paper mentions:
"The grid R defines the receptive field size and dilation. For example,R = {(−1,−1),(−1,0),...,(0,1),(1,1)}.
In deformable convolution, the regular grid R is aug- mented with offsets {∆pn |n = 1, ..., N }, where N = |R|.
The output offset fields have the same spatial resolution with the input feature map. The channel dimension 2N corresponds to N 2D offsets."
So, I think the shape of offset field would be [29, H, W] if 3x3 kernel is used.
While in your implementation, the shape of offset seems to be [batch_size, 2
n_channels, H, W]?

related to TF_version #4

The issue proposed in issue 4.
I test the situation of : both network is training on the dataset (not deform mnist), the test accuracy

---original net---
test_data:             99.03%
test_scaled_data: 62.22%
---deformable net---
test_data:              98.67%
test_scaled_data: 54.43%

I think the train phase should use undeform data, this can show the advantage of the deformable CNN's advantage.

Offset BUG

The paper has offset with shape [B, 2x9, H, W]
But this repo has offset with shape [B, 2xC, H, W]
I think this means two totally different things.

indexing with a detached variable

Hi oeway,

Any chance you can help me understand your code? On this line, you index the input with a detached variable, so I'm wondering how you propagate the gradient backward through the vals_lt, etc.. It seems like mapped_vals would not have any parent nodes with gradients? Does that make sense? When I try to do a similar thing here for a spatial transformer network, it gives me a no nodes require gradients error.

Do you get around this by freezing the entire network? I feel like you would get the same error if the network wasn't frozen. Any insight you can provide into this would be appreciated.

EDIT: Ok, I get that the gradient propagates through the coords_offset_lt value... can you describe where you got this interpolation algorithm from? Thanks :)

Question about implementation

in file deform_conv.py
inds = indices[:, 0]*input.size(1)*input.size(2)+ indices[:, 1]*input.size(2) + indices[:, 2]
what does it mean ?

A re-train normal CNN on scaled data beats deform-conv.

First, appreciate for your work which is easy to use and read.
In scaled_mninst.py, the normal CNN model is trained on origin data and then, tested on scaled one. It shows a bad acc of 60% (In my running). Then, you fine-tune a deform-conv on the scaled data and its accuracy is much better. However, I tried to re-train this trained CNN model on scaled data and the result confuses me definitely. It gets 96% on origin test and 98% on scaled data.
Well, this experiment can not prove the effectiveness of deform-conv layers.

What is the size of the offset?

Hi,
I have read the paper“Deformable Convolutional Networks”and your pytoch-deform-conv code.
Then I think that the size of offset maybe bckenel[0]*kenel[1]hw. Maybe I understand it wrong. Can you explain it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.