Code Monkey home page Code Monkey logo

Comments (10)

hughperkins avatar hughperkins commented on June 29, 2024

I'm porting some cunn code over to clnn

Awesome!

Is there any plans on implementing those cases?

I sort of implement stuff as and when it becomes necessary. If you have a moment to implement one of the commented out cases that would be great! :-)

Also, some CNN noob questions: what those cases stand for?

In max-pooling, we take a square of pooling width (kW) by pooling height (kH) pixels, and find the maximum value in that grid. The output from that grid is this max value. Then, we take the next square of pixels, and do the same thing. Useful web pages: http://ufldl.stanford.edu/wiki/index.php/Pooling and http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/

When we back-propagate, we take the gradient from the gradOutput pixel, and send it back to the one that produced the max value on the way forwards. So we have to store which pixel that was, when we do the forwards propagation.

The pooling areas can be exactly contiguous, which is the easy situation. In this case each gradOutput value maps to exactly one gradInput pixel. Well, that's always true. But the point is, each gradInput pixel will take input from exactly 0 or 1 gradOutput pixels. But rather than making the pooling areas contiguous, we can make them overlap. And then some gradInput pixels need to be updated from the sum of several gradOutput pixels. This is a bit tricky to do, if done from parallel threads, so we consider this case separately. The stride, dW and dH decides the horizontal and vertical distance between each pooling area (kW by kH). When dW == kW and dH == kH, this is the contiguous case. When dW is less than kW, or dH is less than kH, then we have overlapping pools.

Actually, you might not need to handle this, because:

if (input->nDimension == 3) {

Spatial max pooling is over a 2d image, so that is 2 dimensions, W ('width') and H (height). But typically, there will be more than one image plane for each incoming example, so we have nInputPlane image planes per example, three dimensions. Finally, we can provide a mini-batch of multiple examples, so that would be 4 dimensions.

Currently, implementation is done for mini-batches, but not for single non-batched examples.

  • if you can shoe-horn your current code into mini-batches, eg by reshapeing to have an extra dimension of length 1, then you dont need to change the SpatialMaxPooling code at all
  • alternatively, perhaps we can simply modify the clnn_SpatialMaxPooling_updateGradInput method to reshape the incoming tensor to have an extra dimension? In which case, we can reuse the rest of the updateGradInput code as-is

As far as this bit:

} else if((kW>>1) <= dW && (kH>>1) <= dH) {

To handle overlapping pools, I noticed that mostly we only have 3 x 3 pools, with 2 x 2 stride. So, as a first simplification, I only consider this case. Actually, let's consider an even simpler case for now, which is an input image with one single row, with 1 x 3 pools, and 1 x 2 stride. In this case, the pools overlap horizontally:

|pool 1|
     |pool 2|
          |pool 3|
              ... etc

But no more than 2 pools overlap for any input/gradInput pixel. So, we can do the pooling in two batches. In the first batch, we do the odd pools:

|pool 1|
          |pool 3|

These dont overlap :-)

Then we do the even pools:

     |pool 2|
               |pool 4|

... dont overlap either. And we add their results to the gradInput results from the first batch of pools.

In 2 dimensions, we'll need 4 such batches, like we will have these pools first:

X . X . X .
. . . . . .
X . X . X .
. . . . . .

(where 'X' are the pools we are calculating, and '.' are the ones we skip for now)

Next batch will be:

. X . X . X
. . . . . . 
. X . X . X
. . . . . .

Then:

. . . . . .
X . X . X .
. . . . . .
X . X . X .

Finally

. . . . . .
. X . X . X
. . . . . . 
. X . X . X

So, 4 times.

This will actually generalize to any case where the pooling size is no smaller than half the stride, or something like this, hence that weird looking if condition earlier.

It would be easy enough to handle the fully general case though. eg, if stride is 1, and pooling size is 3, then we'd simply need to do the backpropagation 9 times, adding the results to the output of the earlier updateGradInput batches.

from clnn.

fmassa avatar fmassa commented on June 29, 2024

@hughperkins nn and cunn SpatialMaxPooling (and soon SpatialAveragePooling) were updated to support arbitrary padding pad_w and pad_h. Also, the CUDA kernels were changed (borrowed from Caffe), and now there's no more need for atomic operations on the backward case. I have absolutely no knowledge in opencl, but I think that adapting those kernels would be easier to generalise SpatialMaxPooling to all the cases.

from clnn.

hughperkins avatar hughperkins commented on June 29, 2024

@fmassa Ah, good info. Thanks!

from clnn.

hughperkins avatar hughperkins commented on June 29, 2024

Hi Gustavo,

Several people in the neural-style project encountered the same issue, so I've copy/pasted an implementation for contiguous non-batched pools, and if you update your clnn, luarocks install clnn, it might do what you need. If you're using non-contiguous pools, you'll need to either do some extra copy/pasting (or even, ideally, some factorization :-) ), or else follow Francisco's heads-up that the cunn implementation might be worth re-porting over now. I reckon the path of least resistance for now would be to just add an extra copy-paste block :-) Or perhaps factorize a bit.

In terms of testing clnn, I basically run the following tests currently, on a single OS (ubuntu 14.04), and a single GPU (NVIDIA 940M). This is quite quick to do, so please feel free to change things in whatever way you think is beautiful :-) Ideally not diverging too much from cunn, but I've changed tons of stuff, so fairly flexible. The tests I do are:

th -l clnn -e 'clnn.test()'
-- all tests should pass
git clone [email protected]:karpathy/char-rnn.git
cd char-rnn
th -opencl 1 train.lua
-- training loss should decrease from 3-4 ish to about 2-3 ish, shouldnt become NaN, shouldnt crash.  About 5 iterations are sufficient

from clnn.

brunoro avatar brunoro commented on June 29, 2024

Hi @hughperkins, thanks for the detailed explanation. Cool, one of the pieces of code I was trying to get running with clnn was actually neural-style, so thanks for pointing out the issue on their repo.

I'll try this week to get it running with some copy/pasting (and or factorization). Otherwise, re-porting the cunn implementation might be a nice way to brush up my openCL :D

from clnn.

hughperkins avatar hughperkins commented on June 29, 2024

Cool :-) By the way, if you re-port cunn code, you might want to try the following:

git clone [email protected]:torch/cunn.git
git clone [email protected]:hughperkins/clnn.git
cd clnn
python util/port.py
meld port . &

This will show you the diff between automatically ported cunn files (in port directory), and the current clnn files (in . directory). It's far from perfect, but it can provide useful first draft to work from.

from clnn.

hughperkins avatar hughperkins commented on June 29, 2024

Hi Gustavo, please note that cunn spatialmaxpooling has been ported across now, since :ceil() needed by neural-styling. You can luarocks install clnn to pull down latest version.

from clnn.

brunoro avatar brunoro commented on June 29, 2024

Oh, sweet. I was in the middle of the process porting the kernel from cunn, so I guess I don't need to finish that.

from clnn.

hughperkins avatar hughperkins commented on June 29, 2024

Ok. I can close this issue, right?

from clnn.

brunoro avatar brunoro commented on June 29, 2024

Yep, thanks a lot!

from clnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.