Code Monkey home page Code Monkey logo

cnn-exploration's Introduction

A Visual Exploration of Convolutional Neural Networks (CNNs)

For this report I used the following two links to explore the deeper functionality of a CNN.

To start off my understanding of CNNs was fuzzy at best. I understood that a kernel was a sliding window with a set of parameters. I understood that at every neuron in a convolutional layer, the kernel would slide across the pixels of the image (assuming of course the input was an image) and do some operation on the pixels, which would lead to a new image. And I knew there was downsampling (and upsampling, where applicable) involved. Beyond this I did not understand the details.

Exploring both links offered a much deeper insight. Both visualize the process of convolution and it is really cool to actually see the output images after every step of the convolution. First of all, I understood a little bit about convolution actually did to the image. I assumed that early layers of an image classifier CNN would only spot simple features such as edges. This assumption was confirmed by the results of the first convolutional layer highlighting mainly edges of the image for example the image below.

result of convulution from an early layer of a CNN

It was really interesting to realize that a convolution was not magic, but simply the dot product of the pixel values in the image and the weights in the kernel with a bias added. This made clear exactly which weights were being trained and how altering these weights shifted the result of the convolution. It also made sense that because the later layers use a couple of selected neurons from the previous layer as inputs, that the features detected would quickly grow in complexity. When inspecting the later layers, the outputs were no longer interpretable to me as they are an amalgamation of multiple input neurons.

Downsampling through maxpool was easy enough to understand and I could clearly see why it was used in this scenario. As we are simply doing classification and we don't need to reconstruct the image, we don't care too much about preserving size. Rather we just want to downsize the image so that it less computationally intensive to run through the CNN.

Finally I understood the classification step quite clearly. First the outputs from the final round of convolutions and downscaling get flattened. A fully connected layer (which has to be fully connected since we need information about every final output) is used to calculate the likelihood of each image belonging to a particular category. Softmax has to be used here as the last step to generate this likelihood. I assume softmax is ideal here since we only want to predict one category. The category with the highest result at the end is ofcourse the predicted category.

I learned a lot from this exploration, the inner workings of a CNN were made lot more clear, and I have visual insight into previously abstract concepts of convolution, downsample and image classification.

cnn-exploration's People

Contributors

rinajhinjhina avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.