Code Monkey home page Code Monkey logo

cuda_cnn's Introduction

CNN implementation with C++ and CUDA

Various versions (CPU, CUDA_NAIVE, CUDA_TILED, GEMM) of convolutional neural network implementations
by Heechul Lim

Layer configuration

  • Convolution (Forward CPU, CUDA_NAIVE, CUDA_TILED, GEMM)
    Input: 1 * 32 * 32
    Output: 1 * 20 * 20
    Kernel size: 13   Kernel dimension: 8

  • Pooling (Forward CPU, CUDA_NAIVE, CUDA_TILED)
    Input: 8 * 20 * 20
    Output: 8 * 5 * 5
    Kernel size: 4

  • Relu (CUDA)

  • Inner prodect 1 (CUDA)
    Input: 8 * 5 * 5 (flatten 200)
    Output: 200

  • Relu (CUDA)

  • Inner prodect 2 (CUDA)
    Input: 200
    Output: 200

  • Relu (CUDA)

  • Inner prodect 3 (CUDA)
    Input: 200
    Output: 10

  • Softmax

Computational cost of convolution: 80-90% of the total execution
(http://on-demand.gputechconf.com/gtc/2015/webinar/gtc-express-deep-learning-with-cuDNN-webinar.pdf)

Dataset

  • MNIST: 6k train set, 1k test set
  • 1 * 32 * 32 (padding 2)

Accuracy

  • 1.3 epoch: 90%
  • 30 epoch: 98%

It depends on minibatch number and learning rate

Experiment environment

  • CPU: Xeon E5-2630 v4 @ 2.2Ghz
  • GPU: NVIDIA GTX 1080 TI

Result with training set (6k)

  • Minibatch 100
Name Elapsed time (1 epoch) Processing speed (images/sec)
CPU 39.391 1523.2
CUDA NAIVE 5.693 10539.9
CUDA TILED 5.160 11628.1
GEMM 7.890 7604.7
  • Minibatch 2
Name Elapsed time (1 epoch) Processing speed (images/sec)
CPU 53.303 1125.6
CUDA NAIVE 17.048 3519.5
CUDA TILED 15.877 3778.9
GEMM 18.475 3247.6

Usage

cd ./Release
make clean
make
./CNN

Reference

cuda_cnn's People

Contributors

skyde1021 avatar

Stargazers

 avatar  avatar Jeongmin Bae avatar Realtyxxx avatar Heechul Lim avatar  avatar hxqiu avatar  avatar Lanson Zhou avatar qujian avatar  avatar  avatar qianmo avatar  avatar  avatar  avatar XuqiangL avatar wenjing avatar 曾令燊 avatar QuanCheng avatar Xulu42 avatar Yeonghun Jeong avatar haison avatar Kyurae Kim avatar Jinwook-Kim avatar GAURAV avatar Junho Jeong avatar Heeyong Yoon avatar kwnoh-aivis avatar Donghyeon Kim avatar

Watchers

曾令燊 avatar

cuda_cnn's Issues

楼主你好,想请教你工程训练正确率的问题。

我在windows下移植了楼主的工程,但是准确率只在10%左右。
你用的时间计时是linux的代码,所以我都这个部分屏蔽了。但这个应该不影响准确率才是。
请问是否还要做其他相关修改呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.