Code Monkey home page Code Monkey logo

deep-learning-benchmark's Introduction

Benchmark on Deep Learning Frameworks and GPUs

Performance of popular deep learning frameworks and GPUs are compared, including the effect of adjusting the floating point precision (the new Volta architecture allows performance boost by utilizing half/mixed-precision calculations.)

Deep Learning Frameworks

Note: Docker images available from NVIDIA GPU Cloud were used so as to make benchmarking controlled and repeatable by anyone.

  • PyTorch 0.3.0

    • docker pull nvcr.io/nvidia/pytorch:17.12
  • Caffe2 0.8.1

    • docker pull nvcr.io/nvidia/caffe2:17.12
  • Tensorflow 1.4.0 (coming next)

    • docker pull nvcr.io/nvidia/tensorflow:17.12
  • MXNet 1.0.0 (anyone interested?)

    • docker pull nvcr.io/nvidia/mxnet:17.12
  • CNTK (anyone interested?)

    • docker pull nvcr.io/nvidia/cntk:17.12

GPUs

Model Architecture Memory CUDA Cores Tensor Cores F32 TFLOPS F16 TFLOPS Retail Cloud
Tesla V100 Volta 16GB HBM2 5120 640 15.7 125 $3.06/hr (p3.2xlarge)
Titan V Volta 12GB HBM2 5120 640 15 110* $2999 N/A
1080 Ti Pascal 11GB GDDR5 3584 0 11 N/A $699 N/A

CUDA / CuDNN

  • CUDA 9.0.176
  • CuDNN 7.0.0.5
  • NVIDIA driver 387.34. Except where noted.

Networks

  • VGG16
  • Resnet152
  • Densenet161
  • Any others you might be interested in?

Benchmark Results

PyTorch 0.3.0

The results are based on running the models with images of size 224 x 224 x 3 with a batch size of 16. "Eval" shows the duration for a single forward pass averaged over 20 passes. "Train" shows the duration for a pair of forward and backward passes averaged over 20 runs. In both scenarios, 20 runs of warm up is performed and those are not counted towards the measured numbers.

Titan V gets a significant speed up when going to half precision by utilizing its Tensor cores, while 1080 Ti gets a small speed up with half precision computation. Similarly, the numbers from V100 on an Amazon p3 instance is shown. It is faster than Titan V and the speed up when going to half-precision is similar to that of Titan V.

Titan V

Precision vgg16 eval vgg16 train resnet152 eval resnet152 train densenet161 eval densenet161 train
32-bit 31.3ms 108.8ms 48.9ms 180.2ms 52.4ms 174.1ms
16-bit 14.7ms 74.1ms 26.1ms 115.9ms 32.2ms 118.9ms

1080 Ti

Precision vgg16 eval vgg16 train resnet152 eval resnet152 train densenet161 eval densenet161 train
32-bit 39.3ms 131.9ms 57.8ms 206.4ms 62.9ms 211.9ms
16-bit 33.5ms 117.6ms 46.9ms 193.5ms 50.1ms 191.0ms

V100 (Amazon p3, CUDA 9.0.176, CuDNN 7.0.0.3)

Precision VGG16 eval VGG16 train Resnet152 eval Resnet152 train Densenet161 eval Densenet161 train
32-bit 26.2ms 83.5ms 38.7ms 136.5ms 48.3ms 142.5ms
16-bit 12.6ms 58.8ms 21.7ms 92.9ms 35.7ms 102.3ms

Tensorflow 1.4.0

Titan V

Precision vgg16 eval vgg16 train resnet152 eval resnet152 train densenet161 eval densenet161 train
32-bit 31.8ms 157.2ms 50.3ms 269.8ms
16-bit 16.1ms 96.7ms 28.4ms 193.3ms

1080 Ti

Precision vgg16 eval vgg16 train resnet152 eval resnet152 train densenet161 eval densenet161 train
32-bit 43.4ms 131.3ms 69.6ms 300.6ms
16-bit 38.6ms 121.1ms 53.9ms 257.0ms

Caffe2 0.8.1

Titan V

Precision vgg16 eval vgg16 train resnet152 eval resnet152 train densenet161 eval densenet161 train
32-bit 57.5ms 185.4ms 74.4ms 214.1ms
16-bit 41.6ms 156.1ms 56.9ms 172.7ms

1080 Ti

Precision VGG16 eval VGG16 train Resnet152 eval Resnet152 train Densenet161 eval Densenet161 train
32-bit 47.0ms 158.9ms 77.9ms 223.9ms
16-bit 40.1ms 137.8ms 61.7ms 184.1ms

Comparison Graphs

Comparison of Titan V vs 1080 Ti, PyTorch 0.3.0 vs Tensorflow 1.4.0 vs Caffe2 0.8.1, and FP32 vs FP16 in terms of images processed per second:

vgg16-eval vgg16-train resnet152-eval resnet152-train

Contributors

  • Yusaku Sako
  • Bartosz Ludwiczuk (thank you for supplying the V100 numbers)

deep-learning-benchmark's People

Contributors

adesun avatar u39kun avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.