Code Monkey home page Code Monkey logo

lambda-tensorflow-benchmark's Introduction

This is the code to produce the TensorFlow benchmark on this website

Here are also some related blog posts:

Tested Environment:

  • OS: Ubuntu 18.04
  • TensorFlow version: 1.15.4 or 2.3.1
  • CUDA Version 10.0
  • CUDNN Version 7.6.5

You can use Lambda stack which system-wise install the above software stack. If you have CUDA 10.0 installed, you can also create a Python virtual environment by following these steps:

virtualenv -p /usr/bin/python3.6 venv
. venv/bin/activate

pip install matplotlib

# TensorFlow 1.15.4
pip install tensorflow-gpu==1.15.4

# TensorFlow 2.3.1
pip install tensorflow-gpu==2.3.1

Step One: Clone benchmark repo

git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

Step Two: Run benchmark with thermal profiler

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./batch_benchmark.sh \
min_num_gpus max_num_gpus \
num_runs num_batches_per_run \
thermal_sampling_frequency \
config_file

Notice if min_num_gpus is set to be different from max_num_gpus, then multiple benchmarks will be launched multiple times. One for each case between min_num_gpus and max_num_gpus.

This is an example of benchmarking 4 GPUs (min_num_gpus=4 and max_num_gpus=4) for a single run (num_runs=1) of 100 batches (num_batches_per_run=100), measuring thermal every 2 seconds (thermal_sampling_frequency=2) and using the config file config/config_resnet50_replicated_fp32_train_syn.

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./batch_benchmark.sh 4 4 \
1 100 \
2 \
config/config_resnet50_replicated_fp32_train_syn

The config file config_resnet50_replicated_fp32_train_syn.sh sets up a training throughput test for resnet50, using replicated mode for parameter update, use fp32 as the precision, and uses synthetic (syn) data:

MODELS="resnet50"
VARIABLE_UPDATE="replicated"
PRECISION="fp32"
RUN_MODE="train"
DATA_MODE="syn"

You can find more examples of configrations in the config folder.

Step Three: Report Results

This is the command to gather results in logs folder into a CSV file:

python tools/log2csv.py --precision fp32 
python tools/log2csv.py --precision fp16

The gathered results are saved in tf-train-throughput-fp16.csv, tf-train-throughput-fp32.csv, tf-train-bs-fp16.csv and tf-train-bs-fp32.csv.

Add your own log to the list_system dictionary in tools/log2csv.py, so they can be included in the generated csv.

You can also dispaly the throughput v.s. time and GPU temperature v.s. time graph using this command:

python display_thermal.py path-to-thermal.log --thermal_threshold

For example, this is the command to display the graphs of a ResNet50 training using 8x2080Ti:

python tools/display_thermal.py \
logs/Gold_6230-GeForce_RTX_2080_Ti_XLA_trt_TF2_2.logs/syn-replicated-fp16-8gpus/resnet50-128/thermal/1 \
--thermal_threshold 89

Synthetic Data V.S. Real Data

Set DATA_MODE="syn" in the config file uses synthetic data in the benchmarks. In which case images of random pixel colors were generated on GPU memory to avoid overheads such as I/O and data augmentation.

You can also benchmark with real data. To do so, simply set DATA_MODE="real" in the config file. You also need to have imagenet tfrecords. For the purpose of benchmark training throughput, you can download and unzip this mini portion of ImageNet(1.3 GB) to your home directory.

AMD

Follow the guidance here

alias drun='sudo docker run \
      -it \
      --network=host \
      --device=/dev/kfd \
      --device=/dev/dri \
      --ipc=host \
      --shm-size 16G \
      --group-add video \
      --cap-add=SYS_PTRACE \
      --security-opt seccomp=unconfined \
      -v $HOME/dockerx:/dockerx'

drun rocm/tensorflow:rocm3.5-tf2.1-dev

#installed these two in the container
https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-60/miopenkernels-gfx906-60_1.0.0_amd64.deb 
https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-64/miopenkernels-gfx906-64_1.0.0_amd64.deb

cd /home/dockerx
git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

# Run a quick resnet50 test in FP32
./batch_benchmark.sh 1 1 1 100 2 config_resnet50_replicated_fp32_train_syn

# Run full test for all models, FP32 and FP16, training and inference
./batch_benchmark.sh 1 1 1 100 2 config_all

lambda-tensorflow-benchmark's People

Contributors

chuanli11 avatar jeremybobbin avatar sclarkson avatar stephenbalaban avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lambda-tensorflow-benchmark's Issues

NVLink memory pooling

Congrats on the article.

Regarding this sentence:

Benchmark the 2080 Ti with multiple GPUs, with and without the NVLINK connector.

It sounds like even without NVLink we can pool memory of the GeForce, are you sure? (i.e. with 2 cards we can pass bigger batch size).

Average time of benchmark

Could you please put somewhere in the info page an average estimate of the running time of the experiment?

Thanks!

Usage for GPU Integrity Verification?

Just wondering, how would you use this code to test the integrity of a GPU, if at all?
Give or take suggestions for how to modify it for that.

I'd commit back my modifications if this comes up as a viable and good fit.

I'm not able to run the benchmarks...running into a few issues

I'm using the lambdalabs docker images to build (https://github.com/lambdal/lambda-stack-dockerfiles/).

This is the command I run:

sudo nvidia-docker run -v /dev/nvidia0:/dev/nvidia0 -v $(pwd)/lambda-tensorflow-benchmark:/dockerx -v $(pwd)/run_nvidia_benchmark.sh:/run_nvidia_benchmark.sh -it --gpus 1 lambda-stack:20.04 /run_nvidia_benchmark.sh

run_nvidia_benchmark.sh contains the following:

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive
export LTB=lambda-tensorflow-benchmark
export WORKDIR=${WORKDIR:-/dockerx}

apt update && apt install -y 'libcudart10.1' kmod
pip install matplotlib tensorflow-gpu==2.3.1

git config --global --add safe.directory ${WORKDIR}
cd ${WORKDIR}

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
	./batch_benchmark.sh 1 1 \
	1 100 \
	2 \
	config/config_resnet50_replicated_fp32_train_syn

However, this is the error I get:

Processor-NVIDIA_A100-SXM4-80GB
modinfo: ERROR: Module alias nvidia not found.
Batchsize for VRAM size '80GB' not optimized
--optimizer=sgd --model=resnet50 --num_gpus=1 --batch_size= --variable_update=replicated --distortions=false --num_batches=100 --data_name=imagenet --all_reduce_spec=nccl
2022-11-13 01:52:00.900420: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
FATAL Flags parsing error: flag --batch_size=: invalid literal for int() with base 10: ''
Pass --helpshort or --helpfull to see help on flags.

From digging in the bash code it seems that $batch_size is an evaluation of $model:

  # approximately line 163 of benchmark.sh
  eval batch_size=\$$model

Which is being passed in via run_benchmarks_all...

run_benchmark_all() {
  for model in $MODELS; do
    for num_gpus in $(seq ${MAX_NUM_GPU} -1 ${MIN_NUM_GPU}); do
      for iter in $(seq 1 $ITERATIONS); do
        run_benchmark
        sleep 10
      done
    done
  done
}

So then somehow MODELS is not being read from the config. I'll be investigating further, but if you happen to know anything I'm doing wrong, then please say so.

I've ensured that docker can detect the gpus:

$ sudo docker run --gpus 1 lambda-stack:20.04 python3 -c "import torch; print(torch.cuda.device_count())"
1

Benchmark Issue

Really good article guys, I have the RTX 2070 here and I would like to run the test, I have installed your stack from your website, the only thing that I do after installation is updating the Nvidia drivers to recognize the RTX cards, but the tests are not working, here is the issue:

running benchmark for frameworks ['pytorch', 'tensorflow', 'caffe2']
cuda version= None
cudnn version= 7201
/home/bizon/benchmark/deep-learning-benchmark-master/frameworks/pytorch/models.py:17: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
self.eval_input = torch.autograd.Variable(x, volatile=True).cuda() if precision == 'fp32'
Segmentation fault

Thanks in advance

Permission Denied Error

When I run command "./benchmark.sh 0" I get a "Permission Denied" error in the log file. I'm running the command prompt as administrator on Windows 10. Any idea what's causing the error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.