lambdal / lambda-tensorflow-benchmark Goto Github PK

View Code? Open in Web Editor NEW

234.0 25.0 60.0 37.97 MB

License: BSD 3-Clause "New" or "Revised" License

Shell 53.34% Python 46.66%

lambda-tensorflow-benchmark's Introduction

This is the code to produce the TensorFlow benchmark on this website

Here are also some related blog posts:

RTX 2080 Ti Deep Learning Benchmarks with TensorFlow - 2020: https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/
Titan RTX Deep Learning Benchmarks: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/
Titan V Deep Learning Benchmarks with TensorFlow in 2019: https://lambdalabs.com/blog/titan-v-deep-learning-benchmarks/

Tested Environment:

OS: Ubuntu 18.04
TensorFlow version: 1.15.4 or 2.3.1
CUDA Version 10.0
CUDNN Version 7.6.5

You can use Lambda stack which system-wise install the above software stack. If you have CUDA 10.0 installed, you can also create a Python virtual environment by following these steps:

virtualenv -p /usr/bin/python3.6 venv
. venv/bin/activate

pip install matplotlib

# TensorFlow 1.15.4
pip install tensorflow-gpu==1.15.4

# TensorFlow 2.3.1
pip install tensorflow-gpu==2.3.1

Step One: Clone benchmark repo

git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

Step Two: Run benchmark with thermal profiler

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./batch_benchmark.sh \
min_num_gpus max_num_gpus \
num_runs num_batches_per_run \
thermal_sampling_frequency \
config_file

Notice if min_num_gpus is set to be different from max_num_gpus, then multiple benchmarks will be launched multiple times. One for each case between min_num_gpus and max_num_gpus.

This is an example of benchmarking 4 GPUs (min_num_gpus=4 and max_num_gpus=4) for a single run (num_runs=1) of 100 batches (num_batches_per_run=100), measuring thermal every 2 seconds (thermal_sampling_frequency=2) and using the config file config/config_resnet50_replicated_fp32_train_syn.

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./batch_benchmark.sh 4 4 \
1 100 \
2 \
config/config_resnet50_replicated_fp32_train_syn

The config file config_resnet50_replicated_fp32_train_syn.sh sets up a training throughput test for resnet50, using replicated mode for parameter update, use fp32 as the precision, and uses synthetic (syn) data:

MODELS="resnet50"
VARIABLE_UPDATE="replicated"
PRECISION="fp32"
RUN_MODE="train"
DATA_MODE="syn"

You can find more examples of configrations in the config folder.

Step Three: Report Results

This is the command to gather results in logs folder into a CSV file:

python tools/log2csv.py --precision fp32 
python tools/log2csv.py --precision fp16

The gathered results are saved in tf-train-throughput-fp16.csv, tf-train-throughput-fp32.csv, tf-train-bs-fp16.csv and tf-train-bs-fp32.csv.

Add your own log to the list_system dictionary in tools/log2csv.py, so they can be included in the generated csv.

You can also dispaly the throughput v.s. time and GPU temperature v.s. time graph using this command:

python display_thermal.py path-to-thermal.log --thermal_threshold

For example, this is the command to display the graphs of a ResNet50 training using 8x2080Ti:

python tools/display_thermal.py \
logs/Gold_6230-GeForce_RTX_2080_Ti_XLA_trt_TF2_2.logs/syn-replicated-fp16-8gpus/resnet50-128/thermal/1 \
--thermal_threshold 89

Synthetic Data V.S. Real Data

Set DATA_MODE="syn" in the config file uses synthetic data in the benchmarks. In which case images of random pixel colors were generated on GPU memory to avoid overheads such as I/O and data augmentation.

You can also benchmark with real data. To do so, simply set DATA_MODE="real" in the config file. You also need to have imagenet tfrecords. For the purpose of benchmark training throughput, you can download and unzip this mini portion of ImageNet(1.3 GB) to your home directory.

AMD

Follow the guidance here

alias drun='sudo docker run \
      -it \
      --network=host \
      --device=/dev/kfd \
      --device=/dev/dri \
      --ipc=host \
      --shm-size 16G \
      --group-add video \
      --cap-add=SYS_PTRACE \
      --security-opt seccomp=unconfined \
      -v $HOME/dockerx:/dockerx'

drun rocm/tensorflow:rocm3.5-tf2.1-dev

#installed these two in the container
https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-60/miopenkernels-gfx906-60_1.0.0_amd64.deb 
https://repo.radeon.com/rocm/apt/3.5/pool/main/m/miopenkernels-gfx906-64/miopenkernels-gfx906-64_1.0.0_amd64.deb

cd /home/dockerx
git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

# Run a quick resnet50 test in FP32
./batch_benchmark.sh 1 1 1 100 2 config_resnet50_replicated_fp32_train_syn

# Run full test for all models, FP32 and FP16, training and inference
./batch_benchmark.sh 1 1 1 100 2 config_all

lambda-tensorflow-benchmark's People

Contributors

Stargazers

Watchers

Forkers

clhne codeaudit jurjsorinliviu jianweilin shadowkun m1ck2 tony32769 monkeyking mcirino hubeibei007 prabhkaran owenchen vsaulys stjordanis cybort peraktong garfield2005 obventio56 gurpreetshanky devmui maddog78827 mrequena pooyam liudengfeng salmankh47 sailfish009 doychind lcy-seso huanxp2 cloudy reger-men jeremybobbin clarisseth dedmari ironerumi longervisionusa booler95 joeywang1977 genesiscloud mario-kart-felix dcominottim haixing-hu hiddenvs andife northern-data-ag mbah360 aviallon kimwi matthewkfho philipcj20 dylanparsons thomasryck aerhao josephdviviano pinkdiamond1 src-r-r niconical marcinzablocki pihu-sharma superliben

lambda-tensorflow-benchmark's Issues

When Final Results come? And Could you compare TiTan V to RTX 2080 TI? Thanks.

NO tf_cnn_benchmark.py

when i run 'benchmark.sh ', error no tf_cnn_benchmark..py...... please help

NVLink memory pooling

Congrats on the article.

Regarding this sentence:

Benchmark the 2080 Ti with multiple GPUs, with and without the NVLINK connector.

It sounds like even without NVLink we can pool memory of the GeForce, are you sure? (i.e. with 2 cards we can pass bigger batch size).

(standard_in) 2: syntax error when running report.sh

(standard_in) 2: syntax error when running report.sh. Looking around the interwebs gives that this is likely a bc error. Any help would be great. Thank you

Average time of benchmark

Could you please put somewhere in the info page an average estimate of the running time of the experiment?

Thanks!

Benchmark with own data and model

Hello,
would it be possible to give a starting point to use the pipeline for own data and own models would like.

[Feature Request]

Hi forks, any chance for upgrading this to tensorflow2.x?

Command to benchmark NVIDIA 1050 Ti

Hello,
May I get the command to run the benchmark on NVIDIA 1050 Ti?

The current script is reporting an error.

Thanks

How to benchmark with multiple GPUs?

I tried to change the MAX_NUM_GPU in line 9 of benchmark.sh but it didn't work.

Thanks!

Tensorflow 2.10.1?

New models like stable-diffusion? tensorflow 2.10.1?

Usage for GPU Integrity Verification?

Just wondering, how would you use this code to test the integrity of a GPU, if at all?
Give or take suggestions for how to modify it for that.

I'd commit back my modifications if this comes up as a viable and good fit.

I'm not able to run the benchmarks...running into a few issues

I'm using the lambdalabs docker images to build (https://github.com/lambdal/lambda-stack-dockerfiles/).

This is the command I run:

sudo nvidia-docker run -v /dev/nvidia0:/dev/nvidia0 -v $(pwd)/lambda-tensorflow-benchmark:/dockerx -v $(pwd)/run_nvidia_benchmark.sh:/run_nvidia_benchmark.sh -it --gpus 1 lambda-stack:20.04 /run_nvidia_benchmark.sh

run_nvidia_benchmark.sh contains the following:

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive
export LTB=lambda-tensorflow-benchmark
export WORKDIR=${WORKDIR:-/dockerx}

apt update && apt install -y 'libcudart10.1' kmod
pip install matplotlib tensorflow-gpu==2.3.1

git config --global --add safe.directory ${WORKDIR}
cd ${WORKDIR}

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
	./batch_benchmark.sh 1 1 \
	1 100 \
	2 \
	config/config_resnet50_replicated_fp32_train_syn

However, this is the error I get:

Processor-NVIDIA_A100-SXM4-80GB
modinfo: ERROR: Module alias nvidia not found.
Batchsize for VRAM size '80GB' not optimized
--optimizer=sgd --model=resnet50 --num_gpus=1 --batch_size= --variable_update=replicated --distortions=false --num_batches=100 --data_name=imagenet --all_reduce_spec=nccl
2022-11-13 01:52:00.900420: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
FATAL Flags parsing error: flag --batch_size=: invalid literal for int() with base 10: ''
Pass --helpshort or --helpfull to see help on flags.

From digging in the bash code it seems that $batch_size is an evaluation of $model:

  # approximately line 163 of benchmark.sh
  eval batch_size=\$$model

Which is being passed in via run_benchmarks_all...

run_benchmark_all() {
  for model in $MODELS; do
    for num_gpus in $(seq ${MAX_NUM_GPU} -1 ${MIN_NUM_GPU}); do
      for iter in $(seq 1 $ITERATIONS); do
        run_benchmark
        sleep 10
      done
    done
  done
}

So then somehow MODELS is not being read from the config. I'll be investigating further, but if you happen to know anything I'm doing wrong, then please say so.

I've ensured that docker can detect the gpus:

$ sudo docker run --gpus 1 lambda-stack:20.04 python3 -c "import torch; print(torch.cuda.device_count())"
1

Benchmark Issue

Really good article guys, I have the RTX 2070 here and I would like to run the test, I have installed your stack from your website, the only thing that I do after installation is updating the Nvidia drivers to recognize the RTX cards, but the tests are not working, here is the issue:

running benchmark for frameworks ['pytorch', 'tensorflow', 'caffe2']
cuda version= None
cudnn version= 7201
/home/bizon/benchmark/deep-learning-benchmark-master/frameworks/pytorch/models.py:17: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
self.eval_input = torch.autograd.Variable(x, volatile=True).cuda() if precision == 'fp32'
Segmentation fault

Thanks in advance

Permission Denied Error

When I run command "./benchmark.sh 0" I get a "Permission Denied" error in the log file. I'm running the command prompt as administrator on Windows 10. Any idea what's causing the error?