efficient_imagenet_classification's Introduction

Efficient ImageNet Classification

🚀 Training Resnet50 on ImageNet in 8 hours.

This repo provides an efficient implementation of ImageNet classification, based on PyTorch, DALI, and Apex.

If any questions, please create an issue or contact me at [email protected]

Features

Accelerate the pre-processing of the input data with DALI
Half/Mix precision training with Apex
Real-time logger
Extremely simple structure

Getting Start

Installation

1. Download repo

git clone https://github.com/13952522076/Efficient_ImageNet_Classification.git
cd Efficient_ImageNet_Classification

2. Requirements

Python3.6
PyTorch 1.3+
CUDA 10+
GCC 5.0+

pip install -r requirements.txt

3. Install DALI and Apex

DALI Installation:

cd ~
# For CUDA10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
# or
# For CUDA11
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

For more details, please see Nvidia DALI installation.

Apex Installation:

cd ~
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

For more details, please see Apex or Apex Full API documentation.

Training & Testing

We provide two training strategies: step_lr schedular and cosine_lr schedular in main_step.py and main_cosine.py respectively.

The training models (last one and best one) and the log file are saved in "checkpoints/imagenet/model_name" by default.

I personally suggest to manually setup the path to imagenet dataset in main_step.py (line 49) and main_cosine.py (line 50). Replace the default value to your real PATH.

Or you can add a parameter --data in the following training command.

For the step learning rate schedular, run follwing commands

# change the parameters accordingly if necessary
# e.g, If you have 4 GPUs, set the nproc_per_node to 4. If you want to train with 32FP, remove ----fp16.
python3 -m torch.distributed.launch --nproc_per_node=8 main_step.py -a old_resnet50 --fp16 --b 32

For the cosine learning rate schedular, run follwing commands

# change the parameters accordingly if necessary
python3 -m torch.distributed.launch --nproc_per_node=8 main_cosine.py -a old_resnet18 --b 64 --opt-level O0

Add New Models

Please follow the same coding style in models/resnet.py.

Add a new model file in folder models
Import the model file in model package, say models/init.py

Calculate Parameters and FLOPs

python3 count_Param.py

🐛: It would not consider the forward operations. For example, defining a pooling layer in init function and implementing the pooling operation in forward function will lead to different results.

Acknowledgements

This implementation is built upon PyTorch ImageNet demo and PytorchInsight.

Many thanks to Xiang Li for his great work.

efficient_imagenet_classification's People

Contributors

Stargazers

Watchers

efficient_imagenet_classification's Issues

Problem while using the code.

When I try to deploy the code on GCP using 4 V100 GPUs, the code can not work and throw the following error:
Traceback (most recent call last):
File "main_cosine.py", line 586, in
main()
File "main_cosine.py", line 235, in main
model = DDP(model, delay_allreduce=True)
File "/opt/anaconda3/lib/python3.7/site-packages/apex/parallel/distributed.py", line 253, in init
flat_dist_call([param.data for param in self.module.parameters()], dist.broadcast, (0,) )
File "/opt/anaconda3/lib/python3.7/site-packages/apex/parallel/distributed.py", line 75, in flat_dist_call
apply_flat_dist_call(bucket, call, extra_args)
File "/opt/anaconda3/lib/python3.7/site-packages/apex/parallel/distributed.py", line 41, in apply_flat_dist_call
call(coalesced, *extra_args)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 810, in broadcast
work = _default_pg.broadcast([tensor], opts)
RuntimeError: Broken pipe

Python 3.7.4
CUDA Version 10.0.130
CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
Torch Version 1.3.0

Recommend Projects

ma-xu / efficient_imagenet_classification Goto Github PK

efficient_imagenet_classification's Introduction

Efficient ImageNet Classification

Features

Getting Start

Installation

Training & Testing

Add New Models

Calculate Parameters and FLOPs

Acknowledgements

efficient_imagenet_classification's People

Contributors

Stargazers

Watchers

Forkers

efficient_imagenet_classification's Issues

Problem while using the code.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent