Code Monkey home page Code Monkey logo

imagenet-clean's Introduction

ImageNet Clean

This repository contains Bash scripts to clean up the ImageNet 1k dataset and pretrained Pytorch models in different configurations.

The Bash scripts can be downloaded from https://www.dropbox.com/s/pyzem2svhnx5h6m/imagenet_clean_scripts.tar.gz?dl=0.

Pytorch pretrained models can be downloaded from https://www.dropbox.com/s/lzm60bz90wfl6ys/imagenet_clean_models.tar.gz?dl=0.

Requirements

Clean up ImageNet 1k (Validation set)

Download and extract the scripts in a directory. Copy the imagenet_val_*.sh scripts into the validation set subdirectory of the dataset (val/) and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenet_val_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenet_val_2_image_removal.sh
  1. Apply categorical fixes:
./imagenet_val_3_categorical_fixes.sh

Clean up ImageNet 1k (Training set)

Download and extract the scripts in a directory. Copy the imagenet_train_*.sh scripts into the training set subdirectory of the dataset (train/) and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenet_train_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenet_train_2_image_removal.sh
  1. Apply categorical fixes:
./imagenet_train_3_categorical_fixes.sh

Optional steps:

  • Removing the wrong images only found by confident learning (a subset of point 2): imagenet_train_2_image_removal1.sh
  • Removing the wrong images only found by model consensus (a subset of point 2): imagenet_train_2_image_removal3.sh
  • Applying the fixes and removal before category fixes for CAE-EDSR images (https://github.com/hendrycks/imagenet-r/tree/master/DeepAugment) before category fixes: imagenet_train_cae_edsr_1_image_fixes.sh and imagenet_train_cae_edsr_2_image_removal.sh

Note: The CAE and EDSR scripts expect that CAE/EDSR images must be renamed to a new name schema (e.g. n01440764_10042.JPEG -> n01440764_10042_CAE.JPEG)

Clean up ImageNetV2 Matched Frequency (Validation set)

Download and extract the scripts in a directory. Copy the imagenetv2_*.sh scripts into the ImageNetV2 subdirectory and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenetv2_matched_frequency_format_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenetv2_matched_frequency_format_2_image_removal.sh
  1. Apply categorical fixes:
./imagenetv2_matched_frequency_format_3_categorical_fixes.sh

Optional steps:

  • Removing the wrong images only found by confident learning (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal1.sh
  • Removing the wrong images only found by model consensus (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal3.sh
  • Renaming the alphabethical folder names to nxxxxxxx format: imagenetv2_folder_name_fixes.sh

Pretrained Pytorch models

The pretrained models have the following name schema:

model_name-widthxheight-variant.pth.tar

  • model_name - efficientnet_b0, shufflenet_v2_x1_5 or squeezenet1_1
  • variant - baseline (trained on original ImageNet), clean (trained on ImageNet Clean), clean-imagenet-r (trained on ImageNet Clean with CAE/EDSR images)

Install Pytorch Image Models:

pip3 install timm

Pretrained Pytorch models (example validations)

Validate an EfficientNet-B0 model (trained on ImageNet Clean, portrait input 216x384) on cleaned ImageNetV2 dataset (top-1/top-5 - 69.26 %/89.29 %):

./validate.py --model efficientnet_b0 --checkpoint efficientnet_b0-384x216-clean.pth.tar -b 64 --log-interval 100 --input-size 3 216 384 --num-classes 1000 IMAGENETV2_DIRECTORY

Validate a SqueezeNet 1.1 model (trained on ImageNet Clean+CAE/EDSR, landscape input 320x180) on ImageNet validation dataset (top-1/top-5 - 60.89 %/83.15 %):

./validate.py --torchvision-model squeezenet1_1 --checkpoint squeezenet1_1-180x320-clean-imagenetr.pth.tar -b 64 --log-interval 100 --input-size 3 320 180 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Validate a ShuffleNetV2 (x1_5) model (trained on original ImageNet, standard input 224x224) on cleaned ImageNet validation dataset (top-1/top-5 - 77.93 %/94.57 %):

./validate.py --hub-model-github-or-dir kecsap/vision --hub-model shufflenet_v2_x1_5 --checkpoint shufflenet_v2_x1_5-224x224-baseline.pth.tar -b 64 --log-interval 100 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Citation

If this helps your research, please cite the paper (https://arxiv.org/abs/2103.16324):

@misc{kertész2021automated,
      title={Automated Cleanup of the ImageNet Dataset by Model Consensus, Explainability and Confident Learning}, 
      author={Csaba Kertész},
      year={2021},
      eprint={2103.16324},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

imagenet-clean's People

Contributors

kecsap avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

randomwalker300

imagenet-clean's Issues

codes of your pretrained nets

Hi, Mr Kertész. You've done a great job and I'm very interested in your work, but I have problems in the process of reproduction. I use training codes of rwightman as you do and set the hyperparameters the same with yours. When I train enetb0 on the T, the results on the V and Vc are almost the same with yours. However, when I train enetb0 on the Tc, the top1 acc of enetb0 on the Vc is 81.19.
I get the Tc by running these scripts:
./imagenet_train_1_image_fixes.sh
./imagenet_train_2_image_removal.sh
./imagenet_train_3_categorical_fixes.sh
Is the way I get Tc wrong? Or are there other hyperparameters that I should set? And can you send me your codes of the squeezenet or shufflenet? The rwightman's codes do not contain these. So do you just import these from pytorch?
Thank you for your kind consideration of this request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.