Code Monkey home page Code Monkey logo

obow's Introduction

Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning

Official PyTorch implementation of the OBoW paper accepted at CVPR 2021

OBoW

Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, Patrick Pérez, CVPR 2021

If you use the OBoW code or framework in your research, please consider citing:

@inproceedings{gidaris2021obow,
    title={Learning Representations by Predicting Bags of Visual Words},
    author={Gidaris, Spyros and Bursuc, Andrei and Puy, Gilles and Komodakis, Nikos and Cord, Matthieu and P{\'e}rez, Patrick},
    booktitle={CVPR},
    year={2021}
}

License

This code is released under the MIT License (refer to the LICENSE file for details).

Preparation

Pre-requisites

  • Python 3.7
  • Pytorch >= 1.3.1 (tested with 1.3.1)
  • CUDA 10.0 or higher

Installation

(1) Clone the repo:

$ git clone https://github.com/valeoai/obow

(2) Install this repository and the dependencies using pip:

$ pip install -e ./obow

With this, you can edit the obow code on the fly and import function and classes of obow in other projects as well.

(3) Optional. To uninstall this package, run:

$ pip uninstall obow

(4) Create experiment directory:

$ cd obow
$ mkdir ./experiments

You can take a look at the Dockerfile if you are uncertain about the steps to install this project.

Download our ResNet50 pre-trained model

Method Epochs Batch-size Dataset ImageNet linear acc. Links to pre-trained weights
OBoW 200 256 ImageNet 73.8 entire model / only feature extractor

To download our ResNet50 pre-trained model from the command line run:

# Run from the OBoW directory
$ mkdir ./experiments/ImageNetFull
$ cd ./experiments/ImageNetFull

# To download all model files
$ wget https://github.com/valeoai/obow/releases/download/v0.1.0/ImageNetFull_ResNet50_OBoW_full.zip
$ unzip ImageNetFull_ResNet50_OBoW_full.zip

# To download only the student feature extractor in torchvision-like format
$ wget https://github.com/valeoai/obow/releases/download/v0.1.0/ImageNetFull_ResNet50_OBoW_full_feature_extractor.zip
$ unzip ImageNetFull_ResNet50_OBoW_full_feature_extractor.zip

$ cd ../../

Experiments: Training and evaluating ImageNet self-supervised features.

Train a ResNet50-based OBoW model (full solution) on the ImageneNet dataset.

# Run from the obow directory
# Train the OBoW model.
$ python main_obow.py --config=ImageNetFull/ResNet50_OBoW_full --workers=32 -p=250 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444'

Here with --data-dir=/datasets_local/ImageNet it is assumed that the ImageNet dataset is at the location /datasets_local/ImageNet. The configuration file for running the above experiment, which is specified by the --config argument, is located at: ./config/ImageNetFull/ResNet50_OBoW_full.py. Note that all the experiment configuration files are placed in the ./config/ directory. The data of this experiment, such as checkpoints and logs, will be stored at ./experiments/ImageNetFull/ResNet50_OBoW_full.

Evaluate on the ImageNet linear classification protocol

Train an ImageNet linear classification model on top of frozen features learned by student of the OBoW model.

# Run from the obow directory
# Train and evaluate a linear classifier for the 1000-way ImageNet classification task.
$ python main_linear_classification.py --config=ImageNetFull/ResNet50_OBoW_full --workers=32 -p=250 -b 1024 --wd 0.0 --lr 10.0 --epochs 100 --cos-schedule --dataset ImageNet --name "ImageNet_LinCls_b1024_wd0lr10_e100" --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444'

The data of this experiment, such as checkpoints and logs, will be stored at ./experiments/ImageNetFull/ResNet50_OBoW_full/ImageNet_LinCls_b1024_wd0lr10_e100.

Evaluate on the Places205 linear classification protocol

Train an Places205 linear classification model on top of frozen features extracted from the OBoW model.

# Run from the obow directory
# Train and evaluate a linear classifier for the 205-way Places205 classification task.
$ python main_linear_classification.py --config=ImageNetFull/ResNet50_OBoW_full --dataset Places205 --batch-norm --workers=32 -p=500 -b 256 --wd 0.00001 --lr 0.01 --epochs 28 --schedule 10 20 --name "Places205_LinCls_b256_wd1e4lr0p01_e28" --dst-dir=./experiments/ --data-dir=/datasets_local/Places205 --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444'

The data of this experiment, such as checkpoints and logs, will be stored at ./experiments/ImageNetFull/ResNet50_OBoW_full/Places205_LinCls_b256_wd1e4lr0p01_e28.

ImageNet semi-supervised evaluation setting.

# Run from the obow directory
# Fine-tune with 1% of ImageNet annotated images.
$ python main_semisupervised.py --config=ImageNetFull/ResNet50_OBoW_full --workers=32 -p=50  --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444' --percentage 1 --lr=0.0002 --lr-head=0.5 --lr-decay=0.2 --wd=0.0 --epochs=40 --schedule 24 32 --name="semi_supervised_prc1_wd0_lr0002lrp5_e40"
# Fine-tune with 10% of ImageNet annotated images.
$ python main_semisupervised.py --config=ImageNetFull/ResNet50_OBoW_full --workers=32 -p=50  --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444' --percentage 10 --lr=0.0002 --lr-head=0.5 --lr-decay=0.2 --wd=0.0 --epochs=20 --schedule 12 16 --name="semi_supervised_prc10_wd0_lr0002lrp5_e20"

The data of these experiments, such as checkpoints and logs, will be stored at ./experiments/ImageNetFull/ResNet50_OBoW_full/semi_supervised_prc1_wd0_lr0002lrp5_e40 and ./experiments/ImageNetFull/ResNet50_OBoW_full/semi_supervised_prc10_wd0_lr0002lrp5_e20 (for the 1% and 10% settings respectively).

Convert to torchvision format.

The ResNet50 model that we trained is stored in a different format than that of the torchvision ResNe50 model. The following command converts it to the torchvision format.

$ python main_obow.py --config=ImageNetFull/ResNet50_OBoW_full --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --multiprocessing-distributed --dist-url='tcp://127.0.0.1:4444' --convert-to-torchvision

Pascal VOC07 Classification evaluation.

First convert from the torchvision format to the caffe2 format (see command above).

# Run from the obow directory
python utils/convert_pytorch_to_caffe2.py --pth_model ./experiments/ImageNetFull/ResNet50_OBoW_full/tochvision_resnet50_student_K8192_epoch200.pth.tar --output_model ./experiments/ImageNetFull/ResNet50_OBoW_full/caffe2_resnet50_student_K8192_epoch200_bgr.pkl --rgb2bgr True

For the following steps you need first to download and install fair_self_supervision_benchmark.

# Run from the fair_self_supervision_benchmark directory
$ python setup.py install
$ python -c 'import self_supervision_benchmark'
# Step 1: prepare datatset.
$ mkdir obow_ep200
$ mkdir obow_ep200/voc
$ mkdir obow_ep200/voc/voc07
$ python extra_scripts/create_voc_data_files.py --data_source_dir /datasets_local/VOC2007/ --output_dir ./obow_ep200/voc/voc07/
# Step 2: extract features from voc2007
$ mkdir obow_ep200/ssl-benchmark-output
$ mkdir obow_ep200/ssl-benchmark-output/extract_features_gap
$ mkdir obow_ep200/ssl-benchmark-output/extract_features_gap/data
# ==> Extract pool5 features from the train split.
$ python tools/extract_features.py \
    --config_file [obow directory path]/utils/configs/benchmark_tasks/image_classification/voc07/resnet50_supervised_extract_gap_features.yaml \
    --data_type train \
    --output_file_prefix trainval \
    --output_dir ./obow_ep200/ssl-benchmark-output/extract_features_gap/data \
    NUM_DEVICES 1 TEST.BATCH_SIZE 64 TRAIN.BATCH_SIZE 64 \
    TEST.PARAMS_FILE [obow directory path]/experiments/obow/ImageNetFull/ResNet50_OBoW_full/caffe2_resnet50_student_K8192_epoch200_bgr.pkl \
    TRAIN.DATA_FILE ./obow_ep200/voc/voc07/train_images.npy \
    TRAIN.LABELS_FILE ./obow_ep200/voc/voc07/train_labels.npy
# ==> Extract pool5 features from the test split.
$ python tools/extract_features.py \
    --config_file [obow directory path]/utils/configs/benchmark_tasks/image_classification/voc07/resnet50_supervised_extract_gap_features.yaml \
    --data_type test \
    --output_file_prefix test \
    --output_dir ./obow_ep200/ssl-benchmark-output/extract_features_gap/data \
    NUM_DEVICES 1 TEST.BATCH_SIZE 64 TRAIN.BATCH_SIZE 64 \
    TEST.PARAMS_FILE [obow directory path]/experiments/obow/ImageNetFull/ResNet50_OBoW_full/caffe2_resnet50_student_K8192_epoch200_bgr.pkl \
    TRAIN.DATA_FILE ./obow_ep200/voc/voc07/test_images.npy TEST.DATA_FILE ./obow_ep200/voc/voc07/test_images.npy \
    TRAIN.LABELS_FILE ./obow_ep200/voc/voc07/test_labels.npy TEST.LABELS_FILE ./obow_ep200/voc/voc07/test_labels.npy
# Step 4: Train and test linear svms.
# ==> Train linear svms.
$ mkdir obow_ep200/ssl-benchmark-output/extract_features_gap/data/voc07_svm
$ mkdir obow_ep200/ssl-benchmark-output/extract_features_gap/data/voc07_svm/svm_pool5bn
$ python tools/svm/train_svm_kfold.py \
    --data_file ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/trainval_pool5_bn_features.npy \
    --targets_data_file ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/trainval_pool5_bn_targets.npy \
    --costs_list "0.05,0.1,0.3,0.5,1.0,3.0,5.0" \
    --output_path ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/voc07_svm/svm_pool5bn/  
# ==> Test the linear svms.
$ python tools/svm/test_svm.py \
    --data_file ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/test_pool5_bn_features.npy \
    --targets_data_file ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/test_pool5_bn_targets.npy \
    --costs_list "0.05,0.1,0.3,0.5,1.0,3.0,5.0" \
    --output_path ./obow_ep200/ssl-benchmark-output/extract_features_gap/data/voc07_svm/svm_pool5bn/    

Pascal VOC07+12 Object Detection evaluation.

(1) First install Detectron2.

(2) Convert a pre-trained model from the torchvision format to the caffe2 format required by Detectron2 (see command above).

(3) Put dataset under "./datasets" directory, following the directory structure requried by Detectron2.

(4) Copy the config file in the Detectron2 repo configs/PascalVOC-Detection.

(5) In Detectron2 launch the train_net.py script to reproduce the object detection experiments on Pascal VOC:

python tools/train_net.py --num-gpus 8 --config-file configs/PascalVOC-Detection/pascal_voc_0712_faster_rcnn_R_50_C4_BoWNetpp_K8192.yaml

Other experiments: Training using 20% of ImageNet and ResNet18.

A single gpu is enough for the following experiments.

ResNet18-based OBoW vanilla solution.

# Run from the obow directory
# Train the model.
$ python main_obow.py --config=ImageNet20/ResNet18_OBoW_vanilla --workers=16 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet
# Few-shot evaluation.
$ python main_obow.py --config=ImageNet20/ResNet18_OBoW_vanilla --workers=16 --episodes 200 --fewshot-q 1 --fewshot-n 50 --fewshot-k 1 5 --evaluate --start-epoch=-1 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet
# Linear classification evaluation. Note the following command precaches the extracted features at root/local_storage/spyros/cache/obow.
$ python main_linear_classification.py --config=ImageNet20/ResNet18_OBoW_vanilla --workers=16 -b 256 --wd 0.000002 --dataset ImageNet --name "ImageNet_LinCls_precache_b256_lr10p0wd2e6" --precache --lr 10.0 --epochs 50 --schedule 15 30 45 --subset=260 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --cache-dir=/root/local_storage/spyros/cache/obow

ResNet18-based OBoW full solution.

# Run from the obow directory
# Train the model.
$ python main_obow.py --config=ImageNet20/ResNet18_OBoW_full --workers=16 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet
# Few-shot evaluation.
$ python main_obow.py --config=ImageNet20/ResNet18_OBoW_full --workers=16 --episodes 200 --fewshot-q 1 --fewshot-n 50 --fewshot-k 1 5 --evaluate --start-epoch=-1 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet
# Linear classification evaluation. Note the following command precaches the extracted features at root/local_storage/spyros/cache/obow.
$ python main_linear_classification.py --config=ImageNet20/ResNet18_OBoW_full --workers=16 -b 256 --wd 0.000002 --dataset ImageNet --name "ImageNet_LinCls_precache_b256_lr10p0wd2e6" --precache --lr 10.0 --epochs 50 --schedule 15 30 45 --subset=260 --dst-dir=./experiments/ --data-dir=/datasets_local/ImageNet --cache-dir=/root/local_storage/spyros/cache/obow

Download the ResNet18-based OBoW models pre-trained on 20% of ImageNet.

# Run from the OBoW directory
$ mkdir ./experiments/ImageNet20
$ cd ./experiments/ImageNet20

# To download the full OBoW version
$ wget https://github.com/valeoai/obow/releases/download/v0.1.0/ImageNet20_ResNet18_OBoW_full.zip
$ unzip ImageNet20_ResNet18_OBoW_full.zip

# To download the vanilla OBoW version
$ wget https://github.com/valeoai/obow/releases/download/v0.1.0/ImageNet20_ResNet18_OBoW_vanilla.zip
$ unzip ImageNet20_ResNet18_OBoW_vanilla.zip

$ cd ../../

obow's People

Contributors

abursuc avatar gidariss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

obow's Issues

which section of imagenet data should be used?

Hi , I am a new research student in DL, could u tell us which section of imagenet should I download to use? Thank you!
Download
Download ImageNet Data
March 11, 2021. Face-blurred ILSVRC 2012–2017 classification data is released. We strongly urge researchers to use this new privacy-aware version.

October 10, 2019: The ILSVRC 2012 classification and localization test set has been updated.

You have been granted access to the the whole ImageNet database through our site. By doing so you agree to the terms of access.

Winter 2021 release
ImageNet21K MD5: ab313ce03179fd803a401b02c651c0a2
Processed version of ImageNet21K using the script of "ImageNet-21K pretraining for the masses"
ImageNet10K from Deng et al. ECCV2010

People subtree annotations (FAT 2020).*
Description and details
Unsafe synsets
Imageability annotations
Due to sensitivity of the data, the demographic annotations are not available here. Please contact us at [email protected] to request access.

ImageNet Large-scale Visual Recognition Challenge (ILSVRC)
2017
2016
2015
2014
2013
2012
2011
2010
ILSVRC 2012–2017 evaluation server

Face-blurred ILSVRC2012–2017 classification data is now available (below). We strongly encourage researchers to use this new privacy-aware version for all purposes.

Face obfuscation in ILSVRC
Description and details
Face annotations
Blurred training images
Blurred validation images

Object bounding boxes (AAAI 2010).
Description and details
Download all available

Object attributes (ECCV workshop 2010).
Description, details, and download

Download image data for Visual Domain Decathlon(PASCAL in Detail Workshop Challenge)
Description and details
Decathlon data, 6.1 GB

Download downsampled image data (32x32, 64x64)
Description and details
Train(8x8), npz format, 227 MB
Val(8x8), npz format 9 MB
Train(16x16), npz format, 888 MB
Val(16x16), npz format, 34 MB
Train(32x32), npz format, 3 GB
Val(32x32), npz format, 134 MB
Train(64x64) part1, npz format, 6 GB
Train(64x64) part2, npz format, 6 GB
Val(64x64), npz format, 509 MB

Tiny Imagenet(Stanford CS231N)
Description and details
Tiny, 236 MB

Docker Image not working T

There is some problem with the Dockerfile mentioned here !!

The docker file shows not available !!

Please solve this ASAP

Asymmetry between the student and the teacher networks

Great work and thanks for sharing!
Here I have two questions for the method design:

  1. I notice that in the teacher network the codes are computed by L2 distance, while in the student network the codes are computed by the inner product (cosine). Any special insights into this?
  2. In the student network, the dynamically generated words are l2-normed, however, the features from the backbone (S(x)) are not. May I ask why?

`mu_min_dist` got float(-inf) value

First of all, I'd like to thank you for your great work. I have adopted your BoW implementation to my method and mu_min_dist in BoWExtractor class got inf value after few epochs in the early stage though I remain most of your BoW codes in my implementation. What is the problem with this?
Here I use batch size of 16 to debug.

Epoch 0:   0%|                     | 2/7972 [00:16<12:29:47,  5.64s/it, loss=41.7, v_num=0_21]
features: torch.Size([16, 1024, 14, 14])                                                                
embedding_w: torch.Size([4096, 1024, 1, 1])                                                   
embedding_b: torch.Size([4096])                                                               
dist: torch.Size([16, 4096, 14, 14])                                                          
mu_min_dist: tensor(2409.4023, device='cuda:0') 
selected_feaures: torch.Size([16, 1024])
self.mu_min_dist: tensor([89.2840], device='cuda:0')
inv_delta_adaptive: tensor([0.1680], device='cuda:0')
features: torch.Size([16, 2048, 7, 7])
embedding_w: torch.Size([4096, 2048, 1, 1])
embedding_b: torch.Size([4096])
dist: torch.Size([16, 4096, 7, 7])
mu_min_dist: tensor(2662.2830, device='cuda:0') 
selected_feaures: torch.Size([16, 2048])
self.mu_min_dist: tensor([95.8128], device='cuda:0')
inv_delta_adaptive: tensor([0.1566], device='cuda:0')
bow_loss: tensor(42.8774, device='cuda:0', grad_fn=<SumBackward0>)
Epoch 0:   0%|                      | 3/7972 [00:17<9:30:46,  4.30s/it, loss=42.1, v_num=0_21]
features: torch.Size([16, 1024, 14, 14])
embedding_w: torch.Size([4096, 1024, 1, 1])
embedding_b: torch.Size([4096])
dist: torch.Size([16, 4096, 14, 14])
mu_min_dist: tensor(-inf, device='cuda:0')

Meanwhile, I have saw you drop the boundary of feature maps by margin = 1:

features = features[:, :, 1:-1, 1:-1].contiguous() # drop the boudary, which turns spatial size into [H-2,W-2]

What is the intuition behind this? Might it affect the model performance if we do not alter features?

custom data

How to use custom data?What needs to be modified?
Thanks

About indoor usage

Hi, thanks for your great contribution. Here I want to consult some questions about indoor experiment.

In the paper, all experements are conducted in public datasets and most are for object detection and classification. I wander what would be the difference if applying it into a real indoor environment to help slam relocalization and loop detection, such as home, official and even some room with repeat region(such as a Server Room where many server computers with the same appearance are placed)? What should I care about when collecting training data?
Thanks for your attention and I am always looking forward to your kind response and any advice.

Best, Slamer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.