Code Monkey home page Code Monkey logo

ltr-weight-balancing's Introduction

Long-Tailed Recognition via Weight Balancing

PWC

[CVPR2022 paper] [poster] [slides] [video]

alt text

In the real open world, data tends to follow long-tailed class distributions, motivating the well-studied long-tailed recognition (LTR) problem. Naive training produces models that are biased toward common classes in terms of higher accuracy. The key to addressing LTR is to balance various aspects including data distribution, training losses, and gradients in learning. We explore an orthogonal direction, {\bf weight balancing}, motivated by the empirical observation that the naively trained classifier has "artificially" larger weights in norm for common classes (because there exists abundant data to train them, unlike the rare classes). We investigate three techniques to balance weights, L2-normalization, weight decay, and MaxNorm. We first point out that L2-normalization "perfectly" balances per-class weights to be unit norm, but such a hard constraint might prevent classes from learning better classifiers. In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius. Our extensive study shows that both help learn balanced weights and greatly improve the LTR accuracy. Surprisingly, weight decay, although underexplored in LTR, significantly improves over prior work. Therefore, we adopt a two-stage training paradigm and propose a simple approach to LTR: (1) learning features using the cross-entropy loss by tuning weight decay, and (2) learning classifiers using class-balanced loss by tuning weight decay and MaxNorm. Our approach achieves the state-of-the-art accuracy on five standard benchmarks, serving as a future baseline for long-tailed recognition.

Code Description

This folder contains two executable Jupyter Notebook files for demonstrating our training approach and how we will open-source our code. The Jupyter Notebook files are sufficiently self-explanatory with detailed comments, and displayed output. The files reproduce the results on the CIFAR100-LT (imbalance factor 100) as shown in Table 1 of the paper.

Running the files requires some common packages (e.g., PyTorch as detailed later). If running, please start with the first stage training demo before running the second stage training.

  1. demo1_first-stage-training.ipynb
    Running this file will train a naive network using cross-entropy loss and stochastic gradient descent (SGD) without weight decay. It should achieve an overall accracy ~39% on the CIFAR100-LT (imbalance factor 100). Then it will train another network with weight decay. Running this file takes ~2 hour with a GPU (e.g. NVIDIA GeForce RTX 3090 in our work). The runtime can be reduced by chaning total_epoch_num to 100. The training results and model paramters will be saved at exp/demo_1.

  2. demo2_second-stage-training.ipynb
    Running this file will compare various regularizers used in the second-stage training such as L2 normalization, $\tau$-normalization, and MaxNorm with weight decay. The latter should achieve an overall accuracy >52%. Running this file takes a few minutes on a GPU.

Why Jupyter Notebook?

We prefer to release the code using Jupyter Notebook (https://jupyter.org) because it allows for interactive demonstration for education purposes.

We also provide python scripts in case readers would like run them rather than Jupyter Notebook. These python scripts are converted using Jupyter command below:

  • jupyter nbconvert --to script demo1_first-stage-training.ipynb

  • jupyter nbconvert --to script demo2_second-stage-training.ipynb

Requirement

We installed python and most packages through Anaconda. Some others might not be installed by default, such as pandas, torchvision, and PyTorch. We suggest installing them before running our code. Below are the versions of python and PyTorch used in our work.

  • Python version: 3.7.4 [GCC 7.3.0]
  • PyTorch verion: 1.7.1

We suggest assigning 300MB space to run all the demos, because they will save models paramters.

If you find our model/method/dataset useful, please cite our work:

@inproceedings{LTRweightbalancing,
  title={Long-Tailed Recognition via Weight Balancing},
  author={Alshammari, Shaden and Wang, Yuxiong and Ramanan, Deva and Kong, Shu},
  booktitle={CVPR},
  year={2022}
}

ltr-weight-balancing's People

Contributors

shadealsha avatar shadennaif avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ltr-weight-balancing's Issues

Some questions about comparison experiments

Hello, I read this paper and saw the part comparing with other methods in the paper. However, I did not find the source code of DiVE's paper, so I would like to ask how you got the results of DiVE on many, med and few. Did you write a code by yourself or found its source code or trained model?
Thank you for taking time out of your busy schedule to read my questions. I am looking forward to your reply!
1673438268152

Some implementation confusion about ImageNet and iNaturalist

I really appreciate your work and am grateful for your open-sourcing of the code.
I was able to successfully reproduce the experimental results on CIFAR100-LT using your code, but I was wondering if you could provide me with some additional information on the hyperparameter settings for the larger datasets, ImageNet-LT and iNaturalist.

Specifically, I would greatly appreciate it if you could share the following:

For Stage 1:
the initial learning rate, epoch number, weight decay settings,
and any data augmentation techniques used (such as color jittering for ImageNet).

For Stage 2:
the initial learning rate, epoch number, weight decay settings,
and the hyperparameters in CBLoss (loss type, beta, and gamma).

Thank you once again for your fantastic work and for your generosity in sharing your code with the community.

Some questions about the experiments.

  1. In the code, you use the ResNet34 model for CIFAR100-LT, but in the paper, you use the ResNet32 model.
dataset Stage loss base lr schedular batch epoch WD model result_all
CIFAR100-100 stage1 CE 0.01 Coslr 64 320 0.005 ResNet32 40.1
CIFAR100-100 stage1 CE 0.01 Coslr 64 320 0.005 ResNet34 47.3

I use these settings to train the model and get a bad result (7% lower than the open source experiment in CoLab), could you please point out my problem?

  1. Could you provide more details about the experiment on how to choose the proper weight decay value for long-tailed recognition? It would help a lot.
  2. I experiment several methods including MisLAS, BAMLS. I find that 5e-4 is good enough and tuning weight decay improves the performance slightly. Maybe the tuning of weight decay is not the core point of imbalanced learning?

Number of seeds

How many seeds were averaged to get the final shown results and what are they?

Reproducibility Issues

Hello,

I am unable to produce the results on your paper for CIFAR-100 IF 100 with a Resnet-32 backbone. I have followed the details in the paper (cosine scheduler, lr=0.01, batch size 64, etc) and made some minor modifications to your code to take in a Resnet-32 (see resnet_cifar.py in MiSLAS). However, even with weight decay tuning via bayesian optimization, I cannot attain a test accuracy beyond 39.5% (small improvement from baseline).

Would you be able to share the specific weight decay value used to attain the paper results (46.08% acc)?

Thank you for your time!

Reproducibility.. Shocking..

Also, I find that I can never achieve 46% in 200 epoch,... with tuned WD with cross entropy..

you trained it for 320 epoch.

2stage training

Do you need to freeze the weight of feature network for the second stage training of classification layer???

Reproducibility issue... ResNet 32 vs 34.. Fraud?

Thank you for your efforts, but I still doubt about the performance you reported since the Colab code uses ResNet-34 but your paper achieved the performance with ResNet32.

ResNet 34 has far different from ResNet 32 in the perspective of internal channel. ResNet 32 uses 3 stage with internal channel 16, 32, 64. and final fully connected layer uses only 64-dim feature.

ResNet 34 uses 512-dim feature with very large internal channels begins from 64.

Not just piling up 2 layers, they have such a big difference. But you only suggested ResNet34, which has huge parameters. But you reported you used ResNet 32 for CIFAR 100-LT to achieve the performance.

Config for best accuracy stage 1 model

Can you give me the configuration of the best model in stage 1. You write (and in demo 1) stage 1 with 46.7% accuracy but you load weights with 47.9% accuracy .

About the long-tailed learning

This research field is very messy, and most people are playing around without getting to the bottom of the problem.
The performance could be highly improved by just adjusting the hyperparameters of general networks, including the weight decay and others that you have not mentioned but used.
I think the future direction is to solve the optimization problem of long-tailed learning.

Question about Figure 3 on your paper

If you calculate filter L2 norm on trained model, how did you calculate variance of filter norm ? (On figure 3)
Since the model is already trained, the filter is learned, so it is fixed.
Thus norm is just a value, without variance.
Am I right ?

Thanks.

Cannot Reproduce Your Performance

Hi, thank you for your work,

nothing but, I cannot reproduce your work, while I'm using the exactly same parameters that you use.

Moreover, even if I follow the ipynotebook cell by cell, you achieved 91% in training data at 41 epoch, but I can only get 6~70% even if I try many times.

Hyperparameters used

Hi,
Interesting work here!
Can you please provide the details on the following hyperparameters used for each of the following datasets,
INaturalist18 : weight decay, learning rate used.
ImagenetLT: weight decay, learning rate used.

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.