Code Monkey home page Code Monkey logo

surprising-instabilities's Introduction

Surprising Instabilities in Training Deep Networks and a Theoretical Analysis (NeurIPS 2022)

Created by Yuxin Sun, Dong Lao, Ganesh Sundaramoorthi and Anthony Yezzi from Georgia Tech, UCLA and Raytheon Technologies.

NeurIPS page / Arxiv

Demo

Introduction

This repository contains source code for the NeurIPS 2022 paper "Surprising Instabilities in Training Deep Networks and a Theoretical Analysis". We provide the code for producing experimental data, analyzing results and generating figures.

Please follow the installation instructions below.

Instructions

1. Requirements

Our codebase uses PyTorch. The code was tested with Python 3.7.9, torch 1.8.0 on Ubuntu 18.04 (should work with later versions).

2. Simulation of the discretization scheme for the heat PDE

We implemented a demo code for illustration of instability in discretizing the heat equation (heat.py). It plots the variable u at different time.

Tt generates a plot for stable evolution by setting dt = 0.4.

It generates a plot for unstable evolution by setting dt = 0.8.

3. Evidence of Instability in Training Deep Networks

In folder ./instabillity, we include the code for providing empirical evidence of restrained instabilities in current deep learning training practice.

3.1 Perturbed SGD on Common CNNs

We provide the code for showing final test accuracy over different seeds (batch selections) and different floating point perturbations (rows) for Resnet56 trained on CIFAR-10. It generates the data for Table 1(left) in section 4.2. We provide ResNet56 with two activation functions, ReLU and Swish. Run: ./instabillity/main6.py

Different seed could be chosen by changing r in the arguments.

We also provide data analysis code for this experiment in ./scripts/mean_variance.py. It computes the final accuracy and standard deviation for input data.

3.2 Demonstration of Restrained Instabilities

We provide the code for showing the divergence phenomenon in network weights between the original SGD weights, and the perturbed SGD weights. It generates the data for two plots in Figure 2. Run: ./instabillity/main2.py

Different learning rate divisor could be chosen by changing n in the arguments.

We also provide code for producing Figure 2(right) in ./instability/compute_iteration.m.

4. PDE Stability Analysis of a Simplified CNN

In folder ./single_layer, we include the code for experiments on a single layer network.

4.1 Empirical Validation of Stability Bounds for Linear PDE

We provide code to validation two stability bounds for the linearized PDE (9). It generates the loss plots for the discretization of lineared PDE (9). Run: ./sinlge_layer/linear.py

Different learning rate could be chosen by changing learning_rate in the parameters.

We also provide code for producing the Figure 3 in ./scripts/loss_plot.py.

4.2 Restrained Instabilities in the Non-linear PDE

We provide the code for showing that restrained instabilites are present in the (non-linear) gradient descent PDE of the one layer CNN. It generates the loss plots for the non-linear PDE (8) for various choices of learning rates (dt). Run: ./single_layer/PySun.py

Different learning rate could be chosen by changing learning_rate in the parameters.

We also provide code for producing the Figure 4 (left) in ./scripts/oscillation_plot.py.

4.3 L1 error accumulation in the non-linear PDE

We provide the code for showing error amplification occurs in the (non-linear) gradient descent PDE of the one layer CNN. It generates the data for Figure 4(right). Run: ./single_layer/PySun2.py

Different learning rate could be chosen by changing learning_rate in the parameters.

We also provide code for producing the Figure 4 (right) in ./instability/compute_onelayer.m.

Thanks

This research was supported in part by Army Research Labs (ARL) W911NF-22-1-0267 and Raytheon Technologies Research Center.

License and Citation

If you find our work useful in your research, please cite our paper:

Preprint:

@inproceedings{NEURIPS2022_7b97adea,
 author = {Sun, Yuxin and LAO, DONG and Sundaramoorthi, Ganesh and Yezzi, Anthony},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {19567--19578},
 publisher = {Curran Associates, Inc.},
 title = {Surprising Instabilities in Training Deep Networks and a Theoretical Analysis },
 url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/7b97adeafa1c51cf65263459ca9d0d7c-Paper-Conference.pdf},
 volume = {35},
 year = {2022}
}

See LICENSE file.

surprising-instabilities's People

Contributors

sunyx523 avatar

Stargazers

Hao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.