Code Monkey home page Code Monkey logo

replica_exchange_stochastic_gradient_mcmc's Introduction

Replica Exchange Stochastic Gradient MCMC

Experiment code for "Non-convex Learning via Replica Exchange Stochastic Gradient MCMC". This is a scalable replica exchange (also known as parallel tempering) stochastic gradient MCMC algorithm with clear acceleration guarantees. This algorithm proposes corrected swaps to connect the high-temperature process for exploration and the low-temperature process for exploitation.

@inproceedings{reSGMCMC,
  title={Non-convex Learning via Replica Exchange Stochastic Gradient MCMC},
  author={Wei Deng and Qi Feng* and Liyao Gao* and Faming Liang and Guang Lin},
  booktitle =   {Proceedings of the 37th International Conference on Machine Learning},
  pages =   {2474--2483},
  year =   {2020},
  volume =   {119}
}

Simulation of Gaussian mixture distributions

Environment

  1. R

  2. numDeriv (library)

  3. ggplot2 (library)

Please check the file in the simulation folder

Optimization of Supervised Learning on CIFAR100

Environment

  1. Python2.7

  2. PyTorch >= 1.1

  3. Numpy

How to run code on CIFAR100 using Resnet20

Setup: batch size 256 and 500 epochs. Simulated annealing is used by default.

  • #f03c15 SGHMC Set the default learning rate (lr) to 2e-6 and the temperature (T) to 0.01
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -lr 2e-6 -T 0.01 -chains 1
  • #c5f015 reSGHMC The low-temperature chain has the same setting as SGHMC; The high-temperature chain has a higher lr=3e-6 (2e-6/LRgap) and a higher T=0.05 (0.01/Tgap); The initial F is 3e5.
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -chains 2 -LRgap 0.66 -Tgap 0.2 -F_jump 0.8 -bias_F 3e5
  • #1589F0 Naive reSGHMC Simply set bias_F=1e300 and F_jump=1 as follows
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -chains 2 -F_jump 1 -bias_F 1e300

To use a large batch size 1024, you need a slower annealing rate and 2000 epochs to keep the same iterations.

$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 2000 -train 1024 -chains 1 -lr_anneal 0.996 -anneal 1.005
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 2000 -train 1024 -chains 2 -lr_anneal 0.996 -anneal 1.005 -F_jump 0.8

Remark: If you do Bayesian model average every epoch and there are two swaps in the same epoch, the acceleration may be neutralized. To handle this issue, you need to consider a cooling time.

To run the WRN models (WRN-16-8 and wrn-28-10) , you can try the following

$ python bayes_cnn.py -data cifar100 -model wrn -sn 500 -train 256 -chains 2 -F_jump 0.8 -cool 20 -bias_F 3e5
$ python bayes_cnn.py -data cifar100 -model wrn28 -sn 500 -train 256 -chains 2 -F_jump 0.8 -cool 20 -bias_F 3e5

Note that in WRN models, we need to include the extra cooling time because cases of two consecutive swaps during the same epoch happens a lot and cancel the acceleration effects.

To reduce the hyperparameter tuning cost, you can try greedy instead of swap to break the detailed balance. This strategy has the same optimization performance as the swap type. For example

$ python bayes_cnn.py -data cifar100 -model wrn -types greedy -sn 500 -train 256 -chains 2 -cool 20 -bias_F 3e5

Semi-supervised Learning via Bayesian GAN

Environment

  1. Python2.7

  2. Tensorflow == 1.0.0 (version number might be critical)

  3. Numpy

How to run code on CIFAR10 using Replica Exchange Stochastic Gradient MCMC

python ./bayesian_gan_hmc.py --dataset cifar --numz 10 --num_mcmc 2 --data_path ./output --out_dir ./output --train_iter 15000 --N 4000 --lr 0.00045 -LRgap 0.66 -Tgap 100 --semi_supervised --n_save 100 --gen_observed 4000 --fileName cifar10_4000_0.00045_0.66_100

For detailed instruction please check the README.md file inside semi_supervised_learning folder.

replica_exchange_stochastic_gradient_mcmc's People

Contributors

gaoliyao avatar waynedw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

replica_exchange_stochastic_gradient_mcmc's Issues

Scripts to reproduce the baseline (M-SGD)

Dear authors, would you please provide the scripts to reproduce your baseline, namely Resnet20s + CIFAR10/CIFAR100 with M-SGD and the results shown in Table 1 in the paper. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.