Code Monkey home page Code Monkey logo

instahide's Introduction

InstaHide training on CIFAR-10 with PyTorch

Overview

InstaHide[1] is a practical instance-hiding method for image data encryption in privacy-sensitive distributed deep learning.

InstaHide uses the Mixup[2] method with a one-time secret key consisting of a pixel-wise random sign-flipping mask and samples from the same training dataset (Inside-dataset InstaHide) or a large public dataset (Cross-dataset InstaHide). It can be easily plugged into an existing distributed learning pipeline, and is very efficient and incurs minor reduction on accuracy.

We also release a challenge to further investigate the security of InstaHide.

Citation

If you use InstaHide or this codebase in your research, then please cite our paper:

@inproceedings{hsla20,
    title = {InstaHide: Instance-hiding Schemes for Private Distributed Learning},
    author = {Yangsibo Huang and Zhao Song and Kai Li and Sanjeev Arora},
    booktitle = {Internation Conference on Machine Learning (ICML)},
    year = {2020}
}

How to run

Install dependencies

  • Create an Anaconda environment with Python3.6
conda create -n instahide python=3.6
  • Run the following command to install dependencies
conda activate instahide
pip install -r requirements.txt

Important script arguments

Training configurations:

  • model: network architecture (default: 'ResNet18')
  • lr: learning rate (default: 0.1)
  • batch-size: batch size (default: 128)
  • decay: weight decay (default: 1e-4)
  • no-augment: turn off data augmentation

InstaHide configurations:

  • klam: the number of images got mixed in an instahide encryption, k in the paper (default: 4)
  • mode: 'instahide' or 'mixup' (default: 'instahide')
  • upper: the upper bound of any coefficient, c1 in the paper (default: 0.65)
  • dom: the lower bound of the sum of coefficients of two private images, c2 in the paper (default: 0.3, only for Cross-dataset InstaHide)

Inside-dataset InstaHide:

Inside-dataset Instahide mixes each training image with random images within the same private training dataset.

For inside-dataset InstaHide training, run the following script:

python train_inside.py --mode instahide --klam 4 --data cifar10

Cross-dataset InstaHide:

Cross-dataset Instahide, arguably more secure, involves mixing with random images from a large public dataset. In the paper, we use the unlabelled ImageNet[3] as the public dataset.

For cross-dataset InstaHide training, first, prepare and preprocess your public dataset, and save it in PATH/TO/FILTERED_PUBLIC_DATA. Then, run the following training script:

python train_cross.py --mode instahide --klam 6 --data cifar10 --pair --dom 0.3 --help_dir PATH/TO/FILTERED_PUBLIC_DATA

Try InstaHide on new datasets or your own data?

You can easily customize your own dataloader to test InstaHide on more datasets (see the train_inside.py and train_cross.py, around the 'Prepare data' section).

You can also try new models by defining the network architectures under the \model folder.

Questions

If you have any questions, please open an issue or contact [email protected].

References:

[1] InstaHide: Instance-hiding Schemes for Private Distributed Learning, Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora, ICML 2020

[2] mixup: Beyond Empirical Risk Minimization, Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz, ICLR 2018

[3] ImageNet: A Large-Scale Hierarchical Image Database., Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, CVPR 2009

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.