Code Monkey home page Code Monkey logo

compare_dp_mechanisms's Introduction

Introduction

Code in this repository compare the performace of three differentially private mechanisms: the Laplace mechanism, the exponential mechanism, and the method proposed by Johnson & Shmatikov (2013). The data used for demonstration are simulated genome-wide association study data generated by HAP-SAMPLE.

Code in this repository were used to produce results in the following paper:

"Scalable privacy-preserving data sharing methodology for genome-wide association studies." Yu, F, S. Fienberg, S. Slavković, C. Uhler (2014). Journal of Biomedical Informatics (forthcoming).

The paper is available at here and here.

Example data generation

The example case/control genotype data (example/case_genotypes.dat and example/case_genotypes.dat) were generated by HAP-SAMPLE with the following options:

  • Population: CEU
  • Source for SNPs: Chrom9_Chrom13_snps.txt
  • Disease Model File: AR_chrom9_chrom13_Nov09_additive1_MAF025.txt
  • Simulation Type: Case/Control
  • Number of Cases: 1000
  • Number of Controls: 1000
  • Average breaks per cM: 1
  • Output Format: SNPs v. Individuals

The disease model file describes a disease with 2 causative SNPs having addive effects. For more details about the disease model, see Malaspinas & Uhler (2010).

Analysis on the example data

See http://nbviewer.ipython.org/github/fy/compare_dp_mechanisms/blob/master/notebooks/compare_all_dp_mechanisms.ipynb

Or if you have ipython notebook installed, run in the terminal

cd notebooks
../start_ipynb.sh

and open the compare_all_dp_mechanisms notebook in the browser. By default, the iPyhon notebook is at http://localhost:8888/

How to use the code

First of all, untar the example data:

tar xvfz exmaple.tar.gz

Then convert raw data to genotype tables:

python ./notebooks/raw_to_geno_table.py ./example/case_genotypes.dat ./example/anticase_genotypes.dat ./table.tmp

To get results from the Laplace mechanism, run:

python ./notebooks/write_chisquare.py ./table.tmp ./chisquare.tmp
python ./notebooks/get_laplace_results.py k e n_case n_control ./chisquare.tmp

where "k" is the number of top SNPs to release, "e" is the privacy budget (commonly known as epsilon in epsilon-differential privacy), and "n_case" and "n_control" are the numbers of cases and controls.

Similarly, to get results from the Exponential mechanism, run:

python ./notebooks/write_chisquare.py ./table.tmp ./chisquare.tmp
python ./notebooks/get_expo_results.py k e n_case n_control ./chisquare.tmp

To get results from the Johnson & Shmatikov method, run:

python ./notebooks/write_JS_distance.py -p pval ./table.tmp ./js_distance.tmp
python ./notebooks/get_JS_results.py k e ./js_distance.tmp

where "pval" is the p-value specified in Johnson & Shmatikov (2013), which can be interpreted as the overall p-value (say 0.05) of the multiple testing problem involving thousands of SNPs.

compare_dp_mechanisms's People

Contributors

fy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.