Code Monkey home page Code Monkey logo

deeplink's Introduction

DeepLINK: Deep Learning INference using Knockoffs

DeepLINK is a variable selection algorithm that guarantees the false discovery rate (FDR) control in high-dimensional settings. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control.

Dependencies

DeepLINK requires Python 3 (>= 3.7.6) with the following packages:

  • numpy >= 1.18.5
  • keras >= 2.4.3
  • tensorflow >= 2.3.0

Installation

Clone the github repository and enter DeepLINK directory with

$ git clone https://github.com/zifanzhu/DeepLINK.git
$ cd DeepLINK

Usage

The inputs of DeepLINK are sample-by-feature data matrix and the corresponding response vector. Assume the dimension of the data matrix is n-by-p. Then the response should an n-by-1 vector. Also, they should be in '.npy' format and without row and column names. To convert your data into '.npy' format, you can simply load it to python by numpy.loadtxt, trim the row and column names, and then save it with numpy.save. You need to have 'numpy' installed to do so. Another required input is the output directory. The outputs of DeepLINK are two space delimited files with selected feature indices using knockoff and knockoff+ threshold respectively, with names selected_variable_ko and selected_variable_ko+. Notice that feature index is 0-based, so the first feature (first column of the data matrix) has index 0.

Options

-h, --help            show this help message and exit
-X DATA, --data=DATA  data matrix in '.npy' format (row sample, column
                      feature)
-y RESPONSE, --response=RESPONSE
                      response vector in '.npy' format
-o OUTPUT_DIR, --out=OUTPUT_DIR
                      output directory
-s                    use this flag if you do not want to center and scale
                      data (center/scale makes every feature column have
                      mean 0/sd 1)
-l L1, --l1=L1        l1 regularization coefficient used in the feature
                      selection MLP [default: 0.001]
-r LR, --lr=LR        learning rate used in the feature selection MLP
                      [default: 0.001]
-a ACT, --act=ACT     activation function used in the feature selection MLP
                      [default: elu]
-L LOSS, --loss=LOSS  loss function used in the feature selection MLP
                      [default: mean_squared_error]
-q Q, --fdr_level=Q   fdr level [default: 0.2]

A toy example

Run the following code and the outputs are in folder test/out:

$ python deeplink.py -X test/X.npy -y test/y.npy -o test/out

Replicate studies in the paper

Simulation studies

Codes used in the simulation studies are in folder Simulations/. An example python script, example.py, for the simulation setting with linear factor model and linear link function is provided. Simulation results of other settings can be obtained by running this script with changes of setting parameters: x_design, y_design, etc. Choices of factor model design x_design: linear, add_quad, logistic. Choices of link function design y_design: linear, nonlinear. Note that the l1 regularization factor (l1) and the learning rate (lr) in the mlp training needs to be tuned for optimal performance.

Real data analyses

Data matrices and codes used in the real data analyses are in folder Real_data_analyses/. Preprocessed data matrices are given as csv files under the corresponding data/ folder. For the microbiome data analysis, we used an independent dataset yu_CRC_common.csv for screening. For the other two scRNA-seq datasets, the screening was done using 50% of the dataset itself. The comparison folder contains the codes for a prediction comparison analysis between DeepLINK, IPAD and Random Forests.

Copyright and License Information

Copyright (C) 2021 University of Southern California

Authors: Zifan Zhu, Yingying Fan, Yinfei Kong, Jinchi Lv, Fengzhu Sun

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Commercial users should contact Dr. Fengzhu Sun ([email protected]) or Dr. Yingying Fan ([email protected]), copyright at University of Southern California.

deeplink's People

Contributors

zifanzhu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.