Code Monkey home page Code Monkey logo

l1ktools's Introduction

L1000 Analysis Tools v1.1

Copyright 2011-2015 Broad Institute of MIT and Harvard.

A collection of software tools to read and analyze data produced from the LINCS project. For more information on the project, including licensing, please visit (lincsproject.org).

Analysis Tools

A brief description of the tools included in this software package is given below. The Matlab implementation of the tools is currently the most mature. Some basic utilities in R and java are also included. We will update the tools as they become available. Please view the changelog for changes.

Matlab Tools: matlab/

Requirements

  1. Matlab R2014b and above
  2. Statistics Toolbox
  3. Parallel Processing Toolbox [Optional]

Setting the MATLAB path

Run cd l1ktools/matlab; l1kt_setup or enter the pathtool command, click "Add with Subfolders...", and select the directory l1ktools/matlab.

Running the standard CMap data processing pipeline

All scripts are contained within the matlab/data_pipeline folder. A directory of example .lxb files from a LINCS Joint Project (LJP) plate under the data/lxb directory. Note that processing a full 384 well plate can take a long time depending on the machine you're running it on. For test purposes we provide a test plate with a smaller number of samples. We also provide pre-computed results in matlab/data_pipeline/results to enable viewing the outputs without running the pipeline and/or to compare their results with CMap's. These outputs are described in the section below. By default all outputs are saved in the current working directory, but this can be overriden using the plate_path argument to any of the scripts below.

% Setup the envronment
l1kt_setup

% Plate to process
plate_name='LJP009_A375_24H_X1_B20';
% plate_name='TEST_A375_24H_X1_B20'; % A smaller test plate
% Parent folder containing raw lxb files for a given plate
raw_path=fullfile(mortarpath, '../data/lxb');
% Path to map files with sample annotations
map_path=fullfile(mortarpath, '../data/maps');
% Results folder
plate_path=pwd;

% Run the full pipeline on a plate of data
[gex_ds, qnorm_ds, inf_ds, zs_ds_qnorm, zs_ds_inf] = process_plate('plate', plate_name, 'raw_path', raw_path, 'map_path', map_path);

% Run specific parts of the pipeline
% Convert a directory of LXB files (level 1) into gene expression (GEX, level 2) matrix.
% here, using example data
gex_ds = level1_to_level2('plate', plate_name, 'raw_path', raw_path, 'map_path', map_path)

% Convert the GEX matrix (level 2) to quantile normalized (QNORM, level 3) matrices
% in both landmark and inferred (INF) gene spaces.
[qnorm_ds, inf_ds] = level2_to_level3('plate', plate_name, 'plate_path', plate_path)

% Convert the QNORM matrix (level 3) into z-scores (level 4).
% same procedure can be performed using INF matrix (not shown).
zs_ds = level3_to_level4(qnorm_ds, 'plate', plate_name, 'plate_path', plate_path)

Note: Because the peak detection algorithm is non-deterministic, it's possible that data in levels 2 through 4 could differ slightly for two instances of processing the same plate. The code allows reproducing a previous run by passing a random seed file to the process_plate script. We provide such a file at resources/rndseed.mat. Reproducing the results provided in matlab/data_pipeline/results can be done as follows:

% reproduce provided results
[gex_ds, qnorm_ds, inf_ds, zs_ds_qnorm, zs_ds_inf] = process_plate('plate', plate_name, 'raw_path', raw_path, 'map_path', map_path, 'rndseed', 'resources/rndseed.mat');

Description of Pipeline Outputs

File Data Level Gene Space Description
LJP009_A375_24H_X1_B20.map n/a n/a Sample annotations file
LJP009_A375_24H_X1_B20_COUNT_n384x978.gct n/a landmark Matrix of analyte counts per sample
LJP009_A375_24H_X1_B20_GEX_n384x978.gct 2 landmark gene expression (GEX) values
LJP009_A375_24H_X1_B20_NORM_n371x978.gct n/a landmark LISS normalized expession profiles
LJP009_A375_24H_X1_B20_QNORM_n371x978.gct 3 landmark quantile normalized (QNORM) expession profiles
LJP009_A375_24H_X1_B20_INF_n371x22268.gct 3 full quantile normalized (QNORM) expession profiles
LJP009_A375_24H_X1_B20_ZSPCQNORM_n371x978.gct 4 landmark differential expression (z-score) signatures
LJP009_A375_24H_X1_B20_ZSPCINF_n371x22268.gct 4 full differential expression (z-score) signatures
dpeak n/a n/a folder containing peak detection outputs and QC
liss n/a n/a folder containing LISS outputs and QC
calibplot.png n/a n/a Plot of invariant gene sets for each sample
quantile_plots.png n/a n/a Plot of the normalized and non-normalized expression quantiles

Other Tools and Demos (under matlab/demos_and_examples)

  • l1kt_dpeak.m: Performs peak deconvolution for all analytes in a single LXB file, and outputs a report of the detected peaks.
  • l1kt_plot_peaks.m: Plots intensity distributions for one or more analytes in an LXB file.
  • l1kt_parse_lxb.m: Reads an LXB file and returns the RID and RP1 values.
  • l1kt_liss.m: Performs Luminex Invariant Set Smoothing on a raw (GEX) input .gct file
  • l1kt_qnorm.m: Performs quantile normalization on an input .gct file
  • l1kt_infer.m: Infers expression of target genes from expression of landmark genes in an input .gct file

See the documentation included with each script for a details on usage and input parameters.

Demo

  • dpeak_demo.m: Demo of peak detection. To run the demo, start Matlab, change to the folder containing dpeak_demo and type dpeak_demo in the Command Window. This will read a sample LXB file (A10.lxb), generate a number of intensity distribution plots and create a text report of the statistics of the detected peaks (A10_pkstats.txt).

  • example_methods.m: Reads in a .gct and a .gctx file, z-score the data in the .gctx file, and read in an .lxb file. To run the demo, start Matlab, change to the folder containing example_methods and type example_methods at the command line.


R Tools: R/

Requirements:

  1. R versions 3.1 and above
  2. prada package: http://www.bioconductor.org/packages/devel/bioc/html/prada.html
  3. rhdf5 package: http://bioconductor.org/packages/release/bioc/html/rhdf5.html

Tools:

R tools are found under R/cmap

  • lxb2txt.R: Saves values from an LXB file as a tab-delimited text file.
  • lxb2txt.sh: Bash wrapper to lxb2txt.R
  • io.R: Classes for reading and writing .gct / .gctx files

Demo

  • example_methods.R: To run the demo, change to the folder containing example_methods.R and source the script. It will read in a .gctx file and display its contents.

Java Tools: java/

Tools

  • GctxDataset.java: Class to store a GCTX data set.
  • GctxReader.java: Class to read a GCTX file.
  • LXBUtil.java: Class to read and store .lxb files.

Demo

  • ReadGctxExample.java: To run the demo, change to the java/examples folder, then compile by running sh compileExamples.sh, then run by running the runExample*.sh file that alligns with your OS.

Python Tools: python/

Requirements

  1. Python 2.7 (untested under Python 3)
  2. numpy: http://numpy.scipy.org
  3. pandas: http://pandas.pydata.org/
  4. requests: http://docs.python-requests.org/en/latest/
  5. pytables: http://www.pytables.org/moin
  6. blessings: http://pypi.python.org/pypi/blessings

Setting the Python path

To run, append l1ktools/python to the PYTHONPATH environment variable.

Tools

  • cmap/io/gct.py: Classes to interact with .gct and .gctx files.

Demo

  • example_methods.py: To run the demo, change to the folder containing example_methods.py and run the script. It will read in a .gctx file, display its contents, and write to disc.

Common data analysis tasks

Reading .gct and .gctx files

  • MATLAB: Use the parse_gctx function.
  • R: Source the script l1ktools/R/cmap/io.R. Then use the parse.gctx function.
  • Python: Import the module cmap.io.gct. Then instantiate a GCT object, and call its read() method. For more information, see the documentation on the GCT class.
  • Java: See ReadGctxExample.java for an example.

Creating .gct and .gctx files

  • MATLAB: Use the mkgct and mkgctx functions.
  • R Source the script l1ktools/R/cmap/io.R. Then use the write.gctx or write.gct functions.
  • Python: Import cmap.io.gct and instantiate a GCT object. Then call the build method or build_from_DataFrame method to assmble a GCT object from a data matrix and optionally row and column annotations. Finally, call the write method to write to file as a .gctx.
    • Please note that a major update for gct/x related parsing and writing is planned for the near future. Key changes include:

      • Underlying data stored in pandas dataframe objects instead of SQLite. This will also enable users to take advantage of pandas’ pre-existing tools for data analysis, including slicing, multi-indexing, plotting, and more.
      • Object oriented class representation of parsed gct/x data, containing data and metadata fields (both separately and as a multi-indexed pandas data frame)
      • Easy to use methods for parsing, writing, and merging both gct and gctx files
      • Extensive testing (meant to also be contributed to) for more robust documentation of and fixes to edge cases as we/users encounter them

Z-Scoring a data set

  • MATLAB: Use the robust_zscore function. Also see the example_methods.m script.

Reading / converting .lxb files

  • MATLAB: To read an .lxb into the MATLAB workspace, use the l1kt_parse_lxb function.
  • R: To convert an .lxb file to text, use the R/cmap/lxb2txt.sh script.

l1ktools's People

Contributors

cogs avatar coreyflynn avatar jasiedu avatar jtolio avatar oena avatar rghammonds avatar tnat1031 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.