Code Monkey home page Code Monkey logo

pathway-centrality's Introduction

pathway-centrality

This program implements a module that calculates pathway-centrality scores for pre-defined pathway gene sets. Pathway centrality measures the amount of disease-specific communication passing through each pathway gene set, by counting the number of shortest paths between disease genes and differentially expressed genes. Significance of observed pathway-centrality scores for pathways are assessed via permutation tests using 10,000 pathway genes randomly selected from 2-core of the input network.

The program requires 5 arguments:

  1. a file containing genes with known mutation associated with disease of interest (-d): i.e., sample_data/bpd.disease.genes.txt
  2. a file containing genes differentially expressed within disease of interest (-e): i.e., sample_data/bpd.diff.exp.genes.txt
  3. a file containing protein-protein interaction pairs (-p): i.e., sample_data/hippie_high_ppi.txt
  4. a file containing pathway gene sets in .gmt file format (-g): i.e., sample_data/c2.cp.kegg.v6.0.entrez.gmt
  5. output directory where all output files will be placed (-o): i.e., sample_data/output/

Genes should use exactly same identifications across all the input files. In our sample_data, genes are identified using Entrez Gene IDs.

The example command_line to run the program is: python PCmain.py -d sample_data/bpd.disease.genes.txt -e sample_data/bpd.diff.exp.genes.txt -p sample_data/hippie_high_ppi.txt -g sample_data/c2.cp.kegg.v6.0.entrez.gmt -o sample_data/output/

The program will create 11 files:

  1. pc_disease_genes.txt: input disease genes, except those that also exist in differentially expressed gene set
  2. pc_diff_exp_genes.txt: duplicated copy of input differentially expressed genes
  3. pc_overlapping_genes.log: genes that exist in both disease gene set and differentially expressed gene set - these genes are removed from the disease gene set
  4. pc_network_lcc.txt: protein-protein interaction pairs in the largest connected component of the given ppi networks
  5. pc_disease_genes_not_in_lcc.log: diseaes genes that are not in 4), excluded from the experiment
  6. pc_diff_exp_genes_not_in_lcc.log: differentially expressed genes that are not in 4), excluded from the experiment
  7. pc_shortest_paths.txt: all possible shortest paths from input disease genes to differentially expressed genes in the largest connected component.
  8. pc_pathway_genes.txt: input pathway gene sets, excluding disease genes and differentially expressed genes
  9. pc_scores.txt: pathway centrality score calculated for all pathway gene sets
  10. pc_p_cent.txt: p-value calculated for observed pathway centrality score for each pathway gene set using permutation tests
  11. pc_p_cent.log: log file for permutation test, contains genes in the pool for random sampling and time records for progress

pathway-centrality's People

Contributors

jisoopark avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.