Code Monkey home page Code Monkey logo

sys-bio's Introduction

Systems Biology Analytics

Analysis workflow for computational tools often used in systems biology and bioinformatics to extract insights from data-driven findings   by Amal Katrib
 

To perform functional enrichment analysis, leveraging the Enrichr list of curated gene set libraries to extract significantly represented:

  • Pathways, molecular functions, & biological processes
  • Co-expressed/-localized molecular factors & interactors
  • Phenotypes & clinical traits
  • Diseases, pathologies, & symptoms
  • Drugs & therapeutic targets

The analysis can be performed either:
[ ONLINE ]   using the web interface
[ OFFLINE ]   by downloading the R package from CRAN using install.packages("enrichR")
 

Many researchers use a wide range of "Combined Score" cutoffs to assess the significance of gene set/functional enrichment findings from Enrichr. The ad hoc selection of a significance threshold seems to be, for the most part, arbitrary and purely subjective (i.e., not backed up by clear-cut logic and scientific reasoning and, in some cases, even biased, driven by the temptation to produce favorable outcomes).

A common practice that is arguably quite reasonable, albeit not entirely devoid of shortcomings, is to: (a) apply an adjusted p-value ("q-value") cutoff of 0.01-0.1 to filter enriched terms; (b) use the "combined score" output (which has been extensively shown to outperform other ranking metrics due to its inherent z-score permutation background correction on Fisher's exact test p-value1) to sort those filtered terms, in descending order, and then (c) select top X highest ranked terms to identify significantly over-represented functional categories.

While less frequently employed, the aforementioned workflow can be further modified to better address the question at hand and refine the contextual interpretability of enrichment findings. This can be as simple as imposing an additional "Combined Score" threshold (such as >15 or >30, with higher values being more stringent) to narrow down the list of significant results. The selection of enriched terms can also be further optimized as to preserve and prioritize those that are key to the underlying question (for example, by excluding irrelevant gene sets, assigning knowledge-based weights to favor some gene sets over others, concatenating closely-linked gene sets, mapping genes/gene sets onto functional interaction networks to identify topologically-matching processes, etc.)


To extract "spatially-correlated" genes isoforms & proteins, exhibiting a significant overlap in organ-, tissue-, and cell type-specific expression profile, using data downloaded from Human Protein Atlas (HPA) and then further adjusted to facilitate a streamlined analysis

The original data files can be directly downloaded from the HPA webpage
 
[ hpa.tissue.csv ]   user-adjsuted dataset is formed by running:

# load the following HPA datasets:
# "normal tissue" & "RNA consensus tissue gene"
tissue1 = read.csv("normal_tissue.tsv", sep="\t")
tissue2 = read.csv("rna_tissue_consensus.tsv", sep="\t")

# merge dataframes per largest size
hpa.tissue = tissue2 %>% left_join(tissue1)

# adjust the formatting as needed

 
[ hpa.blood.csv ]   user-adjsuted dataset is formed by running:

# load the following HPA datasets:
# "RNA HPA blood cell gene", "RNA Monaco blood cell gene", & "RNA Schmiedel blood cell gene"
blood1 = read.csv("rna_blood_cell.tsv", sep="\t")
blood2 = read.csv("rna_blood_cell_monaco.tsv", sep="\t")
blood3 = read.csv("rna_blood_cell_schmiedel.tsv", sep="\t")

# merge dataframes per largest size
hpa.blood = blood2 %>%
left_join(blood1, by = c("Gene","Gene.name", "Blood.cell")) %>%
left_join(blood3, by = c("Gene","Gene.name", "Blood.cell"))

# adjust the formatting as needed

 

To conceptualize gene interactions by using read count measurements to construct a corresponding Weighted Gene Co-expression Network (WGCNA) and analyzing it to extract network-level trends that are specific to the data

 

sys-bio's People

Contributors

akatrib avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.