Code Monkey home page Code Monkey logo

psichomics's Introduction

psichomics

GitHub Actions Build status codecov

Original article:

Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2019). psichomics: graphical application for alternative splicing quantification and analysis. Nucleic Acids Research. 47(2), e7.

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) project, Sequence Read Archive (SRA) and user-provided data.

psichomics interactively performs:

  • Dimensionality reduction
  • Median- and variance-based differential splicing and gene expression analyses
  • Survival analysis
  • Correlation analysis
  • Grouping by clinical and molecular features (such as tumour stage or survival)
  • Genomic mapping and functional annotation of alternative splicing events and genes

Differential splicing analysis in psichomics

Table of Contents

Install and start running

Bioconductor

To install the package from Bioconductor, type the following in RStudio or in an R console:

install.packages("BiocManager")
BiocManager::install("psichomics")
library("psichomics")
  1. RStudio is now accessible via the web browser at https://localhost:8787
  2. Enter RStudio with user rstudio and password bioc
  3. Load psichomics using library(psichomics)
  4. Start the visual interface of psichomics with psichomics()

Start the visual interface of psichomics with psichomics()

GitHub

Install from GitHub (specify a branch or tag via the ref argument):

install.packages("remotes")
remotes::install_github("nuno-agostinho/psichomics", ref="master")
library("psichomics")

Start the visual interface of psichomics with psichomics()

Docker

The Docker images are based on Bioconductor Docker and contain psichomics and its dependencies.

  1. Pull the latest Docker image:
docker pull ghcr.io/nuno-agostinho/psichomics:latest
  1. Start RStudio Web from the Docker image:
docker run -e PASSWORD=bioc -p 8787:8787 ghcr.io/nuno-agostinho/psichomics:latest
  1. Go to RStudio Web via the web browser at https://localhost:8787
  2. Log in RStudio with user rstudio and password bioc
  3. Load psichomics using library(psichomics)
  4. Start the visual interface of psichomics with psichomics()

Tutorials

The following case studies and tutorials are available and were based on our original article:

Another tutorial was published as part of the Methods in Molecular Biology book series (the code for performing the analysis can be found here):

Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2020). Interactive Alternative Splicing Analysis of Human Stem Cells Using psichomics. In: Kidder B. (eds) Stem Cell Transcriptional Networks. Methods in Molecular Biology, vol 2117. Humana, New York, NY

Workflow

Data input

Automatic retrieval and loading of pre-processed data from the following sources:

  • TCGA data of given tumours, including subject- and sample-associated information, junction quantification and gene expression data
  • GTEx data of given tissues, including subject- and sample-associated information, junction quantification and gene expression data
  • SRA data from select SRA projects via the recount package

Other SRA, VAST-TOOLS and user-provided data can also be manually loaded. Please read Loading user-provided data for more information.

Alternative splicing quantification

The quantification of each alternative splicing event is based on the proportion of junction reads that support the inclusion isoform, known as percent spliced-in or PSI (Wang et al., 2008).

An estimate of this value is obtained based on the the proportion of reads supporting the inclusion of an exon over the reads supporting both the inclusion and exclusion of that exon. To measure this estimate, we require:

  1. Alternative splicing annotation: human annotation is provided and custom annotations can be prepared for use in psichomics.
  2. Quantification of RNA-Seq reads aligning to exon-exon splice junctions (exon-exon junction quantification), either user-provided or retrieved from TCGA, GTEx and SRA.

Gene expression processing

Gene expression can be normalised, filtered and log2-transformed in-app or provided by the user.

Data grouping

Molecular and clinical sample-associated attributes allow to establish groups that can be explored in data analyses.

For instance, TCGA data can be analysed based on smoking history, gender and race, among other attributes. Groups can also be manipulated (e.g. merged, intersected, etc.), allowing for complex attribute combinations. Groups can also be saved and loaded between different sessions.

Data Analyses

  • Dimensionality reduction via principal and independent component analysis (PCA and ICA) on alternative splicing quantification and gene expression.

  • Differential splicing and gene expression analysis based on variance and median parametric and non-parametric statistical tests.

  • Correlation between gene expression and splicing quantification, useful to correlate the expression of a given event with the expression of RNA-binding proteins, for instance.

  • Survival analysis via Kaplan-Meier curves and Cox models based on sample-associated features. Additionally, we can study the impact of a splicing event (based on its quantification) or a gene (based on its expression) on patient survivability.

  • Gene, transcript and protein annotation, including relevant research articles.

Feedback and support

Please send any feedback and questions on psichomics to:

Nuno Saraiva-Agostinho ([email protected])

Disease Transcriptomics Lab, Instituto de Medicina Molecular (Portugal)

References

Wang, E. T., R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. F. Kingsmore, G. P. Schroth, and C. B. Burge. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456 (7221): 470–76.

psichomics's People

Contributors

github-actions[bot] avatar hpages avatar imgbotapp avatar nuno-agostinho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

psichomics's Issues

Normalise gene expression

Besides loading gene expression already normalised, it would be great to normalise by our own means.

Select dataset and/or data category to perform a given analysis

This may depend on the analyses for different dataset types. For instance, an analysis that works with gene expression or junction quantification. However, there are times where two different datasets of the same type may be loaded (for example, two datasets of junction quantification obtained through different methods).

Use text suggestions where possible

One such place that could use text suggestions is when asking the user for folders where data is stored (don't forget that the filesystem separator differs by OS and can be retrieved in R using the variable .Platform$file.sep).

  • Selecting folder
  • Subset expression (suggest dataset columns)

Calculate inclusion levels for all event types

  • Exon skipping
  • Mutually exclusive exons
  • Intron retention (not available using only junction read counts)
  • Alternative 3' splice site
  • Alternative 5' splice site
  • Alternative first exon
  • Alternative last exon
  • Tandem UTR

Only show groups when groups are needed

Instead of having a tab to edit groups, it would be best to allow group editing only when asking a user for groups. This way, it would be much more intuitive to understand what the groups are for with no more explanation (in the current method, people need to understand what the groups are for before doing analysis).

Add Wilcoxon test warnings in user interface

Wilcoxon test warns that it cannot compute exact p-values with zeroes or ties in the console. These warnings should be in the user interface instead.

  • Show warnings in the interface
  • Describe the error

Allow (un)selecting all checkboxes in groups

The groups should be easily (un)selectable by using a master checkbox (which is actually there right now but that does nothing). This could be solvable by calling the JavaScript function after the checkbox is generated (using a DataTables' callback).

Easy to do it with the DT R package, see #151.

Add heatmap

  • Heatmap
  • Allow to plot heatmap with the points of other analyses

Uniform colour for each group

This could be achieved by selecting a colour for each created colour in the groups section. The colours used in there would then be used in all plots when representing that same group.

This may reveal a bit problematic if there are a great number of groups.

Introduce parallel programming in slow loops

The number of cores to use could default to 1 (for safety reasons) or maybe use 2 or 4 depending on cores available… is it possible to check available cores in R?

This should be a setting the user could modify (see #80).

Show table of splicing events with statistic tests

It can be a bit slow to calculate all statistical tests for the events. For this reason, this action should probably be initiated by the user.

The rank/sort feature for the user is easy if this is done in a Data Table.

Limit enclosing box of text complete suggestions and adjust placement

Currently, this box is too big (if the elements within it are big) and may be placed outside the browser window which doesn't allow the user to read the suggestions.

If not possible, an alternative to resizing the box is to at least force the left side of the box inside the browser window.

Save loaded files as individual RDS files

This will allow faster loading of files next time the program opens. By using the exact same name (excluding extension), it'd be easy to compare which files have RDS equivalents or not.

Check if folder exists in the folder fields

Right now, it only tries to get what's inside the path without giving any error.

  • Check if path is not to a file (else warn the user) (2f71fa1)
  • Check if folder exists (else warn the user) (2f71fa1)
  • Only suggest folders in suggestions instead of showing both folders and files (43e9a90)

Create clusters in PCA plot and use those clusters as data groups

These clusters could then be used in survival curves and other analyses.

  • Separate based on user-created line (maybe validated with Fisher's exact test)
  • Separate by k-means / PAM / CLARA
  • Assist methods by showing optimal number of clusters (using total within sum of squares, silhouette width and gap statistics)
  • Validate with cluster validity methods
  • Plot clusters in PCA using the polygon series
  • Create groups based on clusters

See also cluster analysis in R (1 and 2) and cluster validation using cIVALID

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.