Code Monkey home page Code Monkey logo

pquant's Introduction

pquant

pquantR is a python and R package to perform downstream analysis of proteomicsLFQ quantitative data. It also included R shiny application which is designed to do the downstream analysis of proteomics dataset, currently these figures are included: Heatmap, Volcano Plot, QC plot.

Because the application is in developing, the test figures are drawn by test R packages separately now.

Installation of environment

The build.sh bash script use conda and mamba to install the environment to install all the dependencies and packahes in python and R.

Prerequisites:

After the installation of conda and mamba you can run the following script:

$> source build.sh

sample datasets

Shiny application

  1. Preparing for shiny app.
  • Download pquantR folder from this page and put it in a suitable working path.
  1. Run app.R in the folder, and you could run it in the following ways.
  • Upload the data, choose the parameters of MSstats then submit.
  1. We could see visualization of processed data and differentially abundant proteins.

Todo list

  1. Add download function, including PDF,PNG format, etc.
  2. Other interactive plots.

pquant's People

Contributors

daichengxin avatar douerww avatar enriquea avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pquant's Issues

Methods to implement.

The shiny application should implement three major methods:

These are the three fundamental methods that we will use for downstream analysis of LFQ and TMT data. The user should be able to use one of the other from the shiny application. Major challenges. We know that triqler is a python application which mean that shiny will not perform the analysis but will be able to visualizer the results. This is the main reason why this should be leave for the last option.

First release of pquantr

The following issue track all the pending tasks before the formal release of the pquantr shiny application:

  • Fix the current issues #44
  • Define if Proteus would be implemented in the package as another alternative for downstream analysis #71

Export results to EA file format

@enriquea @Douerww :

For every pipeline:

  • MSststas
  • Proteus
  • Triqler

We should output a file with the following structure:

https://github.com/bigbio/pquant/blob/dev/output/output_format/E-PROT-39-DE/E-PROT-39-analytics.tsv

In that file the columns have the following meaning:

  • The first column can be Gene ID or Protein ID depending on the pipeline.
  • Gene Name if available
  • g4_g3.p-value (g4 and g3) p-value
  • g4_g3.log2foldchange (g4 and g3) log2foldchange

The g1, g2, g3, are a group of samples from the SDRF that can be found here:

https://github.com/bigbio/pquant/blob/dev/output/output_format/E-PROT-39-DE/E-PROT-39-configuration.xml

In the configuration file we have the following:

<assay_groups>
            <assay_group id="g1" label="octogenarian age bracket; Alzheimer's disease">
                <assay>4.AD_B</assay>
                <assay>4.AD_A</assay>
                <assay>4.AD_C</assay>
            </assay_group>
            <assay_group id="g2" label="octogenarian age bracket; normal">
                <assay>3.AD_B</assay>
                <assay>3.AD_C</assay>
                <assay>3.AD_A</assay>
            </assay_group>
            <assay_group id="g3" label="sexagenarian age bracket; Alzheimer's disease">
                <assay>2.AD_B</assay>
                <assay>2.AD_A</assay>
                <assay>2.AD_C</assay>
            </assay_group>
            <assay_group id="g4" label="sexagenarian age bracket; normal">
                <assay>1.AD_B</assay>
                <assay>1.AD_A</assay>
                <assay>1.AD_C</assay>
            </assay_group>
        </assay_groups>
        <contrasts>
            <contrast id="g2_g1" cttv_primary="1">
                <name>'Alzheimer's disease' vs 'normal' in 'octogenarian age bracket'</name>
                <reference_assay_group>g2</reference_assay_group>
                <test_assay_group>g1</test_assay_group>
            </contrast>
            <contrast id="g4_g3" cttv_primary="1">
                <name>'Alzheimer's disease' vs 'normal' in 'sexagenarian age bracket'</name>
                <reference_assay_group>g4</reference_assay_group>
                <test_assay_group>g3</test_assay_group>
            </contrast>
        </contrasts> 

The g2, g3 , g4 is a simple way to call the group condition without needing to name the full condition. Contrasts are the comparison between two groups and how this will call in the https://github.com/bigbio/pquant/blob/dev/output/output_format/E-PROT-39-DE/E-PROT-39-analytics.tsv

Implement package structure

We need to implement a package structure for the project, this will organize the following things:

  • Organize dependencies.
  • Organize the structure of the data and the code and sample data.
  • Organize analysis of the project in two types of analysis: TMT and Label-free
  • Organize the data structure for visualization purpose.

Need some kind of progressbar when processing is happening

Douer:

We need some kind of progress bar when the MSstats/Proteus processing is happening because if not the user doesn't know what is going on and can continue clicking in the interface everywhere without knowing that a process is happening.

Proteus implementation for TMT and LabelFree

Currently, the main method in the application for downstream analysis is MSststas. However, Proteus, a limma-based package for TMT and LFQ can be also used in pquantr. Before the implementation in the package, the following task should be done:

  • Benchmark Proteus and MSstats for TMT and LFQ data. Explore the differences.
  • If Proteus offers better results, implement Proteus as an alternative downstream analysis package.
  • Release a new version of the tool with both packages.

Even if Proteus is not implemented after the benchmark. Would be great @Douerww to keep track of that benchmark for future decisions about the project and also for publication purpose.

Running triqler

I have run Triqler in the UPS1 dataset using the parameters suggested by @MatthewThe:

  • What happens if you define as minimum samples where a peptide is quantified as 2 (default). In the example that @MatthewThe uses 6 .
  • What happens if we use as fold_change_eval 0.5 because we want to load in a system then all the quantified proteins and let the system move queries across values.
  • @MatthewThe, in order to compare between method we have decided to use the following format (https://github.com/bigbio/pquant/blob/main/output/output_format/E-PROT-39-DE/E-PROT-39-analytics.tsv). I have seen that the output of triqler is per comparsion. I want to doble check with you for each file which column correspond to g4_g3.p-value g4_g3.log2foldchange

`

Improvements and Bugs to be fixed in pquantr

@Douerww Some major improvements we should do:

  • Can we remove the heatmap generation using python an use R. I think we should remove the python call from the code.

  • I guess the plot instead of one close to another should be one after another.

  • I think we should try to avoid this

    setwd("../data/")
    Why if the user provide a file, we need to go to data, should'nt be possible to read from what ever user give you. One solution would be to copy the file to the sample where the application is stored.

  • When the user select a protein in the protein table, it should be possible to change the panels in the protein plots. I think that was the approach of Proteus shiny.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.