... adding Bayesian flavour to peptide quantification
◾Bayesian inference ◾Peptide quantification ◾Mass spectrometry ◾Proteomics
The mass spectrometry ion peak area of peptides is linearly related to the absolute amount of the peptides through a proportionality constant called conversion factor [4]. QPuB (Quantifcation of Peptides using Bayesian inference) employs Bayesian statistical inference based on Markov chain Monte Carlo (MCMC) sampling to learn the posterior distributions of the conversion factors for the peptide products without further experimentation.
The following instructions will help you to download QPuB and execute the same to estimate the conversion factors for your peptide products.
In order to run QPuB, users must install R ≥ 3.5.0 and the following R packages:
To install the above packages one by one, start your R shell, and execute the following command:
install.packages("package_name")
where the package_name should be replaced by the name of packages that are listed above. Alternatively, one can copy-paste the following command to the R terminal
install.packages(c("R.utils", "tictoc", "mcmcse", "mvtnorm", "tmvtnorm", "corpcor", "dqrng", "coda", "matrixcalc", "ggdmc", "sys", "matrixStats"))
and press enter.
REMEMBER, for Linux, the above command might need the root access.
Once all the prerequisites are met, QPuB needs to be downloaded from its Github repository. Click here to download the .zip file and unzip it. QPuB is designed to be invoked using the command line arguments as well as from as from any graphical user interface such as RStudio, but with little modifications in the code. See Running QPuB from RStudio below.
In order to run QPuB on your data from the terminal, follow these steps:
1. Make sure everything is properly installed on your computer (see Chapter 3 of the Documentation).
2. Make sure the data files and the input folder have the right structure (see Sections 2.3.1 and 2.3.2 of the Documentation for more details)
3. Open the terminal/command prompt (How to: Linux, Mac, Windows).
4. Navigate to the directory from where you want to run QPuB, For example, if the name of the directory is "ABC" then the command takes the following generic form:
$ cd <path to the ABC folder>
For more details on how to navigate to a particular directory using the cd command, see here.
5. Once you are in your working directory (the directory from where you wish to run QPuB), say "ABC", the command to run QPuB has the following generic form
$ Rscript <path to runQPuB.r relative to the folder ABC> -infol <path to input_folder> -outfol <output_folder> -titr <titration_file>
NOTE The commands described above are providing a generic structure and must not be copy-pasted directly in the terminal.
For execution, the fields enclosed within <> including the symbol <> must be replaced by appropriate folder and file names along with their relative paths (if required) as explained in the Examples below. For the output folder you should not provide the path, but only the name of the folder. The same applies for the titration data file. Note that for now, ./ and ../ notation does not work. Please use the full path. Also avoid leading and trailing slashes.
The table below describes the meaning of the command line arguments:
Flag | Meaning |
---|---|
-infol | flag for the input_folder. This input is mandatory. |
-outfol | flag for the output_folder. This input is optional, and the name can be defined by the user. |
-titr | flag for the titration_file. It is a csv-file. This input is optional. |
Flags can also be specified using --
as prefix. For example, --infol=inputfolder
. If you do not wish to provide the output folder and/or titration data, please do not type the respective flags. Please follow the examples below for illustrations.
The examples folder contains two toy examples of endopeptidase digestion, one toy exopeptidase digestion, and a real data gp10040-52 digestion by 26S proteasomes. In order to run these examples, execute the following commands in the terminal assuming your current directory is the QPuB-master folder:
Rscript ./QPuB/runQPuB.r -infol examples/insilico_endopep_nonoise
Rscript ./QPuB/runQPuB.r -infol examples/insilico_endopep__noise
Rscript ./QPuB/runQPuB.r -infol examples/insilico_exopep_noise
Rscript ./QPuB/runQPuB.r -infol examples/gp100_40-52_proteasome -titr 190423_K386_titration_substrate_charge_3.csv
Running QPuB from RStudio
One can execute QPuB from RStudio, but in that case one first needs to open the main Rscript runQPuB.r and disable/comment out the following code snippet in the head of that script
args = commandArgs(trailingOnly=FALSE, asValue=TRUE)
keys <- attachLocally(args)
if (!exists('infol')){
stop("This is the QPuB package. \n
At least one argument must be supplied: -infol.\n
-infol inputfolder (COMPULSORY): folder containing all of the following input files \n AND folder 'data' with at least one csv file with input data. \n
-outfol outputfolder (OPTIONAL): if not provided, default name will be 'OUTPUT_inputfolder(i)'. \n
-titr titrationfile (OPTIONAL): csv file with titration data \n
For documentation see: https://github.com/QuantSysBio/QPuB.\n", call.=FALSE)
}
and enable/uncomment the following snippet and edit the variables according to the desired example. For Prot_K386 with an output folder my_fancy_name, the snippets looks like the following:
# setwd("...")
# file <- 'runQPuB.r'
# infol <- 'examples/Prot_K386'
# outfol <- 'my_fancy_name'
# titr <- '190423_K386_titration_substrate_charge_3_K386.csv'
However, in order to avoid complications (especially crashing of RStudio for large datasets), users are strongly recommended to use the command line as described above.
QPuB generates the following set of output files and folders as the algorithm proceeds:
File/Folder | Description |
---|---|
runQPuB_ROUT.txt | Progress report of the algorithm. The output of every print command directly goes into this file. |
plots_inputsignals | Folder containing the kinetic plots of the input signal intensities for all peptide products |
peptide_i.png | Kinetic plot of the input signal intensities for peptide i. In the folder. |
identifier.csv | Table of peptide numbers, sequences and position codes |
plots_diagnostics | Folder containing the diagnostic plots of the chain and the residuals |
chain_j.pdf | Trace plots and distributions of the Markov chain at jth iteration. In the folder. |
residuals_j.txt | Residual plots at jth iteration. In the folder. |
chain.RData | Markov chain: time series of all parameters |
chain_backscaled.RData | Markov chain after backscaling |
statistics.csv | Summary statistics of the backscaled Markov chain |
statistics.pdf | Graphical summary statistics of the backscaled conversion factors for all peptide products |
substratetitration_given.png | Plot of the titration data and the linear fit. Only if titration provided. |
substratetitration_normalized.png | Plot of the normalised titration data and the linear fits. Only if titration provided. |
concentrations | Folder containing the numerical values of the absolute amounts (if titration provided) or normalised signals (if not provided) |
conc_five_rp.csv, conc_median_rp.csv, conc_ninetyfive_rp.csv | 0.05, 0.5 and 0.95 quantiles of concentration kinetics of all products for the rpth replicate. In the folder. |
plots_concentrations | Folder containing the kinetic plots of the absolute amounts (if titration provided) or normalised signals (if not provided) |
peptide_i.png | Kinetic plot of the absolute amounts or normalised signal for peptide i. In the folder. |
Output generated while running the examples are illustrated in Chapter 5 of the Documentation.
In this chapter, we list a non-exhaustive list of issues that the user might run into while running QPuB. However, users are strongly recommended to keep track of the runQPuB_ROUT.txt from the very beginning of the run, as all the outputs, including the warnings for erroneous input provided, are printed.
The user must make sure that the installed version of R is 3.5.0 or above. In case, the user has multiple installations of R, the path to the R executable corresponding to the version specified for QPuB should be provided. For Windows, please click here for more details on how to include the desired R executable in the PATH variable.
Care must be taken while writing the command line to run QPuB. For example, the following type of command will generate error:
$ Rscript path_to_runQPuB.r -infol ../examples/example_folder
Instead, the right command type is
$ Rscript path_to_runQPuB.r -infol examples/example_folder
The same rules applies while including the titration file, for example, assuming the name of the example folder as P and the corresponding titration file name is titration_P, the following command
$ Rscript path_to_runQPuB.r -infol examples/P -titr examples/P/titration_P.csv
will generate an error, while the command
$ Rscript path_to_runQPuB.r -infol examples/P -titr titration_P.csv
will work nicely.
It may happen that the Markov chain is not exploring the parameter
space. There might be several reasons for that. For example, the value
of sigma_start
is so low that the chain stops exploring the parameter
space. At the same time if the value of the GammaExponent
is higher,
due to vanishing nature of the adaptation, the covariance matrix will
remain stationary leaving the chain stuck at an undesired region.
Therefore, one solution to this is to stop the adaptation at the
beginning for sufficient exploration and then resume the adaptation
again after visually inspecting the chain. This may take several trial
runs to know the number of iterations needed before starting the
adaptation. Remember that this problem is different than the one where
the chain needs to adapt at the beginning to reach the target
covariance matrix. But, it may take over millions of iterations before reaching the target region. This is
especially the case for higher dimensions. Overall, depending upon the
problem, users are recommended to experiment with the parameters
controlling the adaptation scheme and the Markov chain. In [3], the authors
provide an excellent summary of various pathological cases regarding the
adaptive scheme and optimal scaling for MCMC.
A documentation explaining the details of QPuB is available as PDF.
[1] Henze, S., Paul, D., Mansurkhodzhaev, A., Henklein, P., Textoris-Taube, K., Urlaub, H., Mishto, M., and Liepe, J. (2019). Quantification of in vitro peptide hydrolysis and protease-catalyzed peptide splicing using Bayesian inference. Under review.
[2] Henze, S*., Paul, D*., Mishto, M., Liepe, J. (2019). QPuB - Quantification of Peptides using Bayesian inference. In preparation. *Equal contributions
[3] Rosenthal, J. S. (2011). Optimal proposal distributions and adaptive MCMC. Handbook of Markov Chain Monte Carlo, 4(10.1201).
[4] Peters, B., Janek, K., Kuckelkorn, U., Holzhütter, H.-G. (2002). Assessment of proteasomal cleavage probabilities from kinetic analysis of time-dependent product formation, Journal of molecular biology, 318(3), 847-862.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.