it is quite clear now, that 100% digestion efficiency with trypsin should not be assumed in proteomics workflows. Inefficient trypsin digestion also posses a very serious problems in absolute quantitation workflows using labelled isotopic standards.
The way isotopic standards are currently used is peptides to be quantified are synthesised labelled. Then a known amount of the labelled peptide is spiked in the sample prior to its analysis by LC-MS. After the acquisition the amount of unlabelled peptide (and hence its protein of origin), is computed as foolows
quantity_unlabelled = signal_unlabelled/signal_labelled * quantity_labelled
Consider quantitation of the following peptide: VTTYFPSVNLR. Below is a piece of protein sequence it originates from:
GNIR.VTTYFPSVNLR.KSSQK
note to get the peptide out of the protein digestion should occur after R, however R is followed by K, which is expected to result in two dead-end products:
VTTYFPSVNLR and VTTYFPSVNLRK
as a result the amount of VTTYFPSVNLR peptide is no longer proportional to protein amount and if absolute quantitation is performed using this peptide only, the amount of protein will be underestimated (a specific example of this happening is given in ref1).
The most obvious approach to counteract the problem is to ignore peptides like this. However this is not usually possible, given that only a limited amount of peptides suitable for quantitation is available per every protein. Thus the best solution is to mimic cleavage site by adding 3 amino acids before and after.
However consider the following peptide:
QNGRLR.HFTIPSHR.ARAGR
if we add RLR on N-teminus of peptide sequence again the cleavage site does not mimic what happens in the protein since if cleavage occurs after the first R in the protein it yeilds a dead end product:
LR.HFTIPSHR
hence the overhang needs to be extended 3 aa before the RLR. However this extension of overhangs is not always possible, since there is a limit to peptide's length (usually a synthetic peptide of no longer than 20aa) can be synthesised, hence additional parameters need to be passed to the model to determine the optimal compromise.
I will write out a detailed outline of the workflow if this functionality is to be added to cleaver.
references:
- Kito, Keiji, et al. "A synthetic protein approach toward accurate mass spectrometric quantification of component stoichiometry of multiprotein complexes." Journal of proteome research 6.2 (2007): 792-800.