lgatto / msnbase Goto Github PK

Base Classes and Functions for Mass Spectrometry and Proteomics

Home Page: http://lgatto.github.io/MSnbase/

R 94.37% C++ 0.31% C 3.03% TeX 2.25% Makefile 0.04%

bioconductor mass-spectrometry r proteomics bioinformatics proteomics-data visualisation

msnbase's Introduction

The `MSnbase` package

MSnbase is an R/Bioconductor package that provides infrastructure for plotting, manipulation and processing mass spectrometry and proteomics data. The project was started by Laurent Gatto in October 2010 (Mon Oct 4 23:35:23 2010, according to the git log) and has, since then, benefited from various contributions, in particular Sebastian Gibb and Johannes Rainer.

The official package page is the Bioconductor landing page (release or devel versions). The github page page is for active development, issue tracking and forking/pulling purposes.

To get an overview of the package, see the MSnbase-demo vignette. More vignettes are available in the Articles tab.

The R for Mass Spectrometry initiative

The aim of the R for Mass Spectrometry initiative is to provide efficient, thoroughly documented, tested and flexible R software for the analysis and interpretation of high throughput mass spectrometry assays, including proteomics and metabolomics experiments. The project formalises the longtime collaborative development efforts of its core members under the R for Mass Spectrometry organisation to facilitate dissemination and accessibility of their work.

If you are using MSnbase, consider switching to the R for Mass Spectrometry packages, in particular, Spectra for raw data, PSMatch for identification data, and QFeatures for quantitative data. See https://RforMassSpectrometry.org for details.

Installation

To install the package:

install.packages("BiocManager")
BiocManager::install("MSnbase")

If you need the github version (not recommended unless you know what you are doing), use

BiocManager::install("lgatto/MSnbase")

Questions

General questions should be asked on the Bioconductor support forum, using MSnbase to tag the question. Feel also free to open a GitHub issue, in particular for bug reports.

Citation

To cite the MSnbase package in publications, please use:

Gatto L, Lilley KS. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics. 2012 Jan 15;28(2):288-9. doi:10.1093/bioinformatics/btr645. PubMed PMID:22113085.

MSnbase, efficient and elegant R-based processing and visualisation of raw mass spectrometry data. Laurent Gatto, Sebastian Gibb, Johannes Rainer. bioRxiv 2020.04.29.067868; doi: https://doi.org/10.1101/2020.04.29.067868

Contributing

Contributions to the package are more than welcome. If you want to contribute to this package, you should follow the same conventions as the rest of the functions. Please do get in touch (preferable opening a github issue) to discuss any suggestions. The MSnbase development vignette gives some background on the class infrastructure.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

msnbase's People

Contributors

Stargazers

Watchers

Forkers

vladpetyuk martifis jgriss inambioinfo jgoveia alenzhao genomicsnx josuat myjajarm arnesmits kamalfartiyal84 chasemc siggismara adder franciscodavid floraliu1011 iszhi sgibb const-ae nilshoffmann meowcat mjhelf sebncl maxprofs-llcio culpinnis cvanderaa courcelm yguitton sneumann ricoderks educhicano manniealfaro lingjuewang miguelcos vdtoorn stanstrup procha2 lmsimp muyaoxi9271 shoo99 ibphuangchen pascallio shendo19 liujiayi777

msnbase's Issues

internal fragmentation

I am now doing some intact protein analysis and it was recently demonstrated that when you fragment proteins you produce a lot of internal fragments:

http://www.ncbi.nlm.nih.gov/pubmed/25716753

considering these internal fragments results in a huge boost in coverage. Could internal fragmentation be introduced in calculateFragments

the generic for "trimws" hides base::trimws

Since MSnbase 1.19.9 the import of Spectrum.xml files in synapter isn't working anymore. That happens because MSnbase maskes base::trimws by defining a new generic.

To reproduce:

Rdevel> R.version$version.str
[1] "R Under development (unstable) (2015-11-08 r69614)"
Rdevel> trimws(" xx ")
[1] "xx"
Rdevel> library("MSnbase"); packageVersion("MSnbase")
# ...
The following object is masked from ‘package:base’:

    trimws

[1] ‘1.19.9’
Rdevel> trimws(" xx ")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘trimws’ for signature ‘"character"’

IMHO there are two possible solutions:

1. Defining a generic that uses all arguments of trimws:

# AllGenerics.R
setGeneric("trimws", function(x, which = c("both", "left", "right"))standardGeneric("trimws"))
# utils.R
setMethod("trimws", c("data.frame", "character"), function(x, which) { ... })

1. Or creating a new trimws that in contrast to the base::trimws accepts the ... argument (as BiocGenerics does it for is.unsorted: https://github.com/Bioconductor-mirror/BiocGenerics/blob/master/R/is.unsorted.R)

adding mzidentml data to MSexp object fails

I try to add a .mzid file to my MSexp object but when I look at fData I only see NA's. There are for sure identification in the file (it's 50 mb and when I open it in an editor I see lots of entries). The .mzid is generated by msgf+ on a search perfomed on a MGF file. The MSexp object is generated with the same MGF file. I also loaded the results in Peptideshaker and tried to create an .mzid file with the peptideshaker export. Same result.
When I try adding identification files in MSnbase with the files provided with the MSnbase package, it works.

Is it something to do with the MGF file?

Greetz

Check centroided in plotmzDelta

Is documented in Rd file.

Check centroided on quantify

When quantifying an MSnExp or Spectrum, check the spectrum@centroided slots and stop/warn accordingly.

Use BiocParallel

Replace foreach/doMC by BiocParallel.

Implement a combine,MSnExp,MSnExp method

use combine,AnnotatedDataFrame,AnnotatedDataFrame-method to combine feature data (as in the MSnSet method); other slots should use same methods as MSnSet as well
a column should also be added to the feature data to record the experiment and/or file names the respective spectra originate from
for the assay data, if the combined feature names are unique, they could be combined and kept as is [1]
if however there are duplicates in the combined feature names are, they should be many unique, possible using updateFeatureNames [1], or by adding .1 and .2 to the spectra of the first and second inputs.

[1]: this will probably require a featureNames<- for the MSnExp class.

Can readMSData reading more than two files be run with parallel?

Hi Laurent,
Can readMSData reading more than two files be ran with parallel? It should be very useful to reduce the time when reading multiple files.
How do you think?
Best regards!
Bo

bin_Spectrum bug

The latest commit 77515b9 and ecaaa32 fix a bug report by Weibo Xie by email and add a unit test.

@sgibb - could you check if you are fine with this before I commit to Bioc.

pickPeaks error in release

Using MSnbase version 1.16.0 (currently release):

library("MSnbase")
data(itraqdata)
  > pickPeaks(itraqdata)
  |=                                                                     |   2%Error in noise[, 2L] : incorrect number of dimensions

This however works fine with devel. The problem is in the MSnbase:::pickPeaks_Spectrum

isAboveNoise <- object@intensity > (SNR * noise[, 2L])

which is

isAboveNoise <- object@intensity > (SNR * noise)

in devel.

@sgibb is this something that was not backported in release?

problem with loading mzml with a single MS1 spectrum

I have a quite unusual application for MSnbase. To assess peptide purity I perform DI on orbitrap then I pull all MS1 spectra, perform charge state deconvolution (XTRACT algorithm) and export them as a single spectrum so I get mzML with only a single MS1 spectrum. When I try to plot it in MSnbase I get the following problem:

rawdata <- readMSData("C1_01.mzML", msLevel = 1, verbose = FALSE)
MSSpectrum <- rawdata[[1]]
MSSpectrum
Object of class "Spectrum1"
Retention time: 5:0
MSn level: 1
Total ion count: 19567
Polarity:
plot (MSSpectrum)
Error in (function (el, elname) :
Element title must be a element_text object.

I understand smth is missing in this spectrum, but I do not know what. Perhaps you can have a look (sent the file by email)?

iPQF: spectra/PSMs or peptides

@martifis In you comments, you mention that you start from spectra/PSMs to summarise quantitative data into proteins. I believe you also use mzTab data as input (which is really great, by the way). I am working on an updated mzTab importer (see MzTab, but that's only the first step; eventually, this will produce MSnSet instaces). While reading the specifications for the mzTab format version 1.0 (updated 20 June 2014), I understand that there is not quantitation data at the spectra/PSM level, but only at the peptides (or protein) level; more specifically, in the PSM section on page 12-13, table 5, I don't see any abundance field. Could you refer to this, or any other relevant specifications, so that I make sure that the new mzTab importer and iPQF work well together?

Reading OpenSWATH data

library("MSnbase")
library("readr")
## from http://www.peptideatlas.org/PASS/PASS00289
x <- read_tsv("rawOpenSwathResults_1pcnt_only.tsv")

## preparing phenoData
pd <- data.frame(Filename = unique(x$align_origfilename))
pd$Filename <- gsub(".*strep_align/(.*)_all_peakgroups.*", "\\1", pd$Filename)
pd$Condition <- gsub("(Strep.*)_Repl.*", "\\1", pd$Filename)
pd$BioReplicate <- gsub(".*Repl([[:digit:]])_.*", "\\1", pd$Filename)
pd$Run <- seq(1:nrow(pd))
rownames(pd) <- pd$runId <- unique(x$run_id)
pd$nPeps <- as.numeric(table(x$run_id))

## Generate MSnSetList with on MSnSet per sample
msnl <- lapply(unique(x$run_id),
               function(i) {
                   x0 <- readMSnSet2(x[x$run_id == i, ], "Intensity",
                                     fnames = "transition_group_id")
                   sampleNames(x0) <- i
                   pData(x0) <- pd[i, ]
                   updateFvarLabels(x0, i)
               })

## Combine into one MSnSet
msn <- Reduce(combine, msnl)

## checks
stopifnot(all.equal(pData(msn), pd),
          validObject(msnl),
          validObject(msn))

TMT 10 plex

this information is from the kit description:

http://www.piercenet.com/instructions/2162457.pdf

I have saved the description as well as a csv table I created at:

data:\CCP\LabTalks\tags (feel free to move it)

the tags have the disctribution of isotopes as shown

The table gives a very counter-intuitive explanation for purity correction

I would assume that whenever an isotope is added or substracted from the reporter it happens on C12/C13. That means that the C series can only be contaminated by other C-series, while N-series can only be contaminated by other N series (see figure above).

This is the case for -1 and -2 channels, e.g. for 129C -1 is 128C and -2 is 127C.

On the other hand + 1 and + 2 always follow the pattern: + 1 is always from the same series, but + 2 is the next tag in the list, e.g. for 128C + 1 is 129C, but + 2 IS NOT 130C as you would expect, but 129N.

I am not really sure whether this is a mistake and they meant 130C, but instead written 129N or they actually measured 129N, which does not make any sense to me.

add normalise(..., method = "median")

Make a generic MSnSet constructor for spreadsheets

There is a need for a light and generic MSnSet constructor that reads text based spreadsheets. The exprs columns would be defined by name of by index, and all the others would be considered as featureData.

Consider a min.int filter for each channel individually, recycling the vector appropriately.

(The readIspyData functions would be file specific wrappers around that function. Consider also writing other specific importers for MaxQuant and others.)

vectorise highlightOnPlot

highlightOnPlot(dunkley2006, foi = list(foi1, foi2, foi3), col = c("blue", "red", "green"))

Also allow to print the feature names for the foi.

heatmap for MSnSet

Wrap a call to ComplexHeatmap::Heatmap for MSnSet instances, so that pData and fData are used to build HeatmapAnnotation.

No filename when adding id data

When loading mzid files that do not return the id filename as part of the data.frame column, addIdentificationData throws the following error via utils.addSingleIdentificationDataFile:

Error in basename(id$spectrumFile) : a character vector argument expected

See https://groups.google.com/forum/#!topic/rbioc-sig-proteomics/na-dMumFgAU

There should be another or a second mechanism to handle this situations.

Impurity matrix for iTRAQ 8-plex

From http://bfg.oxfordjournals.org/content/7/2/127/T2.expansion.html

TAG	−2	−1	+1	+2
113	0	2.5	3	0.1
114	0	1	5.9	0.2
115	0	2	5.6	0.1
116	0	3	4.5	0.1
117	0.1	4	3.5	0.1
118	0.1	2	3	0.1
119	0.1	2	4	0.1
121	0.1	2	3	0.1

add to ?purityCorrect.

Why is there no polarity in Spectrum2 objects?

Hi,

the Spectrum2 objects have no @polarity slot (as opposed to Spectrum1 objects). Is there a specific reason for this, and would you consider adding one? For me they would be useful, otherwise I have to carry the information somewhere else for downstream processing (or I might add it myself in my RmbSpectrum2 subclass, see https://github.com/MassBank/RMassBank/blob/master/R/SpectrumClasses.R ...)

pSet ordering

ordering could be defined by the feature data row order
assay features would always be ordered before being returned to the user
this would require a change on the validity, and only require that sorted assay and feature data names to be identical
all accessors will need to be reordered

xtandem results in MSexp?

Hi,
You are probably aware that xtandem does not use the mzid output format for its result. The xtandem XML can thus not be loaded directly into MSexp. It is possible to read the results with rtandem package. Is there some way to get your xtandem results or the rtandem result class into MSexp without using external tools like http://www.psidev.info/mzidentml to convert the xml to mzid (which didn't work btw when I tried is, some nulpointerexecption)

greetz

check precScanNum

precScanNum is set to 0, as reported in the mzRamp header. Check if this can be obtained at the mzR level or, if not, recalculate.

Change [, [[ and $ for pSet/MSnSet

Explore the possibility to dispatch on fData instead of pData. Could this be set as an option?

Not able to install MSnbase

I tried to install MSnbase package in R (linux), but I couldn't do it. I did as follows

source("http://bioconductor.org/biocLite.R")
biocLite()

Then
biocLite("MSnbase")

I got following message at the end.

Warning messages:
1: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘mzR’ had non-zero exit status
2: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘MSnbase’ had non-zero exit status

When I tried
library(MSnbase)

The library MSnbase package was not found.

Please help me to install this package ( I searched in google, but I couldn't find the solution).

Thanks

combine NAnnotatedDataFrame

(Problem reported by @lgatto using pavel's computer)

> quantify(exampleMgf, reporters = ionToQuant, verbose = FALSE)
Preparing meta-data
Error in .local(x, y, ...) : 
  'combine,AnnotatedDataFrame,AnnotatedDataFrame-method' objects have diffrenent classes 'NAnnotatedDataFrame', 'AnnotatedDataFrame'

see in MSnbase:::quantify_MSnExp

.phenoData <- new("AnnotatedDataFrame", data = data.frame(mz = reporters@mz, 
        reporters = reporters@name, row.names = reporters@reporterNames))
    if (nrow(pData(object)) > 0) {
        if (nrow(pData(object)) == length(reporters)) {
            .phenoData <- combine(phenoData(object), .phenoData)
        }
        else {
            if (verbose) 
                message("Original MSnExp and new MSnSet have different number of samples in phenoData. Dropping original.")
        }
    }

analy[s|z][Details]

Old object have analyserDetails MIAPE slot, but has been renamed analyzerDetails (see commit bc3139a8-67e5-0310-9ffc-ced21a209358), to be consistent with mzTab, while analyser has been retained.

Two possible solutions

have an updateMIAPE function that fixes this and apply it to all existing objects.
have all possible s/z combinations and keep them in sync.

Add also a proper accessor.

Currently favour solution 1.

Affected instances: itraqdata, instances in pRolocdata, ...

reordering spectra in MSnExp

currently when a spectra is being loaded from mzml the spectra are not ordered according toretention time. E.g.

spectra <- readMSData ("H2A_EThcD_1e6_Reagent.mzML")
barplot (rtime (spectra))

I tried to reorder the spectra but this does not seem to help:

spectra <- spectra[order (as.double (rtime (spectra)))]

Any suggestions?

Can't install MSnbase from github

Hi
I would like to install msnbase from github to use the new calculatefragment features.

I did

devtools::install_github("lgatto/MSnbase")
library(MSnbase)

I got:

> library(MSnbase)  
Error in get(hookname, envir = env, inherits = FALSE) : 
  lazy-load database '/home/adriaan/R/x86_64-unknown-linux-gnu-library/3.1/mzR/R/mzR.rdb' is corrupt
In addition: Warning message:
In get(hookname, envir = env, inherits = FALSE) :
  internal error -3 in R_decompress1
Error: package or namespace load failed for ‘MSnbase’

My sessioninfo is:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] BiocParallel_1.0.3   mzR_2.1.10           Rcpp_0.11.4         
[4] Biobase_2.26.0       BiocGenerics_0.12.1  BiocInstaller_1.16.1
[7] dplyr_0.4.1         

loaded via a namespace (and not attached):
 [1] affy_1.44.0           affyio_1.34.0         assertthat_0.1       
 [4] base64enc_0.1-2       BatchJobs_1.5         BBmisc_1.9           
 [7] bitops_1.0-6          brew_1.0-6            checkmate_1.5.1      
[10] codetools_0.2-9       colorspace_1.2-4      compiler_3.1.2       
[13] DBI_0.3.1             devtools_1.7.0        digest_0.6.8         
[16] doParallel_1.0.8      evaluate_0.5.5        fail_1.2             
[19] foreach_1.4.2         formatR_1.0           ggplot2_1.0.0        
[22] grid_3.1.2            gtable_0.1.2          httr_0.6.1           
[25] impute_1.40.0         IRanges_2.0.1         iterators_1.0.7      
[28] knitr_1.9             lattice_0.20-29       lazyeval_0.1.10      
[31] limma_3.22.5          magrittr_1.5          MALDIquant_1.11      
[34] MASS_7.3-35           munsell_0.4.2         mzID_1.4.1           
[37] pcaMethods_1.56.0     plyr_1.8.1            preprocessCore_1.28.0
[40] proto_0.3-10          RCurl_1.95-4.5        reshape2_1.4.1       
[43] RSQLite_1.0.0         S4Vectors_0.4.0       scales_0.2.4         
[46] sendmailR_1.2-1       stats4_3.1.2          stringr_0.6.2        
[49] tools_3.1.2           vsn_3.34.0            XML_3.98-1.1         
[52] zlibbioc_1.12.0

Have a nice evening

calculateFragments() missing features

Hi,
I was using the calculateFragments() function in MSnbase and I think there are some small modifications that could make it more usefull.

1. The intensity if of the matched peaks are missing.
2. If 2 peaks are in the defined intervall near a theoretical peak only 1 is reported. (I guess the one with the highest intensity?) It would maybe be useful to set an option to report all matched peaks (especially informative when also the intensity is reported)
3. Neutral losses are not considered currently (I think?).

Have a nice evening

writeMgfData doesn't work for MS1

Hi,

These lines in writeMgfContent make writeMgfData fail on MS1 spectra:

  .cat("\nRTINSECONDS=", rtime(sp), "\nPEPMASS=", precursorMz(sp))

  if (length(precursorCharge(sp)) && !is.na(precursorCharge(sp))) {
    .cat("\nCHARGE=", precursorCharge(sp), "+")
  }

d> writeMgfData(cpds.tot[[1]]@parent)
Error in precursorMz(sp) : No precursor MZ value for MS1 spectra.

Admittedly my Spectrum1 objects are not created by MSnbase directly but by RMassBank, but I think this should still work...

Allow NA pattern in filterNA

Something like filterNA(, pattern = "011110") that would tolerate NA values in the first and last reporter channels.

Allow NA pattern in filterNA

Something like filterNA(, pattern = "011110") that would tolerate NA values in the first and last reporter channels.

idSummary produces NA for identification files where the first spectrum is missing

If the first spectrum of the combination of spectrumFile and idFile could
not match against an entry in the identification data.frame idSummary
produces NA for the idFile column.

library("MSnbase")
library("mzID")

quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
                 full.name = TRUE, pattern = "mzXML$")
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
                 full.name = TRUE, pattern = "dummyiTRAQ.mzid")

msnexp <- readMSData(quantFile)
mzid <- flatten(mzID(identFile))

## correct output because the first spectrum was found
msnexpid <- addIdentificationData(msnexp, mzid)
idSummary(msnexpid)
#       spectrumFile          idFile coverage
#1 dummyiTRAQ.mzXML dummyiTRAQ.mzid      0.6

## change the acquisitionnum to demonstrate what happen if the first spectrum
## is not in the identification file
mzid$acquisitionnum[mzid$acquisitionnum == 1] <- 3

msnexpid <- addIdentificationData(msnexp, mzid)

## results in NA for idFile
idSummary(msnexpid)
#       spectrumFile idFile coverage
#1 dummyiTRAQ.mzXML   <NA>      0.6

estimateNoise of Spectrum objects

@sgibb there would be some interest here to estimate signal-to-ratio using MSnbase. The MALDIquant::estimateNoise method seems to be a good candidate.

Have you ever tried it on MS2 spectra? Do the spectra need to be in profile mode, or is centroided fine too?

Add file name when reading mzTab

MSnExp unit test fails for latest R-devel (3.2.0)

Dear Laurent,

I installed the latest version of R-devel (3.2.0; 2014-04-04 r65373) to install the new MSnbase. But now R CMD check fails because of the MSnExp unit test:

1. Failure(@test_MSnExp.R#23): readMSData -----------------------------
all.equal(aa, msx) isn't true

Error: Test failures

It is cause by the different R versions:

all.equal(msx, aa)
# [1] "Attributes: < Component “.__classVersion__”: Component “R”: Mean relative difference: 1 >"

attr(msx, ".__classVersion__")
#       R  Biobase     pSet   MSnExp 
# "3.1.0" "2.23.6"  "0.1.0"  "0.3.0" 
attr(aa, ".__classVersion__")
#       R  Biobase     pSet   MSnExp 
# "3.2.0" "2.23.6"  "0.1.0"  "0.3.0"

all.equal(msx, aa, check.attributes=FALSE)
# [1] TRUE

Is there a way to circumvent this (without check.attributes=FALSE)?
Or is this a feature?

Best wishes,

Sebastian

Support for mzTab v1.0 in readMzTabData and writeMzTabData

Of the twelve examples found on the mzTab homepage, readMzTabData will only successfully read eight of them.

library(MSnbase)
library(plyr)

zip_file <- "examples.zip"
download.file(
  "http://www.ebi.ac.uk/pride/resources/tools/jmztab/latest/examples.zip", 
  zip_file
)
example_dir <- "mzTab_examples"
unzip(zip_file, exdir = example_dir)
files <- dir(example_dir, full.names = TRUE)
length(tryapply(files, readMzTabData)) # 8

Tested with MSnbase 1.14.1, though the development version here also contains the warning

Support for mzTab version 0.9 only. Support will be added soon.

In order to further adoption of the new file format in R workflows, it would be very useful if MSnbase supported the current (v1.0) spec for reading and writing.

Is `npsm` really `npsm` in addIdentificationData?

It seems that the npsm feature variable, added by addIdentificationData actually corresponds to the number of protein in the protein group rather than the actual psms for that entry (ignoring NA).

example(addIdentificationData)
fData(msexp)$npsm
## [1]  2  1 NA NA  1
sapply(MSnbase:::utils.ssv2list(fData(msexp)$accession), length)
## [1] 2 1 1 1 1

npsm should actually be a vector of 1s, as for each entry, only 1 psm has been detected. The npsm is probably confusion here and should be removed. The actual npsm, npep and nprot should be calculated as follows (ignoring NAs):

## npsm
tapply(fData(msexp)$accession, fData(msexp)$accession, length)
##        ECA0510 ECA0984;ECA3829         ECA1028 
##             1               1               1 
## npep
tapply(fData(msexp)$accession, fData(msexp)$pepseq, length)
##    IDGQWVTHQWLKK           LVILLFR VESITARHGEVLQLRPK 
##                1                 1                 1 
## nprot
sapply(MSnbase:::utils.ssv2list(fData(msexp)$accession), length)
## [1] 2 1 1 1 1

Other example:

fData(msexp)$pepseq[3:4] <- c("ABC", "ABC")
fData(msexp)$accession[3:4] <- fData(msexp)$accession[1]

## npsm
tapply(fData(msexp)$accession, fData(msexp)$accession, length)
##       ECA0510 ECA0984;ECA3829         ECA1028 
##              1               3               1 
## npep
tapply(fData(msexp)$accession, fData(msexp)$pepseq, length)  
##             ABC     IDGQWVTHQWLKK           LVILLFR VESITARHGEVLQLRPK 
##                2                 1                 1                 1 
## nprot
sapply(MSnbase:::utils.ssv2list(fData(msexp)$accession), length)
## [1] 2 1 2 2 1

fDataToUnknown

Promote fDataToUnknown to a generic and implement methods for vector and MSnSet

combine samples in an MSnSet

Working with synapter now I need to use correction of saturation on a combined MSnSet (after several synapter runs have been converted to MSnSet and combined into a single MSnSet).

The next step after saturation correction is to merge several replicas, but averageMSnSet works with a list of MSnSets.

Could you please modify averageMSnSet so that it takes an argument of the numbers of samples (i.e. numbers of columns in exprs(MSnSet), e.g. 1:5), so it can work with an MSnSet that has been combined?

iPQF method

iPQF is available in MSnbase version >= 1.17.8 (github and Bioconductor). There is a dedicated man page at ?iPQF, that needs some updates. I have added unit tests, based on the code and data provided by @martifis. There are still a few thing to do though:

Feature variable names: currently, "sequence", "accession","charge", "modifications", "mass_to_charge", "search_engine_score" are hard-coded. This should be customisable.
improve iPQF man page
Update reference to manuscript
Mention in vignette, or add a section

rt() does not work since there is a rt() in stats package

Tried to use rt on a Spectrum1 object get the following error:

Error in rt(xx[[i]]) : argument "df" is missing, with no default

I imagine it is because rt() is present in stats package with df as argument.

Tried to use MSnbase::rt() syntax get an error that

Error: 'rt' is not an exported object from 'namespace:MSnbase'

Vignette updates

Document in vignette

features of interest
raw MS data plotting such as MSmap's plot and plot3D.

"fData<-" breaks validity

> suppressPackageStartupMessages(library("affydata"))
     Package    LibPath                                               
[1,] "affydata" "/home/lgatto/R/x86_64-unknown-linux-gnu-library/2.16"
     Item       Title                        
[1,] "Dilution" "AffyBatch instance Dilution"
> data(Dilution)
> e <- rma(Dilution)
Loading required package: AnnotationDbi

Background correcting
Normalizing
Calculating Expression
> exprs(e)["123", ] <- 1
Error in `[<-`(`*tmp*`, "123", , value = 1) : subscript out of bounds
> fData(e)$a <- "a"
> fData(e)["123", "a"] <- "b"
> dim(e)
Features  Samples 
   12625        4 
> dim(exprs(e))
[1] 12625     4
> dim(fData(e))
[1] 12626     1
> validObject(e)
Error in validObject(e) : 
  invalid class “ExpressionSet” object: 1: feature numbers differ between assayData and featureData
invalid class “ExpressionSet” object: 2: featureNames differ between assayData and featureData

Needs to be patched upstreams, in Biobase.

MSnbase:::quantifySI_MSnExp if MSnExp is based on multiple files

To fulfill all validity checks of an MSnSet instance I have to call
colnames(exprs(object)) <- sampleNames(object) to create assayData that correspond to phenoData. For label free quantitation we use a matrix with only one column. That's why the mentioned line fails.
Maybe we should reconsider the design of MSnSet (or pSet).

Code to reproduce:

## checkout branch labelfree; load package
f <- list.files("../../playground/label_free", pattern="6000.mzML", full.names=TRUE)
id <- list.files("../../playground/label_free", pattern="6000.mzid", full.names=TRUE)

s <- readMSData(f)
si <- addIdentificationData(s, id)

mset <- MSnbase:::quantifySI_MSnExp(si)
# Error in `colnames<-`(`*tmp*`, value = c("1", "2", "3")) :
# length of 'dimnames' [2] not equal to array extent

combine same samples

When combining identical samples, features that are common still have different quantitation data (as expected). This produces

Error in combine(x[[nm]], y[[nm]]) : 
  matrix shared row and column elements differ: Mean relative difference: 0.6605909

How to best deal with this? That mean? updateFeatureNames to consider the common once as different features?
Need to make this explicit in the vignette and describe a workaround.

Confusing description for fData() column

Not sure this the right place to ask but here I go.
In the feature dataframe (from fData()), there are multiple semicolon separated values in the accession collumn. The MSnbase manual says that these are all the matches sorted by there rank values. I'm a bit confused by the meaning of this. Are these the highest scoring PSM, seccond highest PSM and so on? Or something else?
If so, how are shared peptides (same peptides occuring in different protein sequences) annotated?
The featuredata I use comes from msgfplus.