computational-metabolomics / mspurity Goto Github PK

View Code? Open in Web Editor NEW

15.0 5.0 3.0 13.09 MB

R-package - Automated Evaluation of Precursor Ion Purity for Mass Spectrometry Based Fragmentation in Metabolomics

Home Page: http://bioconductor.org/packages/release/bioc/html/msPurity.html

License: GNU General Public License v3.0

R 13.27% HTML 86.62% TeX 0.11%

metabolomics mass-spectrometry precursor-ion-purity lc-ms dims lc-msms fragmentation bioconductor-package

mspurity's Introduction

msPurity: Package to assess precursor ion purity, process fragmentation spectra and perform spectral matching

See NEWS file for updates

General:

Bioconductor (release):

Bioconductor (devel):

------------Which version to use? ------------

Recommendation for most uses cases is to install and use the Bioconductor stable version of msPurity.

The code available from both the Bioconductor development branch and the master branch on github has the newest functionality.

About

msPurity R package and associated Galaxy tools were developed to: 1) assess the spectral quality of fragmentation spectra by evaluating the "precursor ion purity". 2) process fragmentation spectra. And 3) perform spectral matching.

Functionalities:

Assess the contribution of the targeted precursor of acquired fragmentation spectra by checking isolation windows using a metric called "precursor ion purity" (Works for both LC-MS(/MS) and DI-MS(/MS) data)
Assess the anticipated “precursor ion purity” (see below) of XCMS LC-MS features and DIMS features where no fragmentation has been acquired
Map fragmentation spectra to XCMS LC-MS features
Filter and average MS/MS spectra from an LC-MS/MS dataset
Create databases of LC-MS(/MS) spectra and associated annotations
Perform spectral matching of query MS/MS spectra against library MS/MS spectra
Export fragmentation spectra to MSP format
Basic processing of DIMS data. Note that these functionalities are not actively developed anymore - see DIMSpy (https://github.com/computational-metabolomics/dimspy) for recommended alternative for DIMS data processing

What is precursor ion purity?

What we call "Precursor ion purity" is a measure of the contribution of a selected precursor peak in an isolation window used for fragmentation. The simple calculation involves dividing the intensity of the selected precursor peak by the total intensity of the isolation window. When assessing MS/MS spectra this calculation is done before and after the MS/MS scan of interest and the purity is interpolated at the recorded time of the MS/MS acquisition. Additionally, isotopic peaks can be removed, low abundance peaks are removed that are thought to have limited contribution to the resulting MS/MS spectra and the isolation efficiency of the mass spectrometer can be used to normalise the intensities used for the calculation.

Associated paper msPurity: Automated Evaluation of Precursor Ion Purity for Mass Spectrometry Based Fragmentation in Metabolomics. Analytical Chemistry [1]

Use the following links for more details:

Bioconductor: http://bioconductor.org/packages/msPurity/
Vignette: https://bioconductor.org/packages/devel/bioc/vignettes/msPurity/inst/doc/msPurity-vignette.html
Manual: http://bioconductor.org/packages/devel/bioc/manuals/msPurity/man/msPurity.pdf
Galaxy implementation: https://github.com/computational-metabolomics/mspurity-galaxy
Bioconda (stable): https://anaconda.org/bioconda/bioconductor-mspurity
Conda (dev and testing): https://anaconda.org/tomnl/bioconductor-mspurity

Install

Bioconductor

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("msPurity")

Github

library(devtools)
install_github('computational-metabolomics/msPurity')

Ref

[1] Lawson, T.N., Weber, R.J., Jones, M.R., Chetwynd, A.J., Rodriguez Blanco, G.A., Di Guida, R., Viant, M.R. and Dunn, W.B., 2017. msPurity: Automated Evaluation of Precursor Ion Purity for Mass Spectrometry Based Fragmentation in Metabolomics. Analytical Chemistry.

mspurity's People

Contributors

Stargazers

Watchers

Forkers

workflow4metabolomics jsaintvanne lhurst-uob

mspurity's Issues

Database design

Would be useful to update the Library and SQLite database design so that either can be used for Query or Library when using spectral matching.

unused argument (xcmsObj = xcmsObj)

Hi experts

I am following the package here: http://bioconductor.org/packages/release/bioc/vignettes/msPurity/inst/doc/msPurity-lcmsms-data-processing-and-spectral-matching-vignette.html

I have the following error:

Error in frag4feature(pa = pa, xcmsObj = xcmsObj) :
unused argument (xcmsObj = xcmsObj)

while trying use the following script:
pa <- frag4feature(pa = pa, xcmsObj = xcmsObj)

Could anyone help please?

Cyan

MSnbase update

Update the backend to be compatible with MSnBase

New Database Compilation

Hi,

Is it in plan to update the 2021 great DB shared here https://github.com/computational-metabolomics/msp2db/releases ?

Thanks a lot !

@yguitton

Any plan to implement HAMMER's probability score and matching distance score into the spectral matching module?

Thanks for maintaining this package. I found HAMMER's algorithm also provides the probability score (P_Score) and matching distance score (MD_Score). Is there any plan to implement these two scores into the spectral matching module?

Installing the msPurity package in R studio

Greetings,
I was trying to install the msPurity package. But I am getting two error messages

packages ‘BiocVersion’, ‘msPurity’ are not available for Bioconductor version '3.15'
'SSL peer certificate or SSH remote key was not OK' or 'Problem with the SSL CA cert (path? access rights?)'

Can you please help? Is there any other way I can install the package?

Processing Agilent QToF files

Hi @Tomnl,

I'm running some Agilent QToF files and I'm facing a new issue in R.

purityA(file,offset=c(0.65,0.65), mostIntense = TRUE, nearest = FALSE, ppmInterp = 7)
MS2 data has no associated scan data, will use most recent full scan for information
Error in assessPuritySingle(filepth = pa@fileList[[i]], mostIntense = mostIntense, :
task 1 failed - "non-numeric argument to mathematical function"

the file is here https://workflow4metabolomics.usegalaxy.fr/datasets/086de8e126a393ff/display?to_ext=mzml

Any clue for me?
Many Thanks
Yann

assessPuritySingle for MSMS from QTOF

Hi !

Thank you very much for your package. It is very useful to analyze MSMS spectra.

I would like to assess the purity of my MSMS data acquired on a QTOF and an Orbitrap for comparison. However, for QTOF data, I can't apply the function assessPuritySingle on an mzML file.

The error is the following :
Erreur dans if (scanids$pre == scn1) { : l'argument est de longueur nulle De plus : Message d'avis : Dans for (i in seq_len(n)) { : fermeture de la connexion inutilisée 11 (Data_001.mzML) (sorry it is in french).

I go to the traceback and here are presented the steps :
`8: linearPurity(rowi, scan_peaks, minoff, maxoff, ppm, scanids,
nearest, mostIntense, iwNorm, iwNormFun, ilim, plotP, plotdir,
isotopes, im)

7: .fun(piece, ...)

6: (function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(utils::capture.output(print(piece)),
collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
})(1L)
5: loop_apply(n, do.ply)
4: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
3: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: plyr::ddply(mrdfshrt, ~seqNum, .parallel = pBool, get_interp_purity,
scan_peaks = scans, ppm = ppmInterp, ms2 = mrdf[mrdf$msLevel ==
2, ]$seqNum, prec_scans = prec_scans, minoff = minoff,
maxoff = maxoff, mostIntense = mostIntense, plotP = plotP,
plotdir = plotdir, interpol = interpol, nearest = nearest,
iwNorm = iwNorm, iwNormFun = iwNormFun, ilim = ilim, isotopes = isotopes,
im = im)
1: assessPuritySingle(filepth = "Data_001.mzML")`

In advance, thank you for your reply,
Sincerely,

Marie

msPurity::spectralMatching Error in getSmeta(con, pids) : No meta data for spectra available

Hello,
When I try to run the "spectralMatching" function from the msPurity package on R I run into the following error:

q_dbPth <-
        createDatabase(pa = pa,
                       xcmsObj = xcmsObj,
                       dbName = 'test-mspurity-vignette.sqlite')

result <- spectralMatching(
        q_dbPth, 
        cores = 1
)

Error in getSmeta(con, pids) : No meta data for spectra available

I tried assigning a metadata list when I create the database with the function "msPurity::createDatabase", but this does not solve the issue

q_dbPth <- createDatabase(pa, xcmsObj, metadata=list('polarity'='positive','instrument'='Q-Exactive'))

Here is also the session info

sessionInfo("msPurity")
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                        LC_TIME=German_Switzerland.1252    

attached base packages:
character(0)

other attached packages:
[1] msPurity_1.22.0

loaded via a namespace (and not attached):
  [1] utils_4.2.0                 ProtGenerics_1.28.0         bitops_1.0-7                matrixStats_0.62.0          doParallel_1.0.17           RColorBrewer_1.1-3          GenomeInfoDb_1.32.4         backports_1.4.1            
  [9] MSnbase_2.22.0              tools_4.2.0                 utf8_1.2.2                  R6_2.5.1                    affyio_1.66.0               rpart_4.1.16                Hmisc_4.7-1                 DBI_1.1.3                  
 [17] BiocGenerics_0.42.0         colorspace_2.0-3            nnet_7.3-17                 withr_2.5.0                 gridExtra_2.3               tidyselect_1.2.0            compiler_4.2.0              MassSpecWavelet_1.62.0     
 [25] preprocessCore_1.58.0       graph_1.74.0                cli_3.4.1                   Biobase_2.56.0              htmlTable_2.4.1             datasets_4.2.0              DelayedArray_0.22.0         base_4.2.0                 
 [33] checkmate_2.1.0             scales_1.2.1                DEoptimR_1.0-11             robustbase_0.95-0           affy_1.74.0                 RBGL_1.72.0                 stringr_1.4.1               digest_0.6.29              
 [41] foreign_0.8-82              XVector_0.36.0              htmltools_0.5.3             jpeg_0.1-9                  base64enc_0.1-3             pkgconfig_2.0.3             MatrixGenerics_1.8.1        fastmap_1.1.0              
 [49] dbplyr_2.2.1                limma_3.52.4                grDevices_4.2.0             htmlwidgets_1.5.4           rlang_1.0.6                 rstudioapi_0.14             impute_1.70.0               generics_0.1.3             
 [57] mzID_1.34.0                 BiocParallel_1.30.4         dplyr_1.0.10                RCurl_1.98-1.9              magrittr_2.0.3              GenomeInfoDbData_1.2.8      Formula_1.2-4               interp_1.1-3               
 [65] MALDIquant_1.21             Matrix_1.4-1                Rcpp_1.0.9                  munsell_0.5.0               S4Vectors_0.34.0            fansi_1.0.3                 MsCoreUtils_1.8.0           lifecycle_1.0.3            
 [73] vsn_3.64.0                  stringi_1.7.8               MASS_7.3-56                 SummarizedExperiment_1.26.1 zlibbioc_1.42.0             plyr_1.8.7                  grid_4.2.0                  parallel_4.2.0             
 [81] doSNOW_1.0.20               deldir_1.0-6                methods_4.2.0               lattice_0.20-45             MsFeatures_1.4.0            splines_4.2.0               mzR_2.30.0                  knitr_1.40                 
 [89] xcms_3.18.0                 pillar_1.8.1                igraph_1.3.5                fastcluster_1.2.3           GenomicRanges_1.48.0        reshape2_1.4.4              codetools_0.2-18            stats4_4.2.0               
 [97] XML_3.99-0.11               glue_1.6.2                  latticeExtra_0.6-30         data.table_1.14.2           pcaMethods_1.88.0           BiocManager_1.30.18         CAMERA_1.52.0               vctrs_0.4.2                
[105] png_0.1-7                   foreach_1.5.2               graphics_4.2.0              gtable_0.3.1                RANN_2.6.1                  clue_0.3-61                 assertthat_0.2.1            ggplot2_3.3.6              
[113] xfun_0.33                   ncdf4_1.19                  survival_3.3-1              tibble_3.1.8                snow_0.4-4                  iterators_1.0.14            stats_4.2.0                 IRanges_2.30.1             
[121] cluster_2.1.3

Has anyone encountered this issue before or know how to circumvent it or solve it?
Thanks a lot in advance, I've been dealing with this issue for several days.

Spectral matching

Implement spectral matching to library spectra.

currently in development spectral_matching

Correction for the align function in the spectralMatching.R

The default setting of the align function is l_ppmProd=100, q_ppmProd=100, raDiffThres=10. However, the default setting of l_ppmProd and q_ppmProd in the function spectralMatching is 10. I believe this is a typo. My correction for the default setting is l_ppmProd=10, q_ppmProd=10, raDiffThres=100.
intenc <- iD[ppmD==1 & iD<raDiffThres & !is.na(ppmD) & !is.na(raDiffThres)]. In this code, ppmD stands for ppm difference, we should use ppmB, which stands for ppm bool. So my correction is intenc <- iD[ppmB==1 & iD<raDiffThres & !is.na(ppmD) & !is.na(raDiffThres)]

Error in `dimnames<-.data.frame` when using "flag_remove" function

Hello,

Description:
I encountered the following error while executing the "flag_remove" function:

xcmsObj2 <- flag_remove(xcmsObj, ref.class = "meoh") 
# or
xcmsObj2 <- flag_remove(xcmsObj)

Error Message:

Error in `dimnames<-.data.frame`(`*tmp*`, value = list(n)) : 
  invalid 'dimnames' given for data frame

Context:

The "xcmsObj@phenoData@data" object is a dataframe with a "class" column containing levels: c("blank", "meoh", "sample").
The "xcmsObj@phenoData@varMetadata" object is also a dataframe:

> xcmsObj@phenoData@varMetadata
  labelDescription
1   sampleMetadata
2            class
3   injectionOrder
4             mode

The xcmsObj is of class "XCMSnExp"

R and msPurity Versions:

R version 4.2.2 (2022-10-31 ucrt)
msPurity version 1.24.0

Any insights or suggestions to resolve this issue would be greatly appreciated.

Thank you.

Isotope consideration

Currently isotopes found in the isolation window of the selected precursor are not removed from the purity calculation.

It would be useful to have the purity calculation with and without the isotopes included.

Convert compound name to database ID

Hi,

Is there any recommended way to convert the search results of msPurity to public database ID, such as HMDB, pubchem and so on.

Thanks,

-hh1985

Spectral matching (MSP)

Spectral matching should be able to performed using MSP files.

Ideally for both Library and Query databases.

createDatabase low memory version

It would be useful to have a low memory option for the createDatabase function

Error about creating a spectral-database

Hi all,

I wish to use the XCMS-msPurity workflow to analyze liquid mass data. I followed the XCMS tutorial and processed the data using peak detection, refinement, alignment, correspondence and feature group. Next, according to the tutorial of msPurity, use the purityA function, frag4feature function, filterFragSpectra function and averageAllFragSpectra function to further process the data. When I finished the above, I got the following error while creating SQLite:

Creating a database of fragmentation spectra and LC features Error in data.frame(filename = basename(fileList), filepth = fileList, : arguments imply differing number of rows: 19, 0

I don't understand what's causing this, if you could help that would be really appreciated!

purityX on single file

Add feedback reference between pa@av_spectra and pa@grped_df objects

It would be very useful to be able to track from data in grped_df object which MS/MS spectra were included or excluded in inter and intra spectra averaging steps.

createMSP the precursor mz and retention time stored in the MSP file - should be the median

When using createMSP function with averaged data - it uses an mz and retention time values from "representative" peak from the grouped peak cluster

See below

https://github.com/computational-metabolomics/msPurity/blob/master/R/purityA-create-msp.R#L209

This should really be a median of the mz values or ideally using the XCMS predetermined "mzmed" for the grouped feature.

Precursor ion purity calculations for mzXML and other file formats

The msPurity precursor ion purity calculation currently parse mzML files to automatically extract the isolation windows.

This is an issue if a user is using either an mzXML file or other file format (see #50)

Although mzML is the standard file format to use for proteomics and metabolomics, for backward compatibility it might be useful to have the functionality to extraction the isolation windows for other file formats such as mzXML.

averageIntraFragSpectra parameter remove_peaks = FALSE still removes spectra and features from outputs

From 5 spectra in the peak group only 2 are retained, and now flags for which ones are removed are present.

> pa@av_spectra$1234
$av_intra
$av_intra$2
cl mz i snr rsd count total inPurity ra frac snr_pass_flag minfrac_pass_flag ra_pass_flag pass_flag
1 1 76.03936 260408.3 8.168558 NA 1 1 1 100.00000 1 TRUE TRUE TRUE TRUE
2 2 87.05532 146866.0 4.606933 NA 1 1 1 56.39836 1 TRUE TRUE TRUE TRUE

$av_intra$3
cl mz i snr rsd count total inPurity ra frac snr_pass_flag minfrac_pass_flag ra_pass_flag pass_flag
1 1 76.03939 387969.3 9.944705 53.27152 2 2 1 72.63818 1 TRUE TRUE TRUE TRUE
2 2 87.05528 275511.9 7.062117 24.43868 2 2 1 51.58316 1 TRUE TRUE TRUE TRUE
``

> pa@grped_ms2$1234
[[1]]
[,1] [,2]
[1,] 50.01557 921.1038
[2,] 51.02344 1703.2170
[3,] 52.01890 863.3292
[4,] 58.06551 1357.3210
[5,] 62.01553 713.1802
[6,] 62.92937 22061.6680
[7,] 63.02336 928.4113
[8,] 63.92888 4076.4280
[9,] 64.18902 704.4742
[10,] 64.92747 1621.6434
[11,] 70.06499 933.9158
[12,] 74.09650 721.6017
[13,] 76.03937 4613.7686
[14,] 82.64358 701.4169
[15,] 87.05553 1146.8417
[16,] 89.03884 1021.0724
[17,] 90.92950 775.0679
[18,] 95.04922 1272.6851
[19,] 98.06015 1559.4219
[20,] 105.96336 1471.1497
[21,] 107.95026 1670.2163
[22,] 116.13265 731.6583
[23,] 121.96613 1027.8542
[24,] 125.28481 777.2943
[25,] 133.06065 1336.6599
[26,] 144.07999 723.9156
[27,] 149.99741 1323.1761
[28,] 150.23601 687.5054
[29,] 162.05566 759.3861
[30,] 167.02388 3471.1697
[31,] 178.73198 813.5861

[[2]]
[,1] [,2]
[1,] 50.01559 1283.4817
[2,] 51.02339 1680.2363
[3,] 51.22222 552.0405
[4,] 52.89830 605.9756
[5,] 58.06548 1588.9603
[6,] 61.54425 685.7117
[7,] 62.01529 842.8239
[8,] 62.12664 631.6484
[9,] 62.92934 27617.1602
[10,] 63.02320 1746.2920
[11,] 63.88893 615.5449
[12,] 63.92894 5638.6279
[13,] 64.92746 3266.7673
[14,] 65.03875 738.9711
[15,] 66.15918 677.6177
[16,] 70.06518 684.8496
[17,] 74.09651 1855.7240
[18,] 76.03940 4177.2236
[19,] 79.20387 613.4620
[20,] 87.05527 3352.0144
[21,] 89.03860 1753.1893
[22,] 98.06007 2796.7131
[23,] 105.96281 1392.4180
[24,] 107.95052 1219.4855
[25,] 108.90427 925.5315
[26,] 116.07047 1023.8954
[27,] 118.06483 884.3790
[28,] 121.96614 1048.5869
[29,] 123.98153 1197.1688
[30,] 135.69659 879.2622
[31,] 144.08060 1583.9188
[32,] 149.01352 1434.3975
[33,] 149.99742 1520.7753
[34,] 162.05470 1677.3496
[35,] 167.02385 1517.7867

[[3]]
[,1] [,2]
[1,] 57.89136 27293.72
[2,] 68.13218 29213.34
[3,] 76.03936 260408.34
[4,] 87.05532 146866.03
[5,] 101.84837 31550.03
[6,] 160.96516 32208.68

[[4]]
[,1] [,2]
[1,] 53.16459 60097.17
[2,] 76.03942 534112.06
[3,] 77.78919 92281.82
[4,] 87.05525 323122.44
[5,] 91.48993 69106.78
[6,] 101.85719 76224.40
[7,] 102.97038 77937.10
[8,] 133.06093 86291.38
[9,] 152.53438 78336.17

[[5]]
[,1] [,2]
[1,] 51.35890 30431.23
[2,] 56.07704 29754.07
[3,] 76.03936 241826.47
[4,] 76.43085 31852.93
[5,] 80.44976 34344.87
[6,] 82.04591 31178.42
[7,] 85.86448 29377.30
[8,] 87.05531 227901.34
[9,] 104.75587 37067.77
[10,] 112.30231 30200.48
[11,] 122.46079 35904.29
[12,] 144.23920 35326.61
[13,] 176.26358 34763.23
[14,] 202.34436 39012.65
``

pa@av_intra_params
$minfrac
[1] 0.5

$minnum
[1] 1

$ppm
[1] 5

$snr
[1] 0

$ra
[1] 0

$av
[1] "median"

$sum_i
[1] FALSE

$plim
[1] 0.8

$ra_pre
[1] 0

$snr_pre
[1] 3

$cores
[1] 3

$remove_peaks
[1] FALSE

Update bio.tools entry

Hi! I came across your tool as part of a general review of Galaxy Metabolomics resources (GCC21 cofest).

It would be great if the bio.tools entry could be updated to also contain information regarding the input and output data. This would help possible positioning of the tools inside a workflow.

Also, links to the Galaxy Services could be added to the entry in order to increase the galaxy tool's visibility (https://biotools.readthedocs.io/en/latest/curators_guide.html#linktype).

Thank you very much!

purityX does not work when fill peaks has been used

purityX does not work when fill peaks has been used.

This is because fillpeaks 'creates' peaks that were not originally peak picked. As such they do not have any corresponding scan number or retention time that lines up to the original data.

Any suggestion to summarize xcmsMatchedResults?

In the output of spectralMatching, I can get xcmsMatchedResults like this:

    pid grpid       mz    mzmin    mzmax       rt   rtmin    rtmax npeaks mzML  LCMSMS_1  LCMSMS_2
30 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
13 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
14 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
6  1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
7  1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
10 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
12 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
31 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
16 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
25 1659    12 116.0706 116.0706 116.0706 47.73467 47.6888 47.78054      2    2 130337063 124086404
   grp_name  lpid mid       dpc rdpc      cdpc mcount allcount library_precursor_type
30  M116T48 53039   7 0.9996946    1 0.6664630      1        2                 [M+H]+
13  M116T48 53818  13 0.9892785    1 0.6595190      1        2                 [M+H]+
14  M116T48 53819  14 0.9831214    1 0.6554143      1        2                 [M+H]+
6   M116T48 53824  19 0.9792574    1 0.8902340      1       10                 [M+H]+
7   M116T48 53825  20 0.9780311    1 0.6520207      1        2                 [M+H]+
10  M116T48 53815  10 0.9779941    1 0.6519961      1        2                 [M+H]+
12  M116T48 53817  12 0.9779793    1 0.6519862      1        2                 [M+H]+
31  M116T48 53827  22 0.9779269    1 0.6519513      1        2                 [M+H]+
16  M116T48 53820  15 0.9712787    1 0.6475191      1        2                 [M+H]+
25  M116T48 53830  25 0.9706852    1 0.9013506      1       13                 [M+H]+
   library_entry_name                    inchikey library_source_name library_compound_name
30            proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
13            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
14            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
6             Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
7             Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
10            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
12            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
31            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
16            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE
25            Proline ONIBWKKTOPOVIA-UHFFFAOYSA-N            massbank             L-PROLINE

There might be different compounds matched to different ms/ms spectra. Any suggestion to collapse the results so that each xcms peak group can have potential compound(s) and corresponding confidence values.

Thanks,

-Han

msPurity spectralMatching empty for negative mode

Hi Tom,

I have an issue with my negative mode metabolomics MSMS data processing with msPurity (same issue under Galaxy or under Rstudio).
I guess that it might be my bad, but i don't see where I'm wrong. My Galaxy history is here https://workflow4metabolomics.usegalaxy.fr/u/yguitton44/h/copy-of-metabo-neg-dda
If you have an idea ?

I managed to work with Positive mode but for an unknown reason I have a "No Match found" when I use the negative mode input.

I have tested manually and the MSMS of grpid 400 match Phenylalanine, on Mona web site with the accession PR309414.

Spectra extracted from MSP file generated by msPurity

NAME: MZ:164.0717 | RT:127.4 | grpid:400 | file:3 | adduct:NA
PRECURSORMZ: 164.071671352252
RETENTIONTIME: 127.374984741211
XCMS groupid (grpid): 400
COMMENT: Exported from msPurity purityA object using function createMSP, using method 'av_intra' msPurity version:1.12.2
Num Peaks: 7
72.0085405865847 402802.96875 36.04
91.0548060473143 47103.625 4.21
103.054815151088 68894.373046875 6.16
118.066677625233 6107.96728515625 0.55
147.044825781743 1117757.375 100
163.839736077141 44750.62890625 4
164.072411612146 1085359.75 97.1

I have also check that my MetaboNEG.sqlite files from createDatabase is not empty and I found that at some point the spectralMatching function is removing all MSMS spectra from the query (the q_speakmeta is empty for negative mode MSMS)

q_speakmeta <- msPurity:::filterSMeta(purity = q_purity, pol = q_pol,
instrumentTypes = q_instrumentTypes, instruments = q_instruments,
sources = q_sources, pids = q_pids, rtrange = q_rtrange,
con = q_con, xcmsGroups = q_xcmsGroups, spectraTypes = q_spectraTypes,
accessions = q_accessions)

Many thanks for your help
Yann

purity score calculation

Hi Thomas,

I'm working on a file from our laboratory and we ask ourselves some questions about purity score calculation : what is the isolation window and where we can find it ?

After some searches, we find the get_isolation_offsets function.

Our problem is that we can't find lines you are searching for :

if (!lowFound){
      low <- as.numeric(stringr::str_match(oneLine, '^.*name=\"isolation window lower offset\" value=\"([0-9]+\\.[0-9]+).*$')[,2])
      if(!is.na(low)){lowFound=TRUE}
    }

    if (!highFound){
      high <- as.numeric(stringr::str_match(oneLine, '^.*name=\"isolation window upper offset\" value=\"([0-9]+\\.[0-9]+).*$')[,2])
      if(!is.na(high)){highFound=TRUE}
    }

We worked on a Thermo Q-exactive spectrometer and the line we should search is the following :
<precursorMz precursorScanNum="367" precursorIntensity="4.065790625e05" activationMethod="HCD" windowWideness="2.0">251.104995727539</precursorMz>

And I add that we have a window here that we should divide by 2...

What do you think about that ? Is it something that you have already faced ?

Thanks !!

Julien

Correction for the reverse dot product cosine (rpdc) in the spectralMatching module

In the spectralMatching module, the codes for calculating the reverse dot product cosine (rpdc) is:
rl <- aligned$l[!aligned$q==0] rq <- aligned$q[!aligned$q==0] rdpcOut <- dpc(rq, rl)
Above codes remove the library peaks that do not match in the query spectra. According to the definition of reverse library search. We should remove the query peaks that do not match in the library spectra.My correction for the reverse dot product cosine (rpdc) is:
rl <- aligned$l[!aligned$l==0] rq <- aligned$q[!aligned$l==0] rdpcOut <- dpc(rq, rl)

Further discussion can be accessed here

fillChromPeaks and NA values

Hi Thomas !

I just ran some tests and I obtained one error when I made fillePeaks with one data test because it included NA values like this :

     mz    mzmin    mzmax        rt rtmin rtmax       into intb
26703 498.2523 498.2509 498.2528  56.96091    NA    NA 373449.017   NA
26704 498.2896 498.2889 498.2907  95.44486    NA    NA 291589.534   NA
26705 498.8040 498.8039 498.8053  37.53883    NA    NA  82723.066   NA
26706 498.8403 498.8397 498.8417  36.86524    NA    NA  58967.879   NA
26707 498.9380 498.9378 498.9383  31.88292    NA    NA  16315.707   NA
26708 499.6689 499.6676 499.6691 100.84240    NA    NA   8494.532   NA
            maxo sn sample is_filled   cid          filename rtminCorrected
26703 117439.977 NA      3         1 32416 dataset_15797.dat       54.23714
26704  82254.477 NA      3         1 32417 dataset_15797.dat       90.25576
26705  80204.953 NA      3         1 32418 dataset_15797.dat       36.72130
26706  61036.531 NA      3         1 32419 dataset_15797.dat       35.63115
26707  13742.608 NA      3         1 32420 dataset_15797.dat       30.48961
26708   8568.181 NA      3         1 32421 dataset_15797.dat       97.97043
      rtmaxCorrected
26703       60.16270
26704       97.45180
26705       38.78409
26706       38.52948
26707       32.86408
26708      101.93605

And with these values, during the matching, we obtain NA lines which result with one error during the ppmerror calculation !

Have you seen one error like this? Have you ever try with fillPeaks function before processing ?

Thanks for the answer !

groupPeaks function (clustering bug)

The groupPeaks and groupPeaksEx function has a bug that when there is more than 1 peak clustered within the same dataframe the resulting output gives duplicate rows.

The fix is average (median) the values when this happens

combineAnnotation function too slow

The combineAnnotation tool is taking too long to for the API calls to PubChem and KEGG.

An option to run with a local database is required to speed up the processing.

createDatabase function

Hi,

I would like to use the createDatabase (or create_database) function with the Galaxy wrapper (see here https://github.com/computational-metabolomics/mspurity-galaxy/blob/master/tools/msPurity/createDatabase.R)

My question is : in the documentation, you say "createdatabse replaces the create_database function" but when I go on msPurity on github, I can see that it is "create_database" function which is update ?

So which one should I use ?
Because for the moment, I obtain this error with the wrapper and createDatabase :
Error in if (pa@filter_frag_params$allfrag) { :
argument is of length zero

Because I don't want to filter something...

And I can see that I haven't this script with the "create_database" function !

So what can I do with these two functions ?

Thanks!

Julien Saint-Vanne