bihealth / neatms Goto Github PK

View Code? Open in Web Editor NEW

26.0 8.0 3.0 45.74 MB

NeatMS is an open source python package for untargeted LCMS signal labelling and filtering.

Home Page: https://neatms.readthedocs.io/en/latest/

License: MIT License

Python 78.13% Jupyter Notebook 21.87%

neural-network lcms peak-detection

neatms's Introduction

NeatMS

Introduction

NeatMS is an open source python package for untargeted LCMS signal labelling and filtering. NeatMS enables automated filtering of false positive MS¹ peaks reported by commonly used LCMS data processing pipelines. NeatMS relies on neural network based classification. NeatMS is distributed as an open source software under the MIT License.

Source code

The source code and related materials (e.g. tutorials, example data, neural network model) are available at https://github.com/bihealth/NeatMS.

Reference & publication

Publication is currently pending, please use NeatMS github repository as reference when citing NeatMS.

Documentation

Documentation is available at https://neatms.readthedocs.io/en/latest/

neatms's People

Contributors

Stargazers

Watchers

Forkers

alexzajac ricmonteiro

neatms's Issues

KeyError: 1.0 - Error while running XCMS input generating 'experiment'

While generating the experiment variable from the ntms.Experiment method I get an error while loading in my own data. I've followed the XCMS R input that you have on xcms_input_table_gen.

I can send data and feature_table if needed.

raw_data_folder_path = '../data/neg_data/'
feature_table_path = '../data/NeatMS_aligned_feature_table.csv'

# This is important for NeatMS to read the feature table correctly
input_data = 'XCMS'

exp = ntms.Experiment(raw_data_folder_path, feature_table_path, input_data)
  2021-09-12 13:41:54,483 | INFO : Data reader backend: pymzml
  2021-09-12 13:41:54,483 | INFO : Loading file 1 / 41
  2021-09-12 13:41:54,483 | INFO : Loading file 2 / 41
  2021-09-12 13:41:54,483 | INFO : Loading file 3 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 4 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 5 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 6 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 7 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 8 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 9 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 10 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 11 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 12 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 13 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 14 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 15 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 16 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 17 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 18 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 19 / 41
  2021-09-12 13:41:54,484 | INFO : Loading file 20 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 21 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 22 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 23 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 24 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 25 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 26 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 27 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 28 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 29 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 30 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 31 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 32 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 33 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 34 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 35 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 36 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 37 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 38 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 39 / 41
  2021-09-12 13:41:54,485 | INFO : Loading file 40 / 41
  2021-09-12 13:41:54,486 | INFO : Loading file 41 / 41
  2021-09-12 13:41:54,486 | INFO : Loading feature table: /data/NeatMS_aligned_feature_table.csv
  2021-09-12 13:41:54,486 | INFO : Feature table format: XCMS
  2021-09-12 13:41:55,132 | INFO : Feature table contains aligned peaks
  2021-09-12 13:41:55,144 | INFO : Loading 14483 features and 330493 peaks 
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/local/lib/python3.9/site-packages/NeatMS/experiment.py", line 33, in __init__
      self.load_feature_table(feature_table_path, labels)
    File "/usr/local/lib/python3.9/site-packages/NeatMS/experiment.py", line 141, in load_feature_table
      feature_table.load_features(self.samples)
    File "/usr/local/lib/python3.9/site-packages/NeatMS/feature.py", line 639, in load_features
      self.load_unaligned_features(unaligned_feature_array, sample_map)
    File "/usr/local/lib/python3.9/site-packages/NeatMS/feature.py", line 492, in load_unaligned_features
      sample = sample_map[feature_array[10][entry]]
  KeyError: 1.0

Results show more samples than samples in input

Dear

I have been trying out NeatMS on LC-MS data processed with XCMS.
First remark: not all my peaks in XCMS were assigned to a feature, which caused the presence of NA's in the feature.id column of some peaks, so I deleted the peaks that didn't get a feature assigned, I'm not quite sure if this is good practice.
Everything else seems to be working as expected, but when I took a look at the results (using the code from the tutorial) I get this output:

# Feature collection: 7124
Number of consensus features labeled as 'High quality':
   of size 10 :   2100
   of size  9 :    953
   of size  0 :    749
   of size  8 :    668
   of size  7 :    532
   of size  6 :    431
   of size  4 :    362
   of size  1 :    345
   of size  5 :    336
   of size  3 :    281
   of size  2 :    262
   of size 11 :     27
   of size 13 :     16
   of size 12 :     14
   of size 14 :     10
   of size 15 :      9
   of size 20 :      8
   of size 19 :      7
   of size 16 :      6
   of size 17 :      6
   of size 18 :      1
   of size 23 :      1
        total :   7124

If I understand it correctly, this says that the amount of features present in x samples is y. So how can a feature be present in more samples than there are input samples (10)?

All files for reproducing are attached. And mzML files can be downloaded here: https://we.tl/t-mWPcWAXEGc
aligned peaks: aligned_pos_no_NA.csv
notebook: NeatMS.zip

Thanks in advance for the help!

Kind regards
Pablo

Number of peaks per sample in NeatMS experiment does not match

Hello,

I am following the tutorial of NeatMS for basic usage.

I have an experiment with 28 files that I have processed (peak picking, retention time adjustment based on subset, peak grouping, peak filling) in R with XCMS (version 3.16.1). I created the feature table as per the documentation and successfully created a NeatMS experiment.

However I do not understand the numbers of peaks that are returned when I run this code:

for sample in experiment.samples: print('Sample {} : {} peaks'.format(sample.name,len(sample.feature_list)))

I will focus on one sample as an example (sample number 1, name = 03_MS31_20221004_stdmix_03.mzML).

I get the following from the code above for this sample:

Sample 03_MS31_20221004_stdmix_03 : 7039 peaks

However, according to the XCMS object ("res") in R and the feature_dataframe generated from it this sample should have 7638 peaks :

> feature_dataframe %>% filter(sample == 1) %>% nrow()
[1] 7638
>feature_dataframe %>% filter(sample_name == "03_MS31_20221004_stdmix_03.mzML") %>% nrow()
[1] 7638
> chromPeakData(filterFile(res, 1)) %>% nrow()
[1] 7638

The peak number before filling is 3261

> feature_dataframe <- as.data.frame(chromPeaks(dropAdjustedRtime(res)))
> feature_dataframe %>% filter(sample == 1) %>% nrow()
[1] 3261

Can you please help me understand where the number 7039 peaks is coming from?

Index Error in annotation tool

When trying to generate training data with the annotation tool, the tool crashes after around 20-30 labels with an Index Error (see text below). Any advice would be appreciated. I am happy to share my notebook and data with you if necessary.

IndexError Traceback (most recent call last)
c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\peak.py in get_chromatogram(
self=<NeatMS.peak.Peak object>,
margin=1
)
106 rt_start, rt_end = self.get_window_margin(margin)
107 # We extract the values inside the peak window (mz and RT window)
--> 108 chromatogram = self.sample.extract_chromatogram(rt_start, rt_end, self.mz_min, self.mz_max)
chromatogram = undefined
self.sample.extract_chromatogram = <bound method Sample.extract_chromatogram of <NeatMS.sample.Sample object at 0x000002D0DA4AB188>>
rt_start = 16.961999999999996
rt_end = 17.562000000000005
self.mz_min = 1053.8347677336883
self.mz_max = 1053.8979997167116
109 # Containt [rt, intensity, mz]
110 return chromatogram

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\sample.py in extract_chromatogram(
self=<NeatMS.sample.Sample object>,
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
36
37 def extract_chromatogram(self, rt_start, rt_end, mz_min, mz_max):
---> 38 return self.raw_data.extract_chromatogram(rt_start, rt_end, mz_min, mz_max)
self.raw_data.extract_chromatogram = <bound method RawData.extract_chromatogram of <NeatMS.data.RawData object at 0x000002D0DA9ED1C8>>
rt_start = 16.961999999999996
rt_end = 17.562000000000005
mz_min = 1053.8347677336883
mz_max = 1053.8979997167116
39
40

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\data.py in extract_chromatogram(
self=<NeatMS.data.RawData object>,
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
26 def extract_chromatogram(self, rt_start, rt_end, mz_min, mz_max):
27 if self.MS1 is None:
---> 28 chromatogram = self.reader.extract_chromatogram(self.file, rt_start, rt_end, mz_min, mz_max)
chromatogram = undefined
self.reader.extract_chromatogram = <bound method PymzmlDataReader.extract_chromatogram of <NeatMS.data.PymzmlDataReader object at 0x000002D0DA6B9808>>
self.file = WindowsPath('../data/mzMLs/covid_plasma/tmp/B1_NIST1950_3_6540.mzML')
rt_start = 16.961999999999996
rt_end = 17.562000000000005
mz_min = 1053.8347677336883
mz_max = 1053.8979997167116
29 else:
30 # If The MS1 data has been extracted, then extract the chromatogram directly from the array (splited into two lines for comprehension purposes)

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\data.py in extract_chromatogram(
self=<NeatMS.data.PymzmlDataReader object>,
file_path=WindowsPath('../data/mzMLs/covid_plasma/tmp/B1_NIST1950_3_6540.mzML'),
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
111 # One more step is required as a single rt can contain several mz values, we need to find and sum those values
112 # Extract all unique rt values
--> 113 all_rt = np.unique(chromatogram[2])
all_rt = undefined
global np.unique = <function unique at 0x000002D1400B24C8>
global chromatogram = undefined
114 # Sum Intensities for rt values that are not unique (Equivalent of pandas groupby but using numpy arrays)
115 # Average mz for rt values that are not unique (mz values are kept for consistensy but not used)

IndexError: index 2 is out of bounds for axis 0 with size 0

feature_id from aligned_features not translated to NeatMS output

Hi there,
I have exported XCMS features from patRoon (rickhelmus/patRoon) and created an aligned feature list. After running NeatMS, the feature_id values that were originally in the aligned_features.csv are overridden with sequential numbers. Is it possible to keep the feature_id values throughout the analysis?

Thanks,

Doesn't seem to support targeted mzML data right?

Hello,
I have a question as shown in the title.
When I do predictions with untargeted mzML data, I successfully get results. But when I make predictions on my targeted mzML data, here in predict_peaks() I get an error that I cannot resolve. So I guess this is caused by the algorithm not supporting processing targeted data.

Annotation tool - dash host issue

Add host option to annotation tool to prevent the tool from crashing and enable users to change address

About predict_peaks()

Hello, I have a queation about running NeatMS code. When I run to predict_peaks(), show the error below:
ValueError: Unexpected result of predict_function (Empty batch_outputs). Please use Model.compile(..., run_eagerly=True), or tf.config.run_functions_eagerly(True) for more information of where went wrong, or file a issue/bug to tf.keras.
How can I fix this problem, thanks.

Getting Empty DataFrame/CSV file from experiment

Hi,
I've been trying to export the experiment to a dataframe / csv file from the sample data provided and I keep getting an empty dataframe and / or file. I am able to see peaks and interrogate the data in other ways, but for some reason it won't give me any data when I try to export. Any ideas? Here is my code if that helps:

import NeatMS as ntms
raw_data_folder_path = 'C:/Test/mzML'
feature_table_path = 'C:/Test/aligned_features.csv'
input_data = 'mzmine'
experiment = ntms.Experiment(raw_data_folder_path, feature_table_path, input_data)
my_dataframe = experiment.export_to_dataframe()
print(my_dataframe)