aberystwythsystemsbiology / dimepy Goto Github PK

Python package for the high-throughput nontargeted metabolite fingerprinting of nominal mass direct injection mass spectrometry directly from mzML files.

Home Page: https://dimepy.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 100.00%

metabolomics python direct-infusion-metablomics statistics

dimepy's Introduction

DIMEpy: Direct Infusion MEtablomics processing in python

Important news about the future of this project

It has been more than 4 years since I started this project as a learning excercise in metabolomics. Since then the project has been downloaded almost 50,000 times! I am appreciative of everyone that has got in contact with me over the past couple of years with comments and suggestions, I've met a number of amazing people from this project and I'll forever see this as a key driver in pushing me on in the field of metabolomics.

Unfortunately, this project has become somewhat of a burden in recent years and I am overwhelmed with emails asking for assistance. I have asked numerous times for people to submit problems via the issues tracker, but this has sadly fallen on deaf ears. So, with all of the above in mind, I've decided to archive this project as read-only and suggest that those interested in new updates or submitting patches fork the project.

If you really like this project, that's ok too! The project is GPL licensed, so you can fork it and run it on whatever you like so long as you respect the terms of said license.

For my part, I'm extremely proud to have led this project, and I'm sorry I've been unable to commit more time to it for everyone. I hope you all understand.

Best wishes, Keiron

Python package for the high-throughput nontargeted metabolite fingerprinting of nominal mass direct injection mass spectrometry directly from mzML files.

Features

Loading mass spectrometry files from mzML.
- Support for polarity switching.
- MAD-estimated infusion profiling.
Assay-wide outlier spectrum detection.
Spurious peak elimination.
Spectrum export for direct dissemination using Metaboanalyst.
Spectral binning.
Value imputation.
Spectral normalisation.
- including TIC, median, mean...
Spectral transformation.
- including log10, cube, nlog, log2, glog, sqrt, ihs...
Export to array for statistical analysis in Metaboanalyst.

Installation

DIMEpy requires Python 3+ and is unfortunately not compatible with Python 2. If you are still using Python 2, a clever workaround is to install Python 3 and use that instead.

You can install it through pypi using pip:

pip install dimepy

If you want the 'bleeding edge' version this, you can also install directly from this repository using git - but beware of dragons:

pip install git+https://www.github.com/AberystwythSystemsBiology/DIMEpy

Usage

To use the package, type the following into your Python console:

>>> import dimepy

At the moment, this pipeline only supports mzML files. You can easily convert proprietary formats to mzML using ProteoWizard.

Loading a single file

If you are only going to load in a single file for fingerprint matrix estimation, then just create a new spectrum object. If the sample belongs to a characteristic, it is recommend that you also pass it through when instantiating a new Spectrum object.

>>> filepath = "/file/to/file.mzML"
>>> spec = dimepy.Spectrum(filepath, identifier="example", stratification="class_one")
/file/to/file.mzML

By default the Spectrum object doesn't set a snr estimator. It is strongly recommended that you set a signal to noise estimation method when instantiating the Spectrum object.

If your experimental protocol makes use of mixed-polarity scanning, then please ensure that you limit the scan ranges to best match what polarity you're interested in analysing:

>>> spec.limit_polarity("negative")

If you are using FIE-MS it is strongly recommended that you use just the infusion profile to generate your mass spectrum. For example, if your scan profiles look like this:

        |        _
      T |       / \
      I |      /   \_
      C |_____/       \_________________
        0     0.5     1     1.5     2 [min]

Then it is fair to assume that the infusion occured during the scans ranging from 30 seconds to 1 minute. The limit_infusion() method does this by estimating the median absolute deviation (MAD) of total ion counts (TIC) before limiting the profile to the range between the time range in which whatever multiple of MAD has been estimated:

>>> spec.limit_infusion(2) # 2 times the MAD.

Now, we are free to load in the scans to generate a base mass_spectrum:

>>> spec.load_scans()

You should now be able to access the generated mass spectrum using the masses and intensities attributes:

>>> spec.masses
array([ ... ])
>>> spec.intensities
array([ ... ])

Working with multiple files

A more realistic pipeline would be to use multiple mass-spectrum files. This is where things really start to get interesting. The SpectrumList object facilitates this through the use of the append method:

>>> speclist = dimepy.SpectrumList()
>>> speclist.append(spec)

You can make use of an iterator to recursively generate Spectrum objects, or do it manually if you want.

If you're only using this pipeline to extract mass spectrum for Metabolanalyst, then you can now simply call the _to_csv method:

>>> speclist.to_csv("/path/to/output.csv", output_type="metaboanalyst")

That being said, this pipeline contains many of the preprocessing methods found in Metaboanalyst - so it may be easier for you to just use ours.

As a diagnostic measure, the TIC can provide an estimation of factos that may adversely affect the overal intensity count of a run. As a rule, it is common to remove spectrum in which the TIC deviates 2/3 times from the median-absolute deviation. We can do this by calling the detect_outliers method:

>>> speclist.detect_outliers(thresh = 2, verbose=True)
Detected Outliers: outlier_one;outlier_two

A common first step in the analysis of mass-spectrometry data is to bin the data to a given mass-to-ion value. To do this for all Spectrum held within our SpectrumList object, simply apply the bin method:

>>> speclist.bin(0.25) # binning our data to a bin width of 0.25 m/z

In FIE-MS null values should concern no more than 3% of the total number of identified bins. However, imputation is required to streamline the analysis process (as most multivariate techniques are unable to accomodate missing data points). To perform value imputation, just use value_imputate:

>>> speclist.value_imputate()

Now transforming and normalisating the the spectrum objects in an samples independent fashion can be done using the following:

>>> speclist.transform()
>>> speclist.normalise()

Once completed, you are now free to export the data to a data matrix:

>>> speclist.to_csv("/path/to/proc_metabo.csv", output_type="matrix")

This should give you something akin to:

Sample ID	M0	M1	M2	M3	...
Sample 1	213	634	3213	546	...
Sample 2	132	34	713	6546	...
Sample 3	1337	42	69	420	...

Bug reporting and feature suggestions

Please report all bugs or feature suggestions to the issues tracker. Please do not email me directly as I'm struggling to keep track of what needs to be fixed.

We welcome all sorts of contribution, so please be as candid as you want(!)

Documentation

Documentation for the project can be found on its readthedocs page.

Contributors

Lead Developer: Keiron O'Shea ([email protected])
Developer: Rob Bolton ([email protected])
Project Supervisor: Chuan Lu ([email protected])
Project Supervisor: Luis AJ Mur ([email protected])
Methods Expert: Manfred Beckmann ([email protected])

License

DIMEpy is licensed under the GNU General Public License v3.0.

dimepy's People

Contributors

Stargazers

Watchers

Forkers

aspirincode wudangbio keirono

dimepy's Issues

Submitting Data

I stopped at above step
Can you share a short video on how to submit a data?

Replicate support?

Is it possible to use this tool for experiments that make use of assay replicates?

Extension of normalisation techniques

The following methods need to merged from existing code:

MS-total useful signal (MSTUS)
MAD
PQN
Cyclic LOWESS

Binning fails on Microsoft Windows

Platform: Microsoft Windows 10 (x64 bit)
Code to reproduce:

import dimepy
import matplotlib.pyplot as plt

sl = dimepy.SpectrumList()

s0 = dimepy.Spectrum("E:\Data\Mixes\mzml\Mix1a.mzML", polarity="positive", snr_estimator="mean")
s0.baseline_correction(qtl=0.6)
sl.append(s0)
s1 = dimepy.Spectrum("E:\Data\Mixes\mzml\Mix2a.mzML", polarity="positive", snr_estimator="mean")
s1.baseline_correction(qtl=0.6)
sl.append(s1)

slp = dimepy.SpectrumListProcessor(sl)

slp.binning(n_jobs=2)

Description: I am unable to perform multiprocessed binning on a Microsoft Windows VM.

Stacktrace:

(dimepy-dev) C:\Users\keiron\Documents>python test.py
Traceback (most recent call last):
  File "test.py", line 15, in <module>
    slp.binning(n_jobs=2)
  File "C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\SpectrumListProcessor.py", line 134, in binning
    _bin, [spectrum for spectrum in self.to_list()]).get()
  File "C:\Users\keiron\vens\dimepy-dev\lib\site-packages\multiprocess\pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

Trouble installing

Describe the bug

I've just caught up with my inbox and a couple of you have emailed me in regards to being unable to install dimepy on newer versions of Python. I can't believe I'm having to say this, but please, in future, submit an issue as I prioritise them over cold emails. My inbox is not a suitable place to report issues, or ask for help.

To Reproduce
Steps to reproduce the behavior:

Set up a virtualenv
Install dimepy through pip
Attempt to import dimepy.
See error, given below:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-44-01df2d44ecbd> in <module>
----> 1 from dimepy import Spectrum

~/Projects/mtbls/env/lib/python3.9/site-packages/dimepy/__init__.py in <module>
      2 # encoding: utf-8
      3 
----> 4 from .spectrum import Spectrum
      5 from .spectrumList import SpectrumList
      6 from .scan import Scan

~/Projects/mtbls/env/lib/python3.9/site-packages/dimepy/spectrum.py in <module>
     29 import itertools
     30 import os
---> 31 import matplotlib.pyplot as plt
     32 
     33 """

~/Projects/mtbls/env/lib64/python3.9/site-packages/matplotlib/__init__.py in <module>
    172 
    173 
--> 174 _check_versions()
    175 
    176 

~/Projects/mtbls/env/lib64/python3.9/site-packages/matplotlib/__init__.py in _check_versions()
    157     # Quickfix to ensure Microsoft Visual C++ redistributable
    158     # DLLs are loaded before importing kiwisolver
--> 159     from . import ft2font
    160 
    161     for modname, minver in [

ImportError: numpy.core.multiarray failed to import

Expected behavior
For it to be imported.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Red Hat Fedora 40 GNU/Linux
Python Version: 3.9

Extend SpectrumList to take file path.

Ideally I'd like to emulate xcms and FIEMSpro in being able to provide a filepath and some Spectrum parameters which will "automate" the loading process - as opposed to only offering .append() functionality.

TypeError when creating a "blank" Spectrum object

Platform: Microsoft Windows 10 (x64 bit)
Code to reproduce:

import dimepy

s = dimepy.Spectrum()

Description: When testing on a Windows VM, DIMEpy crashes when a blank Spectrum object is entered. Not sure if this is concurrent across all platforms.

Stacktrace:

(dimepy-dev) C:\Users\keiron\Documents>python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    s = dimepy.Spectrum()
  File "C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\Spectrum.py", line 70, in __init__
    self._get_id_from_fp()
  File "C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\Spectrum.py", line 98, in _get_id_from_fp
    self.id = os.path.splitext(os.path.basename(self.file_path))[0]
  File "C:\Users\keiron\vens\dimepy-dev\lib\ntpath.py", line 208, in basename
    return split(p)[1]
  File "C:\Users\keiron\vens\dimepy-dev\lib\ntpath.py", line 180, in split
    d, p = splitdrive(p)
  File "C:\Users\keiron\vens\dimepy-dev\lib\ntpath.py", line 115, in splitdrive
    if len(p) > 1:
TypeError: object of type 'NoneType' has no len()

list of known analytes intensity extraction

Hi,
I have list of known analytes (around 50-70), I want to extract the m/z and intensity from the dims data of the known analytes only, is there any way to apply the known list to Dimepy process?
and I want to normalize the data with respective internal std in each file. is there any way?
I am able to generate metaboanalyst ZIP data for multiple files, but still, I am facing the problem with matrix data file generation for multiple files, can you provide example python code for matrix file generation?

thanks
Satish

Documentation

Right now, there exists zero documentation other than the example provided in the README. It would be good to generate some documentation with Sphinx and create some - perhaps hosting on a site like readthedocs.

RuntimeWarning: invalid value encountered in greater

Platform: Microsoft Windows 10 (x64 bit)
Code to reproduce:

import dimepy

s = dimepy.Spectrum("E:\Data\Mixes\mzml\Mix1a.mzML", polarity="positive", snr_estimator="mean")

Description: When testing on a Windows VM, The following warnings are given.

Stacktrace:

(dimepy-dev) C:\Users\keiron\Documents>python test.py
C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\Spectrum.py:262: RuntimeWarning:

divide by zero encountered in divide

C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\Spectrum.py:262: RuntimeWarning:

invalid value encountered in divide

C:\Users\keiron\vens\dimepy-dev\lib\site-packages\dimepy\Spectrum.py:263: RuntimeWarning:

invalid value encountered in greater

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.