Code Monkey home page Code Monkey logo

metabolite-substructures's Introduction

Characteristic Substructure of Metabolites Application

This project is an implementation of a method(PDF) that should help identify metabolites. To do this it requires:

  • A fragmentation pattern of a molecule.
  • Potential candidate molecules obtained by performing a lookup with the spectral data on a major database/search engine of chemicals such as HMDB or PubChem.

It will then produce a Characteristic Substructure (CS) that is a representative molecule of all the input molecules. This can then be optionally fed into a tool such as CFM-ID that breaks the molecule back down into the fragmentation pattern via machine learning & heuristics: Ideally, if the original pattern matches this one, then the algorithm has produced a good representation.

As this is a lot of information here is a visualisation of how the application is intended to be used.

Alt text

More details to how the algorithms work and the general flow of the application are detailed here

Getting Started

This section explains the requirements of the application and how to get it running.

Prerequisites

The application has the following dependencies:

  • Python 2 - it is the main language this is implemented in.
  • Python Enum - needed for compatibility purposes.
  • NetworkX - a graph library that is used to create the Characteristic Substructure.

Optionally

  • MatplotLib, numpy and rdkit are required to draw molecules.
  • PubChemPy - required if you want to do lookups on the PubChem database.
  • CFM-ID - used to create a fragmentation pattern from a molecule. An older version is already included by default for testing purposes (is a Windows binary, so will probably not work with Linux distributions).

Installing

To be able to install the application the aforementioned dependencies are required.

From there, all that is needed is to download the src folder and run src/main.py.

Running

The application is given inputs in the command line and has 2 modes:

main.py [-h] {cs,rm}

{cs,rm}
  cs        Find characteristic substructure and optionally use
            fragmentation comparison.
  rm        Find a the best-matching molecule from a list, when building 2
            CS.

The cs creates a characteristic substructure from a molecule list and will draw it if the given libraries are installed. The rm is an experimental process that allows the closest fitting molecule between several lists of candidate molecules via CS.

At the bare minimum you will need to input python main.py cs FILE_NAME to create a CS, although there are plenty more options that can be viewed via python main.py cs -h.

Example

There are some example files of metabolites & spectral patterns that can be used as inputs in test_data and results are output into output_data. Here is an example of running one of them:

python main.py cs ../test_data/acetylaminofluorenes.txt

which will create cs.txt in output_data/acetylaminofluorenes, containing the created CS and an image of it. Optionally we can also specify an -img flag to have it drawn in the same folder as cs.png.

Contributing

Please contact me if you want to contribute to this project. There are possible enhancements in regard of the characteristic substructure and heuristic choices.

Authors

Acknowledgments

metabolite-substructures's People

Contributors

niklasz avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.