Code Monkey home page Code Monkey logo

Comments (8)

michaelwitting avatar michaelwitting commented on June 30, 2024 1

Well, this is why I'm working here with the MassBank record format. It is rich in metadata and human readable, but also easy to parse due to partially controlled variables. The functions I wrote are reading from this format to a Spectra object and then will also write to a MassBank record.

By the way: I implemented two new spectra comparison methods in masstrixR. One is a standards forward score aka dot product, but aligns the spectra instead of binning and the second one is a reverse score (reverse dot product), which uses only peaks that are in the library spectrum. If the match is good both should be quite high, if the forward is low and the backward high then you have a lot "contaminating" peaks in your query spectrum or it is just the wrong hit.

from compounddb.

stanstrup avatar stanstrup commented on June 30, 2024

Yes, those don't have spectra.

from compounddb.

jorainer avatar jorainer commented on June 30, 2024

Little complication from HMDB: HMDB provides one xml file for each spectrum associated with a compound. Now it can be that the same spectrum (same values) are associated to different compounds. HMDB uses the same spectrum_id, but provides two (or more) xml files, one for each compound_id.

Complicated solution to handle this would be:

  • insert only unique spectra to the spectrum table.
  • Add an additional table providing the mapping between spectrum and compound tables (to handle the n:m mapping).

Disadvantage: queries are more complicated, possibly slower.

Simple solution:

  • insert each spectrum as it is provided, but assigning own, internal and unique, IDs to each spectrum.

from compounddb.

michaelwitting avatar michaelwitting commented on June 30, 2024

I have at least a function that can read from MassBank records. Well that doesn't help with MoNA but with all other MassBank related records.
Check my masstrixR package later that day. There is a branch called masstrixR_RaMoNA_merge. It is based on our in-house tool MassTRIX [1]. There might be also some other usefull functions we can use / reuse.

[1] http://dx.plos.org/10.1371/journal.pone.0039860

from compounddb.

jorainer avatar jorainer commented on June 30, 2024

Cool! thanks for your input @michaelwitting . I had a look at the MoNa SDF file and it should be straight forward to extract all relevant information (compound annotations and spectra) from that. It's just a bummer that every database/resource uses own identifiers and nomenclature.

from compounddb.

jorainer avatar jorainer commented on June 30, 2024

when you talk MassBank record format - where do you get that data? Is it from https://github.com/MassBank/MassBank-data ? apparently not MoNa...

from compounddb.

michaelwitting avatar michaelwitting commented on June 30, 2024

Yes, for example. We use also the MassBank format for our internal database.
Regarding the MoNA JSON: It is very inconsistent. When I read some data from their webservice I'm having difficulties to get the entries I would like to access. Not every library they have has exactly the same format. Maybe it is different with the JSON files...

from compounddb.

jorainer avatar jorainer commented on June 30, 2024

Import of open data from MassBank is discussed in issue #34.

from compounddb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.