Code Monkey home page Code Monkey logo

Comments (5)

jorainer avatar jorainer commented on August 16, 2024

For ChEBI (2018-12-03):

  • No. of compounds: 46765
  • No. of compounds without inchi: 7075
  • No. of compounds with inchi: 39690.
  • No. of unique inchis: 38946

So, we don't have an InChI for all of them and we have compounds with the same InChI! Apart from the name and the ID these compounds are however identical:

      compound_id                            compound_name
8564  CHEBI:17775   7,9-dihydro-1H-purine-2,6,8(3H)-trione
18506 CHEBI:46811 2,6-dihydroxy-7,9-dihydro-8H-purin-8-one
18507 CHEBI:46814                    9H-purine-2,6,8-triol
18509 CHEBI:46817                    7H-purine-2,6,8-triol
18513 CHEBI:46823                    1H-purine-2,6,8-triol
27249 CHEBI:62589     6-hydroxy-1H-purine-2,8(7H,9H)-dione
                                                                         inchi
8564  InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18506 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18507 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18509 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18513 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
27249 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
                        inchi_key  formula    mass
8564  LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18506 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18507 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18509 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18513 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
27249 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
> 

Question is whether these compounds would have different MS2 spectra? If so it would not make sense to combine them!

Some of the compounds without an inchi are listed below:

     compound_id            compound_name inchi inchi_key
3    CHEBI:10003     ribostamycin sulfate  <NA>      <NA>
15   CHEBI:10036                wax ester  <NA>      <NA>
91   CHEBI:10283     2-hydroxy fatty acid  <NA>      <NA>
140  CHEBI:10545                 electron  <NA>      <NA>
148  CHEBI:10583        kappa-carrageenan  <NA>      <NA>
154 CHEBI:106304 sphingomyelin d18:1/16:0  <NA>      <NA>
                     formula    mass
3       C17H34N4O10.(H2O4S)n      NA
15                     CO2R2  43.990
91  C2H3O3R __ C2H3O3R(CH2)n  75.008
140                     <NA>   0.000
148            (C12H17O12S)n      NA
154              C39H79N2O6P 702.568

from compounddb.

SiggiSmara avatar SiggiSmara commented on August 16, 2024

In the case of CHEBI:46814 and CHEBI:46817 for instance (and I suspect the rest of them) then they are not the same chemical at first glance (see below, different locations of a hydrogen), but in fact they are tautomers of each other. This is also indicated in the CHEBI entries of some of them if you look them up in CHEBI. That means they readily convert from one to the other without any external input (energy or otherwise) and thus should really be thought of as a mixture of all of them. The MS2 spectrum "should" be similar if not identical, buut the actualy ionization conditions (pH, buffer ions etc) might also have a big effect leading to different MS2 spectra.

Here I would suggest to get input from people that are actually working with tautomers to hear what they have to say about it.

46814
46814

and
46817
46817

from compounddb.

jorainer avatar jorainer commented on August 16, 2024

Thanks for your input @SiggiSmara ! I'll try to get some input from people actually working with MS2 spectra and identification.

from compounddb.

stanstrup avatar stanstrup commented on August 16, 2024

I have no experience with tautomers but one option could be to use the SMILES where this is explicit. You can also generate a non-standard InChI with the fixed-H layer from the SMILES.

from compounddb.

jorainer avatar jorainer commented on August 16, 2024

Had also feedback from Steffen. They use the same approach than pubchem: a compound table with unique InChI and a substance table with additional annotations (eventually multiple entries per compound).

from compounddb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.