Code Monkey home page Code Monkey logo

Comments (9)

stuppie avatar stuppie commented on June 19, 2024

Hi Chris,
I was thinking about how to map wikidata exposure terms to ECTO. For chemical exposures (the bulk of exposure terms), many can be easily mapped. NIOSH has added "has cause" edges between chemical exposures and the chemical itself, and the chemical often has a ChEBI ID associated with it.

For example, mapping exposure to ammonia (ECTO:0000411) to ammonia exposure with this query, which can be mapped to the ECTO term containing "intersection_of: RO:0002233 CHEBI:16134 ! has input ammonia". Here is a query for all chemical exposures.

Of the 690 chemical exposures in wikidata, 259 cannot be mapped this way (link). However, many can be mapped through other chemical identifiers, such as Pubchem ID or rxnorm CUI (example: 1,1,2-trichloro-1,2,2-trifluoroethane exposure). Others, such as asphalt exposure may be harder/require string matching/manual curation.

587/690 have a pubchem CID or SID, CUI, and/or chebi ID (link)

from environmental-exposure-ontology.

cmungall avatar cmungall commented on June 19, 2024

Thanks for the comments!

For our purposes it works out well to have everything in CHEBI, as we can leverage a single ontologically based classification. We have been using this as something of a gap-filling exercise, see for example ebi-chebi/ChEBI#3218, ebi-chebi/ChEBI#3217 and other queries.

It definitely helps CHEBI if we can provide an alternate ID. So what I think we should do is take your query and extend it to match other chemical IDs. I can use this as the basis for a term request.

This may seem clunky but we are planning some tools that make this easier.

If there are no chemical IDs, then we'd find another ontology.

For exposure to asphalt, we'd use this ENVO class, see #7. ENVO is not yet mapped to wikidata. For this subset we could query and then map, and possibly feed back mappings to WD.

btw I had been keeping SPARQL queries here: https://github.com/cmungall/environmental-conditions/tree/master/src/resources/wikidata

it's a bit of a random jumble and I'm not sure of the best way of organizing these.

from environmental-exposure-ontology.

cmungall avatar cmungall commented on June 19, 2024

hmm, so your list includes cobalt poisoning which has-cause cobalt.

cobalt in fact is in CHEBI, or rather the various forms are:

I am not sure if there is no mapping in WD simply because this has not been done yet, or if it is a deliberate omission due to the different levels of precision.

This is something that is important to get right. Some of the time we want to treat different forms as interchangeable (e.g. conjugate bases). In other cases we want to be very careful about not mixing up different forms as they have very different toxicity, e.g. mercury.

We have some discussion of this in:

Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513. http://doi.org/10.1186/1471-2164-14-513

from environmental-exposure-ontology.

cmungall avatar cmungall commented on June 19, 2024

Adding ecto classes for the following based on wd gap analysis:

from environmental-exposure-ontology.

sebotic avatar sebotic commented on June 19, 2024

Hi Chris,
Exposure data most likely comes from NIOSH, they are very active contributors on that end. User James Hare, also a NIOSH employee is the main contributor right now.

Regarding ChEBI, for now, I imported compounds which have a InChI key and therefore are real chemial compounds. The ontology structure with all higher level concepts is not in Wikidata yet. So for the cobalt example, only cobalt atom would have been imported, the reason why it has not been is most likely an import error, these were quite frequent for chemical elements/atoms. Mostly due to inconsistent unique IDs (e.g. PubChem ID) or issues with links to Wikipedia. These can all be fixed, count is ~1.5K.

-sebastian

from environmental-exposure-ontology.

stuppie avatar stuppie commented on June 19, 2024

These are likely due to errors in mapping and/or inconsistencies with the way compounds are represented.

For example, there is an item for Arsenic the element and Arsenic the chemical compound. It looks like the chemical compound arsenic was created by the NIOSH people and linked to the the element with has part. The chemical compound arsenic has a chebi ID. Inorganic arsenic exposure is linked to the compound.

For cobalt, there is currently an element(which I've just added the chebi ID for) and a cation (which has a chebi ID). Cobalt poisoning has cause the element cobalt. This can all be fixed... just requires some work...

from environmental-exposure-ontology.

cmungall avatar cmungall commented on June 19, 2024

Notes mostly to self, to be expanded later

fumes vs chemicals

id: https://www.wikidata.org/wiki/Q21175429
name: exposure to zinc chloride fume
xref: CHEBI:49976 ! zinc chloride

mixtures vs parts

id: https://www.wikidata.org/wiki/Q21175420
name: Warfarin exposure
xref: CHEBI:87732 ! 4-hydroxy-3-(3-oxo-1-phenylbutyl)-1-benzopyran-2-one

in chebi Warfarin is modeled as a mixture, with CHEBI:87732 as one of the parts:

image

generic class vs specific class

https://www.wikidata.org/wiki/Q21174342 monogermane exposure

caused by

https://www.wikidata.org/wiki/Q415811 germanium tetrahydride (xref CHEBI:30443 ! germane )

subclass of

https://www.wikidata.org/wiki/Q354160 germane (no CHEBI)

CHEBI doesn't distinguish between these and doesn't have a class 'monogermane'. The language translations seem confused as well between the generic and specific form

from environmental-exposure-ontology.

cmungall avatar cmungall commented on June 19, 2024

@stuppie

For example, there is an item for Arsenic the element and Arsenic the chemical compound. It looks like the chemical compound arsenic was created by the NIOSH people and linked to the the element with has part. The chemical compound arsenic has a chebi ID. Inorganic arsenic exposure is linked to the compound

Is there design patterns that people follow here, or is it dependent on who adds things?

It looks like the chemical branch implicitly follows the CHEBI design patterns (which AFAICT are not documented) in having the 'compound-has-part-element' pattern. Should the nomenclature also follow? Is it not potentially confusing to have 'Arsenic has-part Arsenic'?

And for the exposures, what if we want to add 'organic arsenic exposure' to wikidata? Would it have the same relationships as 'inorganic arsenic exposure'?

from environmental-exposure-ontology.

stuppie avatar stuppie commented on June 19, 2024

We try to follow design patterns when they exist and define them when they don't. If we define them, we try to document them as best we can so that they will be reused. However, Wikidata is open and editable to the world and so there is no way to enforce this! We are working on bots that will patrol well defined item types and check that they conform to a defined schema, but that doesn't exist at the moment. The typical course of action is to change it ourselves and/or leave a message on the user's talk page and suggest they change it.

I agree that is confusing. I'd lean toward having the element and molecule arsenic as one item, which is how it looks like most of the common ones (query) are already done in wikidata (example).

This can be modeled any way that would make the most sense. We can create an item for organic arsenic, that is an instance of organic compound and has part arsenic, for example.

from environmental-exposure-ontology.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.