Comments (9)
Hi Chris,
I was thinking about how to map wikidata exposure terms to ECTO. For chemical exposures (the bulk of exposure terms), many can be easily mapped. NIOSH has added "has cause" edges between chemical exposures and the chemical itself, and the chemical often has a ChEBI ID associated with it.
For example, mapping exposure to ammonia (ECTO:0000411)
to ammonia exposure with this query, which can be mapped to the ECTO term containing "intersection_of: RO:0002233 CHEBI:16134 ! has input ammonia". Here is a query for all chemical exposures.
Of the 690 chemical exposures in wikidata, 259 cannot be mapped this way (link). However, many can be mapped through other chemical identifiers, such as Pubchem ID or rxnorm CUI (example: 1,1,2-trichloro-1,2,2-trifluoroethane exposure). Others, such as asphalt exposure may be harder/require string matching/manual curation.
587/690 have a pubchem CID or SID, CUI, and/or chebi ID (link)
from environmental-exposure-ontology.
Thanks for the comments!
For our purposes it works out well to have everything in CHEBI, as we can leverage a single ontologically based classification. We have been using this as something of a gap-filling exercise, see for example ebi-chebi/ChEBI#3218, ebi-chebi/ChEBI#3217 and other queries.
It definitely helps CHEBI if we can provide an alternate ID. So what I think we should do is take your query and extend it to match other chemical IDs. I can use this as the basis for a term request.
This may seem clunky but we are planning some tools that make this easier.
If there are no chemical IDs, then we'd find another ontology.
For exposure to asphalt, we'd use this ENVO class, see #7. ENVO is not yet mapped to wikidata. For this subset we could query and then map, and possibly feed back mappings to WD.
btw I had been keeping SPARQL queries here: https://github.com/cmungall/environmental-conditions/tree/master/src/resources/wikidata
it's a bit of a random jumble and I'm not sure of the best way of organizing these.
from environmental-exposure-ontology.
hmm, so your list includes cobalt poisoning which has-cause cobalt.
cobalt in fact is in CHEBI, or rather the various forms are:
- CHEBI:23336 ! cobalt cation
- CHEBI:27638 ! cobalt atom
- CHEBI:33888 ! cobalt molecular entity
I am not sure if there is no mapping in WD simply because this has not been done yet, or if it is a deliberate omission due to the different levels of precision.
This is something that is important to get right. Some of the time we want to treat different forms as interchangeable (e.g. conjugate bases). In other cases we want to be very careful about not mixing up different forms as they have very different toxicity, e.g. mercury.
We have some discussion of this in:
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513. http://doi.org/10.1186/1471-2164-14-513
from environmental-exposure-ontology.
Adding ecto classes for the following based on wd gap analysis:
- CHEBI:33343,"hafnium atom"
- CHEBI:81950,"Ammonium sulfamate"
- CHEBI:87362,"pentyl acetate"
- CHEBI:82256,"ANTU"
- CHEBI:27563,"arsenic atom"
- CHEBI:63317,"barium chloride"
- CHEBI:63921,"2-butoxyethanol"
- CHEBI:31344,"calcium oxide"
- CHEBI:27871,"chloroacetaldehyde"
- CHEBI:34624,"Chloroacetyl chloride"
- CHEBI:75955,"copper(II) oxide"
- CHEBI:82111,"Crufomate"
- CHEBI:4503,"Dichloroacetylene"
- CHEBI:34573,"Bis(2-chloroethyl)ether"
- CHEBI:81859,"Dalapon"
- CHEBI:38658,"dicrotophos"
- CHEBI:85259,"diethylamine"
- CHEBI:87755,"pentan-3-one"
- CHEBI:89195,"2-Methyl-4-heptanone"
- CHEBI:84254,"N,N-dimethylacetamide"
- CHEBI:82280,"Dimethylcarbamoyl chloride"
- CHEBI:81364,"Dioxation"
- CHEBI:89484,"4-Heptanone"
- CHEBI:64163,"diquat"
- CHEBI:34733,"EPN"
- CHEBI:50139,"heptan-3-one"
- CHEBI:34750,"Ethylenethiourea"
- CHEBI:34760,"fensulfothion"
- CHEBI:38689,"fonofos"
- CHEBI:16397,"formamide"
- CHEBI:64276,"glutaraldehyde"
- CHEBI:17754,"glycerol"
- CHEBI:62995,"2-methylpentane-2,4-diol"
- CHEBI:16503,"selane"
- CHEBI:30430,"indium atom"
- CHEBI:77517,"3-methyl-2-butanol"
- CHEBI:34800,"isophorone"
- CHEBI:89993,"4-Methyl-3-penten-2-one, 9CI"
- CHEBI:25219,"methacrylic acid"
- CHEBI:69441,"p-methoxyphenol"
- CHEBI:28124,"4,4'-methylene-bis-(2-chloroaniline)"
- CHEBI:53216,"dicyclohexylmethane-4,4'-diisocyanate"
- CHEBI:88432,"5-Methyl-2-hexanone"
- CHEBI:35060,"alpha-Methylstyrene"
- CHEBI:15733,"N-methylaniline"
- CHEBI:34399,"4-Chloronitrobenzene"
- CHEBI:50637,"2-nitronaphthalene"
- CHEBI:76261,"1-nitropropane"
- CHEBI:35227,"4-nitrotoluene"
- CHEBI:82367,"Phenyl glycidyl ether"
- CHEBI:34877,"N-Phenyl-2-naphthylamine"
- CHEBI:30335,"phosphorus pentachloride"
- CHEBI:38218,"isophthalonitrile"
- CHEBI:82261,"Pindone"
- CHEBI:82370,"1,3-Propane sultone"
- CHEBI:40116,"propyl acetate"
- CHEBI:33359,"rhodium atom"
- CHEBI:82125,"Fenchlorphos"
- CHEBI:30434,"selenium hexafluoride"
- CHEBI:33348,"tantalum atom"
- CHEBI:38945,"sulfotep"
- CHEBI:30469,"tellurium hexafluoride"
- CHEBI:82149,"TEPP"
- CHEBI:30183,"tetramethyllead"
- CHEBI:71240,"sodium diphosphate"
- CHEBI:28950,"N-methyl-N-picrylnitramine"
- CHEBI:37825,"p-toluidine"
- CHEBI:30956,"trichloroacetic acid"
- CHEBI:46053,"2,4,6-trinitrotoluene"
- CHEBI:84069,"pentanal"
- CHEBI:59001,"4-vinylcyclohexene dioxide"
- CHEBI:82550,"Vinylidene fluoride"
- CHEBI:33342,"zirconium atom"
- CHEBI:9195,"Soman"
from environmental-exposure-ontology.
Hi Chris,
Exposure data most likely comes from NIOSH, they are very active contributors on that end. User James Hare, also a NIOSH employee is the main contributor right now.
Regarding ChEBI, for now, I imported compounds which have a InChI key and therefore are real chemial compounds. The ontology structure with all higher level concepts is not in Wikidata yet. So for the cobalt example, only cobalt atom would have been imported, the reason why it has not been is most likely an import error, these were quite frequent for chemical elements/atoms. Mostly due to inconsistent unique IDs (e.g. PubChem ID) or issues with links to Wikipedia. These can all be fixed, count is ~1.5K.
-sebastian
from environmental-exposure-ontology.
These are likely due to errors in mapping and/or inconsistencies with the way compounds are represented.
For example, there is an item for Arsenic the element and Arsenic the chemical compound. It looks like the chemical compound arsenic was created by the NIOSH people and linked to the the element with has part
. The chemical compound arsenic has a chebi ID. Inorganic arsenic exposure is linked to the compound.
For cobalt, there is currently an element(which I've just added the chebi ID for) and a cation (which has a chebi ID). Cobalt poisoning has cause
the element cobalt. This can all be fixed... just requires some work...
from environmental-exposure-ontology.
Notes mostly to self, to be expanded later
fumes vs chemicals
id: https://www.wikidata.org/wiki/Q21175429
name: exposure to zinc chloride fume
xref: CHEBI:49976 ! zinc chloride
mixtures vs parts
id: https://www.wikidata.org/wiki/Q21175420
name: Warfarin exposure
xref: CHEBI:87732 ! 4-hydroxy-3-(3-oxo-1-phenylbutyl)-1-benzopyran-2-one
in chebi Warfarin is modeled as a mixture, with CHEBI:87732 as one of the parts:
generic class vs specific class
https://www.wikidata.org/wiki/Q21174342 monogermane exposure
caused by
https://www.wikidata.org/wiki/Q415811 germanium tetrahydride (xref CHEBI:30443 ! germane )
subclass of
https://www.wikidata.org/wiki/Q354160 germane (no CHEBI)
CHEBI doesn't distinguish between these and doesn't have a class 'monogermane'. The language translations seem confused as well between the generic and specific form
from environmental-exposure-ontology.
For example, there is an item for Arsenic the element and Arsenic the chemical compound. It looks like the chemical compound arsenic was created by the NIOSH people and linked to the the element with has part. The chemical compound arsenic has a chebi ID. Inorganic arsenic exposure is linked to the compound
Is there design patterns that people follow here, or is it dependent on who adds things?
It looks like the chemical branch implicitly follows the CHEBI design patterns (which AFAICT are not documented) in having the 'compound-has-part-element' pattern. Should the nomenclature also follow? Is it not potentially confusing to have 'Arsenic has-part Arsenic'?
And for the exposures, what if we want to add 'organic arsenic exposure' to wikidata? Would it have the same relationships as 'inorganic arsenic exposure'?
from environmental-exposure-ontology.
We try to follow design patterns when they exist and define them when they don't. If we define them, we try to document them as best we can so that they will be reused. However, Wikidata is open and editable to the world and so there is no way to enforce this! We are working on bots that will patrol well defined item types and check that they conform to a defined schema, but that doesn't exist at the moment. The typical course of action is to change it ourselves and/or leave a message on the user's talk page and suggest they change it.
I agree that is confusing. I'd lean toward having the element and molecule arsenic as one item, which is how it looks like most of the common ones (query) are already done in wikidata (example).
This can be modeled any way that would make the most sense. We can create an item for organic arsenic
, that is an instance of
organic compound
and has part
arsenic
, for example.
from environmental-exposure-ontology.
Related Issues (20)
- Issues created by ExO and NCIT OBO imports HOT 4
- NTR: exposure to infrasound
- ExO:0000001 label HOT 2
- ECTO occupational exposures HOT 3
- Improve automated mappings HOT 2
- Add SSSOM mappings as skos into ecto release
- NTRs for DO dependencies HOT 3
- Release schedule HOT 3
- Overlapping pattern ID ranges? HOT 5
- Add obofoundry topic to repo metadata HOT 1
- ECTO contains a rogue part_of property HOT 2
- ECTO can not be loaded into POET HOT 5
- 'interacts with' and interacts_with HOT 8
- NTR: exposure to vibration HOT 10
- Building ECTO HOT 13
- delete
- Blank node in named individual type assertion creates parse error HOT 3
- RO_0000052 used both as an object property and a class HOT 4
- Some FOODON classes have disappeared without a trace HOT 2
- NTRs for MRE dependencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from environmental-exposure-ontology.