related-sciences / nxontology-data Goto Github PK
View Code? Open in Web Editor NEWNXOntology data: making ontologies accessible as simple JSON files
License: Other
NXOntology data: making ontologies accessible as simple JSON files
License: Other
Hey @dhimmel, I was trying to pull the 2021 MeSH NXO like this:
from nxontology import NXOntology
url = "https://github.com/related-sciences/nxontology-data/raw/71cf538dc5c258ada880d58663b0205b7b7f8561/001_medical_subject_headings_mesh_desctree.json.gz"
nxo = NXOntology.read_node_link_json(url)
I was a little surprised to find that the node ids are ints and that there isn't a lot of data attached to them:
pd.Series(type(n) for n in nxo.graph.nodes).value_counts()
<class 'int'> 920388
nxo.node_info(1).data
{'name': 'Organisms Category',
'description': None,
'pubchem_hnid': 1269010,
'url': 'http://www.ncbi.nlm.nih.gov/mesh/1000066'}
Is there another way to get the unique ids, class, and tree numbers (for descriptors)?
background in EBISPOT/efo#935
We currently extract database cross-references for EFO using the oboInOwl:hasDbXref
predicate. However, MONDO is providing xrefs with greater specificity using the mondo:exactMatch
and mondo:closeMatch
predicates. Furthermore, there are axioms (with rdf:type owl:Axiom
) that annotate oboInOwl:hasDbXref
instances with values like MONDO:equivalentTo
.
EFO:0000479
is a good example of a class that has all types of xrefs:
oboInOwl:hasDbXref
without axiomsoboInOwl:hasDbXref
with axiomsmondo:exactMatch
and mondo:closeMatch
It would be nice to further understand the relation between 2 and 3.
It would be great for topical descriptors nodes in our MeSH ontologies to have a data attribute//property like qualifiers
for the list of allowed qualifiers.
For example the disease Exostoses, Multiple Hereditary has the following descriptors available through hasDescriptor:
See also https://hhs.github.io/meshrdf/descriptor-qualifier-pairs
Some supplemental concept records (SCRs) in MeSH only have a preferredMappedTo
whose predicate is a AllowedDescriptorQualifierPair
rather than the usual TopicalDescriptor
For example, the SCR Disease Familial spinal arachnoiditis has preferredMappedTo to an AllowedDescriptorQualifierPair Arachnoiditis/congenital. So Arachnoiditis is the parent topical descriptor and congenital is the qualifier.
Currently, these edges are dropped:
nxontology-data/nxontology_data/mesh/mesh.py
Lines 200 to 202 in 33ea9e9
Need to investigate whether any of these AllowedDescriptorQualifierPair parents would break our is-a / parent assumption. If they are consistent with a hierarchical conceptual relationship, then I'm thinking we just add an edge from Arachnoiditis to Familial spinal arachnoiditis with an edge property that for the congenital qualifier.
Here's an example MeSH query for pharmacological action relationships:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT *
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
?source_uri meshv:pharmacologicalAction ?action_uri .
?source_uri rdfs:label ?source_label.
?action_uri rdfs:label ?action_label.
?source_uri meshv:identifier ?source_id.
?action_uri meshv:identifier ?action_id.
}
ORDER BY ?source_uri ?action_uri
That produces results like
source_uri | action_uri | source_label | action_label | source_id | action_id |
---|---|---|---|---|---|
mesh:C000002 | mesh:D000894 | bevonium | Anti-Inflammatory Agents, Non-Steroidal | C000002 | D000894 |
mesh:C000006 | mesh:D007004 | insulin, neutral | Hypoglycemic Agents | C000006 | D007004 |
mesh:C000081 | mesh:D000697 | 4-methylaminorex | Central Nervous System Stimulants | C000081 | D000697 |
mesh:C000082 | mesh:D000903 | alanosine | Antibiotics, Antineoplastic | C000082 | D000903 |
mesh:C000082 | mesh:D002614 | alanosine | Chelating Agents | C000082 | D002614 |
Currently, we are not extracting meshv:pharmacologicalAction
relationships. Should we? Either as a separate table or in the core ontology as a valid edge type?
From the June 18, 2015 MeSH RDF release notes:
Users now must specify the language tag
@en
when searchingrdfs:label
or any other string literal. See the sample queries page (queries 5 and 6) for examples. One preferred MeSH Heading, Central Nervous System which isD002493
, has non-English strings as a proof-of-concept example. This sample will remain in the beta version but may not be included in the production MeSH RDF version.
We already filter out non-English matches in our identifiers query:
nxontology-data/nxontology_data/mesh/queries/identifiers.rq
Lines 14 to 18 in e55c903
But not in our synonyms table.
Here's a display of the vocabulary subclass graph from 2022 MeSH:
Code to create it:
from nxontology_data.mesh.mesh import MeshLoader
from IPython.display import Image
from networkx.drawing.nx_agraph import to_agraph
vocab = MeshLoader.create_vocab_digraph(rdf)
gviz = to_agraph(vocab)
gviz.layout("dot")
Image(gviz.draw(format="png"))
Code requires pygraphviz
which requires graphviz
, which ends up being a problematic dependency on CI: failed on the self-hosted runner. So just posting this visualization in a GitHub issue and will remove pygraphviz from CI.
MeSH includes some external mappings via the following predicates (from docs):
meshv:registryNumber
: A property of Concepts. A unique identifier from one of these sources: Enzyme Commission (Example: EC 2.4.2.17; Example for Partial enzyme number: EC 1.4.3.-); Chemical Abstracts Service (CAS) (Example: 7004-12-8); FDA Substance Registration System Unique Identifier (UNII) in 10-character format (Example: R16CO5Y76E); or the value of 0 if no match is available from the previous sources. A single MeSH Concept can only have one Registry Number. Used for Concepts related to Descriptors in the D Category Drugs and Chemicals and for SupplementaryConceptRecords. MUI M0000115 example: 362O9ITL9D.
meshv:relatedRegistryNumber
: A property of Concepts. An additional unique identifier for chemicals, which is sometimes followed by a label in parentheses. Multiple Related Registry Numbers are allowed for each Concept. For example, these might be salts and/or stereoisomers of the parent compound. Used for Concepts related to Descriptors in the D Category Drugs and Chemicals and for SupplementaryConceptRecords. MUI M0000115 example: 103-90-2 (Acetaminophen). MUI M0068239 example: 75821-71-5 (Ca salt)
meshv:casn1_label
: A property of Concepts. Free-text of the Chemical Abstracts Type N1 Name which is the systematic name used in the Chemical Abstracts Chemical Substance and Formula Indexes. The systematic name is a unique name assigned to a chemical substance to represent its structure. First available in 1995. MUI M0000115 example: Acetamide, N-(4-hydroxyphenyl)-
Here's a query to access these:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT DISTINCT *
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
?concept_uri rdf:type meshv:Concept.
?concept_uri rdfs:label ?concept_label.
?concept_uri meshv:identifier ?concept_id.
VALUES ?predicate_uri {
meshv:registryNumber
meshv:relatedRegistryNumber
meshv:casn1_label
}
?concept_uri ?predicate_uri ?registry_number.
BIND( STRAFTER(STR(?predicate_uri), "mesh/vocab#") AS ?relationship_type )
FILTER (?registry_number != "0")
}
ORDER BY ?concept_uri ?predicate_uri ?registry_number
concept_uri | concept_label | concept_id | predicate_uri | registry_number | relationship_type |
---|---|---|---|---|---|
mesh:M0000001 | Calcimycin | M0000001 | meshv:casn1_label | 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))- | casn1_label |
mesh:M0000001 | Calcimycin | M0000001 | meshv:registryNumber | 37H9VM9WZL | registryNumber |
mesh:M0000001 | Calcimycin | M0000001 | meshv:relatedRegistryNumber | 52665-69-7 (Calcimycin) | relatedRegistryNumber |
mesh:M0000002 | Temefos | M0000002 | meshv:casn1_label | Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester | casn1_label |
mesh:M0000002 | Temefos | M0000002 | meshv:registryNumber | ONP3ME32DL | registryNumber |
mesh:M0000002 | Temefos | M0000002 | meshv:relatedRegistryNumber | 3383-96-8 (Temefos) | relatedRegistryNumber |
mesh:M0000011 | Abelson murine leukemia virus | M0000011 | meshv:registryNumber | txid11788 | registryNumber |
mesh:M0000055 | Abrin | M0000055 | meshv:casn1_label | Abrins | casn1_label |
mesh:M0000055 | Abrin | M0000055 | meshv:registryNumber | 1393-62-0 | registryNumber |
mesh:M0000061 | Abscisic Acid | M0000061 | meshv:registryNumber | 72S9A8J5GW | registryNumber |
mesh:M0000061 | Abscisic Acid | M0000061 | meshv:relatedRegistryNumber | 113349-29-4 ((Z,E)-isomer) | relatedRegistryNumber |
One challenge is that registry numbers appear to be local identifiers without any notation of their source.
It would be nice to have synonyms for each MeSH node (i.e. descriptor / SCRs).
From Concept Structure in MeSH:
Terms in a MeSH record which are strictly synonymous with each other are grouped in a category called a "Concept." (Not to be confused with Supplementary Concept Records.) See the Concept element in MeSH. Each MeSH record consists of one or more Concepts, and each Concept consists in one or more synonymous terms. For example,
Cardiomegaly [Descriptor] Cardiomegaly [Concept, Preferred] Cardiomegaly [Term, Preferred] Enlarged Heart [Term] Heart Enlargement [Term] Cardiac Hypertrophy [Concept, Narrower] Cardiac Hypertrophy [Term, Preferred] Heart Hypertrophy [Term]
This Descriptor record consists of two Concepts and five terms. Each Concept has a Preferred Term, which is also said to be the name of the Concept. And each record has a Preferred Concept. The name of the record - the term most often used to refer to the Descriptor - is the Preferred Term of the preferred Concept.
Within each Concept the terms are synonymous with each other. In contrast, the terms in one Concept are not strictly synonymous with terms in another Concept, even in the same record. For example, one concept in a record may be narrower than the Preferred Concept, as in the above example. Also note that the terms in a concept inherit this relationship and so are narrower, for example, than the terms in the other concept. However, all the terms in a record are equivalent for purposes of indexing and searching MEDLINE and so they are still entry terms for the record.
A more complex example, with three Concepts and 12 terms.
AIDS Dementia Complex [Descriptor] AIDS Dementia Complex [Concept, Preferred] AIDS Dementia Complex [Term, Preferred] Acquired-Immune Deficiency Syndrome Dementia Complex [Term] AIDS-Related Dementia Complex [Term] HIV Dementia [Term] Dementia Complex, Acquired Immune Deficiency Syndrome [Term] Dementia Complex, AIDS-Related [Term] HIV Encephalopathy [Concept, Narrower] HIV Encephalopathy [Term, Preferred] AIDS Encephalopathy [Term] Encephalopathy, HIV [Term, Preferred] Encephalopathy, AIDS [Term] HIV-1-Associated Cognitive Motor Complex [Concept, Narrower] HIV-1-Associated Cognitive Motor Complex [Term, Preferred] HIV-1 Cognitive and Motor Complex [Term]
... Note that this three-tiered structure is within a given record, not between separate records. This is in contrast to the MeSH Tree Structures, which are hierarchical in structure, but the relationships are between different Descriptor records. MeSH includes both types of relationships. See "Concepts, Synonyms, and Descriptor Structure" in Introduction to MeSH in XML format.
Also noting this reference from "Concepts, Synonyms, and Descriptor Structure":
Redefining a Thesaurus: Term-Centric No More
Douglas Johnston, Stuart J Nelson, Jacque-Lynne A Schulman, Allan G Savage, Tammy P Powell
Proceedings of the AMIA Symposium (1998)
PMCID: PMC2232255
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.