Code Monkey home page Code Monkey logo

nxontology-data's People

Contributors

bfoltyn avatar dhimmel avatar ravwojdyla avatar trangdata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nxontology-data's Issues

Accessing MeSH NXO

Hey @dhimmel, I was trying to pull the 2021 MeSH NXO like this:

from nxontology import NXOntology
url = "https://github.com/related-sciences/nxontology-data/raw/71cf538dc5c258ada880d58663b0205b7b7f8561/001_medical_subject_headings_mesh_desctree.json.gz"
nxo = NXOntology.read_node_link_json(url)

I was a little surprised to find that the node ids are ints and that there isn't a lot of data attached to them:

pd.Series(type(n) for n in nxo.graph.nodes).value_counts()
<class 'int'>    920388

nxo.node_info(1).data
{'name': 'Organisms Category',
 'description': None,
 'pubchem_hnid': 1269010,
 'url': 'http://www.ncbi.nlm.nih.gov/mesh/1000066'}

Is there another way to get the unique ids, class, and tree numbers (for descriptors)?

EFO cross-references: classify as exact/close when possible

background in EBISPOT/efo#935

We currently extract database cross-references for EFO using the oboInOwl:hasDbXref predicate. However, MONDO is providing xrefs with greater specificity using the mondo:exactMatch and mondo:closeMatch predicates. Furthermore, there are axioms (with rdf:type owl:Axiom) that annotate oboInOwl:hasDbXref instances with values like MONDO:equivalentTo.

EFO:0000479 is a good example of a class that has all types of xrefs:

  1. oboInOwl:hasDbXref without axioms
  2. oboInOwl:hasDbXref with axioms
  3. mondo:exactMatch and mondo:closeMatch

It would be nice to further understand the relation between 2 and 3.

MeSH: include qualifiers as a node property

It would be great for topical descriptors nodes in our MeSH ontologies to have a data attribute//property like qualifiers for the list of allowed qualifiers.

For example the disease Exostoses, Multiple Hereditary has the following descriptors available through hasDescriptor:

  • diagnostic imaging D005097
  • blood D005097
  • therapy D005097
  • history D005097
  • mortality D005097
  • prevention & control D005097
  • surgery D005097
  • diagnosis D005097
  • classification D005097
  • radiotherapy D005097
  • and some more

See also https://hhs.github.io/meshrdf/descriptor-qualifier-pairs

How to handle MeSH supplemental concepts that only map to an AllowedDescriptorQualifierPair

Some supplemental concept records (SCRs) in MeSH only have a preferredMappedTo whose predicate is a AllowedDescriptorQualifierPair rather than the usual TopicalDescriptor

For example, the SCR Disease Familial spinal arachnoiditis has preferredMappedTo to an AllowedDescriptorQualifierPair Arachnoiditis/congenital. So Arachnoiditis is the parent topical descriptor and congenital is the qualifier.

Currently, these edges are dropped:

except nxontology.exceptions.NodeNotFound:
# meshv:AllowedDescriptorQualifierPair nodes like D014199Q000031 aren't included as nodes
pass

Need to investigate whether any of these AllowedDescriptorQualifierPair parents would break our is-a / parent assumption. If they are consistent with a hierarchical conceptual relationship, then I'm thinking we just add an edge from Arachnoiditis to Familial spinal arachnoiditis with an edge property that for the congenital qualifier.

MeSH: extract pharmacological action relationships

Here's an example MeSH query for pharmacological action relationships:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT *
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
  ?source_uri meshv:pharmacologicalAction ?action_uri .
  ?source_uri rdfs:label ?source_label.
  ?action_uri rdfs:label ?action_label.
  ?source_uri meshv:identifier ?source_id.
  ?action_uri meshv:identifier ?action_id.
}
ORDER BY ?source_uri ?action_uri

That produces results like

source_uri action_uri source_label action_label source_id action_id
mesh:C000002 mesh:D000894 bevonium Anti-Inflammatory Agents, Non-Steroidal C000002 D000894
mesh:C000006 mesh:D007004 insulin, neutral Hypoglycemic Agents C000006 D007004
mesh:C000081 mesh:D000697 4-methylaminorex Central Nervous System Stimulants C000081 D000697
mesh:C000082 mesh:D000903 alanosine Antibiotics, Antineoplastic C000082 D000903
mesh:C000082 mesh:D002614 alanosine Chelating Agents C000082 D002614

Currently, we are not extracting meshv:pharmacologicalAction relationships. Should we? Either as a separate table or in the core ontology as a valid edge type?

MeSH: should we exclude non-English labels?

From the June 18, 2015 MeSH RDF release notes:

Users now must specify the language tag @en when searching rdfs:label or any other string literal. See the sample queries page (queries 5 and 6) for examples. One preferred MeSH Heading, Central Nervous System which is D002493, has non-English strings as a proof-of-concept example. This sample will remain in the beta version but may not be included in the production MeSH RDF version.

We already filter out non-English matches in our identifiers query:

OPTIONAL {
# meshv:prefLabel is used for meshv:Term
?mesh_uri rdfs:label|meshv:prefLabel ?mesh_label .
FILTER (langMatches(lang(?mesh_label), "EN")) .
}

But not in our synonyms table.

MeSH vocabulary subclass graph

Here's a display of the vocabulary subclass graph from 2022 MeSH:

image

Code to create it:

from nxontology_data.mesh.mesh import MeshLoader
from IPython.display import Image
from networkx.drawing.nx_agraph import to_agraph

vocab = MeshLoader.create_vocab_digraph(rdf)
gviz = to_agraph(vocab)
gviz.layout("dot")
Image(gviz.draw(format="png"))

Code requires pygraphviz which requires graphviz, which ends up being a problematic dependency on CI: failed on the self-hosted runner. So just posting this visualization in a GitHub issue and will remove pygraphviz from CI.

Extract MeSH mappings to external registries / vocabularies

MeSH includes some external mappings via the following predicates (from docs):

  • meshv:registryNumber: A property of Concepts. A unique identifier from one of these sources: Enzyme Commission (Example: EC 2.4.2.17; Example for Partial enzyme number: EC 1.4.3.-); Chemical Abstracts Service (CAS) (Example: 7004-12-8); FDA Substance Registration System Unique Identifier (UNII) in 10-character format (Example: R16CO5Y76E); or the value of 0 if no match is available from the previous sources. A single MeSH Concept can only have one Registry Number. Used for Concepts related to Descriptors in the D Category Drugs and Chemicals and for SupplementaryConceptRecords. MUI M0000115 example: 362O9ITL9D.

  • meshv:relatedRegistryNumber: A property of Concepts. An additional unique identifier for chemicals, which is sometimes followed by a label in parentheses. Multiple Related Registry Numbers are allowed for each Concept. For example, these might be salts and/or stereoisomers of the parent compound. Used for Concepts related to Descriptors in the D Category Drugs and Chemicals and for SupplementaryConceptRecords. MUI M0000115 example: 103-90-2 (Acetaminophen). MUI M0068239 example: 75821-71-5 (Ca salt)

  • meshv:casn1_label: A property of Concepts. Free-text of the Chemical Abstracts Type N1 Name which is the systematic name used in the Chemical Abstracts Chemical Substance and Formula Indexes. The systematic name is a unique name assigned to a chemical substance to represent its structure. First available in 1995. MUI M0000115 example: Acetamide, N-(4-hydroxyphenyl)-

Here's a query to access these:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT DISTINCT *
FROM <http://id.nlm.nih.gov/mesh>
WHERE { 
  ?concept_uri rdf:type meshv:Concept.
  ?concept_uri rdfs:label ?concept_label.
  ?concept_uri meshv:identifier ?concept_id.
  VALUES ?predicate_uri {
    meshv:registryNumber
    meshv:relatedRegistryNumber
    meshv:casn1_label
  }
  ?concept_uri ?predicate_uri ?registry_number.
  BIND( STRAFTER(STR(?predicate_uri), "mesh/vocab#") AS ?relationship_type )
  FILTER (?registry_number != "0")
}
ORDER BY ?concept_uri ?predicate_uri ?registry_number
concept_uri concept_label concept_id predicate_uri registry_number relationship_type
mesh:M0000001 Calcimycin M0000001 meshv:casn1_label 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))- casn1_label
mesh:M0000001 Calcimycin M0000001 meshv:registryNumber 37H9VM9WZL registryNumber
mesh:M0000001 Calcimycin M0000001 meshv:relatedRegistryNumber 52665-69-7 (Calcimycin) relatedRegistryNumber
mesh:M0000002 Temefos M0000002 meshv:casn1_label Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester casn1_label
mesh:M0000002 Temefos M0000002 meshv:registryNumber ONP3ME32DL registryNumber
mesh:M0000002 Temefos M0000002 meshv:relatedRegistryNumber 3383-96-8 (Temefos) relatedRegistryNumber
mesh:M0000011 Abelson murine leukemia virus M0000011 meshv:registryNumber txid11788 registryNumber
mesh:M0000055 Abrin M0000055 meshv:casn1_label Abrins casn1_label
mesh:M0000055 Abrin M0000055 meshv:registryNumber 1393-62-0 registryNumber
mesh:M0000061 Abscisic Acid M0000061 meshv:registryNumber 72S9A8J5GW registryNumber
mesh:M0000061 Abscisic Acid M0000061 meshv:relatedRegistryNumber 113349-29-4 ((Z,E)-isomer) relatedRegistryNumber

One challenge is that registry numbers appear to be local identifiers without any notation of their source.

MeSH: add name synonyms from mesh concepts / terms

It would be nice to have synonyms for each MeSH node (i.e. descriptor / SCRs).

Background

From Concept Structure in MeSH:

Terms in a MeSH record which are strictly synonymous with each other are grouped in a category called a "Concept." (Not to be confused with Supplementary Concept Records.) See the Concept element in MeSH. Each MeSH record consists of one or more Concepts, and each Concept consists in one or more synonymous terms. For example,

Cardiomegaly [Descriptor]
     Cardiomegaly                      [Concept, Preferred]
          Cardiomegaly                    [Term, Preferred]
          Enlarged Heart                  [Term]
          Heart Enlargement               [Term]
     Cardiac Hypertrophy               [Concept, Narrower]
          Cardiac Hypertrophy             [Term, Preferred]
          Heart Hypertrophy               [Term]

This Descriptor record consists of two Concepts and five terms. Each Concept has a Preferred Term, which is also said to be the name of the Concept. And each record has a Preferred Concept. The name of the record - the term most often used to refer to the Descriptor - is the Preferred Term of the preferred Concept.

Within each Concept the terms are synonymous with each other. In contrast, the terms in one Concept are not strictly synonymous with terms in another Concept, even in the same record. For example, one concept in a record may be narrower than the Preferred Concept, as in the above example. Also note that the terms in a concept inherit this relationship and so are narrower, for example, than the terms in the other concept. However, all the terms in a record are equivalent for purposes of indexing and searching MEDLINE and so they are still entry terms for the record.

A more complex example, with three Concepts and 12 terms.

AIDS Dementia Complex [Descriptor]
     AIDS Dementia Complex                                   [Concept, Preferred]
          AIDS Dementia Complex                                 [Term, Preferred]
          Acquired-Immune Deficiency Syndrome Dementia Complex  [Term]
          AIDS-Related Dementia Complex                         [Term]
          HIV Dementia                                          [Term]
          Dementia Complex, Acquired Immune Deficiency Syndrome [Term]
          Dementia Complex, AIDS-Related                        [Term]
     HIV Encephalopathy                                       [Concept, Narrower]
          HIV Encephalopathy                                    [Term, Preferred]
          AIDS Encephalopathy                                   [Term]
          Encephalopathy, HIV                                   [Term, Preferred]
          Encephalopathy, AIDS                                  [Term]
     HIV-1-Associated Cognitive Motor Complex                [Concept, Narrower]
          HIV-1-Associated Cognitive Motor Complex              [Term, Preferred]
          HIV-1 Cognitive and Motor Complex                     [Term]

... Note that this three-tiered structure is within a given record, not between separate records. This is in contrast to the MeSH Tree Structures, which are hierarchical in structure, but the relationships are between different Descriptor records. MeSH includes both types of relationships. See "Concepts, Synonyms, and Descriptor Structure" in Introduction to MeSH in XML format.

Also noting this reference from "Concepts, Synonyms, and Descriptor Structure":

Redefining a Thesaurus: Term-Centric No More
Douglas Johnston, Stuart J Nelson, Jacque-Lynne A Schulman, Allan G Savage, Tammy P Powell
Proceedings of the AMIA Symposium (1998)
PMCID: PMC2232255

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.