Code Monkey home page Code Monkey logo

Comments (13)

caufieldjh avatar caufieldjh commented on September 15, 2024

Output of newly added script get_all_transform_stats.sh:

All processed ontologies:
910
All successful JSON transforms:
896
All successful KGX TSV transforms:
888
Transforms with at least one of the following errors:
MISSING_NODE_PROPERTY
0
MISSING_EDGE_PROPERTY
0
INVALID_NODE_PROPERTY
833
INVALID_EDGE_PROPERTY
796
INVALID_NODE_PROPERTY_VALUE_TYPE
28
INVALID_NODE_PROPERTY_VALUE
833
INVALID_EDGE_PROPERTY_VALUE_TYPE
0
INVALID_EDGE_PROPERTY_VALUE
796
MISSING_CATEGORY
0
INVALID_CATEGORY
888
Category 'OntologyClass' is a mixin in the Biolink Model
888
MISSING_EDGE_PREDICATE
0
INVALID_EDGE_PREDICATE
495
MISSING_NODE_CURIE_PREFIX
0
DUPLICATE_NODE
0
MISSING_NODE
0
INVALID_EDGE_TRIPLE
0
VALIDATION_SYSTEM_ERROR
0

The big take-home here is that entities in every transform gets assigned biolink:OntologyClass despite Biolink modeling OntologyClass as a class mixin rather than intending it to be a class type itself.

Do we know enough about each ontology to assign a mode specific class to nodes?

There are other metaclasses, like [biolink:TaxonomicRank](https://w3id.org/biolink/vocab/TaxonomicRank) - these may still make sense to use in some contexts.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Finding appropriate mappings vs. Biolink is a goal for kgx - that will help to reduce the number of OntologyClass nodes.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Completely failed transforms:

ID Name Issue
NIFSTD Neuroscience Information Framework (NIF) Standard Ontology #15
EXACT An ontology for experimental actions #15 ; Small, alpha status, last uploaded 2014
DOID Human Disease Ontology #15 ; Unknown - would really expect this to work
ECOCORE An ontology of core ecological entities Empty? Error in Bioportal? Last updated Mar 10 2022
ETHIOPIADISEASES EthiopiaDiseaseList Empty? Does not render on Bioportal
LC-CARRIERS Library of Congress Carriers Scheme Empty? In SKOS format; does not render on Bioportal
SCDO Sickle Cell Disease Ontology #15
FENICS Functional Epilepsy Nomenclature for Ion Channels #15 ; Using webprotege: prefix (unsure if related to transform fail)
FOVT FuTRES Ontology of Vertebrate Traits #15
PTRANS Pathogen Transmission Ontology #15 ; Does not render on Bioportal
TIMEBANK Timebank Ontology #15
GSSO Gender, Sex, and Sexual Orientation Ontology #15
CST Cancer Staging Terms Unknown ; Does not render on Bioportal
MARC-RELATORS MARC Code List for Relators Empty? Does not render on Bioportal

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Transforms translating to Obojson but not to KGX TSV:

ID Name Issue
PDRO The Prescription of Drugs Ontology Unknown CURIE prefix: file
VICO Vaccination Informed Consent Ontology Unknown CURIE prefix: file
IXNO Interaction Ontology Last updated in 2011; Unknown CURIE prefix: file
IDQA Image and Data Quality Assessment Ontology Unknown CURIE prefix: file
KTAO Kidney Tissue Atlas Ontology Unknown CURIE prefix: file
GAZ Gazetteer Unknown CURIE prefix: file; KG-OBO transforms GAZ w/o issue, see https://kg-hub.berkeleybop.io/kg-obo/gaz/no_version/
CANONT Upper-Level Cancer Ontology Last updated in 2012; Unknown CURIE prefix: file

These are generally issues with the OBONamespace set to a local file path, and in at least one case (VICO) it's because of references to another namespace beginning with file: (GAZ).

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

See #23 for Unknown CURIE prefix: file issue.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

With issues #15 and #23 resolved, the only remaining problematic transforms are:

  • ECOCORE (use current BioPortal submission)
  • ETHIOPIADISEASES (drop)
  • LC-CARRIERS (drop)
  • CST (drop)
  • MARC-RELATORS (drop)

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

ECOCORE has a new version on BioPortal - can just use this for now:
https://bioportal.bioontology.org/ontologies/ECOCORE/?p=summary

Can drop LC-CARRIERS and MARC-RELATORS.

from bioportal-to-kgx.

jvendetti avatar jvendetti commented on September 15, 2024

Hi Harry.

ETHIOPIADISEASES

The latest submission in our system was corrupt. I recreated/reprocessed the submission so that the ontology is accessible again:

https://bioportal.bioontology.org/ontologies/ETHIOPIADISEASES?p=summary

CST

It looks like the end user uploaded an ontology source file for this entry, but we were never able to load the data into the triplestore, because our code errors out when we try to serialize to RDF/XML format with the following error:

org.semanticweb.owlapi.rdf.rdfxml.renderer.IllegalElementNameException: Illegal Element Name (Element Is Not A QName): http://www.w3.org/2000/01/rdf-schema#comment:

I think this one could probably be dropped for now.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Great - thanks @jvendetti !

from bioportal-to-kgx.

jvendetti avatar jvendetti commented on September 15, 2024

Hi @caufieldjh. It turns out that the maintainers of ETHIOPIADISEASE told John that they no longer need this entry in BioPortal. I had originally reprocessed it, but I've now deleted the entry.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Great, thanks! One more off the list.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Updated statistics, including for types:

*** General ontology counts:
All processed ontologies:       910
All successful JSON transforms: 906
All successful KGX TSV transforms:      903
All transforms with KGX validation logs:        902
All transforms with ROBOT measure reports:      883
All transforms with ROBOT validation reports:   904
Ontologies with failed transforms:      
./transformed/ontologies/ETHIOPIADISEASES
./transformed/ontologies/LC-CARRIERS
./transformed/ontologies/CST
*** Transforms with at least one of the following errors:
MISSING_NODE_PROPERTY   0
MISSING_EDGE_PROPERTY   0
INVALID_NODE_PROPERTY   844
INVALID_EDGE_PROPERTY   807
INVALID_NODE_PROPERTY_VALUE_TYPE        31
INVALID_NODE_PROPERTY_VALUE     844
INVALID_EDGE_PROPERTY_VALUE_TYPE        0
INVALID_EDGE_PROPERTY_VALUE     807
MISSING_CATEGORY        0
INVALID_CATEGORY        902
Category 'OntologyClass' is a mixin in the Biolink Model        902
MISSING_EDGE_PREDICATE  0
INVALID_EDGE_PREDICATE  502
MISSING_NODE_CURIE_PREFIX       0
DUPLICATE_NODE  0
MISSING_NODE    0
INVALID_EDGE_TRIPLE     0
VALIDATION_SYSTEM_ERROR 0
*** Node type counts:
biolink:NamedThing      731
biolink:OntologyClass   903
biolink:BiologicalProcess       76
biolink:Cell    110
biolink:CellularComponent       46
biolink:ChemicalSubstance       119
biolink:Disease 15
biolink:Event   2
biolink:ExposureEvent   3
biolink:Gene    9
biolink:MolecularActivity       49
biolink:NamedThing      731
biolink:OntologyClass   903
biolink:OrganismalEntity        128
biolink:Pathway 6
biolink:PhenotypicFeature       44
biolink:Protein 79
biolink:SequenceFeature 56
biolink:SexQualifier    1
biolink:Source  2
biolink:TaxonomicRank   3
biolink:Unit    2
biolink:AnatomicalEntity        112
*** Edge type counts (i.e., predicate types):
biolink:related_to      376
biolink:subclass_of     899
biolink:part_of 52
biolink:inverseOf       408
biolink:subPropertyOf   449
biolink:has_part        165
biolink:has_participant 99
biolink:has_unit        29
biolink:preceded_by     69
biolink:has_attribute   76
biolink:positively_regulates    35
biolink:negatively_regulates    37

This includes all node types across all ontologies, and a selection of the more common predicate types.
Note that these are largely the result of type assignment by KGX.
As expected, nodes with biolink:NamedThing or biolink:OntologyClass are ubiquitous, suggesting that many may be re-assigned to more informative types.
Though predicate types appear more consistent, there is a long tail of sparsely-used types (not shown) across all ontologies.

from bioportal-to-kgx.

caufieldjh avatar caufieldjh commented on September 15, 2024

Closing issue as complete - reopen as needed

from bioportal-to-kgx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.