Code Monkey home page Code Monkey logo

osdu-ontology's Introduction

OSDU Ontology

This is an open source ontology for the subsurface energy data based on 3rd release of the schema files and standards specified by the Open Subsurface Data Universe community. Please see the documentation for more information about OSDU and how the ontology is designed.

License

License The OSDU Ontology is licensed under the Apache License 2.0 - see License.

#Ontology Files Load the .ttl in your favorite ontology editor, e.g. Protege.

Ontology Generator

A tool to convert the OSDU data loading schema to an OWL3-based ontology, in .ttl format.

Dependecies

Python3, with libraries numpy and regex.

Installation

git clone https://github.com/Accenture/OSDU-Ontology.git

Usage

Download the latest OSDU schema from [this location.] (https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data/-/tree/master/rc--3.0.0/3-schema)

From a terminal in the osdu-ontology-generator folder:

python3 -m create_ontology --src path_to_full_schema/

To run metric calculation (with reporting in terminal):

python3 -m create_ontology --src path_to_full_schema/ --report_metrics

osdu-ontology's People

Contributors

ana-tudor avatar neda-abolhassani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osdu-ontology's Issues

property naming convention

There are many properties that don't conform to the lowerCamelCase convention, eg:

totalcostamount
costcurrency

Stale OSDU schema source

I saw recent activity - I am publishing the OSDU schemas for each milestone. The URL mentioned in the README is a copy.

  • The resources in that copy are based on local file references instead of schema ids
  • The resources to register the schemas in the OSDU core Schema service might be better, and reflect the latest version.
  • OSDU treats "status": "PUBLISHED" as read-only. As a consequence, there are a growing number of higher minor or patch schemas - for the ontology, I would assume only the latest version should be used. Such a list of the latest versions is available in reports, but could be generated as another artefact, easy to consume by the ontology creator.

If you are interested in a collaboration, please submit an issue on the public OSDU GitLab and assign it to me (Thomas Gehrmann [slb] @Gehrmann).

unreadable description

Several properties (osdu:workflowPersona, osdu:workflowUsage) have descriptions that do not parse:

rdfs:comment "<...> that the record is technical assurance value is valid for. " ;
rdfs:comment "<...> that the record is technical assurance value is not valid for. " ;

Even the single comments don't parse properly.
The two comments put together are truly puzzling.

use GeoSPARQL don't invent your own classes

You define a lot of your own classes following GeoJSON, eg

			osdu:AnyCrsGeoJSONPoint
			osdu:AnyCrsGeoJSONLineString
			osdu:AnyCrsGeoJSONPolygon
			osdu:AnyCrsGeoJSONMultiPoint
			osdu:AnyCrsGeoJSONMultiLineString
			osdu:AnyCrsGeoJSONMultiPolygon
			osdu:AnyCrsGeoJSONGeometryCollection
			osdu:AnyCrsGeoJSONFeature
			osdu:AbstractAnyCrsFeatureCollection
			osdu:GeoJSONPoint
			osdu:GeoJSONLineString
			osdu:GeoJSONPolygon
			osdu:GeoJSONMultiPoint
			osdu:GeoJSONMultiLineString
			osdu:GeoJSONMultiPolygon
			osdu:GeoJSONGeometryCollection
			osdu:GeoJSONFeature
			osdu:AbstractFeatureCollection

However, the OGC GeoSPARQL standard defines how to represent all of this in RDF.

  • Geometries are represented as opaque literals with datatypes gmlLiteral or wktLiteral
  • Any OGC CRS (in the EPSG collection but not only) can be used
  • Defines spatial relations such as geo:ehContains, geo:rcc8ntpp (inside), geo:sfContains
  • The standard is widely supported by semantic repositories. Upon seeing the special datatypes, they pass the geo data to special components for geospatial indexing.

many "strings" should become "things"

There are 866 data props with rdfs:range xsd:string ("strings").
However, many of them are candidates for converting to object props ("things"):

  1. "ID" props that point to existing classes, but do so indirectly, eg
    activityLevelID activityTemplateID ... consequenceCategoryID consequenceSubCategoryID ...
  2. Props where the target should perhaps be converted to a class, to capture richer info, eg
    acquisitionCompanyID acquisitionSite agency businessActivities attributionAuthority ...
  3. Enumerated props where the target could be converted to skos:Concept in its own ConceptScheme,
    acquisitionTypeID activityCodeID activityLevel activityOutcomeDetailID activityOutcomeID activityTypeID additiveTypeID agreementExternalSystem businessIntentionID ...

Note: in contrast, activityID, agreementExternalID are identifiers inside an object, so should remain strings

osdu:bbox is not properly defined

A bounding box is a structure of 2 points or 4 numbers.
But it is defined as a single number:

osdu:bbox rdf:type owl:DatatypeProperty ;
	rdfs:range xsd:decimal ;

Then it is used like this:

osdu:GeoJSONPolygon rdf:type owl:Class ;
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty osdu:bbox ;
		owl:minQualifiedCardinality "4"^^xsd:nonNegativeInteger ;
		owl:onClass xsd:decimal ;
	] ;

So in instance data you may have this:

<myGeoJSONPolygon> osdu:bbox 1,2,3,4

But RDF multivalued properties don't keep order between the values. So if you try to fetch it with SPARQL:

select * {
  <myGeoJSONPolygon> osdu:bbox ?bbox
you'll get the coordinates in random order.

reduce the use of acronyms

You use many abbreviations specific to O&G, eg

cCLTopShotDistance rdf:type owl:DatatypeProperty ;
	rdfs:comment "Distance from CCL to Interval Top " ;

An internet search indicates this is "casing collar location" (or maybe "locator"?). IMHO it would be better to spell it out in full: casingCollarLocationToTopShotDistance.

In another case casingCollarLocatorMD you spell out "CCL" (which is inconsistent with the previous case), but abbreviate "measured depth". IMHO it would be better to spell it out in full: casingCollarLocatorMeasuredDepth.

doubled restriction

Here's an example of a restriction that is stated twice, which is redundant and not useful:

osdu:AnyCrsGeoJSONLineStringCoordinatesArray rdf:type owl:Class ;
	rdfs:subClassOf osdu:Array ;
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty osdu:items ;
		owl:minQualifiedCardinality "2"^^xsd:nonNegativeInteger ;
		owl:onClass xsd:decimal ;
	] ;
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty osdu:items ;
		owl:minQualifiedCardinality "2"^^xsd:nonNegativeInteger ;
		owl:onClass xsd:decimal ;
	] ;

improve the camel casing of abbreviations

Even if you disagree with #12, IMHO it's better to treat abbreviations as "words" and then apply the camelCase convention. Eg
cCLTopShotDistance should become
cclTopShotDistance because it's easier to "parse out" the ccl part.

`boundingBoxEastBoundLongitude` etc are improperly defined

osdu:boundingBoxEastBoundLongitude rdf:type owl:DatatypeProperty ;
	rdfs:comment "Eastern longitude limit of the bounding box in degrees based on WGS 84 " ;
	rdfs:domain osdu:Extent ;
	rdfs:range gn:Feature ;
	owl:sameAs gn:longitude ;

Several mistakes here:

  • the range should be xsd:decimal not gn:Feature. This is a data prop, whereas gn:Feature is a named GeoNames object
  • Cannot be owl:sameAs gn:longitude because you're using this on 2 of your props, so they will become owl:sameAs between themselves.
  • Furthermore, you should use equivalentProperty or subPropertyOf for props (in this case I think you want to use the latter)

duplicated restrictions

osdu:Point includes this restriction repeated 17 (!) times.

rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty osdu:observationmeasureddepth ;
		owl:minCardinality "1"^^xsd:nonNegativeInteger ;

Repeating a restriction is pointless, so please diagnose what has caused this duplication. It's possible it occurs on other classes and restrictions as well

reuse QUDT or another UoM ontology

There are a number of classes/props that describe units of measure and their characteristics, eg:

UnitOfMeasure
UnitQuantity
ExternalUnitOfMeasure
ExternalUnitQuantity
baseForConversion
memberUnits

However, these don't capture all the complexity of UOMs, eg dimension vectors, conversion factors, systems of units, etc.

Reuse a well established ontology of UoM, eg QUDT, rather than making your own partial ontology.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.