Code Monkey home page Code Monkey logo

ontopilot's People

Contributors

jdeck88 avatar ramonawalls avatar stuckyb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jdeck88 tjroamer

ontopilot's Issues

generate base ontology

It should be possible to automatically generate a base ontology using information gathered from the Makefile. This would be a useful feature that would further simplify developing new ontologies. The feature would also be useful for automatically adding new import modules to the base ontology.

Java-only distribution

It would be really cool to distribute ontobuilder in a way that required nothing more than the JRE on users' computers. E.g., it might be possible to compile all of the Jython code to Java classes and distribute the whole package as a jar file.

IRI handling code in ontology.py

Clean up the code in the various methods that handle either string IRIs, OWL API IRI objects, or both. Move redundant type-checking code to a separate function or method.

individual support?

Should the build system support defining OWL individuals (i.e., instances of classes)? I'm not sure how important this is for ontology development.

default ontology IRI

If no IRI is defined in the project configuration file, a suitable IRI should be generated from the compiled ontology file path on the local file system.

sample project

The sample project is now woefully out of sync with the current state and feature set of the build system, so it needs to be comprehensively reviewed and revised.

alternative module extraction methods

Currently, module extraction only uses SyntacticLocalityModuleExtractor, but it might be useful to implement other methods. E.g., single term extraction, or something that more closely approximates the output of OntoFox.

user documentation

The build system has changed in many ways since I wrote the initial user documentation. All of the documentation needs to be reviewed and revised or expanded, as needed.

label resolution 2

Allow the label text in a text definition to differ from the display text. E.g., a text definition might naturally include a plural form, such as "flowers", whereas the associated label is singular.

consistency checking

Add an option to check the consistency of an ontology. This should also happen automatically before generating inferred axioms.

reasoner invalidation

There is currently a problem with the way reasoner instances are managed. If an ontology changes, and the reasoner is running in BUFFERED mode, then the reasoner will fall out of sync with the current state of the ontology. There are at least three options:

  1. Detect ontology changes and then immediately dispose any instantiated reasoner objects. The problem with this solution is that external code might be holding on to reasoner references that would suddenly become invalid.
  2. Detect ontology changes, mark reasoners as invalid, and somehow synchronize them before the next use. This would be ideal in terms of overhead, but probably difficult to implement correctly.
  3. Detect ontology changes and immediately synchronize all reasoner objects. This could likely introduce considerable useless overhead because reasoners would be synchronized even when they are not needed. However, this avoids any difficulty with external code hanging on to reasoner references.

Allow commenting out of lines in csv files

It would be a nice feature to be able to include an ontology in the imported_ontologies.csv file but be able to comment it out with a #. That way, I wouldn't have to retype the line for that ontology when I am playing around with including it or not.

refactor input CSV field retrieval

Similar code for handling field queries in CSV files can be found in OWLOntologyBuilder and ImportModuleBuilder. This functionality should be modularized and separated from those two classes.

auto-generate new project

It would be a very useful feature (I think) if the build system were able to automatically generate a new project with a single command. This would include the project folder structure, configuration file, base ontology, and initial top-level imports CSV file.

discovering source files

Currently, the source files for terms must be explicitly provided as a configuration variable, which also provides the order in which they are to be parsed. This could be changed in two ways:

  1. Automatically discover the source term files by looking for files in a certain directory.
  2. Try to make compilation work regardless of the order in which the source files are parsed. This could be implemented by caching term descriptions with unmet dependencies to be processed later, after all terms without unmet dependencies are defined.

Should we implement either/both of these?

support either in-source or out-of-source builds

There should be enough information in the project configuration file to support either in-source or out-of-source builds. It would be useful, I think, to add a single flag to the configuration file to indicate which approach to use, since some users might have a preference for one or the other.

build an ontology with no terms, only imports

Is this possible? When I try it, I get the following error:

make: *** No rule to make target /Users/rwalls-iplant/ppo_ingest_app/src/terms/piao_term_definitions.csv', needed by pdiao.owl'. Stop.

but I'm not sure if this is because I don't have a piao_terms_definitions file or because of some other issue.

build/release file and IRI management

I am working on finalizing a sensible (default) file and IRI management scheme, and I'd appreciate some feedback on a few questions. Here they are, in no particular order.

  1. What files should be included in a release? The options would be a "plain" ontology file (not pre-reasoned, imports not merged), a pre-reasoned ontology file, a "merged" ontology file (imports merged), or a pre-reasoned and merged ontology file.

  2. How should those files be named? Specifically, which file should get the "simple" name (e.g., ppo.owl)? How should the others be named? Examples: ppo-reasoned.owl, ppo-merged-reasoned.owl, ppo-unreasoned.owl, etc.

  3. Should releases use an explicit folder structure? In this scenario, creating a new release would mean creating a new folder, something like "./releases/2017-02-15/", and then placing the generated files inside the folder. The other option would be to depend entirely on tags in the version control system. The problem with the tag option is that it makes older releases much harder to access (from the repository, anyway). E.g., there is no easy way to pull down a specific file from a tagged commit on GitHub. Getting the ontology file from an old release would require either downloading the tag tarball and extracting the file or cloning the repository and checking out a new branch initialized at the tag point. Neither is trivial for someone who doesn't have decent working knowledge of git and/or GitHub. With the explicit folder system, getting previous ontology releases is very easy.

  4. How should IRIs be managed? I've looked at some repositories for other ontology projects, and at least one approach is to give all variations of an ontology file the same IRI, but give each a separate version IRI. So, ppo-reasoned.owl, ppo-merged-reasoned.owl, ppo-unreasoned.owl, etc. would all have the same IRI, but each would get a unique version IRI. Is that an acceptable solution? Should some or all of these files actually have different IRIs besides just the version IRI?

I have looked at how OORT handles some of these. OORT uses the explicit folder structure solution for 3. For 2 and 4, OORT seems to make a number of assumptions about file names, formats, and source organization, and I'd like to avoid those assumptions, if possible.

I need to sort all of this out in order to close out several remaining issue tickets. I also want to refactor part of the main executable and build system to make it more modular, and that will also require working out some of these details. These are all "community of practice" sorts of questions without clear objective best answers, so I didn't want to answer them on my own.

multiple comments?

Should we support defining multiple rdfs:comment annotations for a class or property? Glancing through a few published OBO ontologies suggests that this is very rarely used in practice.

separate configuration file?

Currently, a few key configuration parameters are set in the Makefile:

  1. The base IRI for the compiled ontology.
  2. The term source files and the order in which to parse them.

Might it be better to have a separate configuration file for these items, and anything else that might arise?

specification of import terms files

Currently, the build system uses a process of name matching following an expected file naming convention to derive the names of CSV files that provide the terms to import from external ontologies. This seems unduly complex, and it would probably be better to just add a column to imported_ontologies.csv that gives the name of the import terms file for each external ontology.

support compiling source files in any order

Currently, the order in which source files are compiled matters because the order in which entities are defined can impact the validity of expressions in the source files. I think this could be addressed by deferring the compilation of class expressions until after all entity names have been defined.

property chains

At some point, we might want to add support for defining property chains for object properties.

IRI management

IRI management needs to be partially or completely overhauled. The issue is that ontology compilation can generate multiple versions of an ontology (e.g., -reasoned, -merged, etc.), and they should not all share the same identifier (I don't think). We need to figure out a way to handle this as automatically as possible.

input files read twice

Currently, all input terms files are read twice, once for preprocessing term IDs and labels and once for actually defining new terms. This inefficiency should be eliminated.

base ontology build target

Add a new build target that updates the import set in the base ontology (i.e., adds import statements for local import modules). The purpose of this would be to support development workflows that use ontobuilder for imports and release management, but not necessarily for terms management.

pure Python build system

It would be nice to have the entire build system implemented in pure Python/Jython. The only remaining piece that is not pure Python is Make. So, the Make functionality would have to be moved to a pure Python implementation.

refactor entity abstraction classes

The classes that provide abstract interfaces for OWL entities (classes and properties) currently have some redundancy due to shared features, such as support for labels, comments, and definitions. There is additional redundancy between the property classes. These classes should be refactored to eliminate redundant code.

ontology metadata

It would be possible to add some ontology metadata to the configuration file, such as ontology label or contributers. Would this be useful? Or is it just as easy to directly edit the base ontology document?

expand build system to include properties

Could be implemented with classes/properties in same source file, or with classes and properties in separate source files with the command-line interface indicating what kind of terms are in a given source file.

unit testing

Implement unit testing for the core components of the build system.

command completion

Implement command completion for the UI. This could take two forms: 1) tab completion of the command name; 2) allowing the user to enter a partial command name (as long as it is enough to be unambiguous).

release pipeline

Add a build target called "release" that automatically produces a "release" version of the ontology along with associated files.

allow multiple classes and/or class expressions in some CSV fields

Add support for defining multiple equivalency axioms, multiple subclass of axioms, etc. for a class, and possibly also multiple domains, ranges, or disjointness axioms for properties. These would require supporting multiple Manchester Syntax class expressions in a single CSV cell.

label resolution 1

Allow labels to be prefixed by an OBO id (e.g., "po:whole plant"). This would allow disambiguation when more than one ontology uses the same label. This will require adding additional data structures to the label manager class.

build system refactor

Attempting to implement some new features has revealed that the current build system implementation is too monolithic. It needs to be refactored to support discrete build tasks with arbitrary dependency relationships.

"full" ontology build

The build system needs to support compiling an ontology in which all imported terms are included directly in the ontology (i.e., rather than using OWL import statements).

domains for properties

Domain values for object and data properties should support class expressions. Currently, only simple, non-anonymous classes are supported.

input format support

The code currently only supports CSV files as input. We could consider expanding this to other, richer input formats. The OpenOffice/LibreOffice OpenDocument format would be a good place to start.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.