ontopilot's People
ontopilot's Issues
generate base ontology
It should be possible to automatically generate a base ontology using information gathered from the Makefile. This would be a useful feature that would further simplify developing new ontologies. The feature would also be useful for automatically adding new import modules to the base ontology.
Java-only distribution
It would be really cool to distribute ontobuilder in a way that required nothing more than the JRE on users' computers. E.g., it might be possible to compile all of the Jython code to Java classes and distribute the whole package as a jar file.
extract import modules sources from ontology sources
Write new code to extract import module sources from CSV ontology sources, with terms drawn from class definitions (formal and textual) and superclass relationships.
IRI handling code in ontology.py
Clean up the code in the various methods that handle either string IRIs, OWL API IRI objects, or both. Move redundant type-checking code to a separate function or method.
individual support?
Should the build system support defining OWL individuals (i.e., instances of classes)? I'm not sure how important this is for ontology development.
disjoint with for classes
Add support for disjointness axioms for classes.
default ontology IRI
If no IRI is defined in the project configuration file, a suitable IRI should be generated from the compiled ontology file path on the local file system.
sample project
The sample project is now woefully out of sync with the current state and feature set of the build system, so it needs to be comprehensively reviewed and revised.
alternative module extraction methods
Currently, module extraction only uses SyntacticLocalityModuleExtractor, but it might be useful to implement other methods. E.g., single term extraction, or something that more closely approximates the output of OntoFox.
user documentation
The build system has changed in many ways since I wrote the initial user documentation. All of the documentation needs to be reviewed and revised or expanded, as needed.
label resolution 2
Allow the label text in a text definition to differ from the display text. E.g., a text definition might naturally include a plural form, such as "flowers", whereas the associated label is singular.
consistency checking
Add an option to check the consistency of an ontology. This should also happen automatically before generating inferred axioms.
reasoner invalidation
There is currently a problem with the way reasoner instances are managed. If an ontology changes, and the reasoner is running in BUFFERED mode, then the reasoner will fall out of sync with the current state of the ontology. There are at least three options:
- Detect ontology changes and then immediately dispose any instantiated reasoner objects. The problem with this solution is that external code might be holding on to reasoner references that would suddenly become invalid.
- Detect ontology changes, mark reasoners as invalid, and somehow synchronize them before the next use. This would be ideal in terms of overhead, but probably difficult to implement correctly.
- Detect ontology changes and immediately synchronize all reasoner objects. This could likely introduce considerable useless overhead because reasoners would be synchronized even when they are not needed. However, this avoids any difficulty with external code hanging on to reasoner references.
Allow commenting out of lines in csv files
It would be a nice feature to be able to include an ontology in the imported_ontologies.csv file but be able to comment it out with a #. That way, I wouldn't have to retype the line for that ontology when I am playing around with including it or not.
refactor input CSV field retrieval
Similar code for handling field queries in CSV files can be found in OWLOntologyBuilder and ImportModuleBuilder. This functionality should be modularized and separated from those two classes.
non-OBO terms in definition label expansions
Extend the definition label expansion feature so it also supports non-OBO entities.
auto-generate new project
It would be a very useful feature (I think) if the build system were able to automatically generate a new project with a single command. This would include the project folder structure, configuration file, base ontology, and initial top-level imports CSV file.
error reporting when processing deferred axioms
Need better error reporting when exceptions are encountered while processing deferred entity axioms.
arbitrary annotations
Would be good to support arbitrary annotations for entities.
pre-reason ontologies
The build system needs to be able to generate pre-reasoned ontologies.
import full ontologies
Add the ability to import full ontologies from the top-level imports source file.
discovering source files
Currently, the source files for terms must be explicitly provided as a configuration variable, which also provides the order in which they are to be parsed. This could be changed in two ways:
- Automatically discover the source term files by looking for files in a certain directory.
- Try to make compilation work regardless of the order in which the source files are parsed. This could be implemented by caching term descriptions with unmet dependencies to be processed later, after all terms without unmet dependencies are defined.
Should we implement either/both of these?
make input column names case insensitive
There is no fundamental reason for expected column names to depend on casing, so the software should be flexible enough to accept any column name casing.
support either in-source or out-of-source builds
There should be enough information in the project configuration file to support either in-source or out-of-source builds. It would be useful, I think, to add a single flag to the configuration file to indicate which approach to use, since some users might have a preference for one or the other.
build an ontology with no terms, only imports
Is this possible? When I try it, I get the following error:
make: *** No rule to make target /Users/rwalls-iplant/ppo_ingest_app/src/terms/piao_term_definitions.csv', needed by
pdiao.owl'. Stop.
but I'm not sure if this is because I don't have a piao_terms_definitions file or because of some other issue.
build/release file and IRI management
I am working on finalizing a sensible (default) file and IRI management scheme, and I'd appreciate some feedback on a few questions. Here they are, in no particular order.
-
What files should be included in a release? The options would be a "plain" ontology file (not pre-reasoned, imports not merged), a pre-reasoned ontology file, a "merged" ontology file (imports merged), or a pre-reasoned and merged ontology file.
-
How should those files be named? Specifically, which file should get the "simple" name (e.g., ppo.owl)? How should the others be named? Examples: ppo-reasoned.owl, ppo-merged-reasoned.owl, ppo-unreasoned.owl, etc.
-
Should releases use an explicit folder structure? In this scenario, creating a new release would mean creating a new folder, something like "./releases/2017-02-15/", and then placing the generated files inside the folder. The other option would be to depend entirely on tags in the version control system. The problem with the tag option is that it makes older releases much harder to access (from the repository, anyway). E.g., there is no easy way to pull down a specific file from a tagged commit on GitHub. Getting the ontology file from an old release would require either downloading the tag tarball and extracting the file or cloning the repository and checking out a new branch initialized at the tag point. Neither is trivial for someone who doesn't have decent working knowledge of git and/or GitHub. With the explicit folder system, getting previous ontology releases is very easy.
-
How should IRIs be managed? I've looked at some repositories for other ontology projects, and at least one approach is to give all variations of an ontology file the same IRI, but give each a separate version IRI. So, ppo-reasoned.owl, ppo-merged-reasoned.owl, ppo-unreasoned.owl, etc. would all have the same IRI, but each would get a unique version IRI. Is that an acceptable solution? Should some or all of these files actually have different IRIs besides just the version IRI?
I have looked at how OORT handles some of these. OORT uses the explicit folder structure solution for 3. For 2 and 4, OORT seems to make a number of assumptions about file names, formats, and source organization, and I'd like to avoid those assumptions, if possible.
I need to sort all of this out in order to close out several remaining issue tickets. I also want to refactor part of the main executable and build system to make it more modular, and that will also require working out some of these details. These are all "community of practice" sorts of questions without clear objective best answers, so I didn't want to answer them on my own.
design patterns support
It would be nice to support functionality similar to what OntoRat provides.
multiple comments?
Should we support defining multiple rdfs:comment annotations for a class or property? Glancing through a few published OBO ontologies suggests that this is very rarely used in practice.
separate configuration file?
Currently, a few key configuration parameters are set in the Makefile:
- The base IRI for the compiled ontology.
- The term source files and the order in which to parse them.
Might it be better to have a separate configuration file for these items, and anything else that might arise?
specification of import terms files
Currently, the build system uses a process of name matching following an expected file naming convention to derive the names of CSV files that provide the terms to import from external ontologies. This seems unduly complex, and it would probably be better to just add a column to imported_ontologies.csv that gives the name of the import terms file for each external ontology.
support compiling source files in any order
Currently, the order in which source files are compiled matters because the order in which entities are defined can impact the validity of expressions in the source files. I think this could be addressed by deferring the compilation of class expressions until after all entity names have been defined.
property chains
At some point, we might want to add support for defining property chains for object properties.
IRI management
IRI management needs to be partially or completely overhauled. The issue is that ontology compilation can generate multiple versions of an ontology (e.g., -reasoned, -merged, etc.), and they should not all share the same identifier (I don't think). We need to figure out a way to handle this as automatically as possible.
input files read twice
Currently, all input terms files are read twice, once for preprocessing term IDs and labels and once for actually defining new terms. This inefficiency should be eliminated.
base ontology build target
Add a new build target that updates the import set in the base ontology (i.e., adds import statements for local import modules). The purpose of this would be to support development workflows that use ontobuilder for imports and release management, but not necessarily for terms management.
pure Python build system
It would be nice to have the entire build system implemented in pure Python/Jython. The only remaining piece that is not pure Python is Make. So, the Make functionality would have to be moved to a pure Python implementation.
refactor entity abstraction classes
The classes that provide abstract interfaces for OWL entities (classes and properties) currently have some redundancy due to shared features, such as support for labels, comments, and definitions. There is additional redundancy between the property classes. These classes should be refactored to eliminate redundant code.
ontology metadata
It would be possible to add some ontology metadata to the configuration file, such as ontology label or contributers. Would this be useful? Or is it just as easy to directly edit the base ontology document?
expand build system to include properties
Could be implemented with classes/properties in same source file, or with classes and properties in separate source files with the command-line interface indicating what kind of terms are in a given source file.
unit testing
Implement unit testing for the core components of the build system.
command completion
Implement command completion for the UI. This could take two forms: 1) tab completion of the command name; 2) allowing the user to enter a partial command name (as long as it is enough to be unambiguous).
release pipeline
Add a build target called "release" that automatically produces a "release" version of the ontology along with associated files.
subclass and equivalentclass for class expressions
When parsing Manchester Syntax class definitions, we need to support implementing these in either subclass or equivalency axioms.
allow multiple classes and/or class expressions in some CSV fields
Add support for defining multiple equivalency axioms, multiple subclass of axioms, etc. for a class, and possibly also multiple domains, ranges, or disjointness axioms for properties. These would require supporting multiple Manchester Syntax class expressions in a single CSV cell.
label resolution 1
Allow labels to be prefixed by an OBO id (e.g., "po:whole plant"). This would allow disambiguation when more than one ontology uses the same label. This will require adding additional data structures to the label manager class.
build system refactor
Attempting to implement some new features has revealed that the current build system implementation is too monolithic. It needs to be refactored to support discrete build tasks with arbitrary dependency relationships.
"full" ontology build
The build system needs to support compiling an ontology in which all imported terms are included directly in the ontology (i.e., rather than using OWL import statements).
domains for properties
Domain values for object and data properties should support class expressions. Currently, only simple, non-anonymous classes are supported.
input format support
The code currently only supports CSV files as input. We could consider expanding this to other, richer input formats. The OpenOffice/LibreOffice OpenDocument format would be a good place to start.
remove owltools dependencies
Write jython version of module build system to remove owltools dependencies.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.