kbss-cvut / termit Goto Github PK

View Code? Open in Web Editor NEW

9.0 6.0 8.0 17.13 MB

An advanced SKOS terminology manager linking concepts to their definitions in documents

License: GNU General Public License v3.0

Java 95.17% Shell 0.19% HTML 4.46% Dockerfile 0.04% Ruby 0.13%

java skos rdf ufo rest

termit's People

Contributors

Stargazers

Watchers

Forkers

datagov-cz michalmed aahmadai psiotwo alanbuzek holecekm filip-kopecky mighantos

termit's Issues

Migrate to Spring Boot

As a developer, I want to migrate the project configuration to Spring Boot.

This will allow easier configuration w.r.t. virtualized environments like Docker.

Snapshoty pojmů vrácené TermIt API neobsahují některé atributy

Endpoint se snapshotem pojmu by měl vracet v elementu properties tyto atributy:
http://onto.fel.cvut.cz/ontologies/slovnik/slovník-datového-modelu-dtm/pojem/je-reálným-objektem
http://purl.org/dc/terms/references
http://www.w3.org/2004/02/skos/core#notation
http://www.w3.org/2004/02/skos/core#example

Tady příklad pojmu v aktuálních datech a ve verzi, kdy datově by měly být totožné:

Term removal does not delete `hasTopConcept` statements

If a term is a root term of a vocabulary and is removed, the skos:hasTopConcept referencing the term from the glossary remains in the repository.

Import of vocabulary does not generate default document

When a vocabulary is imported, TermIt fails to generate a document for such vocabulary. In contrast, when a vocabulary is created (not through import, but through create vocabulary form), a document is generated for it.

Integration with Keycloak

In order to facilitate compatibility with the SGoV assembly line, TermIt has to be able to use Keycloak as an authorization service.

However, to retain backwards compatibility, it also has to be able to run without it, using its internal authentication mechanisms for secure access to the application.

Note that this issue involves backend as well as frontend of TermIt.

Configure Docker Compose to preserve logs

Currently, the Docker Compose setup does logs only to system out, so the output is lost on restart. As a system admin, I need to be able to examine logs from before last restart.

Allow vocabulary context IRI to be different from vocabulary IRI

In order to support the new architecture of the SGoV Assembly Line, identifiers of vocabulary contexts need not coincide with the identifiers of the vocabularies they contain.

TermIt needs to adapt to this change. Also, since vocabularies (and thus their contexts) may be created externally by the assembly line, TermIt must be able to update whatever information it holds as to the contexts vocabularies are stored in.

Document update fails due to JSON-LD deserialization exception

The following exception is thrown when attempting to update a document:

cz.cvut.kbss.jsonld.exception.AmbiguousTargetTypeException: Object with types [http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/pojem/zdroj, http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/pojem/dokument] matches multiple equivalent target classes: [class cz.cvut.kbss.termit.dto.listing.DocumentDto, class cz.cvut.kbss.termit.model.resource.Document]
	at cz.cvut.kbss.jsonld.deserialization.util.TargetClassResolver.ambiguousTargetType(TargetClassResolver.java:133)
	at cz.cvut.kbss.jsonld.deserialization.util.TargetClassResolver.selectFinalTargetClass(TargetClassResolver.java:105)
	at cz.cvut.kbss.jsonld.deserialization.util.TargetClassResolver.getTargetClass(TargetClassResolver.java:82)
	at cz.cvut.kbss.jsonld.deserialization.expanded.Deserializer.resolveTargetClass(Deserializer.java:51)
	at cz.cvut.kbss.jsonld.deserialization.expanded.ObjectDeserializer.openObject(ObjectDeserializer.java:79)
	at cz.cvut.kbss.jsonld.deserialization.expanded.ObjectDeserializer.processValue(ObjectDeserializer.java:60)
	at cz.cvut.kbss.jsonld.deserialization.expanded.ExpandedJsonLdDeserializer.deserialize(ExpandedJsonLdDeserializer.java:61)
	at cz.cvut.kbss.jsonld.jackson.deserialization.JacksonJsonLdDeserializer.deserialize(JacksonJsonLdDeserializer.java:85)
	at cz.cvut.kbss.jsonld.jackson.deserialization.JacksonJsonLdDeserializer.deserializeWithType(JacksonJsonLdDeserializer.java:120)

Endpoint: rest/resources/document

Allow opening a set of vocabularies for editing

To facilitate collaborative creation and maintenance of multiple vocabularies, TermIt must be able to open only a selected set of vocabularies for editing and treating any other vocabularies as read-only. This should be session-based, so that multiple requests from the same user can work with the same set of vocabularies.

All vocabulary contexts are available for editing by default (this will ensure compatibility with the current behavior).

An API for opening a set of vocabularies (or rather a set of vocabulary contexts) has to be added.
Information about which contexts are open for editing by a user is stored in a session (server-side or client-side (token)).
List of vocabularies contains only the vocabularies open for editing.
References to vocabularies outside of the specified set (e.g., when a term from another vocabulary is referenced via a SKOS relationship) are read-only. I.e., they are accessible, but editing such vocabularies (the terms they contain) is forbidden.
termit-ui must be able to parse this set of vocabularies from a URL and set-up the working context accordingly.

Rewrite vocabulary history of content retrieval

The current implementation of a vocabulary content history retrieval is extremely inefficient, as it retrieves all change records related to the repository. There can be thousands of those, so the loading takes minutes and there are megabytes of data sent to the client which then only needs the grouped changes per day (added/edited every day).
This should be rewritten so that the backend immediately returns the aggregated changes.

Ensure TermIt ontology is in a separate context in the repository

As a developer, I want to keep the TermIt ontology in a separate context (RDF graph) in the repository, so that it can be updated automatically (#227).
Currently, some of the existing deployments have the ontology in the default context, which makes the automated updates difficult (additions are fine, removals would be hard). If the ontology were in a dedicated context, we could just replace the context completely.

Automatic update of ontology in repository

As a developer, I sometimes make changes to the TermIt ontology (occasionally, even changes to the popis dat (data description) ontology happen). These changes may influence the inference results or behavior of the application. As installations of TermIt are created that are not managed by the development team, there needs to be a mechanism of automatically updating these ontologies in the main application repository, so that when a new version of TermIt is deployed, the ontologies in the repository are up-to-date.

Return datetime values as ISO string in JSON

When using plain JSON, datetime values using Java 8 datetime API (Instant in particular) are serialized as decimal numbers by Jackson. Instead, they should be serialized as ISO 8601 strings. This will ensure, among other things, consistency with the representation in JSON-LD.

Provide REST API documentation per instance

Currently, the REST API documentation is maintained manually at SwaggerHub. However, this is quite inefficient for two reasons:

Manual maintenance in a separate place than the source code makes it often outdated,
Testing the API is difficult because different instances would require different versions on SwaggerHub.

Instead, the documentation should be a part of each deployment of TermIt so that it can be directly tested. Moreover, the documentation of the endpoints would be specified directly in code. Springdoc OpenAPI could be used for this purpose.

Support working with repository containing multiple copies of the same vocabulary

Follow-up to #163 and #164 - a repository may contain several copies of the same vocabulary, one is canonical, the other ones are working copies. Each user may open open of the working copies for editing.
TermIt has to be able to determine the correct context of the vocabulary and any other related vocabularies (vocabularies containing terms SKOS-related to the terms from the edited vocabulary).

Possibly problematic areas:

SKOS export (inferred skos:exactMatch and skos:relatedMatch - do not know if they are inferred based on statements in someone else's workspace)
- Solution: SKOS export in instance with possible workspaces will only contain asserted statements about skos:exactMatch and skos:relatedMatch

TODOs:

Harmonize code with current development head
Optimize retrieval of vocabulary repository contexts

Repeated annotation of large files is slow

When text analysis is invoked on an already annotated larger file (cca 1MB) containing many term occurrences, processing of its results can take minutes to finish. This makes it practically unusable, as the user is unsure whether it is normal that the application shows Please wait... for several minutes and may leave/attempt to refresh.

Analysis of repeated annotation of the metropolitan plan shows the following times:

Invocation of text analysis: 8.5s
Resolution of occurrences in the file: 47s
Saving occurrences: 5min 31s

The goal should be to get at least under a minute altogether, preferably even better.

Allow configuring types language

As a TermIt administrator, I want to be able to specify a file containing the definition of types users can use to classify terms.

Currently the types (based on UFO ontology) are loaded from a file that is packed into the application archive. This cause any changes to the types language to require rebuilding the project. Instead, it should be at least possible to specify the path to the language file as a parameter on startup, with the built-in file used as a default when no custom one is provided.

This is motivated by attempts to incorporate TermIt into the SGoV assembly line, which uses a different language to stereotype terms.

Replace aspects with Spring application events

Following migration to JOPA 2.0.0(-SNAPSHOT), AspectJ is no longer required to work with the object model. However, we are currently using Aspects to notify certain components of selected events. This prevents the removal of AspectJ Maven plugin from the build configuration.

We should replace the Spring aspects with application events and remove AspectJ altogether.

Improve build performance

As a developer, I want the TermIt build to be faster. The tests take too much time which slows the development down considerably (PRs, Jenkins build before deployment, local test build).