Code Monkey home page Code Monkey logo

p3-datatxt-stanbol's People

Contributors

gmega avatar mainini avatar retog avatar westei avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

p3-datatxt-stanbol's Issues

ClassCastException in the FiseTranslator

The trace

Caused by: java.lang.ClassCastException: org.apache.clerezza.rdf.core.UriRef cannot be cast to java.io.Serializable
    at eu.spaziodati.datatxt.stanbol.enhancer.engines.translators.FiseTranslator.translate(FiseTranslator.java:47)
    at eu.spaziodati.datatxt.stanbol.enhancer.engines.DatatxtNexEngine.computeEnhancements(DatatxtNexEngine.java:160)

This is cause by this logging in the following code snipet

public void translate(Pair<UriRef, MGraph> item, EnhancementEngine engine, String text, DatatxtResponse datatxtResponse) {
    LOG.info(String.format("DatatxtAnnotator: Enhance ContentItem with FISE Annotations: ContentItem=%s, " +
            "DatatxtResponse=%s", (item != null ? item.getKey() : item), GSON.toJson(datatxtResponse)));
    Language lang = datatxtResponse.lang != null ? new Language(datatxtResponse.lang) : null;

Not sure why this fries to cast the UriRef - probably item.getKey() to Serializable, but removing the logging definitely solves the issue. I guess it has something to do with the inline if and generic type removal.

In any case this logging is problematic as regardless of the LOG level GSON.toJson(datatxtResponse) is called - what I assume to be an expensive operation.

Suported Languages

When sending a text with an unsupported Language the engine currently fails with a

Caused by: eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtException: Unmanaged language'es'
    at eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtClient.performRequest(DatatxtClient.java:186)

This is not what a Stanbol Enhancement Engine is supposed to do. Instead an engine should check for supported languages within the canEnhance(..) method and refuse to enhance content with unsupported language. But what it MUST NOT do is to fail in the computeEnhancement(..) method because it accepted a request for an unsupported language

Their are the following possibilities to solve this:

  1. having a Service where dataTXT returns supported languages. Use this list to implement canEnhance(..) so that contents with unsupported languages are refused
  2. Same as (1) but with a hard coded list of supported languages. Based on the documentation at [1] dataTXT supports de | en | fr | it | pt
  3. checking the error code of the response. If the error is for a unsupported language throw a special exception (e.g. a UnmanagedLanguageException). This exception can than be cached by the Engine and be silently ignored. In this case canEnhance(..) would accept content in any language, but the engine would not "crash" the enhancement chain in case of an unsupported one.

As I do not see a service that can be used for (1) and I do not want to require code changes for the engine if dataTXT adds additional lanugages i will go for option (3) to fix this issue.

However I would strongly prefer a solution that can already decline requests in the canEnhance(..) as this would avoid calls to the dataTXT service.

[1] https://dandelion.eu/docs/api/datatxt/nex/v1/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.