Code Monkey home page Code Monkey logo

Comments (5)

afs avatar afs commented on June 14, 2024

Hi @LorenzBuehmann ,

I vaguely (it was a very long time ago!) recall this coming up before. A difference now is that the only site this affects is likely to be wikidata (and then, only for now).

Here is a MVCE:

    public static void main(String...args) {
        // U00E7
        String qs = "SELECT ?x { BIND('Curaçao' As ?x) }";
        String qsx = "SELECT ?x { BIND('Cura\\u00E7ao' As ?x) }";

        RowSet rowSet = QueryExecHTTP
                .service("https://query.wikidata.org/sparql")
                //.sendMode(QuerySendMode.asPostForm)
                //.sendMode(QuerySendMode.asPost)
                .sendMode(QuerySendMode.asGetAlways)
                .queryString(qs)
                .select();
        RowSetOps.out(rowSet);
    }

After checking, the corruption is on the request receiving and qsx works in all three cases.

The three different sendModes give three different results.

  • asGetAlways works
  • asPost is corrupted in a way that looks like UTF-8 read as ISO-8859-?
  • asPostForm is a different corruption, not sure what and that might be Jena.

I don't know why ISO-8859 is being used if their servers are Linux (system default). It hints it is a choice in the Blazegraph code.

from jena.

LorenzBuehmann avatar LorenzBuehmann commented on June 14, 2024

Hi @afs

Yes, as I expected a limitation on the Wikidata backend or at least their server setup. I was just confused by the different behaviour of Jena 4.1.0 vs the latest versions, and then I remembered that you changed the used HTTP API.

See https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java#L921

 static private String getQueryString(final HttpServletRequest req)
            throws IOException {
        if (RESTServlet.hasMimeType(req, MIME_SPARQL_QUERY)) {
            // return the body of the POST, see trac 711
            return readFully(req.getReader());
        }
        return req.getParameter(ATTR_QUERY) != null ? req
                .getParameter(ATTR_QUERY) : (String) req
                .getAttribute(ATTR_QUERY);
    }

Unfortunately they rely on the old HTTP API and the HttpServletRequest sticks to ISO-8859- by default if in the HTTP request no encoding is specified - and you can't change the default encoding afaik. The only fix would be to set the encoding on the request object, i.e.

req.setCharacterEncoding("UTF-8");

So not sure how to continue, we'll raise an issue on Blazegraph, but I don't think that fix will even make it to Wikidata setup as they would have to rebuild and redeploy Blazegraph.

Or they would set the default encoding in their Jetty server if possible.


Regarding POST Form, via curl it works:

curl -X POST --data "query=SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql

For Jena I guess we can close this issue (once you got an idea on the POST form issue you mentioned) here and at least have it for reference and documentation as a known limitation. Might affect other users as well.

from jena.

LorenzBuehmann avatar LorenzBuehmann commented on June 14, 2024

@afs a follow up issue/question (we could also open another issue for better reference)

Wikidata people argued to use POST form because it works ...

We tried to set the SERVICE request mode via Fuseki assembler config:

ja:context [ ja:cxtName "arq:httpServiceSendMode" ;  ja:cxtValue "asGetWithLimitForm" ] ;

This indeed fails, as Context::get tries to return an object of the expected type in Service::chooseQuerySendMode method which in that case will be QuerySendMode and indeed casting a String to this type fails.

A quick fix would workaround the limitation and handle at least the two different types of the context value, i.e. i) String coming from an assembler config or ii) a QuerySendMode coming from maybe some Java API setup :

private static QuerySendMode chooseQuerySendMode(String serviceURL, Context context, QuerySendMode dftValue) {
        if ( context == null )
            return dftValue;
        Object querySendMode = context.<Object>get(httpServiceSendMode, dftValue);
        if (querySendMode instanceof String) { // handle string type from assembler config
            return QuerySendMode.valueOf((String) querySendMode);
        } else if (querySendMode instanceof QuerySendMode) { // handle enum type from Java API
            return (QuerySendMode) querySendMode;
        }
        // handle null value and other non-supported types
        return context.get(httpServiceSendMode, dftValue);
    }

from jena.

afs avatar afs commented on June 14, 2024

Separate issue and PR please!

With error handling.

from jena.

afs avatar afs commented on June 14, 2024

We might as well put back the "charset=utf8". It didn't seem to cause problems.

I noticed another problem - GET and POST+form are not encoding as % characters outside printable ASCII.
Everything works, including Wikidata, but strictly it is wrong.

A fix is quite easy - PR #1269.

from jena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.