Comments (5)
Hi @LorenzBuehmann ,
I vaguely (it was a very long time ago!) recall this coming up before. A difference now is that the only site this affects is likely to be wikidata (and then, only for now).
Here is a MVCE:
public static void main(String...args) {
// U00E7
String qs = "SELECT ?x { BIND('Curaçao' As ?x) }";
String qsx = "SELECT ?x { BIND('Cura\\u00E7ao' As ?x) }";
RowSet rowSet = QueryExecHTTP
.service("https://query.wikidata.org/sparql")
//.sendMode(QuerySendMode.asPostForm)
//.sendMode(QuerySendMode.asPost)
.sendMode(QuerySendMode.asGetAlways)
.queryString(qs)
.select();
RowSetOps.out(rowSet);
}
After checking, the corruption is on the request receiving and qsx
works in all three cases.
The three different sendModes give three different results.
asGetAlways
worksasPost
is corrupted in a way that looks like UTF-8 read as ISO-8859-?asPostForm
is a different corruption, not sure what and that might be Jena.
I don't know why ISO-8859 is being used if their servers are Linux (system default). It hints it is a choice in the Blazegraph code.
from jena.
Hi @afs
Yes, as I expected a limitation on the Wikidata backend or at least their server setup. I was just confused by the different behaviour of Jena 4.1.0 vs the latest versions, and then I remembered that you changed the used HTTP API.
static private String getQueryString(final HttpServletRequest req)
throws IOException {
if (RESTServlet.hasMimeType(req, MIME_SPARQL_QUERY)) {
// return the body of the POST, see trac 711
return readFully(req.getReader());
}
return req.getParameter(ATTR_QUERY) != null ? req
.getParameter(ATTR_QUERY) : (String) req
.getAttribute(ATTR_QUERY);
}
Unfortunately they rely on the old HTTP API and the HttpServletRequest
sticks to ISO-8859-
by default if in the HTTP request no encoding is specified - and you can't change the default encoding afaik. The only fix would be to set the encoding on the request object, i.e.
req.setCharacterEncoding("UTF-8");
So not sure how to continue, we'll raise an issue on Blazegraph, but I don't think that fix will even make it to Wikidata setup as they would have to rebuild and redeploy Blazegraph.
Or they would set the default encoding in their Jetty server if possible.
Regarding POST Form, via curl
it works:
curl -X POST --data "query=SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql
For Jena I guess we can close this issue (once you got an idea on the POST form issue you mentioned) here and at least have it for reference and documentation as a known limitation. Might affect other users as well.
from jena.
@afs a follow up issue/question (we could also open another issue for better reference)
Wikidata people argued to use POST form because it works ...
We tried to set the SERVICE
request mode via Fuseki assembler config:
ja:context [ ja:cxtName "arq:httpServiceSendMode" ; ja:cxtValue "asGetWithLimitForm" ] ;
This indeed fails, as Context::get
tries to return an object of the expected type in Service::chooseQuerySendMode
method which in that case will be QuerySendMode
and indeed casting a String
to this type fails.
A quick fix would workaround the limitation and handle at least the two different types of the context value, i.e. i) String
coming from an assembler config or ii) a QuerySendMode
coming from maybe some Java API setup :
private static QuerySendMode chooseQuerySendMode(String serviceURL, Context context, QuerySendMode dftValue) {
if ( context == null )
return dftValue;
Object querySendMode = context.<Object>get(httpServiceSendMode, dftValue);
if (querySendMode instanceof String) { // handle string type from assembler config
return QuerySendMode.valueOf((String) querySendMode);
} else if (querySendMode instanceof QuerySendMode) { // handle enum type from Java API
return (QuerySendMode) querySendMode;
}
// handle null value and other non-supported types
return context.get(httpServiceSendMode, dftValue);
}
from jena.
Separate issue and PR please!
With error handling.
from jena.
We might as well put back the "charset=utf8". It didn't seem to cause problems.
I noticed another problem - GET and POST+form are not encoding as % characters outside printable ASCII.
Everything works, including Wikidata, but strictly it is wrong.
A fix is quite easy - PR #1269.
from jena.
Related Issues (20)
- Lookup script name "javascript"
- Fuseki WAR file fails to start HOT 1
- vite-plugin-istanbul 6.0.1+ not compatible with Fuseki UI build. HOT 1
- moving/renaming `:jena-ontapi` vocabularies.
- [Fuseki] Reloadable configuration files HOT 5
- `mvn clean install` fails HOT 8
- RDFXML: can't parse rdf:XMLLiteral HOT 17
- Spread out the days dependabot PRs are generated
- OntModel enhancements HOT 3
- Fuseki query endpoint stopped working on jena-fuseki-main upgrade HOT 6
- jena-benchmark-jmh module defunct HOT 10
- Remove <dependencyPath> from POMs where it is unnecessary
- Clean warnings in jena-ontapi
- trouble with Fuseki UI and one AJAX request (/$/stats) HOT 9
- Fuseki HttpAction: Check and improve the transaction lifecycle for HTTP action execution.
- Fuseki - setting the context path to a relative path causes 404s.
- Support for SPARQL CDTs (lists and maps as literals) HOT 1
- Using a reasoner to retrieve individuals beloning to a Class HOT 3
- How do you work with jena-fuseki-ui for local development? HOT 4
- Update jena-text to use Lucene in a Java21 compatible way.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jena.