Comments (6)
Why are these StreamRDF... classes in ...riot/system/ and not in .../riot/writer/stream/?
Different meanings of "stream".
org.apache.jena.riot.system.stream is in support of the stream manager - that is IO streams.
org.apache.jena.system includes StreamRDF - a stream of triples./quads/prefixes.
from jena.
RIOT has its own tokenizer and parsers - the combination is x2 to x4 faster. The tokenizer is the performance bottleneck.
The fastest parsers in Jena run at up to 1m triples/second on binary RDF Thrift. RDF PRotobuf is slightly less than 10% slower (making protobuf work for open ended streams of input seems to create an extra object and at 1microsecond a triple this is observable).
The performance of Turtle and N-triples etc is approximately 240 kTPS and 400 kTPS. The only difference is the grammar parser being much simpler than all the "if"s for Turtle.
All these are a minimum of x4 faster than Javacc.
All parsing performance is sensitive to the hardware used. So these figures are relative. (they are on a old core-I5 with SATA SSD as has been used consistently for measurements over time.)
Java has to convert to Java chars at some point which is a copy. In fact, it is faster to convert large buffers using Java built-in UTF-8 handling than to try to do one less copy but of each RDF term. Java checks all input for validity of UTF-8.
If you'd like to improve the tokenizer and provide a PR, then would be great.
from jena.
@AtesComp Could you provide a test case to illustrate the issue with StreamRDFWriter.getWriterStream
? There is a lot of rdf-transform
that may be influencing issue.
from jena.
Thanks, Andy. Of course, I should have wrote .../riot/io/stream
, or just .../riot/stream
, instead of .../riot/writer/stream
. I was just locked onto writing out RDF export files. The system
directory just didn't click with me.
As for the real issue, I'm noting that the OpenRefine 3.5.2 version uses an older 3.x Jena ARQ that is squashing my dependency on the 4.5.0 Jena ARQ Maven release. I looked at many ways to force it to use the newer jars but to no avail. Since my code is just a lowly red-headed stepchild extension to OpenRefine, I don't have much say in the matter. I'm fairly sure the getWriterStream
issue is due to the older jar.
However, the OpenRefine 3.6-SNAPSHOT is up-to-date! So, now, if I can just get them to make an official release, all will be good. Well, mostly. The Jena documentation needs updating. I can live with in for now.
from jena.
So this issue "StreamRDFWriter getWriterStream()" can be closed?
from jena.
from jena.
Related Issues (20)
- Support for multi-variable join keys
- improve arq command line documentation
- Incorrect JoinClassifier results with unbound values.
- Fix broken Fuseki when using a context path in the URL
- Update the lexical space and value space of rdf:XMLLiteral to comply with RDF 1.1 HOT 2
- Remove commons-cli dependency from jena-core
- Update various @Deprecation to include "forRemoval"
- The CORS filter has references to Jetty code.
- Make jena-fuseki-core independent of Eclipse Jetty
- Lookup script name "javascript"
- Fuseki WAR file fails to start HOT 1
- vite-plugin-istanbul 6.0.1+ not compatible with Fuseki UI build. HOT 1
- moving/renaming `:jena-ontapi` vocabularies.
- [Fuseki] Reloadable configuration files HOT 5
- `mvn clean install` fails HOT 8
- RDFXML: can't parse rdf:XMLLiteral HOT 17
- Spread out the days dependabot PRs are generated
- OntModel enhancements HOT 3
- Fuseki query endpoint stopped working on jena-fuseki-main upgrade HOT 6
- jena-benchmark-jmh module defunct HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jena.