Comments (4)
My work on an alternative to GraphMem is progressing slowly.
So I decided to contribute a few improvements to the existing GraphMem on the way.
Please review my PR.
from jena.
There's a straightforward way to improve the speed of GraphMem
for Graph#add
by nearly 50% and Graph#delete
by 10-35%. This change has almost no impact on any other operation. However, it uses an additional array for the hashCodes in HashCommon
. As a result, the memory consumption of the indexing structures (excluding the space for the triples) would increase by 31-51%.
I propose not to implement this for the deprecated GraphMem
, but instead to create a successor GraphMem2
with the following changes:
- Speed up
Graph#add
andGraph#delete
accepting the given memory cost - Remove support for
Iterator#remove
to reduce complexity
(other implementations likeDatasetGraphInMemory
don't support this either) - Switch to term-equality as proposed in issue#1867 mostly by:
- Removing calls to
org.apache.jena.graph.Node#sameValueAs
- Replacing usages of
org.apache.jena.graph.Node#getIndexingValue
with the Node itself
- Removing calls to
What do you say?
from jena.
Yes - term-quality.
A GraphMem2
sounds like a good plan; just the switch to term-equality means we need to migrate somehow. Jena5 is likely later this year.
No support Iterator#remove
is generally a good idea. It is at best an "optional extra" not a core part of iterators IMO.
The extra space usage - this I am in two minds about. Faster is better but so is having an storage that optimizes for space. It doesn't sound if it is so fundamental as to be irreversible.
from jena.
Thanks, that's the information I needed to proceed.
Adding and removing the array with the hashes is straightforward. It might also be quite simple to have two variants of the graph store. That way, users can choose between the fast variant or the low memory one.
Speaking of which:
There are two fields tracking changes for potential concurrent modifications:
org.apache.jena.mem.ArrayBunch#changes
org.apache.jena.mem.HashCommon#changes
The one in ArrayBunch
is volatile, but the one in HashCommon
is not. ArrayBunch
is totally fine with integer overflow, since it checks for equality when verifying:
if (changes != initialChanges) throw new ConcurrentModificationException();
HashCommon, however, wouldn't handle an integer overflow correctly:
if (changes > initialChanges) throw new ConcurrentModificationException();
The document header of java.util.HashMap
states:
` *
Note that the fail-fast behavior of an iterator cannot be guaranteed
- as it is, generally speaking, impossible to make any hard guarantees in the
- presence of unsynchronized concurrent modification. Fail-fast iterators
- throw {@code ConcurrentModificationException} on a best-effort basis.
- Therefore, it would be wrong to write a program that depended on this
- exception for its correctness: the fail-fast behavior of iterators
- should be used only to detect bugs.`
Tracking the changes here might be the safest possible way. However, HashCommon
should then also make changes
volatile and compare for equality rather than using a greater than comparison.
On the other hand, if we removed both fields and checked org.apache.jena.mem.HashCommon#size
and org.apache.jena.mem.ArrayBunch#size
instead, we would save memory (since there are many instances of both classes) and avoid incrementing a second variable on #add
and #remove
operations. The iterators should still help detect concurrent modifications in almost any scenario. Only if there are an equal number of #add
and #remove
calls, so that the total size of the graph is not affected, could there be undetected concurrent modifications while executing an iterator. This seems unlikely enough to me that it would prevent anyone from detecting a concurrency bug.
What is your opinion in this?
from jena.
Related Issues (20)
- Incorrect JoinClassifier results with unbound values.
- Fix broken Fuseki when using a context path in the URL
- Update the lexical space and value space of rdf:XMLLiteral to comply with RDF 1.1 HOT 2
- Remove commons-cli dependency from jena-core
- Update various @Deprecation to include "forRemoval"
- The CORS filter has references to Jetty code.
- Make jena-fuseki-core independent of Eclipse Jetty
- Lookup script name "javascript"
- Fuseki WAR file fails to start HOT 1
- vite-plugin-istanbul 6.0.1+ not compatible with Fuseki UI build. HOT 1
- moving/renaming `:jena-ontapi` vocabularies.
- [Fuseki] Reloadable configuration files HOT 5
- `mvn clean install` fails HOT 8
- RDFXML: can't parse rdf:XMLLiteral HOT 17
- Spread out the days dependabot PRs are generated
- OntModel enhancements HOT 3
- Fuseki query endpoint stopped working on jena-fuseki-main upgrade HOT 6
- jena-benchmark-jmh module defunct HOT 10
- Remove <dependencyPath> from POMs where it is unnecessary
- Clean warnings in jena-ontapi
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jena.