Comments (4)
The problem is exacerbated by the use of the Hamster hashing mechanism as part of the default Repository/Graph implementation, which never repeatably returns values entered into the hash; this was added because the former version was not functional-friendly, and led to problems when updating the repository. However, a Repository can be implemented using any backing store, as long as it implements the contract. Resurrecting a Hash-based Repository would be feasible, and this would maintain ordering.
On top of that, if the dataset is normalized using the rdf-normalize gem, it will always output quads in a sorted order. A trivial implementation might look like the following:
require 'rdf'
require 'rdf/turtle'
require 'rdf/normalize'
graph = RDF::Graph.new {|g| RDF::Turtle::Reader.open(file) { |r| g << r}}
statements = RDF::Normalize.new(graph_.statements).extend(RDF::Enumerable)
RDF::Turtle::Writer.dump(statements, STDOUT)
Of course, a gem which implemented an order-preserving Repository might be more generally useful, and would be fairly straightforward, but normalizing that graph is typically going to be important, unless you know that the source of the data is stable (e.g., re-serilaizing).
from rdf.
This is a problem that the IPLD community has been wrestling with because they want consistent content hashes. Here is their proposal for JSON: https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-json.md
I think another problem you are going to run into is namespace prefixes. The prefix doesn't have to be the same for two documents to be semantically identical.
from rdf.
Thanks @jcoyne, I am concerned about a consistent output from a single library on a single document. I know of the consistent hashing problem and it is a far more complex one than the one I am having.
More off-topic, NS prefixes can be normalised with https://vocab.org/vann/#preferredNamespacePrefix (as long as they do not clash within a single document). If two RDF documents need to be compared, https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/Graph.html#isIsomorphicWith-org.apache.jena.graph.Graph- or equivalent (did not find one in RDF.rb see http://rdf.greggkellogg.net/yard/RDF/Isomorphic.html#isomorphic_with%3F-instance_method) is the only true way to do it (esp. given that RDF documents may be serialised to N-Triples or RDF/XML etc.).
from rdf.
See https://rubygems.org/gems/rdf-ordered-repo, which uses native hashes which preserve insert order. It may not be as performant, although there are probably some improvements that could be made by better leveraging transactions.
from rdf.
Related Issues (20)
- RDF::NTriples::Reader does not `require 'strscan'` HOT 3
- Ruby 2.7.1, URI.decode is obsolete
- bad behaviour in RDF::List HOT 7
- blank nodes reuse memory addresses, causing problems for persistent stores HOT 3
- Is there a "prettier" HOT 1
- Is there a more 'elegant' way to do this? HOT 3
- Strange "file missing" error with 3.1.14 release HOT 6
- Comparing two repositories HOT 2
- Verbose warnings on 2.7+ Ruby HOT 2
- 3.2.1 removes `RDF::Repository::Implementation::SerializedTransaction` HOT 5
- Is it possible to change the standard prefixes? HOT 7
- docmentation: how to create new vocabs, and langage-tagged strings in vocabs HOT 3
- Weird `require 'enumerator'` issue caused by this gem HOT 5
- change from DateTime < Literal to DateTime < Temporal breaks downstream customization HOT 5
- Invalid argument @ rb_sysopen - (Errno:EINVAL) when using local files on Windows HOT 10
- RDF::Vocabulary.find on RDF vocab returns RDF class HOT 5
- Ruby regexp warning: character class has duplicated range HOT 10
- Errno::EINVAL using `each_statement` with a LARGE, local NTriples file HOT 1
- RDF::Query#graph_name does not behave has documented HOT 3
- Error when using local files on Windows HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rdf.