Code Monkey home page Code Monkey logo

Comments (4)

gkellogg avatar gkellogg commented on June 18, 2024 1

The problem is exacerbated by the use of the Hamster hashing mechanism as part of the default Repository/Graph implementation, which never repeatably returns values entered into the hash; this was added because the former version was not functional-friendly, and led to problems when updating the repository. However, a Repository can be implemented using any backing store, as long as it implements the contract. Resurrecting a Hash-based Repository would be feasible, and this would maintain ordering.

On top of that, if the dataset is normalized using the rdf-normalize gem, it will always output quads in a sorted order. A trivial implementation might look like the following:

require 'rdf'
require 'rdf/turtle'
require 'rdf/normalize'

graph = RDF::Graph.new {|g| RDF::Turtle::Reader.open(file) { |r| g << r}}
statements = RDF::Normalize.new(graph_.statements).extend(RDF::Enumerable)
RDF::Turtle::Writer.dump(statements, STDOUT)

Of course, a gem which implemented an order-preserving Repository might be more generally useful, and would be fairly straightforward, but normalizing that graph is typically going to be important, unless you know that the source of the data is stable (e.g., re-serilaizing).

from rdf.

jcoyne avatar jcoyne commented on June 18, 2024

This is a problem that the IPLD community has been wrestling with because they want consistent content hashes. Here is their proposal for JSON: https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-json.md

I think another problem you are going to run into is namespace prefixes. The prefix doesn't have to be the same for two documents to be semantically identical.

from rdf.

berezovskyi avatar berezovskyi commented on June 18, 2024

Thanks @jcoyne, I am concerned about a consistent output from a single library on a single document. I know of the consistent hashing problem and it is a far more complex one than the one I am having.

More off-topic, NS prefixes can be normalised with https://vocab.org/vann/#preferredNamespacePrefix (as long as they do not clash within a single document). If two RDF documents need to be compared, https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/Graph.html#isIsomorphicWith-org.apache.jena.graph.Graph- or equivalent (did not find one in RDF.rb see http://rdf.greggkellogg.net/yard/RDF/Isomorphic.html#isomorphic_with%3F-instance_method) is the only true way to do it (esp. given that RDF documents may be serialised to N-Triples or RDF/XML etc.).

from rdf.

gkellogg avatar gkellogg commented on June 18, 2024

See https://rubygems.org/gems/rdf-ordered-repo, which uses native hashes which preserve insert order. It may not be as performant, although there are probably some improvements that could be made by better leveraging transactions.

from rdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.