Code Monkey home page Code Monkey logo

oxigraph's Introduction

Oxigraph

Latest Version Released API docs PyPI npm tests status artifacts status dependency status Gitter Twitter URL

Oxigraph is a graph database implementing the SPARQL standard.

Its goal is to provide a compliant, safe, and fast graph database based on the RocksDB key-value store. It is written in Rust. It also provides a set of utility functions for reading, writing, and processing RDF files.

Oxigraph is in heavy development and SPARQL query evaluation has not been optimized yet. The development roadmap is using GitHub milestones. Oxigraph internal design is described on the wiki.

Oxigraph implements the following specifications:

It is split into multiple parts:

Also, some parts of Oxigraph are available as standalone Rust crates:

  • oxrdf, datastructures encoding RDF basic concepts (the oxigraph::model module).
  • oxrdfio, a unified parser and serializer API for RDF formats (the oxigraph::io module). It itself relies on:
    • oxttl, N-Triple, N-Quad, Turtle, TriG and N3 parsing and serialization.
    • oxrdfxml, RDF/XML parsing and serialization.
  • spargebra, a SPARQL parser.
  • sparesults, parsers and serializers for SPARQL result formats.
  • sparopt, a SPARQL optimizer.
  • oxsdatatypes, an implementation of some XML Schema datatypes.

The library layers in Oxigraph. The elements above depend on the elements below: Oxigraph libraries architecture diagram

A preliminary benchmark is provided. There is also a document describing Oxigraph technical architecture.

When cloning this codebase, don't forget to clone the submodules using git clone --recursive https://github.com/oxigraph/oxigraph.git to clone the repository including submodules or git submodule update --init to add the submodules to the already cloned repository.

Help

Feel free to use GitHub discussions or the Gitter chat to ask questions or talk about Oxigraph. Bug reports are also very welcome.

If you need advanced support or are willing to pay to get some extra features, feel free to reach out to Tpt.

License

This project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Oxigraph by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Sponsors

And others. Many thanks to them!

oxigraph's People

Contributors

benediktseidl avatar danbri avatar dependabot-preview[bot] avatar dependabot[bot] avatar dwhitney avatar edmondchuc avatar etiennept avatar hobofan avatar jeremiahpslewis avatar jeswr avatar maxlath avatar nyurik avatar pchampin avatar theduke avatar tpltnt avatar tpt avatar vemonet avatar vtermanis avatar yamdan avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oxigraph's Issues

Support JSON-LD as a data ingest format

Currently, users can POST data into the database using a variety of formats. It would be great to support JSON-LD.

I'm interested in implementing this feature. I'm not sure where the best place for the core parser library would be: rio or sophia. Sophia has started some work on an initial JSON-LD serializer in the sophia_jsonld crate.

See also: pchampin/sophia_rs#16

Consider having async stores and SPARQL evaluation

It would be nice to allow stores and SPARQL evaluation to be asynchronous. I see three use cases:

  • Implementing a web-compatible store based on IndexDB that has only an async API.
  • Allow distributed storage with system like TiKV.
  • Allow HTTP async clients inside SPARQL query (useful to avoid blocking the evaluation thread or to provide a web implementation using fetch.

Oxigraph server - authentication support

Hi, fantastic project!

It would be awesome to have authentication (and perhaps authorisation) support for the SPARQL and graph store endpoints in the Oxigraph server. For example, you can have the store as a public facing read-only SPARQL endpoint while the SPARQL update endpoint is protected with basic auth.

I would love to contribute to this project. I don't have any experience in Rust (but this project has inspired me to learn it). I mainly work in Python.

Some options to implement basic auth:

  • NGINX (easiest but probably not user-friendly in terms of deployment)
  • Bundle a Python Flask/FastAPI proxy with a Next.js frontend in the Docker image of Oxigraph to handle and manage basic auth
    • I can help with this, and implement role-based access control and create an admin dashboard similar to RDF4J and GraphDB
  • Implement auth in Oxigraph server (in Rust) and use a Next.js frontend for the admin dashboard.

The chosen option will obviously affect future feature implementations such as authorisation.

If we go with the latter two options, we can use a file-based JSON (or RDF) store. This makes it very easy to deploy and modify things for sys admins (following GraphDB's approach).

Let me know your thoughts.

Distribute on crates.io

I understand that this is a work in progress project, but I think the earlier people can experiment, the better! Maybe just release it with a less than 1.0 version.

Also, it would put the documentation on Docs.rs, which could be useful. I think a release could also attract more contributors.

Precompute some statistics about the graph

I first just want to say thanks for your work! I've now loaded a dataset with 36 million triples into Oxigraph:
image

I was wondering if Oxigraph could (perhaps while loading) keep some statistics about the number of triples in the entire graph and each named graph.

This might help queries like the one shown return faster.

problem when compiling master branch of server

following the instructions for compiling the server I get the following error

` Compiling parking v2.0.0
error[E0658]: use of unstable library feature 'checked_duration_since'
--> C:\Users\Pedro.cargo\registry\src\github.com-1ecc6299db9ec823\parking-2.0.0\src\lib.rs:144:32
|
144 | .park(Some(instant.saturating_duration_since(Instant::now())))
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: for more information, see rust-lang/rust#58402

error: aborting due to previous error

For more information about this error, try rustc --explain E0658.
error: Could not compile parking.
warning: build failed, waiting for other jobs to finish...
error: build failed

`

Any suggestions on resolution?

Allow arbitrary blank node identifiers

Currently all blank node identifiers in Oxigraph are 128bits randomly generated numbers.

It would be nice to allow insertion method to use their own identifiers to be similar to other RDF libraries like RDF/JS or RDFLib.

The blank node identifiers generated by Oxigraph or the parser would still be 128bits random ids.

Quads inserted to SledStore using `transaction()` seems to not add the named graph to the store

Hello! I've been playing with oxigraph and it's a lot of fun, but I've run into an issue with getting all the graphs in the system with store.named_graphs().

When I insert a quad into a SledStore using insert(), and then check the named graphs with named_graphs() I get the newly added named graph.

However, if I use the transaction() function and perform the same operation, I then do not see the newly added graph in named_graph().

It's possible I'm doing something wrong, and I'm happy to learn!

For an example, here are two tests that show these cases:

use oxigraph::SledStore as Store;
use oxigraph::model::{NamedOrBlankNode, Quad, NamedNode};
use oxigraph::store::sled::{SledConflictableTransactionError, SledTransaction};
use std::convert::Infallible;

#[test]
fn test_regular_insert() {
    let store = Store::new().expect("Couldn't make Store");

    // Insert a triple into named graph `<www.example.com>`
    let _ = store.insert(Quad::new(
        NamedNode::new_unchecked("www.example.com/A"),
        oxigraph::model::vocab::rdf::TYPE,
        NamedNode::new_unchecked("www.example.com/B"),
        NamedNode::new_unchecked("www.example.com")
    ).as_ref());

    let q: Vec<_> = store.iter().filter_map(|q| q.ok()).collect();
    println!("{:?}", q);
    assert_eq!(1, q.len());
    let g: Vec<_> = store.named_graphs().filter_map(|g| g.ok()).collect();
    println!("{:?}", g);
    assert_eq!(1, g.len());
    assert_eq!(vec![NamedOrBlankNode::NamedNode(NamedNode::new_unchecked("www.example.com"))], g);
}

#[test]
fn test_transaction_insert() {
    let store = Store::new().expect("Couldn't make Store");

    // Insert a triple into named graph `<www.example.com>`
    let _ = store.transaction(|transaction: SledTransaction| {
        let _ = transaction.insert(Quad::new(
            NamedNode::new_unchecked("www.example.com/A"),
            oxigraph::model::vocab::rdf::TYPE,
            NamedNode::new_unchecked("www.example.com/B"),
            NamedNode::new_unchecked("www.example.com")
        ).as_ref());

        Ok(()) as Result<(), SledConflictableTransactionError<Infallible>>
    });

    let q: Vec<_> = store.iter().filter_map(|q| q.ok()).collect();
    println!("{:?}", q);
    assert_eq!(1, q.len());
    let g: Vec<_> = store.named_graphs().filter_map(|g| g.ok()).collect();
    println!("{:?}", g);
    // These asserts fail
    assert_eq!(1, g.len());
    assert_eq!(vec![NamedOrBlankNode::NamedNode(NamedNode::new_unchecked("www.example.com"))], g);
}

Thanks for the help!

A better name?

Rudf is not a very nice name. It might be better to pick a new one.

I have no good idea yet. I was maybe thinking about FastR (pronounced faster, acronym-like for fast relations or fast RDF) but it sounds a lot like a library for the R programming language.

@dwhitney Do you have any idea/opinion about it? It would be much welcome

Why rocksdb?

Rocksdb is a memory hog. Cdb and dgraph started with rocksdb but when they found memory and other issues they replaced rocksdb with their own implementation of key value store.

Add default Content Type to respond to queries

When I ran a SPARQL query without an Accept header, it didn't give a response. I ran the following commands to connect to a 0.2.2 Oxigraph instance running in docker:

$ curl -X POST -H 'Content-Type:application/sparql-query' --data 'SELECT * WHERE { ?s ?p ?o } LIMIT 10' http://localhost:7878/query
No suitable ContentEncoding found
$ curl -X POST -H 'Content-Type:application/sparql-query' -H 'Accept: application/sparql-results+json' --data 'SELECT * WHERE { ?s ?p ?o } LIMIT 10' http://localhost:7878/query
{"head":{"vars":["o","p","s"]},"results":{"bindings":[]}}

The first query is actually the first bullet point in the Usage section of the server README, and the second query is the "Make a query" command under the "Run the web server section".

I think we should provide a default serialization if the user doesn't specify an Accept header. If you want to keep the behavior that it errors without one, it would be great to update that section of docs.

Also, when I made the request with the Accept header set to text/turtle, I get the same error.

Thanks for all your work on this project!

docker image

Dear Thomas,

To make it easier for people unfamiliar with the project dependencies to try oxigraph, I wrote a Dockerfile to generate images, and published those as maxlath/oxigraph and maxlath/oxigraph-wikibase.

If you're interested, I could give ownership of the repo to the oxigraph organization and republish the images as oxigraph/oxigraph and oxigraph/oxigraph-wikibase, what do you think?

Au plaisir

Add String garbage collector

Currently, big strings are stored in a special key-value group to avoid overloading the indexes. However, strings are not removed even if all triples using them are removed. We should remove strings that are not used anymore.

Bug in use of prefixes in SPARQL queries

I found a bug when using prefixes in queries - essentially if you use prefixes, the query doesn't find anything. I've created a branch called prefix-bug in my fork that is current with master, but adds a test to illustrated the bug. It uses the two queries below, which should produce the same results, but the test fails because of the bug.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
  { 
    ?s <foaf:name> ?name .
    ?s <foaf:mbox> ?mbox .
  }
SELECT ?name ?mbox
WHERE
  { 
    ?s <http://xmlns.com/foaf/0.1/name> ?name .
    ?s <http://xmlns.com/foaf/0.1/mbox> ?mbox .
  }

flexible alternatives to load_graph and load_dataset

Here is the current type signature for load_graph in MemoryStore. It is mirrored in RocksDbStore and SledStore. The type signature for load_dataset is similar.

Please consider this API as it may be a better fit:

pub fn alternate_load_graph<'a, E>(
    &self,
    triples: impl IntoIterator<Item = Result<Triple, E>>,
    to_graph_name: impl Into<GraphNameRef<'a>>,
) -> Result<(), E>;

A few upsides to the suggested api:

  • It should make the implementation simpler.
  • Stores no longer need to care about parsing. (separation of concerns)
  • Scratches a personal itch of mine (I need to reify tuples before inserting them into the graph so I can't directly pass the serialized document.)
  • Allows use of arbitrary graph serialization formats, not just GraphFormat::NTriples, GraphFormat::Turtle and GraphFormat::RdfXml

Consider using a faster hash function

We currently hash IRIs and strings with MD5. We do not need a weak cryptographic hash, migrating to something like SipHash might get us some performance improvements.

It would have also be nice to migrate to 64bits ids instead of 128bits hashes but it would require synchronization around the id generator to avoid generating two ids for the same string. So, I am not sure it's worth it.

SERVICE clauses

Hey, I've been playing with your library the past couple of days. It's really great - thanks for the work!

I'd like to use SERVICE clauses to fetch remote data. I realize that this is a large architectural change. Do you have any thoughts on what you'd like to do to make this work? I've been programming Rust for about a week, so I'm a bit out of my element, but I'm an experienced programmer. If you want to outline it, I could take a pass at implementing it.

Thanks!

Think about caching queries

The query from issue #94 takes a while (18 seconds) to run on 36M triples which take up 5.7 GB on disk. When I run it again, it takes a similar time (17 seconds).

Adding an option to cache query responses might help. Some other options with this:

  • Using the parsed version of the query to create the cache key
  • Possibly caching intermediate steps (e.g. a small set of frequently accessed triples)

Of course, caching adds a whole level of complexity and creates issues with invalidation and the consistency of the system. But it is probably good to think about for the future!

Add `union-default-graph` parameter

The following issue occurs on Oxigraph 0.2.2, with a new empty data directory, running in Docker using the command:

docker run --init --rm -d -v /mnt/data/oxi2:/data -p 8878:8878 oxigraph/oxigraph -f /data -b 0.0.0.0:8878

Here's what happened:

$ http POST http://localhost:8878/store 'Content-Type:application/n-triples' < ./example.nt
HTTP/1.1 201 Created
content-length: 0
date: Fri, 09 Apr 2021 04:03:15 GMT
location: http://localhost:8878/store/6368cd9e9885cb8fb9e127437a3f61af
server: Oxigraph/0.2.2



$ http GET http://localhost:8878/store/6368cd9e9885cb8fb9e127437a3f61af 'Accept: application/n-triples'
HTTP/1.1 200 OK
content-length: 723
content-type: application/n-triples
date: Fri, 09 Apr 2021 04:03:27 GMT
server: Oxigraph/0.2.2

<http://example.org/#spiderman> <http://example.org/text> "This is a multi-line\nliteral with many quotes (\"\"\"\"\")\nand two apostrophes ('')." .
<http://en.wikipedia.org/wiki/Helium> <http://example.org/elements/specificGravity> "0.0001663"^^<http://www.w3.org/2001/XMLSchema#double> .
<http://en.wikipedia.org/wiki/Helium> <http://example.org/elements/atomicNumber> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org/show/218> <http://example.org/show/localName> "That Seventies Show"@en .
<http://example.org/show/218> <http://example.org/show/localName> "Cette Série des Années Septante"@fr-be .
<http://example.org/show/218> <http://www.w3.org/2000/01/rdf-schema#label> "That Seventies Show" .


$ echo 'SELECT * WHERE { ?s ?p ?o } LIMIT 10' | http -v POST http://localhost:8878/query 'Accept: application/sparql-results+json' 'Content-Type: application/sparql-query'
POST /query HTTP/1.1
Accept: application/sparql-results+json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 37
Content-Type: application/sparql-query
Host: localhost:8878
User-Agent: HTTPie/2.3.0

SELECT * WHERE { ?s ?p ?o } LIMIT 10


HTTP/1.1 200 OK
content-length: 57
content-type: application/sparql-results+json
date: Fri, 09 Apr 2021 04:03:38 GMT
server: Oxigraph/0.2.2

{
    "head": {
        "vars": [
            "o",
            "p",
            "s"
        ]
    },
    "results": {
        "bindings": []
    }
}

I started out with several named graphs and although I could get their names with the query below, I wasn't able to get any triples from any of them, so I then tried to create the minimum reproducible example.

SELECT DISTINCT ?g 
WHERE { 
  GRAPH ?g {?s a ?o}
} 

Hopefully you can reproduce this, or point out if I'm making any stupid mistakes. Let me know if I can provide any more information. Thanks!

WebAssembly issues

There were two issues when making this work with WebAssembly

  1. UUID - the uuid library requires that you enable the wasm-bindgen (which I'm using) or the stdweb features to work with WebAssembly. I'm not sure if I can do this with my own Cargo.toml file. It may be possible but I could not figure out how, so I branched master and made the update myself.

  2. chrono/time aren't supported in WebAssembly. I had to remove chrono from the application entirely because the Utc::now function doesn't work. This was a fairly simple to do because it's only used in two places, but also they are necessary, so some solution is needed instead. When working with dates in WebAssembly, you cause use js_sys::Date::now.

Add a transaction system

Oxigraph should expose a transaction system allowing to get ACID properties of operations consisting of multiples triples additions and/or removals. RocksDB supports transactions so it should not be too hard to implement.

Is something like Blazegraph GAS searches feasible in the future?

So this is more to satisfy my own curiosity by asking people I am sure understand this far better than I.

First, I'm really enjoying tinkering with Oxigraph in my workflow. I currently use Blazegraph, but on the arrival of text search, which I see as an active goal, I'd be ready to drop this in for it.

I do, however, also use these queries: https://github.com/blazegraph/database/wiki/RDF_GAS_API

I'm curious, between the pull request for external functions and, perhaps more so, the talk of RDF* support...
Would similar queries potentially be possible at some point?

I've only started to read about RDF*, but I wonder if it alone would enable something similar to these GAS queries to be done.

Thanks for any time and perspective on this..

Doug

(this likely would have been better for the "discussion" section, but doesn't look like you use them)

language tagged literals may be invalid

No check is performed on the language tag passed to Literal::new_language_tagged_literal. It may be non compliant with BCP47, making the literal invalid per the RDF spec.

You might want to use https://crates.io/crates/language-tag.

Note that this would change the API, as Literal::new_language_tagged_literal should now return a Result. In that case, an unchecked version of this function would also be nice.

Optimize in-memory storage

The current storage uses a RwLock<HashMap> internally. This means that all read are forbidden when a write is operating, leading to important contention in concurrent use cases. It would be nice to migrate to something better like evmap.

`oxigraph::Error` does not implement `std::error::Error`

Everything is in the title...

This hinders interoperability with other libraries.

I was about to propose a PR, but then I realized that oxigraph::Error is just a pub use of failure::Error so I could not just add an impl std::error::Error for Error block... :-/

SPARQL parser speed should be improved

The query parsing seems to take a significant amount of the time spent executing small queries according to some quick profiling. It seems to be caused by some naively written rules like:

AdditiveExpression -> Expression =
    a:MultiplicativeExpression _ s: $('+' / '-') _ b:AdditiveExpression {  ... } / MultiplicativeExpression

If there is no + or - in the expression, the left side argument is parsed twice. Combine with the other rules it leads to an exponential number of redundent terminal rules evaluation.

At the same time we might consider migrating to the latest version of peg or migrate to an other library like nom

FInd a good internal representation for blank node

Currently blank node ids are stored as a string and allocated by an incrementing counter. It is not very nice for multiple reasons:

  1. String encoding of blank node ids may be quite space consuming. Especially if we read files that uses a lot of big ids.
  2. We allocate new blank node with an incrementing counter. It does not look if a blank node with the same id is already in an existing graph.

A possible idea would be to use UUIDs as blank node ids and store them in the new u128 Rust datatype. It would allow a simple and distributed creation of new blank node ids.

The SPARQL `query` should override the `options`, not the other way around

The documentation of QueryOption states:

[the values set by with_default_graph and with_named_graph] override the FROM and FROM_NAMED elements of the evaluated query.

I think is wrong, as it breaks the general expectation that a local configuration should have precedence over a global configuration. In my view, the query string is "more local" than the QueryOptions. I expect that the latter could be set once and for all and reused for several queries in a given application or module.

Oxigraph server Compatibility with `SPARQLWrapper`

I've setup a entrypoint with the webserver and tried to connect via SPARQLWrapper as follows:

from SPARQLWrapper import SPARQLWrapper, JSON
endpoint = 'http://localhost:7878/query'
sparql = SPARQLWrapper(endpoint)
sparql.setTimeout(300)
#sparql.setReturnFormat(JSON)
sparql.setQuery('SELECT * WHERE { ?s ?p ?o} LIMIT 2')
results = sparql.query()
print(results)

But the server returns an error:

QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed. 

Response:
b'Unexpected parameter: format'

Rewrite the query evaluator

The current query evaluator is pull-base and quite naive.
We could maybe explore query compilation. This SIGMOD'18 paper seems quite interesting on this topic.
If we do that, we would have at least a simple Rust interpreter to support WebAssembly and simple installation and maybe an optional JIT/compiler based on cranelift or LLVM.

Add full text search support to Oxigraph

It would be nice to have some kind of full text search support in Oxigraph.
On the query level, it should probably be exposed as a special SERVICE to avoid having to tweak SPARQL grammar similarly to blazegraph.

On the implementation side, I would be more in inclined to write our own search system using existing components like the Snowball stemmer in order to keep having a single storage system (Sled/RocksDB). It would make things like transactions much simpler to implement.

Experiment with a Sled-based storage

Sled is a key-value store written in Rust that seems to offer promising performances. It would be nice to experiment using it as an alternative to RocksDB to allow building Oxigraph without a dependency on a C++ compiler.

Build of Oxigraph for JS is failing

The build of Oxigraph for JS has started to fail in February 19th. I am not sure what is the cause of this problem, both failing and passing builds are using Rust 1.50 stable and wasm-pack 1.9.1.

I have reported this issue to wasm-pack .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.