Quadstore is a LevelDB-backed RDF graph database for Node.js and the browser with native support for quads and querying across named graphs, RDF/JS interfaces and SPARQL queries.
- Introduction
- Status
- Usage
- Storage
- Data model and return Values
- Quadstore class
- Custom indexes
- Quadstore.prototype.open
- Quadstore.prototype.close
- Quadstore.prototype.get
- Range matching
- Quadstore.prototype.put
- Quadstore.prototype.multiPut
- Quadstore.prototype.del
- Quadstore.prototype.multiDel
- Quadstore.prototype.patch
- Quadstore.prototype.multiPatch
- Quadstore.prototype.getStream
- Quadstore.prototype.putStream
- Quadstore.prototype.delStream
- Quadstore.prototype.sparql
- Quadstore.prototype.sparqlStream
- Quadstore.prototype.match
- Quadstore.prototype.import
- Quadstore.prototype.remove
- Quadstore.prototype.removeMatches
- Blank nodes and quad scoping
- Browser usage
- Performance
- License
In the context of knowledge representation, a statement can often be
represented as a 3-dimensional (subject, predicate, object)
tuple,
normally referred to as a triple
.
subject predicate object
BOB KNOWS ALICE
BOB KNOWS PAUL
A set of statements / triples can also be thought of as a graph:
┌────────┐
KNOWS (predicate) │ ALICE │
┌─────────────────────────────────▶│(object)│
│ └────────┘
┌─────────┐
│ BOB │
│(subject)│
└─────────┘ ┌────────┐
│ │ PAUL │
└─────────────────────────────────▶│(object)│
KNOWS (predicate) └────────┘
A quad
is a triple with an additional term, usually called graph
or
context
.
(subject, predicate, object, graph)
On a semantic level, the graph
term identifies the graph to which a triple
belongs. Each identifier can then be used as the subject
or object
of
additional triples, facilitating the representation of metadata such as
provenance and temporal validity.
subject predicate object graph
BOB KNOWS ALICE GRAPH-1
BOB KNOWS PAUL GRAPH-2
GRAPH-1 SOURCE FACEBOOK
GRAPH-2 SOURCE LINKEDIN
Quadstore heavily borrows from LevelGraph's approach to storing tuples, maintaining multiple indexes each of which deals with a different permutation of quad terms. In that sense, Quadstore is an alternative to LevelGraph that strikes a different compromise between expressiveness and performance, opting to natively supporting quads while working towards minimizing the performance penalty that comes with the fourth term.
Active, under development.
See CHANGELOG.md.
Current version(s): version 8.0.0
available on NPM under the tag latest
.
We're currently working on the following features:
- expanding support for SPARQL queries;
- general performance improvements.
We're also evaluating the following features for future developments:
- RDF* (see also these slides)
- uses Semantic Versioning, pre-releases are tagged accordingly;
- the
production
branch mirrors what is available under thelatest
tag on NPM; - the
master
branch is the active, development branch; - requires Node.js >= 10.0.0.
quadstore
can work with any storage backend that implements the
AbstractLevelDOWN interface. An incomplete list of available backends
is available at level/awesome#stores.
Our test suite focuses on the following backends:
leveldown
for persistent storage using LevelDBrocksdb
for persistent storage using RocksDBmemdown
for volatile in-memory storage using red-black trees
Except for those related to the RDF/JS stream interfaces, quadstore
's
API is promise-based and all methods return objects that include both the actual
query results and the relevant metadata.
Objects returned by quadstore
's APIs have the type
property set to one of
the following values:
"VOID"
- when there's no data returned by the database, such as with theput
method orINSERT DATA
SPARQL queries;"QUADS"
- when a query returns a collection of quads;"BOOLEAN"
- when a query returns a boolean result;"BINDINGS"
- when a query returns a collection of bindings;"APPROXIMATE_SIZE"
- when a query returns an approximate count of how many matching items are present.
For those methods that return objects with the type
property set to either
"QUADS"
or "BINDINGS"
, quadstore
provides query results either in streaming
mode or in non-streaming mode.
Streaming methods such as getStream
and searchStream
return objects with
the iterator
property set to an instance of AsyncIterator
,
an implementation of a subset of the stream.Readable
interface. This instance
emits either quads or bindings, depending on the value of the type
property.
Non-streaming methods such as get
and search
return objects with the
items
property set to an array of either quads or bindings, depending on the
value of the type
property.
Quads are returned as and expected to be instances of the
RDF/JS Quad
interface as produced by the implementation of the
RDF/JS DataFactory
interface passed to the Quadstore
constructor.
Bindings are returned as and expected to be maps of variable names
(including ?
) to instances of the RDF/JS Term interface as produced
by the same implementation of the RDF/JS DataFactory interface.
Matching patterns, such as those used in the get
and getStream
methods,
are expected to be maps of term names to instances of the
RDF/JS Term interface.
The backend of a quadstore
can be accessed with the db
property, to perform
additional storage operations independently of quads.
In order to perform write operations atomically with quad storage, the put
,
multiPut
, del
, multiDel
, patch
and multiPatch
methods accept a
preWrite
option which defines a procedure to augment the batch, as in the
following example:
await store.put(dataFactory.quad(/* ... */), {
preWrite: batch => batch.put('my.key', Buffer.from('my.value'))
});
const Quadstore = require('quadstore').Quadstore;
const store = new Quadstore(opts);
Instantiates a new store. Supported properties for the opts
argument
are:
The opts.backend
option must be an instance of a leveldb backend.
See storage backends.
The opts.comunica
option must be an implementation of Comunica's
ActorInitSparql
interface.
Comunica is a meta query engine using which query engines can be created. It does this by providing a set of modules that can be wired together in a flexible manner. [...] Its primary goal is executing SPARQL queries over one or more interfaces.
The Quadstore
instance will use the provided ActorInitSparql
implementation
to run most SPARQL queries.
A custom configuration of the Comunica framework optimized for bundle size and dependency count is available at quadstore-comunica and can be used as follows:
import {newEngine} from 'quadstore-comunica';
const store = new Quadstore({
/* other options... */
comunica: newEngine(),
});
Many thanks to Comunica's contributors for sharing such a wonderful project with the global community.
The dataFactory
option must be an implementation of the
RDF/JS DataFactory interface. Some of the available
implementations:
- rdf-data-factory (default)
- @rdfjs/data-model
- N3.DataFactory
If left undefined, quadstore
will automatically instantiate
one using rdf-data-factory
.
The opts.indexes
option allows users to configure which indexes will be used
by the store. If not set, the store will default to the following indexes:
[
['subject', 'predicate', 'object', 'graph'],
['object', 'graph', 'subject', 'predicate'],
['graph', 'subject', 'predicate', 'object'],
['object', 'subject', 'predicate', 'graph'],
['predicate', 'object', 'graph', 'subject'],
['graph', 'predicate', 'object', 'subject'],
];
This option, if present, must be set to an array of term arrays, each of
which must represent one of the 24 possible permutations of the four terms
subject
, predicate
, object
and graph
. Partial indexes are not
supported.
The store will automatically select which index(es) to use for a given query based on the available indexes and the query itself. If no suitable index is found for a given query, the store will throw an error.
Also, Quadstore
can be configured with a prefixes
object that defines a
reversible mapping of IRIs to abbreviated forms, with the intention of reducing
the storage cost where common HTTP prefixes are known in advance.
The prefixes
object defines a bijection using two functions expandTerm
and
compactIri
, both of which take a string parameter and return a string, as in
the following example:
opts.prefixes = {
expandTerm: term => term.replace(/^ex:/, 'http://example.com/'),
compactIri: iri => iri.replace(/^http:\/\/example\.com\//, 'ex:'),
}
This will replace the IRI http://example.com/a
with ex:a
in storage.
This method opens the store and throws if the open operation fails for any reason.
This method closes the store and throws if the open operation fails for any reason.
const pattern = {graph: dataFactory.namedNode('ex://g')};
const { items } = await store.get(pattern);
Returns an array of all quads within the store matching the specified terms.
This method also accepts an optional opts
parameter with the following
properties:
opts.defaultGraphMode
: this can be set to either"default"
or"union"
and allows client to specify whether the default graph used in queries should be the actual default graph or the union of all graphs present in the database.
quadstore
supports range-based matching in addition to value-based matching.
Ranges can be defined using the gt
, gte
, lt
, lte
properties:
const pattern = {
object: {
termType: 'Range',
gt: dataFactory.literal('7', 'http://www.w3.org/2001/XMLSchema#integer')
}
};
const { items } = await store.get(matchTerms);
Values for literal terms with the following numeric datatypes are matched against their numerical values rather than their literal representations:
http://www.w3.org/2001/XMLSchema#integer
http://www.w3.org/2001/XMLSchema#decimal
http://www.w3.org/2001/XMLSchema#double
http://www.w3.org/2001/XMLSchema#nonPositiveInteger
http://www.w3.org/2001/XMLSchema#negativeInteger
http://www.w3.org/2001/XMLSchema#long
http://www.w3.org/2001/XMLSchema#int
http://www.w3.org/2001/XMLSchema#short
http://www.w3.org/2001/XMLSchema#byte
http://www.w3.org/2001/XMLSchema#nonNegativeInteger
http://www.w3.org/2001/XMLSchema#unsignedLong
http://www.w3.org/2001/XMLSchema#unsignedInt
http://www.w3.org/2001/XMLSchema#unsignedShort
http://www.w3.org/2001/XMLSchema#unsignedByte
http://www.w3.org/2001/XMLSchema#positiveInteger
This is also the case for terms with the following date/time datatypes:
http://www.w3.org/2001/XMLSchema#dateTime
await store.put(dataFactory.quad(/* ... */));
Stores a new quad. Does not throw or return an error if the quad already exists.
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.opts.scope
: this can be set to aScope
instance as returned byinitScope()
andloadScope()
. If set, blank node labels will be changed to prevent blank node collisions. See Blank nodes and quad scoping.
await store.multiPut([
dataFactory.quad(/* ... */),
dataFactory.quad(/* ... */),
]);
Stores new quads. Does not throw or return an error if quads already exists.
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.opts.scope
: this can be set to aScope
instance as returned byinitScope()
andloadScope()
. If set, blank node labels will be changed to prevent blank node collisions. See Blank nodes and quad scoping.
This method deletes a single quad. It Does not throw or return an error if the specified quad is not present in the store.
await store.del(dataFactory.quad(/* ... */));
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.
This method deletes multiple quads. It Does not throw or return an error if the specified quads are not present in the store.
await store.multiDel([
dataFactory.quad(/* ... */),
dataFactory.quad(/* ... */),
]);
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.
This method deletes one quad and inserts another quad in a single operation. It Does not throw or return an error if the specified quads are not present in the store (delete) or already present in the store (update).
await store.patch(
dataFactory.quad(/* ... */), // will be deleted
dataFactory.quad(/* ... */), // will be inserted
);
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.
This method deletes and inserts quads in a single operation. It Does not throw or return an error if the specified quads are not present in the store (delete) or already present in the store (update).
// will be deleted
const oldQuads = [
dataFactory.quad(/* ... */),
dataFactory.quad(/* ... */),
];
// will be inserted
const newQuads = [ // will be inserted
dataFactory.quad(/* ... */),
dataFactory.quad(/* ... */),
dataFactory.quad(/* ... */),
];
await store.multiPatch(oldQuads, newQuads);
This method also accepts an optional opts
parameter with the following
properties:
opts.preWrite
: this can be set to a function which accepts a chainedBatch and performs additional backend operations atomically with theput
operation. See Access to the backend for more information.
const pattern = {graph: dataFactory.namedNode('ex://g')};
const { iterator } = await store.getStream(pattern);
This method supports range matching, see QuadStore.prototype.get().
This method also accepts an optional opts
parameter with the following
properties:
opts.defaultGraphMode
: this can be set to either"default"
or"union"
and allows client to specify whether the default graph used in queries should be the actual default graph or the union of all graphs present in the database.
await store.putStream(readableStream);
Imports all quads coming through the specified stream.Readable
into the store.
This method also accepts an optional opts
parameter with the following
properties:
opts.scope
: this can be set to aScope
instance as returned byinitScope()
andloadScope()
. If set, blank node labels will be changed to prevent blank node collisions. See Blank nodes and quad scoping.
await store.delStream(readableStream);
Deletes all quads coming through the specified stream.Readable
from the store.
The sparql()
method provides support for non-streaming SPARQL queries.
Objects returned by sparql()
have their type
property set to different
values depending on each specific query:
SELECT
queries will result in objects having theirtype
property set to"BINDINGS"
;CONSTRUCT
queries will result in objects objects having theirtype
property set to"QUADS"
;UPDATE
queries such asINSERT DATA
,DELETE DATA
andINSERT/DELETE WHERE
will result in objects having theirtype
property set to either"VOID"
or"BOOLEAN"
.
const { type, items } = await store.sparql(`
SELECT * WHERE { ?s <ex://knows> <ex://alice> . }
`);
The sparql()
also accepts an optional opts
parameter with the following
properties:
opts.defaultGraphMode
: this can be set to either"default"
or"union"
and allows client to specify whether the default graph used in queries should be the actual default graph or the union of all graphs present in the database.
We're using the rdf-test-suite
package to validate our
support for SPARQL queries against official test suites published by the W3C.
We're currently testing against the following manifests:
- SPARQL 1.0: 277/438 tests passing (
npm run test-rdf:sparql10
) - SPARQL 1.1: 249/271 tests passing (
npm run test-rdf:sparql11
, limited to the SPARQL 1.1 Query spec)
The sparqlStream()
method provides support for streaming SPARQL queries.
Objects returned by sparqlStream()
have their type
property set to
different values depending on each specific query, as for sparql()
.
sparqlStream()
also accepts the same options as sparql()
.
const { iterator } = await store.sparqlStream(`
SELECT * WHERE { ?s <ex://knows> <ex://alice> . }
`);
See Quadstore.prototype.sparql().
const subject = dataFactory.namedNode('http://example.com/subject');
const graph = dataFactory.namedNode('http://example.com/graph');
store.match(subject, null, null, graph)
.on('error', (err) => {})
.on('data', (quad) => {
// Quad is produced using dataFactory.quad()
})
.on('end', () => {});
Implementation of the RDF/JS Source#match method. Supports range-based matching.
This method also accepts an optional opts
parameter with the following
properties:
opts.defaultGraphMode
: this can be set to either"default"
or"union"
and allows client to specify whether the default graph used in queries should be the actual default graph or the union of all graphs present in the database.
const readableStream; // A stream.Readable of Quad() instances
store.import(readableStream)
.on('error', (err) => {})
.on('end', () => {});
Implementation of the RDF/JS Sink#import method.
const readableStream; // A stream.Readable of Quad() instances
store.remove(readableStream)
.on('error', (err) => {})
.on('end', () => {});
Implementation of the RDF/JS Store#remove method.
const subject = dataFactory.namedNode('http://example.com/subject');
const graph = dataFactory.namedNode('http://example.com/graph');
store.removeMatches(subject, null, null, graph)
.on('error', (err) => {})
.on('end', () => {});
Implementation of the RDF/JS Sink#removeMatches method.
Blank nodes are defined as existential variables in that they merely indicate the existence of an entity rather than act as references to the entity itself.
While the semantics of blank nodes can be rather confusing, one of the most practical consequences of their definition is that two blank nodes having the same label may not refer to the same entity unless both nodes come from the same logical set of quads.
As an example, here's two JSON-LD documents converted to N-Quads using the
JSON-LD playground:
{
"@id": "http://example.com/bob",
"foaf:knows": {
"foaf:name": "Alice"
}
}
<http://example.com/bob> <foaf:knows> _:b0 .
_:b0 <foaf:name> "Alice" .
{
"@id": "http://example.com/alice",
"foaf:knows": {
"foaf:name": "Bob"
}
}
<http://example.com/alice> <foaf:knows> _:b0 .
_:b0 <foaf:name> "Bob" .
The N-Quads equivalent for both of these documents contains a blank node with
the b0
label. However, although the label is the same, these blank nodes
indicate the existence of two different entities. Intuitively, we can say that
a blank node is scoped to the logical grouping of quads that contains it, be it
a single quad, a document or a stream.
As quadstore treats all write operations as if they were happening within the same scope, importing these two sets of quads would result in a collision of two unrelated blank nodes, leading to a corrupted dataset.
A good way to address these issues is to skolemize skolemize all blank nodes into IRIs / named nodes. However, this is not always possible and / or practical.
The initScope()
method returns a Scope
instance which can be passed to the put
, multiPut
and putStream
methods.
When doing so, quadstore will replace each occurrence of a given blank node
with a different blank node having a randomly-generated label, preventing blank
node collisions.
Each Scope
instance keeps an internal cache of mappings between previously
encountered blank nodes and their replacements, so that it is able to always
return the same replacement blank node for a given label. Each new mapping is
atomically persisted to the store together with its originating quad, leading
each scope to be incrementally persisted to the store consistently with each
successful put
and multiPut
operation. This allows scopes to be re-used
even across process restarts via the
loadScope()
method.
Initializes a new, empty scope.
const scope = await store.initScope();
await store.put(quad, { scope });
await store.multiPut(quads, { scope });
await store.putStream(stream, { scope });
Each Scope
instance has an .id
property that acts as its unique identifier.
The loadScope()
method can be used to re-hydrate a scope through its .id
:
const scope = await store.initScope();
/* store scope.id somewhere */
/* read the previously-stored scope.id */
const scope = await store.loadScope(scopeId);
Deletes all mappings of a given scope from the store.
const scope = await store.initScope();
/* ... */
await store.deleteScope(scope.id);
Deletes all mappings of all scopes from the store.
await store.deleteAllScopes();
The level-js
backend for levelDB offers support for browser-side
persistent storage.
quadstore
can be bundled for browser-side usage via Webpack, preferably using
version 4.x. The reference repository is meant to help in getting to a
working Webpack configuration and also hosts a pre-built bundle with everything
that is required to use quadstore
in the browser.
Rollup, ES modules and tree-shaking are not supported (yet).
The performance profile of quadstore
is strongly influenced by its design
choices in terms of atomicity. As all update operations are implemented
through AbstractLevelDOWN#batch operations that atomically update
all indexes, they are performed in a manner that closely approximates batch
random updates.
The testing platform is a 2018 MacBook Pro (Intel Core i7 2.6 Ghz, SSD storage) running Node v14.0.0.
Sequential reads iterating through quads in any given index run at about ~340k quads per second.
node dist/perf/read.js
Our reference benchmark for import performance is the level-bench
batch-put
benchmark, which scores ~200k updates per second when run as follows:
node level-bench.js run batch-put leveldown --concurrency 1 --chained true --batchSize 10 --valueSize 256
We test import performance by importing the 21million.rdf
file
or a subset of it.
node dist/perf/loadfile.js /path/to/21million.rdf
With the default six indexes and the leveldown
backend, import performance
clocks at ~20k quads per second when importing quads one-by-one, with a
density of ~6.5k quads per MB. Due to the six indexes, this translates to
~120k batched update operations per second, ~0.6 times the reference
target.
We track the computational cost of handling get()
and getStream()
queries
(setting up iterators, etc...) by running a benchmark based on a SPARQL query
that results in a high number of concatenated join operations, each producing
a single quad.
node dist/perf/search.js
Quadstore is currently able to process ~5k join operations per second.
MIT. See LICENSE.md.