Code Monkey home page Code Monkey logo

4store's Introduction

4store

4store is an efficient, scalable and stable RDF database.

4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people.

Getting started

In this section:

  1. Installing prerequisites.
  2. Installing 4store.
  3. Running 4store.
  4. Installing frontend tools only.
  5. Other installation hints.

Installing prerequisites

To install Raptor (RDF parser) and Rasqal (SPARQL parser):

# install a 64-bit raptor from freshly extracted source
./configure --libdir=/usr/local/lib64 && make
sudo make install

# similarly for 64-bit rasqal
./configure "--enable-query-languages=laqrs sparql rdql" \
 --libdir=/usr/local/lib64 && make
sudo make install

# ensure PKG_CONFIG_PATH is set correctly
# ensure /etc/ld.so.conf.d/ includes /usr/local/lib64
sudo ldconfig

Installing 4store

./autogen.sh
./configure
make
make install

Running 4store

/usr/local/bin/4s-boss -D

Installing frontend tools only

To install just the frontend tools on non-cluster frontends:

# pre-requisites for installing the frontend tools
yum install pcre-devel avahi avahi-tools avahi-devel

# src/common
(cd src/common && make)

# src/frontend
(cd src/frontend && make && make install)

Other installation hints

Make sure /var/lib/4store/ exists (in a cluster, it only needs to exist on backend nodes) and that the user or users who will create new KBs have permission to write to this directory.

For clusters (or to test cluster tools on a single machine) the frontend must have a file /etc/4s-cluster, which lists all machines in the cluster.

To avoid problems running out of Avahi DBUS connections, modify /etc/dbus-1/system.d/avahi-dbus.conf to:

  • Increase max_connections_per_user to 200 or so
  • Increase max_match_rules_per_connection to 512 or so (optional)

4store's People

Contributors

arekinath avatar berezovskyi avatar clockwerx avatar dajobe avatar danny4927 avatar davechallis avatar galpin avatar jaredjennings avatar jen140 avatar kgardas avatar leifwarner avatar mildred avatar mischat avatar msalvadores avatar njh avatar presbrey avatar prusnak avatar rafl avatar robsyme avatar stevenc99 avatar swh avatar tialaramex avatar wwaites avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

4store's Issues

new feature: offset and limit support combined with grouping+order

At the moment we do not have support for queries like

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?child (COUNT(?sub_child) as ?c)
FROM <http://example.com/group-by.ttl>
WHERE { 
       ?child a owl:Class .
       ?child rdfs:subClassOf <http://foaf.qdos.com/0> .
       OPTIONAL { ?sub_child rdfs:subClassOf ?child . }
} GROUP BY ?child OFFSET 10 LIMIT 10

This is an important requirement for us in BioPortal and I am working on that. I'll be releasing that code into the master branch shortly.

Compile errors in frontend/query.c

I'm trying to build 4store master on 64-bit Ubuntu against Raptor master and Rasqal master (and librdf master if that makes any difference).

The configure script ends saying it's using Raptor 2.0.1 and Rasqal 0.9.26.

But when I make I eventually get

query.o: In function `fs_query_execute':
/usr/local/src/4store/src/frontend/query.c:363: undefined reference to `rasqal_world_set_log_handler'
/usr/local/src/4store/src/frontend/query.c:510: undefined reference to `rasqal_query_get_having_condition'

And it then exits with an error code.

I've tried rolling Rasqal back to a stable version (tried each tag until 0.9.22 -- before that 4store complains that it needs newer Rasqal) and also rolling 4store back through the last handful of tags.

Any idea what I can try?

FILTER sometimes really slow

Traversing graphs takes too much time - FILTER is slow

The following query will take exceptionally much time·

CONSTRUCT { ?r1334571120r5958r30398 ?p ?o } WHERE { <http://lobid.org/resource/TT002447758> <http://purl.org/vocab/frbr/core#exemplar> ?r1334571120r5958r30669 . ?r1334571120r5958r30669 <http://purl.org/vocab/frbr/core#owner> ?r1334571120r5958r30672 . ?r1334571120r5958r30672 ?p ?o . FILTER (?p=<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> || ?p=<http://xmlns.com/foaf/0.1/name>)

?r1334571120r5958r30672 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?t .
FILTER (?t=<http://xmlns.com/foaf/0.1/Organization> || ?t=<http://xmlns.com/foaf/0.1/Agent>)
}

The problem is the last FILTER. Even if you short that FILTER to take only one argument (i.e. FILTER (?t=<http://xmlns.com/foaf/0.1/Organization> ) ) it takes around 40 seconds. Removing the last FILTER statement the query is fast answered.

Repeated triples when using CONSTRUCT

As requested by Manuel, I'm opening a new issue for repeated triples seen during CONSTRUCT.

Using the stripped down test triples below:

<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_10> <http://admin.company.com/tenancies/customer1/collections/11> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_11> <http://admin.company.com/tenancies/customer1/collections/12> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_12> <http://admin.company.com/tenancies/customer1/collections/13> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_13> <http://admin.company.com/tenancies/customer1/collections/14> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_14> <http://admin.company.com/tenancies/customer1/collections/15> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://admin.company.com/tenancies/customer1/collections/1> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <http://admin.company.com/tenancies/customer1/collections/2> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> <http://admin.company.com/tenancies/customer1/collections/3> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_4> <http://admin.company.com/tenancies/customer1/collections/4> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_5> <http://admin.company.com/tenancies/customer1/collections/5> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_6> <http://admin.company.com/tenancies/customer1/collections/6> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_7> <http://admin.company.com/tenancies/customer1/collections/7> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_8> <http://admin.company.com/tenancies/customer1/collections/9> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_9> <http://admin.company.com/tenancies/customer1/collections/10> .
<http://admin.company.com/tenancies/customer1/collections> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> .
<http://admin.company.com/tenancies/customer1> <http://appname.company.com/schema#collections> <http://admin.company.com/tenancies/customer1/collections> .

and the following query:

CONSTRUCT {
  <http://admin.company.com/tenancies/customer1> ?p ?o .
  ?o ?p2 ?o2 .
}
WHERE {
  <http://admin.company.com/tenancies/customer1> ?p ?o .
  OPTIONAL {
    ?o ?p2 ?o2 .
  }
}

the expected number of triples returned is 16, however, 30 are being returned.

an explain outputs the following:

execute: triple(uri<http://admin.company.com/tenancies/customer1>, variable(p), variable(o)) DISTINCT LIMIT 998
mmmms (_,_[d772c8df1933543a],?,?) -> 1
1 bindings (2)
execute: triple(variable(o), variable(p2), variable(o2)) DISTINCT LIMIT 998
mmmms (_,?[d5b19b1e8220ff0a],?,?) -> 15
15 bindings (45)

It appears that the following triple is emitted multiple times:

<http://admin.company.com/tenancies/customer1> <http://appname.company.com/schema#collections> <http://admin.company.com/tenancies/customer1/collections> .

Phil.

Duplicates in select result set after delete

Summary: Select query returns deleted data.

Steps to reproduce:

  1. Execute insert query:
    INSERT DATA { GRAPH { http://example/s http://example/p "value" } }
  2. Execute delete query:
    DELETE DATA { GRAPH { http://example/s http://example/p "value" } }
  3. Execute insert query again:
    INSERT DATA { GRAPH { http://example/s http://example/p "value" } }
  4. Execute select query:
    SELECT * FROM WHERE { ?s ?p ?o } LIMIT 10

Expected query result:

<sparql xmlns="http://www.w3.org/2005/sparql-results#">
    <head>
        <variable name="s"/>
        <variable name="p"/>
        <variable name="o"/>
    </head>
    <results>
         <result>
              <binding name="s">
                  <uri>http://example/s</uri>
              </binding>
              <binding name="p">
                  <uri>http://example/p</uri>
              </binding>
              <binding name="o">
                  <literal>value</literal>
              </binding>
        </result>
    </results>
</sparql>

Actual query result:

<sparql xmlns="http://www.w3.org/2005/sparql-results#">
    <head>
        <variable name="s"/>
        <variable name="p"/>
        <variable name="o"/>
    </head>
    <results>
         <result>
              <binding name="s">
                  <uri>http://example/s</uri>
              </binding>
              <binding name="p">
                  <uri>http://example/p</uri>
              </binding>
              <binding name="o">
                  <literal>value</literal>
              </binding>
        </result>
        <result>
              <binding name="s">
                  <uri>http://example/s</uri>
              </binding>
              <binding name="p">
                  <uri>http://example/p</uri>
              </binding>
              <binding name="o">
                  <literal>value</literal>
              </binding>
        </result>
    </results>
</sparql>

Discussion in GoogleGroup: https://groups.google.com/forum/?fromgroups#!topic/4store-support/mZ_3_9TXjAQ

Predicate URI corrupted

Strange one this, I've loaded a data set and noticed that

http://purl.org/dc/terms/title

is coming back as

http://aurl.org/dc/terms/title

I then checked what predicates are contained in the kb with

SELECT DISTINCT ?p WHERE { ?s ?p ?o } ORDER BY ?p

and there are several that are incorrect, all purl ones, here they are:

<http://aurl.org/dc/terms/created>
<http://aurl.org/dc/terms/creator>
<http://aurl.org/dc/terms/title>
<http://aurl.org/dc/terms/updated>
<http://aurl.org/vocab/changeset/schema#Removal>
<http://aurl.org/vocab/changeset/schema#addition>
<http://aurl.org/vocab/changeset/schema#changeReason>
<http://aurl.org/vocab/lifecycle/schema#state>
<http://aurl.org/vocab/lifecycle/schema#states>
<http://aurl.org/vocab/multi-tenant-configuration/schema#createdAt>

any idea what could be going on?

query-backend.c fs_bind issues

This is related to the bug that I tried to fix last week. I could not fix it then but now I think I have found some related 'holes' in the query-backend.c fs_bind function.

fs_bind unique and sorting of rids

The sorting and unique operation here

https://github.com/garlik/4store/blob/master/src/backend/query-backend.c#L184

is inconsistent with the pair index traversing that we do here ...

https://github.com/garlik/4store/blob/master/src/backend/query-backend.c#L410

and here ...

https://github.com/garlik/4store/blob/master/src/backend/query-backend.c#L470

We cannot sort and unique the rids vector if later on we are going to traverse by pairs.

Potential fix for this error

https://github.com/ncbo/4store/blob/ncbo_integration_branch/src/backend/query-backend.c#L186
(do not run sort/uniq when doing pair traversing)

fsp_bind_limit_many

Another potential issue is in fsp_bind_limit_many where subjects are just sent to adequate segments. See

https://github.com/garlik/4store/blob/master/src/common/4s-client.c#L1491

If the backend branch receptor of the bind is either this one ...

https://github.com/garlik/4store/blob/master/src/backend/query-backend.c#L410

or this one

https://github.com/garlik/4store/blob/master/src/backend/query-backend.c#L470

then we can have a potential binding error. Search is again by pairs but we did not take that into account when filtering subjects by segments in 4s-client.c

Potential fix for this issue:
https://github.com/ncbo/4store/blob/ncbo_integration_branch/src/common/4s-client.c#L1506
(aligned the memcpy for predicates with subjects)

This is a tricky one, I do not really know how to address the other CASE branch.

Shell scripts are written in bash but have sh shebang

The included shell scripts have the sh shebang line but don't actually run (or at least 4s-cluster-create and 4s-ssh-all-parallel don't) in sh. They're probably working for some people because on many symlinks sh is actually a symlink to bash.

The problem sh has with 4s-cluster-create is line 2.
The problem sh has with 4s-ssh-all-parallel is line 8.

There may be others -- I've fixed my local ones by changing the shebang line to bash rather than sh.

Possible bug in function binding_row_compare in frontend/query-datatypes.c

In the function binding_row_compare (in frontend/query-datatypes.c), it appears to consider rows as duplicate incorrectly, resulting in up to 50% of results being discarded in certain queries (notably unions, constructs where the distinct keyword is present).

Two rows of bindings are compared using:

For each bound variable:
const fs_rid b1v = table_value(b1, i, p1);
const fs_rid b2v = table_value(b2, i, p2);

if (b1v == FS_RID_NULL || b2v == FS_RID_NULL) {
    continue;
}

if (b1v > b2v) {
    return 1;
}
if (b1v < b2v) {
    return -1;
}

Then finally (assuming nothing was returned):
return 0;

So if we have the bindings (e.g. from a union):
?a ?b ?c
1 NULL 2
1 3 NULL

And compare the two rows, the comparison function would deem the two rows above to be equal:
1. For ?a: compare b1v to b2v (b1v=1, b2v=1), find them equal, check other variables
2. For ?b: find that b1v is NULL, so check other variables
3. For ?c: find that b2v is NULL, so check other variables
4. No other variables to check, so return 0

If the distinct keyword is present, then one of the two rows will be discarded by the fs_binding_uniq function.

A suggested fix would be:

if (b1v == FS_RID_NULL) {
    if (b2v == FS_RID_NULL) {
        /* Both bindings are NULL, assume equality, check rest of vars */
        continue;
    }

    /* b1v is NULL, b2v isn't, assume NULL < b2v  */
    return -1;
}

if (b2v == FS_RID_NULL) {
    /* b2v is NULL, b1v isn't, assume b1v > NULL */
    return 1;
}

if (b1v > b2v) {
    return 1;
}
if (b1v < b2v) {
    return -1;
}

memset size mismatch in fs_binding_free?

Just a minor thing I noticed, don't think this should really have any adverse effect, but in query_datatypes.c, fs_binding is created with:

fs_binding *b = calloc(FS_BINDING_MAX_VARS+1, sizeof(fs_binding));

https://github.com/garlik/4store/blob/master/src/frontend/query-datatypes.c#L60

When fs_binding_free is called, it uses calls memset using the size of a single binding, rather than the size actually allocated:

memset(b, 0, sizeof(fs_binding));
free(b);

https://github.com/garlik/4store/blob/master/src/frontend/query-datatypes.c#L79

Should this be:

memset(b, 0, (FS_BINDING_MAX_VARS+1) * sizeof(fs_binding));

instead?

failure to start 'default' 4store service (Ubuntu 11.10)

Is this simply a failure to create a 'default' KB or to specify a KB on the service command?

[ dlweber@BMIR-X247-MBP-Ubuntu:0 scripts ]$ sudo service 4store start

  • Starting 4s-backend 4s-backend 4store[4455]: lock.c:38 failed to open metadata file /var/lib/4store/default/metadata.nt for locking: No such file or directory
    4store[4455]: 1: /usr/bin/4s-backend() [0x411201]
    4store[4455]: 2: /usr/bin/4s-backend() [0x40392b]
    4store[4455]: 3: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f61c978a30d]
    4store[4455]: 4: /usr/bin/4s-backend() [0x403979]
    [ dlweber@BMIR-X247-MBP-Ubuntu:0 scripts ]$ ll /var/lib/4store/
    total 4.0K
    drwxr-xr-x 4 root root 4.0K 2012-04-19 20:41 mappings/
    [ dlweber@BMIR-X247-MBP-Ubuntu:0 scripts ]$ uname -a
    Linux BMIR-X247-MBP-Ubuntu 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
    [ dlweber@BMIR-X247-MBP-Ubuntu:0 scripts ]$ cat /etc/lsb-release
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=11.10
    DISTRIB_CODENAME=oneiric
    DISTRIB_DESCRIPTION="Ubuntu 11.10"
    [ dlweber@BMIR-X247-MBP-Ubuntu:0 scripts ]$ cat /etc/4store.conf
    [default]
    unsafe = true # enable LOAD etc. (default is disabled)
    cors = true # enable CORS (default is disabled)

[mappings]
port = 8888 # HTTP port number (default is 8080)
default-graph = true # default graph = union of named graphs (default)
soft-limit = 0 # disable soft limit
opt-level = 3 # enable all optimisations (default)

inconsistent xsd:dateTime with timezones - Rasqal VS Raptor

Raptor and Rasqal seem to behave differently when parsing the xsd:dateTime objects.

A KB with only one assertion:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://s>  <http://date>  "2012-01-31T15:32:16-08:00"^^xsd:dateTime .

if we execute SELECT * WHERE { ?s ?p ?o } we get a right result:

<http://s>  <http://date>   "2012-01-31T15:32:16-08:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>

The issue comes when executing the following query:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE { ?s ?p "2012-01-31T15:32:16-08:00"^^xsd:dateTime }

It returns no results because it seems that Rasqal transforms the lexical form of the date into 2012-01-31T15:32:16Z adding a letter for the time zone and removing the -8:00.

Is there anyway we can tell Rasqal to not do such transformation ? Dajobe might be able to help with this one.

Build failure -- Frontend

I attempted to build from git mainline, but received this error. Then I went back to 1.1.4 and the error is repeating itself.

query.o: In function `fs_query_init':
/.../4store-v1.1.4/src/frontend/query.c:248: undefined reference to `rasqal_world_set_warning_level'
query.o: In function `fs_query_execute':
/.../4store-v1.1.4/src/frontend/query.c:372: undefined reference to `rasqal_world_set_log_handler'
/.../4store-v1.1.4/src/frontend/query.c:522: undefined reference to `rasqal_query_get_having_condition'
collect2: ld returned 1 exit status
make[3]: *** [4s-query] Error 1
make[3]: Leaving directory `/.../4store-v1.1.4/src/frontend'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/.../4store-v1.1.4/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/.../4store-v1.1.4'
make: *** [all] Error 2```

make error

Hello,
i try to use 4store-v1.1.3,i installed 4store-v1.1.3 and 4store-v1.1.3.
in the 4store directory, i do a ./configure; it's work. but when i do a "make" .
i got an error

results.c:947: warning: ‘res’ may be used uninitialized in this function
mv -f .deps/results.Tpo .deps/results.Po
fatal: No names found, cannot describe anything.
gcc -DHAVE_CONFIG_H -I. -I../.. -std=gnu99 -fno-strict-aliasing -Wall -g -O2 -I./ -I../ -DGIT_REV=""v1.1.3"" -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/local/include/raptor2 -I/usr/local/include/rasqal -I/usr/local/include/raptor2 -I/usr/include/libxml2 pcre-config --cflags -g -O2 -MT query-data.o -MD -MP -MF .deps/query-data.Tpo -c -o query-data.o query-data.c
mv -f .deps/query-data.Tpo .deps/query-data.Po
fatal: No names found, cannot describe anything.

i uninstall rasqal doing a apt-get remove rasqal-0.9.26 then reinstalled it but still i got the same error .

Thanks for your help.

DISTINCT with ORDER BY isn't uniqing

Load the following Turtle document in 4store:

@prefix : <http://example.com/> .
:thing1 :property1 [ :weight 0.7 ; :property2 :thing2 ] .
:thing1 :property1 [ :weight 0.8 ; :property2 :thing2 ] .
:thing2 :property3 :thing3 .
:thing2 :property3 :thing4 .

Execute the following SPARQL query:

select distinct ?d where {?b <http://example.com/weight> ?weight . ?b <http://example.com/property2> ?c . ?c <http://example.com/property3> ?d} order by desc(?weight)

Expected results (one row per distinct bindings for ?d, and the order is actually undefined, as both bindings have the same ordering):

Actual results:

From https://groups.google.com/d/msg/4store-support/-/5_e-DvMfGcEJ

UNION bug

The graph:

@base <http://example.org/> .
@prefix ex: <http://example.org/terms#> .
<a1> a ex:Class1 ;
  ex:prop <x1> .
<a2> a ex:Class2 ;
  ex:prop <y> .
<y> ex:source <x2> .

when queried with:

prefix ex: <http://example.org/terms#>
select ?a where {
   { ?a a ex:Class1 }
 UNION
   { ?a a ex:Class2 } .
   { ?a ex:prop <http://example.org/x2> }
 UNION
   { ?a ex:prop ?b . ?b ex:source <http://example.org/x2> } .
}

returns both http://example.org/a1 and http://example.org/a2. Redland's roqet returns the correct result, http://example.org/a2.

[ submitted by David Brooks ]

4s-info and 4s-httpd can't connect to backend

After failing to build 4store master (see my other issue) I've tried to build the raptor1 branch on 64-bit Ubuntu with Ubuntu's own Raptor, Rasqal, librdf libraries. The configure script shows that it has found Raptor 1.4.21 and Rasqal 0.9.17.

This builds just fine. I install it and make my backend, then run the backend:
$ 4s-backend -D observations
4store[8456]: 4s-server.c:441 4store backend v1.0.6-1-g7bc9ea2UMAC for kb observations on port 6734
So that seems to be running. netstat shows that it is indeed listening on 6734:
$ sudo netstat --listen | grep 6734
tcp6 0 0 [::]:6734 [::]:* LISTEN
But then when I try to run 4s-httpd:
$ 4s-httpd -p 8000 -D observations
4store[8483]: httpd.c:1531 4store HTTP daemon v1.0.6-1-g7bc9ea2 started on port 8000
4store[8484]: httpd.c:1542 couldn't connect to “observations”
4store[8483]: httpd.c:1596 child 8484 exited with return code 3
4store[8483]: 1: 4s-httpd() [0x40928c]
4store[8483]: 2: /lib/libc.so.6(__libc_start_main+0xfd) [0x7fed75d85c4d]
4store[8483]: 3: 4s-httpd() [0x405d99]
4store[8485]: httpd.c:1542 couldn't connect to “observations”
That continues. Also:
$ 4s-info observations noop
4store[8501]: 4s-info.c:57 couldn't connect to “observations”

What could be wrong?

aggregate (count) + filter + isLiteral issue

In a not very big KB this query works ...

SELECT DISTINCT (count(?property) as ?c) WHERE {
?s ?property ?o .
?s a <http://crime.rkbexplorer.com/id/ReportedCrime> .
}

but these other queries do not give anything back:

 SELECT DISTINCT (count(?property) as ?c) WHERE {
?s ?property ?o .
?s a <http://crime.rkbexplorer.com/id/ReportedCrime> .
FILTER (isURI(?o)) .
}

or

SELECT DISTINCT (count(?property) as ?c) WHERE {
?s ?property ?o .
?s a <http://crime.rkbexplorer.com/id/ReportedCrime> .
FILTER (isLiteral(?o)) .
}

and for instance ....

SELECT DISTINCT ?property ?o WHERE {
?s ?property ?o .
?s a <http://crime.rkbexplorer.com/id/ReportedCrime> .
FILTER (isURI(?o)) .
}

returns lots of solutions. So the bug must be in the combination of counting and filtering.

But weirdly this one works ...

SELECT DISTINCT (count(?property) as ?c) WHERE {
?s ?property ?o .
?s a <http://crime.rkbexplorer.com/id/ReportedCrime> .
FILTER (?o = <http://data.ordnancesurvey.co.uk/id/postcodeunit/LN50HW>) .
}

complexity warning contains corrupt json

I'm using today's git head of 4store loaded with the dataset from http://skipforward.opendfki.de/wiki/DBTropes

curl -H "Accept: application/sparql-results+json,text/boolean" "http://localhost:9991/sparql/?query=PREFIX+skip%3A+%3Chttp%3A%2F%2Fskipforward.net%2Fskipforward%2Fresource%2Fseeder%2Fskipinions%2F%3E%0APREFIX+dbt%3A+%3Chttp%3A%2F%2Fdbtropes.org%2Font%2F%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0A++++++SELECT+DISTINCT+%3Fmovie+WHERE+%7B%0A++++++++%3Fmovie+a+dbt%3ATVTItem%3B%0A++++++++++rdfs%3Alabel+%3Flabel+.%0A++++++++FILTER+%28regex%28%3Flabel%2C+%22The%5C+Lord%22%29%29%0A++++++%7D%0A++"

returns

{"head":{"vars":["movie"]},
 "results": {
  "bindings":[
   {"movie":{"type":"uri","value":"http://dbtropes.org/resource/Film/TheLordOfTheRings"}},
   {"movie":{"type":"uri","value":"http://dbtropes.org/resource/Main/TheLordOfTheRings"}}
  ]
 },
 "warnings": ["hit complexity limit 4 times, increasing soft limit may give more results""parser warning: Unknown SPARQL string escape \\  in \"The\\ Lord\"\" on line 8"]
}

The double " in middle of the warning string are making this invalid json.

Bad Content-Length causes 100% CPU (4s-httpd)

A POST request containing a bad Content-Length results in an an infinite loop and max CPU usage by 4s-httpd.

This can be reproduced using the following script:

#!/usr/bin/env python
import httplib, urllib

server = "localhost:8866"
query = "SELECT ?s WHERE { ?s ?p ?o } LIMIT 10"
path = "/sparql/" # "/query/" or "/data/"

headers = {
    'Content-Length': "1000", # content length > len(query)
    'Content-Type': 'application/x-www-form-urlencoded'
    }

params = urllib.urlencode({'query':query})
conn = httplib.HTTPConnection(server)
conn.request("POST", "/data/", params, headers)
response = conn.getresponse()

# 4s-httpd CPU usage will jump to 100%

Excessive CPU usage of 4s-httpd on SPARQL Update

As reported in various threads on the mailing lists, writing queries generally seem to boost CPU use of 4s-httpd to overly high levels. The behaviour has been confirmed for various types of updates, including INSERT, DELETE, and POST. The issue is present in version 1.1.3.

Matching Literals with type xsd:string

Hi

I found strange behaviours when I write the type xsd:string in the query insert data :

  • not match with string when I don't write the type in the query sparql
  • sometimes the results are duplicate with or without distinct... (I don't know repeat this bug simply)

I can repeat the first type of bug :
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
PREFIX a: http://example.com/test/a/
PREFIX b: http://example.com/test/b/
INSERT DATA {
GRAPH http://example.com/test {
a:A b:Name "Test"^^xsd:string .
a:A b:Name "Test2" .
}}

select * where { GRAPH http://example.com/test {?a ?b "Test".}} "
=>ERROR 0 result

select * where { GRAPH http://example.com/test {?a ?b "Test2"^^xsd:string.}} "
=>ERROR 0 result

Bye,
Karima

"Count(*) as count" always equal 0

Hi,

May be not a bug...

SELECT count(*) AS count WHERE {?a ?b ?c} => always 0

Quick Fix :
SELECT count(?a) AS count WHERE {?a ?b ?c}

bye,
karima

The functions-rand test fails with recent version of rasqal

The functions-rand test is failing with recent versions of rasqal, but only when run in the full suite:

Query: SELECT ?s (RAND() AS ?r) WHERE { ?s ?p ?o. } LIMIT 1
400 Parser error
This is a 4store SPARQL server v1.1.3-37-gbd7c6c1

parser error: syntax error, unexpected $end, expecting integer literal on line 1
parser error: rasqal_new_typed_literal failed on line 1

The problem seems to be related to the 'LIMIT' directive.

Running the test on its own passes:

jenkins@star:~/jobs/4store/workspace/tests$ ./httpd/run.pl functions-rand
4store[32182]: backend-setup.c:185 erased files for KB http_test_jenkins
4store[32182]: backend-setup.c:310 created RDF metadata for KB http_test_jenkins
4s-httpd running on PID 32191
[PASS] functions-rand
Tests completed: passed 1/1 (0 fails)

As does running the test manually:

jenkins@star:~/jobs/4store/workspace/tests$ ../src/frontend/4s-query -f text demo 'SELECT ?s (RAND() AS ?r) WHERE { ?s ?p ?o. } LIMIT 1'
?s  ?r
<file:///var/lib/jenkins/jobs/4store/workspace/data/i>  0.50757992415155528e0

Running more than one test fails:

[4store[1565]: httpd.c:491 starting import http://example.com/numbers.ttl (1800 bytes)
4store[1565]: httpd.c:613 finished import http://example.com/numbers.ttl
4store[1565]: httpd.c:693 deleted model <http://example.com/numbers.ttl>
4store[1565]: httpd.c:292 HTTP error, returning status 200 deleted successfully
PASS] functions-ceil
[4store[1565]: httpd.c:491 starting import http://example.com/numbers.ttl (1800 bytes)
4store[1565]: httpd.c:613 finished import http://example.com/numbers.ttl
4store[1565]: httpd.c:292 HTTP error, returning status 400 Parser error
4store[1565]: httpd.c:292 HTTP error, returning status 400 Parser error
4store[1565]: httpd.c:693 deleted model <http://example.com/numbers.ttl>
4store[1565]: httpd.c:292 HTTP error, returning status 200 deleted successfully
FAIL] functions-rand
Tests completed: passed 1/2 (1 fails)

JSON response is blank for ASK queries

This could possible be a mistake on my part, but I am having trouble getting a valid JSON response for ASK queries.

For a request that would return "true", I expect:

{"head":{"vars":[]},
 "results": {
  "bindings":[{"boolean": true}]
 }}

but instead I get:

{"head":{"vars":[]},
 "results": {
  "bindings":[]
 }}

The result is valid when using XML:



true

4s-httpd server demands a write-capable link from the backends

The 4s-httpd server demands a write-capable link from the backends. As a result, when one of the cluster data-nodes fails (crashes/dies) and the DB becomes read-only, you cannot use 4s-httpd.

What I see in the syslog is:
Jun 12 15:25:08 fourstore 4store[1065]: 1: /usr/bin/4s-httpd() [0x43672b]
Jun 12 15:25:08 fourstore 4store[1065]: 2: /usr/bin/4s-httpd() [0x406909]
Jun 12 15:25:08 fourstore 4store[1065]: 3: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f5e4808930d]
Jun 12 15:25:08 fourstore 4store[1065]: 4: /usr/bin/4s-httpd() [0x406de5]
Jun 12 15:25:08 fourstore 4store[1065]: httpd.c:1641 couldn't connect to “4scluster”
Jun 12 15:25:08 fourstore 4store[1064]: httpd.c:1690 child 1065 exited with return code 3
Jun 12 15:25:08 fourstore 4store[1064]: 1: /usr/bin/4s-httpd() [0x4067e2]
Jun 12 15:25:08 fourstore 4store[1064]: 2: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f5e4808930d]
Jun 12 15:25:08 fourstore 4store[1064]: 3: /usr/bin/4s-httpd() [0x406de5]
Jun 12 15:25:08 fourstore 4store[1232]: 4s-client.c:261 kb=4scluster waiting for more backend nodes
Jun 12 15:28:08 fourstore 4store[1232]: 4s-client.c:283 kb=4scluster not enough primary nodes, segments 1, 3, missing
Jun 12 15:28:08 fourstore 4store[1232]: 1: /usr/bin/4s-httpd() [0x43672b]
Jun 12 15:28:08 fourstore 4store[1232]: 2: /usr/bin/4s-httpd() [0x406909]
Jun 12 15:28:08 fourstore 4store[1232]: 3: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f5e4808930d]
Jun 12 15:28:08 fourstore 4store[1232]: 4: /usr/bin/4s-httpd() [0x406de5]
Jun 12 15:28:08 fourstore 4store[1232]: httpd.c:1641 couldn't connect to “4scluster”
Jun 12 15:28:08 fourstore 4store[1064]: httpd.c:1690 child 1232 exited with return code 3
Jun 12 15:28:08 fourstore 4store[1064]: 1: /usr/bin/4s-httpd() [0x4067e2]
Jun 12 15:28:08 fourstore 4store[1064]: 2: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f5e4808930d]
Jun 12 15:28:08 fourstore 4store[1064]: 3: /usr/bin/4s-httpd() [0x406de5]
Jun 12 15:28:08 fourstore 4store[1238]: 4s-client.c:261 kb=4scluster waiting for more backend nodes.

At the same time, querying the DB produces the following result:
4s-query 4scluster 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'
4store[1244]: 4s-client.c:433 kb=4scluster write error for segment 1: Bad file descriptor
4store[1244]: 4s-client.c:438 kb=4scluster switching to backup segment 1 for queries
4store[1244]: 4s-client.c:433 kb=4scluster write error for segment 3: Bad file descriptor
4store[1244]: 4s-client.c:438 kb=4scluster switching to backup segment 3 for queries

........data here........

Os is Ubuntu 11.10 Server 64bit. 4store installed from the repository package.

repeated triples returned from describe or select when resources hold mutual references to each other

I reported this on the 4store-support group but thought it would be a good idea to create an issue here for it.

We're seeing a query that should return only 900 triples return >300,000, however, when passed through:

rapper -i rdfxml -o ntriples test.xml | sort | uniq

we get 900 unique triples. Something is causing the triples to be repeated many times. In our case we think it's sets of two resources that both have references to each other.

A simple test case demonstrates this. Take these two triples:

<http://example.com/resource1> <http://foo/has> <http://example.com/resource2> .
<http://example.com/resource2> <http://foo/has> <http://example.com/resource1> .

import into a 4store kb and then run the following query:

DESCRIBE ?resource1 ?resource2 WHERE {
?resource1 <http://foo/has> ?resource2 .
}

four triples are returned, of which 2 are unique:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xml:base="local:local">
  <rdf:Description rdf:about="http://example.com/resource1">
    <ns0:has xmlns:ns0="http://foo/" rdf:resource="http://example.com/resource2"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.com/resource2">
    <ns0:has xmlns:ns0="http://foo/" rdf:resource="http://example.com/resource1"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.com/resource2">
    <ns0:has xmlns:ns0="http://foo/" rdf:resource="http://example.com/resource1"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.com/resource1">
    <ns0:has xmlns:ns0="http://foo/" rdf:resource="http://example.com/resource2"/>
  </rdf:Description>
</rdf:RDF>

converted back to ntriples:

<http://example.com/resource1> <http://foo/has> <http://example.com/resource2> .
<http://example.com/resource2> <http://foo/has> <http://example.com/resource1> .
<http://example.com/resource2> <http://foo/has> <http://example.com/resource1> .
<http://example.com/resource1> <http://foo/has> <http://example.com/resource2> .

not sure if the fact that the 2nd time they're listed it's in reverse order is of any relevance, but I thought it was interesting.

--Phil

Numeric types ignored in sparql modify data requests (INSERT DATA or DELETE DATA)

Hi,
I have described the problem here http://groups.google.com/group/4store-support/browse_thread/thread/9e6c52d92fbfdb3d

To reproduce the bug.

  1. start with empty backend
  2. execute the following sparql command over the update endpoint:
INSERT DATA { GRAPH <http://example.com/G> { 
  <http://example.com/s> <http://example.com/p1> "some string literal"^^<http://www.w3.org/2001/XMLSchema#string> . 
  <http://example.com/s> <http://example.com/p2> "123"^^<http://www.w3.org/2001/XMLSchema#int> .
}}

which turns into the following curl request:

curl -i -d 'update=INSERT+DATA+%7B+GRAPH+%3Chttp%3A%2F%2Fexample.com%2FG%3E+%7B%0A%3Chttp%3A%2F%2Fexample.com%2Fs%3E+%3Chttp%3A%2F%2Fexample.com%2Fp1%3E+%22some+string+literal%22%5E%5E%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23string%3E+.%0A%3Chttp%3A%2F%2Fexample.com%2Fs%3E+%3Chttp%3A%2F%2Fexample.com%2Fp2%3E+%22123%22%5E%5E%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23int%3E+.%0A%7D%7D' http://127.0.0.1:8080/update/
  1. select all triples with the following sparql submitted to test endpoint:
SELECT * WHERE {
 ?s ?p ?o
} 
  1. the output file will look like the following (note the missing object of the first result!)
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="s"/>
    <variable name="p"/>
    <variable name="o"/>
  </head>
  <results>
    <result>
      <binding name="s"><uri>http://example.com/s</uri></binding>
      <binding name="p"><uri>http://example.com/p2</uri></binding>
    </result>
    <result>
      <binding name="s"><uri>http://example.com/s</uri></binding>
      <binding name="p"><uri>http://example.com/p1</uri></binding>
      <binding name="o"><literal datatype="http://www.w3.org/2001/XMLSchema#string">some string literal</literal></binding>
    </result>
  </results>
</sparql>

Similarily the DELETE DATA does not work, even when the data is added using the /data endpoint and that integer literal property value exists.

Thanks,
Derek

Unencoded unicode character in SPARQL XML result

My 4store database contains a literal with a unicode 0x1 character in it. When one of my SPARQL results contains this literal, the character is returned as is, i.e. it is not encoded as an XML entity. This means that the SPARQL XML is invalid, as 0x1 is not allowed in XML documents, and my XML parser falls over.

See (line 2186):

http://eculture2.cs.vu.nl:5020/sparql/?query=SELECT+%3Fb+%3Fv0%0AWHERE+%7B%0A%3Fb+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fsider%2Fresource%2Fsider%2Fdrugs%3E+.%0AOPTIONAL+%7B%0A%3Fb+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fsider%2Fresource%2Fsider%2FdrugName%3E+%3Fv0+.%0A%7D%0A%7D+OFFSET+8000+LIMIT+1000

Build fails with undefined reference to uuid_generate in linking.

I checkout the source, run ./autoget.sh; ./configure --prefix=/usr; make, and then after a while it fails with:

/bin/sh ../../libtool --tag=CC --mode=link gcc -std=gnu99 -fno-strict-aliasing -Wall -g -O2 -I./ -I../ -DGIT_REV=""v1.1.4-227-g8c5681f"" -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/raptor2 -I/usr/include/rasqal -I/usr/include/raptor2 -I/usr/include/libxml2 pcre-config --cflags -g -O2 -o 4s-query 4s-query.o query.o results.o query-data.o query-datatypes.o query-cache.o filter.o filter-datatypes.o order.o group.o optimiser.o decimal.o ../common/lib4sintl.a ../common/libsort.a ../libs/mt19937-64/libmt64.a -lraptor2 -lrasqal -lraptor2 -lavahi-glib -lglib-2.0 -lavahi-common -lavahi-client -lncurses -lreadline -lglib-2.0 pcre-config --libs
libtool: link: gcc -std=gnu99 -fno-strict-aliasing -Wall -g -O2 -I./ -I../ -DGIT_REV="v1.1.4-227-g8c5681f" -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/raptor2 -I/usr/include/rasqal -I/usr/include/raptor2 -I/usr/include/libxml2 -g -O2 -o 4s-query 4s-query.o query.o results.o query-data.o query-datatypes.o query-cache.o filter.o filter-datatypes.o order.o group.o optimiser.o decimal.o ../common/lib4sintl.a ../common/libsort.a ../libs/mt19937-64/libmt64.a -lrasqal -lraptor2 -lavahi-glib -lavahi-common -lavahi-client -lncurses -lreadline -lglib-2.0 -lpcre
/usr/bin/ld: filter.o: undefined reference to symbol 'uuid_generate@@UUID_1.0'
/usr/bin/ld: note: 'uuid_generate@@UUID_1.0' is defined in DSO /usr/lib/libuuid.so.1 so try adding it to the linker command line
/usr/lib/libuuid.so.1: could not read symbols: Invalid operation
collect2: error: ld returned 1 exit status
make[3]: *** [4s-query] Error 1
make[3]: Leaving directory `/tmp/4store/src/frontend'

xsd int/integer

I noticed some weird xsd int/integer behavior in 4store. Given the following two RDF statements:

   @base <http://test.com/ints#> .
   @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    <integer> <pred> "99"^^xsd:integer  .
    <int> <pred> "99"^^xsd:int .

This query:

  SELECT * WHERE { ?s <http://test.com/pred> '99'^^xsd:integer . }

Returns correctly:

 ?s
 <http://test.com/integer>

This other query:

  SELECT * WHERE { ?s <http://test.com/pred> ?o .  FILTER (?o = 99) }

Returns

 ?s ?o
 <http://test.com/integer>  99
 <http://test.com/int>  "99"^^<http://www.w3.org/2001/XMLSchema#int>

Which is maybe wrong but not a big deal.

The following is the one that worries me a bit more ....

  SELECT * WHERE { ?s <http://test.com/pred> "99"^^xsd:int }

returns ...

 ?s 
 <http://test.com/integer>

In my opinion this is bug because it should return either both, if we are not strict, or only if we are strict.

For me, the ideal behavior from a practical point of view, it would be to consider xsd:int and xsd:integer as synonyms but I know that it is more complex than it sounds.

I have patched frontend/query.c in the branch https://github.com/garlik/4store/tree/xsd_int_integer

And at least the query

  SELECT * WHERE { ?s <http://test.com/pred> "99"^^xsd:int }

... does correct results.

See patch here 06d0346

I'll be adding a test and if you guys agree I'll merge with master.

Test failures on Arch Linux

Running 'make test' with git master, and no avahi service running, all the tests fail except for 'integrity'. With avahi running, only 'functions-hash' and 'simple-as' fail.
Should I worry about this?
In comparision, the release tarball of 1.1.4 fails on the test 'optional-no-lhs'

Versions:
Rasqal 0.9.29
Raptor 2.0.8
Avahi 0.6.31

An example gcc invocation during the build:
gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -g -std=gnu99 -I.. -DGIT_REV="\"v1.1.4-236-g340aa14\"" -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -march=native -mtune=native -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -MT passwd.o -MD -MP -MF .deps/passwd.Tpo -c -o passwd.o passwd.c

Gist of the config.log: https://gist.github.com/3008513

filter OR on predicates - wrong results

Queries like the one below tend to mistake conjunctive vs disjunctive behavior in query-backend.c. This has been reported in the support email list.

The following query with the foaf file in data returns only one row:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE { <mailto:[email protected]> ?p ?o . 
FILTER (?p = <http://xmlns.com/foaf/0.1/knows> || ?p = <http://xmlns.com/foaf/0.1/homepage>) }

?p  ?o
<http://xmlns.com/foaf/0.1/homepage>    <http://inanna.ecs.soton.ac.uk/>

right output is ...

<http://xmlns.com/foaf/0.1/homepage>    <http://inanna.ecs.soton.ac.uk/>
<http://xmlns.com/foaf/0.1/knows>   <local:stripes>
<http://xmlns.com/foaf/0.1/knows>   <local:nick>
<http://xmlns.com/foaf/0.1/knows>   <local:dajobe>
<http://xmlns.com/foaf/0.1/knows>   <local:libby>
<http://xmlns.com/foaf/0.1/knows>   <local:jo>

FILTER ( "2010-03-09"^^xsd:dateTime < "2010-03-10"^^xsd:dateTime ) is false

NEW FEATURE - ACLs for graphs

Access Control List (ACL) for graphs - early draft.

I need to have some sort of access control lists for graphs. The requirements for this are:

  • Graphs by default are public to all users.
  • Graphs that are part of an ACL are only accesible to users in the ACL.
  • Admin users can access both private and public graphs.
  • ACL can be modified as graphs are added/removed without having to restart KBs.
  • Users are identified by a HTTP parameter (apikey)
  • 4s doesn't need to do the authentication of the apikey. That is suppose to be done by a proxy.

Following this simple spec I have implemented a first version released here:
https://github.com/garlik/4store/tree/graph_acl

An example configuration file would look like:

[ADMIN]
admin_users=token_admin_1;token_admin_2
allow_no_apikey=no
blacklist_apikeys=token_rotten1;token_rotten2

[ACL]
http://example.org/alice=token_user_1;token_user_2
http://example.org/one=token_user_2;
http://example.org/bob=token_user_3;

This initial version gets the ACL list via a config file (KB level). Even though this fits my use case it might not be suitable for others, i.e: cases where there are thousands of graphs with long ACLs.

Implementation details

The apikey is received in the httpd process. A list of graphs that this apikey cannot access is obtained and passed to query-cache.c via fs_query.

In query-cache.c the function fs_bind_cache_wrapper is now a wrap for fs_bind_cache_wrapper_intl. The fs_rid ***results returned by fs_bind_cache_wrapper_intl are filtered according to q->inv_acl. inv_acl is a fs_rid_set that contains the graphs that are NOT accessible for the query. inv_acl stands for Inverse Access Control List.

The implementation works such that if inv_acl is empty or null (all graphs are readable) then there's no performance penalty. In case inv_acl has some elements then there's a bit of performance penalty. The cycles that takes to traverse results to remove discarded rows. There is no physical removal or memory reallocation of the results, it simply shifts rows and changes length of fs_rid_vector.

The implementation at this level makes it to make the most of the bind cache.

ACL modification via httpd

httpd ,via the path configacl, accepts POST requests to change ACLs. Only admin users can make this request. If the request is accepted then new configuration is persisted and reloaded. This is wrapped in a mutex config_mutex to avoid race conditions. The lock is held only when the config file is modified or the inverse ACL is processed.

This needs to be reimplemented at some point. Integration via 4s-admin commands would be neat.

TODO

  • Clean httpd.c: create separate files to handle all the logic of ACL. All in httpd.c is just confusing.
  • Tests.
  • fs_bind_cache_wrapper try to discard rows with just one loop.
  • acl-list.ini path is hardcoded. Ugly.
  • Move the configuration to /etc/4store.conf . Use KB- blocks.
  • 4s-admin commands to edit ACLs. Instead of HTTP POST with the full config file that is very ugly and potentially dangerous.
  • add apikey to the query log.
  • wildcard configuration in config file.
  • flag to enable/disable all the ACLs.

All ideas/comments to improve this feature are welcome.

Solution modfiers ignored in sub-selects.

The following returns ten rows from 4store whereas I would it expect it to return four. Redland's rasqal/roqet returns four rows.

select ?s ?p ?o where {
  select ?s ?p ?o where {
    ?s ?p ?o
    } limit 4
  } limit 10

Issue with INSERT WHERE

I am trying to insert values using INSERT WHERE. The query is as follows,

INSERT {
http://some.com#test2 rdf:type http://dbpedia.org/ontology/Floor .
?URI1257846444363706 http://dbpedia.org/ontology/TestFloor <http:// some.com #test2> .
<http:// some.com #test2> http://dbpedia.org/ontology/floorNo 'B11' .
}
WHERE {
?URI1257846444278864 rdf:type http://dbpedia.org/ontology/Campus .
?URI1257846444278864 http://dbpedia.org/ontology/hasCampusCode 'ABC' .
?URI1257846444363706 rdf:type http://dbpedia.org/ontology/Building .
?URI1257846444278864 http://dbpedia.org/ontology/Building ?URI1257846444363706 .
?URI1257846444363706 http://dbpedia.org/ontology/hasBuildingCode 'XYZ' .
}

When I run the above query the triples in INSERT are not inserted in the store (I tried retrieving the triples using a SELECT query, but it returns no results). I have checked all the triples in WHERE clause and all of them do exist in the store.

PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
SELECT * {
?URI1257846444278864 rdf:type http://dbpedia.org/ontology/Campus .
?URI1257846444278864 http://dbpedia.org/ontology/hasCampusCode 'ABC' .
?URI1257846444363706 rdf:type http://dbpedia.org/ontology/Building .
?URI1257846444278864 http://dbpedia.org/ontology/Building ?URI1257846444363706 .
?URI1257846444363706 http://dbpedia.org/ontology/hasBuildingCode 'XYZ' .

}

Results:

http://some.com/Ontology/2012.owl#ranfa1087b9-6cee-4433-a4d3-816e9b1af208 http://some.com/Ontology/2012.owl#ran1224548700931885

But the same INSERT WHERE with just one variable works fine,

INSERT {
http://some.com#test2 rdf:type http://dbpedia.org/ontology/Floor .
?URI1257846444363706 http://dbpedia.org/ontology/TestFloor <http:// some.com #test2> .
<http:// some.com #test2> http://dbpedia.org/ontology/floorNo 'B11' .
}
WHERE {
?URI1257846444363706 rdf:type http://dbpedia.org/ontology/Building .
?URI1257846444363706 http://dbpedia.org/ontology/hasBuildingCode 'XYZ' .
}

Is something wrong in the 1st INSERT WHERE ?

The google group thread for the above is here -> https://groups.google.com/group/4store-support/browse_thread/thread/94f3a1edd26ab5ec

log directory hardcoded

In src/http/httpd.c, the path to the log dir is hard-coded:

char *filename = g_strdup_printf("/var/log/4store/query-%s.log", kb_name);

Please make configurable, I have to patch this manually to install in $HOME.

DELETE with FILTER

Reported by Samuel Morello, http://groups.google.com/group/4store-support/browse_thread/thread/e46a0345a0ee82f5?hl=en_US

DELETE { ?s ?p ?o } WHERE { ?s ?p ?o . ?s <http://www.w3.org/2000/01/
rdf-schema#label> ?o . FILTER ( langMatches(lang(?o), 'FR') )}

This has no effect but the following delete all the triples with a
refs:label predicate ?
DELETE { ?s http://www.w3.org/2000/01/rdf-schema#label ?o } WHERE { ?
s http://www.w3.org/2000/01/rdf-schema#label ?o . FILTER
( langMatches(lang(?o), 'FR') )}```

invalid URIs when projecting var with an expression

A query that uses a expresion to change the var name (?s as ?term):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
SELECT DISTINCT (?s as ?term) ?synonym WHERE { 
   GRAPH <http://bioportal.bioontology.org/ontologies/NIF> {
       ?s <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> ?o .
       ?o rdfs:label ?synonym .
   }
} LIMIT 10

returns invalid URIs as first column:

"http://purl.obolibrary.org/obo/PR_000024163"@_:b9  "UPase"@EN
"http://purl.org/obo/owl/GO#GO_0042660"@_:b9    "up regulation of cell fate specification"@EN
"http://purl.obolibrary.org/obo/PR_000006312"@_:b9  "doublecortin-like and CAM kinase-like 3"@EN
"http://purl.obolibrary.org/obo/PR_000027234"@_:b7  "Proto-oncogene c-Src"@EN

Without using the expresion it works just fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.