ebay / akutan Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 108.0 3.29 MB

A distributed knowledge graph store

License: Apache License 2.0

Makefile 0.44% Dockerfile 0.13% Shell 0.20% Go 99.23%

go graph rdf sparql

akutan's People

Contributors

Stargazers

Watchers

Forkers

superfell ongardie sathish-io michaelbernstein isgasho dav009 tomzhang influx6 samuell ongardie-ebay sprinterzzj rogerspy watsonso mbrukman raykroeker hhy5277 almoslmi sujeetv lexsf reynoldsm88 zhouyonglong feifeiiiiiiiiiii th3architect keevol vicever maniacs-db shaunstanislauslau 0xflotus lifool lifeisstrange iamsingularity hmzzrcs kustomzone karthik-cbe tobym linuxerwang skymysky crakeyboy lotapp shammishailaj blueskychina adelowo b-xiang mewbak kxion awesomegolang boubou818 roberthorlings bonedaddy chenrui2014 shanba carrewei hannson gaohuan2015 wh-gd ethanlovequeen gavinljj daniel-007 distributed-systems-hub m00zh33 andydodo backwardn ai-learn-use forkkit asanchez75 xyuan volodymyrss ming-fork youlei5898y lanxingmo zzszmyf cesar456 lzpfmh linus5 lambert764 databill86 db4u zzti jnkhunter joshgay lxngoddess5321 amarjitghuman isabella232 huangweiboy2 linsicai dilshan23 521hellogithub socioprophet vishalbelsare sunatthegilddotcom ramch22 b-street worldup matteo-grella demonoid81 bearerpipelinetest aahmadai wmudge logicaltrojan jnbdz

akutan's Issues

Remove old queryFacts and insertFacts paths

There's a large amount of cruft in the API server to support the old way of doing queries and inserts. It might also be confusing to newcomers.

Adopts sparql's literal value format

The current format for literals is based on sparql, but is not sparql. It uses slightly different format so that the underlying concrete type can be set. (e.g. 10 instead of "10"^^xsd:int). Instead it should take the type from the type specifier, and allow some way for custom schemas to map their types to the xsd types, so you can do "10"^^uom:inch and still get a int64 value.

Can't see how to resolve views.proto for Ubuntu 18.04

Very excited to try this....

However, I can't seem to get past some of the dependencies for Ubuntu 18.04

not sure where to resolve "views.proto"

go install vendor/golang.org/x/tools/cmd/goimports
go install vendor/honnef.co/go/tools/cmd/staticcheck
PATH=/home/fils/src/git/beam/bin:/home/fils/.cargo/bin:/home/fils/bin:/home/fils/.cargo/bin:/home/fils/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin:/usr/local/go/bin:/usr/local/go/bin:/home/fils/src/go/bin:/home/fils/.cargo/bin:/home/fils/src/flutter/bin:/home/fils/.local/bin:/usr/local/android-studio/bin:/usr/lib/dart/bin:/home/fils/.pub-cache/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin:/usr/local/go/bin:/usr/local/go/bin:/home/fils/src/go/bin:/home/fils/.cargo/bin:/home/fils/src/flutter/bin:/home/fils/.local/bin:/usr/local/android-studio/bin:/usr/lib/dart/bin:/home/fils/.pub-cache/bin protoc --gogoslick_out=plugins=grpc:src/github.com/ebay/beam/rpc -Isrc:src/vendor:src/github.com/ebay/beam/rpc views.proto
views.proto: No such file or directory
Makefile:54: recipe for target 'src/github.com/ebay/beam/rpc/views.pb.go' failed
make: *** [src/github.com/ebay/beam/rpc/views.pb.go] Error 1

Log prefix truncation

When using the logspec client, the cluster needs to regularly calculate a safe truncation point, and tell the log store to truncate the prefix of the log to that index.

The API server can consult with all the views to determine the smallest safe point across the cluster. Either the API tier can then issue the delete, or for more complex deployments where a single log is used across multiple DCs, and each DC has its own beam cluster, you'd probably want this truncation to be managed by a separate control plane.

Adopt RDF/Sparql's entity format

Entities are specified by using <entity> or prefix:entity. When a prefix is used, no URI is associated with it, and the prefix itself is used for sorting etc. This should be updated to match Sparql, to allow URIs to be associated with prefixes, and to handle sorting correctly. How this ends up encoded in the eventual KV store key needs some thought as well.

Hard to use beam, need more details

i must say it's hard to use it, more documents about beam are strongly recommended for users.

for this point, dgraph is much better. Cause i know how to deploy my own distributed dgraph system, althought it's not faster to load data through grpc interface, but i think beam has the same problems.

what's more, i can not even find a way to load large rdf files, that's too bad.

use of vendored package not allowed

Hi,
while building akutan with make build facing following error.

go install github.com/ebay/akutan/...
src/vendor/golang.org/x/net/http2/frame.go:17:2: use of vendored package not allowed
src/vendor/google.golang.org/grpc/internal/transport/controlbuf.go:28:2: use of vendored package not allowed
src/vendor/golang.org/x/net/http2/transport.go:33:2: use of vendored package not allowed
/usr/local/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:15:2: use of vendored package not allowed
/usr/local/go/src/vendor/golang.org/x/net/idna/idna10.0.0.go:27:2: use of vendored package not allowed
src/vendor/golang.org/x/net/http2/frame.go:18:2: use of vendored package not allowed
make: *** [Makefile:89: build] Error 1

any help appreciated.

Is there a command line tool to import data?

My rdf file is 60+GB, using insert grpc is gonna take long time, is there a command line tool for this?

Logo?

I have added Beam to my encyclopedia of databases:

https://dbdb.io/db/beam

Do you have a logo that I can include? Thanks!

-- Andy

prod deployment docs

i am still confused about how to deploy my own cluster beams, no time to think about the project structure about it. Anyone can tell me how to deploy prod beam server to use. Any Docs?

Originally posted by @shanghai-Jerry in #33 (comment)

run "make run" error

run "bin/plank" shows “plank server started at localhost:20011”。However，run “make run” error：
22:23:03 txview-00 | WARN[2019-05-08 14:23:03.055084 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(Log).Read() Retrying RPC=Read error="rpc error: code = Canceled desc = grpc: the client connection is closing" server="tcp://node24:20011"
22:23:03 txview-00 | INFO[2019-05-08 14:23:03.055528 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:455 github.com/ebay/beam/blog/logspecclient.(Log).connectAnyLocked.func1() Logspec client connecting to server="tcp://node24:20011"
22:23:10 hashsp-00 | WARN[2019-05-08 14:23:10.884702 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(Log).Read() Retrying RPC=Read error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 218.93.250.18:20011: i/o timeout"" server="tcp://node24:20011"
22:23:10 hashsp-00 | INFO[2019-05-08 14:23:10.885279 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:455 github.com/ebay/beam/blog/logspecclient.(Log).connectAnyLocked.func1() Logspec client connecting to server="tcp://node24:20011"
22:23:10 hashsp-01 | WARN[2019-05-08 14:23:10.907590 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(*Log).Read() Retrying RPC=Read error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 218.93.250.18:20011: i/o timeout"" server="tcp://node24:20011"

How should I do

is akutan is still under active development and mainetnance ?

Hi there,

I want to know whether eBay/akutan is under active development or maintenance ?

thanks

local model

Can all functions be used in local mode?

don't assume all predicates are transitive.

Currently the query engine assumes that all predicates are transitive, unless it knows the target object is a literal. This is a pretty expensive default. It would be better to only treat predicates explicitly declared as transitive as transitive. the owl:TransitiveProperty predicate seems like the best thing to use to indicate that.
The query rewriter could be updated to fetch this property for all the predicates used in the query, and then pass this info along with the rest of the query structure.

Implement backup/restore

One could imagine using a carousel (see carousel client tool) for the heavy lifting of a backup mechanism, but what would the restore process look like?

logspec: redirect needs scope

When the server returns a redirect reply, it's not currently defined whether that's supposed to affect the current request type, all request types, or some subset. This could get us into trouble. For example, consider a client that was issuing Appends() as well as a Read() for an early prefix of the log. If the servers redirected Appends to a leader but old reads to a follower, the client could be bounced back and forth.

db-scan -keys no longer works

If you run db-scan with -keys, it doesn't pretty print the keys any more.

Improve KGObject encoding

KGObject's encoding is inherited from the earlier prototypes, where debugging was more important than performance or space used. There are a number of fields in the key that are encoded as a 19 character ascii number, rather than as 8 byte binary value.

Does this support deleting edges based on the type of edge?

KGObjects encoding of strings needs a null terminator

When a literal string is encoded into a KGObject value, there's a separator between the end of the string and the language ID, but as the separator is not 0x00 this throws off sorting. (for example "Bob" & "Bob's house" aren't ordered correctly). The separator should be changed to be a null instead. This separator is not used to determine the length of the string, so there's no escaping considerations to be concerned about.

Using beam as a library, or at least allow imporing its packages

Unless I'm missing something (a canonical package name maybe?), beam's structure makes it really really hard to go get any of its packages.

Ideally I'd be really interested in being able to use beam as a library in another service, skipping the gpc server part, clustering, and optionally even rocksdb.

But even that's not possible, being able to import and use util/grpc/client to connect to a beam server would be a big improvement to having to re-generate the grpc stuff in the client service.

Would there be any chances of getting any of these or accepting PRs for them or other ways of getting to the end goal (other than keeping our own forks)?

Move the beam packages to the top level
^ or introduce a canonical url we can import that points to the nested directory.
Get the protoc, genny, and other generated files commited.
Replace the custom dep tool with go mod (sarama needs to use the new package, and cheggaaa/pb needs to have their mod fix pr merged and the new version imported and everything else seems fine).

ps This is an amazing project and bql feels very nice and easy to use, thank you for releasing this publicly. :)

SPARQL 1.2 WC3 Community Group

Hi everyone,

Great to see your work with beam, looks very interesting and we are really interested in the discussions you are having about aligning with RDF/SPARQL. I'm referring to this document for example. I think you might be a valuable contributor to a W3C Community Group we just recently launched after a meetup in Berlin in February. It's about defining what could become SPARQL 1.2 and later maybe 2.0.

This is a W3C Community Group, which is much less formal and more open than a formal W3C group to release a standard. The ones driving it right now are all people that work with SPARQL for many years and some of them implement it as well, so it is very hands-on.

We are interested in making SPARQL easier to use and add stuff many of us are missing right now. Having people like you involved sounds like a great extension to a possible new standard in the future.

Feel free to close this issue immediately, I just wanted to make sure that you are aware of what is going on there.

You can find a list of collected ideas so far in the GitHub repository.

Log service

There should be a server implementation of the logspec service. Beam was moving from using Kafka to using this abstract log service definition for its log. The Kafka client needs some work for production usage.

How to delete exist Beam's Disk Views before restarting the cluster

You'll need to clear out Beam's Disk Views' data before restarting the cluster.

Implement delete

The Update RFC details an approach for delete, but this is not currently implemented.

change optional match syntax in query

Query supports an optional match operator, however the syntax is subtle (a trailing ? on the predicate) and the part of the query that is optional is determined by that query lines subject & object variables. This gets more confusing if there are multiple optional matches. This should be moved to Sparqls optional match format where its much clearer both that there is an optional match, and what the optional match is made up of.

godoc.org import paths broken

Beam requires a GOPATH set to the root of the repo, but godoc.org doesn't seem to get that.

We have to use https://godoc.org/github.com/eBay/beam/src/github.com/ebay/beam to get to the docs, rather than https://godoc.org/github.com/eBay/beam.
facts, msg, tools, and util are directories that don't contain any go files, and godoc.org doesn't appear to discover them on its own. We might be able to work around this by adding doc.go files to those directories.
Our import paths on godoc.org are all broken, so links between package types don't work. I don't know what we can do about that without moving away from GOPATH.

Control Plane

There's no control plane. Given the architecture, small or test clusters can usually be managed by hand, but ideally there's a control plane that can manage adding / removing views, deployments, backup, etc. The Control Plane document discusses this in more detail.