trellis-ldp / trellis Goto Github PK

View Code? Open in Web Editor NEW

104.0 9.0 21.0 9.75 MB

Trellis is a platform for building scalable Linked Data applications

Home Page: https://www.trellisldp.org

License: Apache License 2.0

Shell 0.07% Java 99.78% Mustache 0.15%

ldp rdf linked-data trellisldp rest-api microprofile

trellis's Introduction

Trellis Linked Data Server

A scalable platform for building linked data applications.

Trellis is a rock-solid, enterprise-ready linked data server. The quickest way to get started with Trellis is to use a pre-built docker container.

Trellis is built on existing Web standards. It is modular, extensible and fast.

All source code is open source and licensed as Apache 2. Contributions are always welcome.

Docker Containers

Docker containers for Trellis are published on Docker Hub. Container environments are published with every commit to the main branch and are available for all stable releases. More details are available on the Trellis Wiki.

Docker pull command

docker pull trellisldp/trellis-triplestore

Or, for the PostgreSQL-based persistence layer

docker pull trellisldp/trellis-postgresql

Building Trellis

In most cases, you won't need to compile Trellis. Released components are available on Maven Central, and the deployable application can be downloaded directly from the Trellis website. However, if you want to build the latest snapshot, you will need, at the very least, to have Java 11+ available. The software can be built with Maven using this command:

./mvnw install

Related projects

py-ldnlib A Python3 library for linked data notifications
static-ldp A PHP application that serves static files as LDP resources
camel-ldp-recipes Integration workflows built with Apache Camel

trellis's People

Contributors

Stargazers

Watchers

Forkers

wrt2dc ajs6f christopher-johnson mjgiarlo arkhovansky rodant bseeger acoburn gregjan mikeadams1 gatos-jd coffeygit suvendupatra elikkatzgit codacy-badger bencomp bendevjunior rpatil524 hzbarcea

trellis's Issues

Migrate the various messaging clients into the core trellis repository

This includes trellis-ldp/trellis-jms, trellis-ldp/trellis-amqp and trellis-ldp/trellis-kafka

Add an OSGi-based deployment option

It should be possible to deploy the Trellis triplestore-based implementation entirely in an OSGi container. This would involve writing up an OSGi-based wiring (e.g. Blueprint) and including some PaxExam tests.

Ideally, this will end up as a deployable kar file that one can just drop into an existing Karaf instance.

Add a Resource::isDeleted method

After a Resource is deleted, subsequent requests for the resource can result in either a 404 Not Found or a 410 Gone response, depending on what the ResourceService implementation supports. The 404 Not Found case is simple: a ResourceService::get request returns an empty Optional. However, for the 410 Gone case, the code currently does some convolutions that involve checking for <> a ldp:Resource, trellis:DeletedResource triples. While this works, it is not a very clean design. It would be much better for the Resource interface to just include a method such as:

default Boolean isDeleted() {
    return false;
}

This way, if a ResourceService implementation has some special rules around deleted resources, those rules can be restricted to that part of the code (as opposed to leaking into the HTTP layer, as it does now).

Consolidate karaf features files

Create a new module: trellis-karaf containing the consolidated features.xml files from each subproject.

Add CDI support to HTTP (and other) classes

CDI offers a standard J2EE-based dependency injection framework that could greatly simplify object creation in Trellis.

Move buildtools into this project

The build tools are currently part of an external project. It would be more convenient if they were part of this project.

`ResourceService::getIdentifierSupplier` vs. `IdentifierService`

ResourceService features Supplier<String> getIdentifierSupplier(), but IdentifierService features Supplier<String> getSupplier() and two other similar methods.

Is this an oversight in some sense, or are the semantics of the two distinct? If we can explicitly lay out those semantics, I'll enrich the Javadocs appropriately.

Deleting an ACL resource should actually be a replace operation

An HTTP DELETE operation on an ACL resource issues a delete event for the resource, but this should really be an update event, since the resource itself hasn't been deleted.

Add support for asynchronous HTTP

Now that o.t.api.ResourceService::create returns a Future, it should be possible to enable asynchronous HTTP write operations. At the very least, using AysyncResponse could help speed things up.

Rework Memento-related interfaces

The current Resource interface contains a method for retrieving mementos: getMementos. While this can certainly work (it works fine for the kafka-based implementation), it means that the resource subsystem is tied to the versioning subsystem, which may not necessarily be the case. At least, I can think of cases where the two could easily be separated. As such, I would like to propose removing the Resource::getMementos method, and adding something similar to the ResourceService interface.

Trellis Prefix

The TRELLIS_PREFIX value is currently set as trellis:. This leads to root containers with only that value. While Jena allows this as a valid IRI, not all commons-rdf implementations do. It would be much better to start using trellis:data as a root container. This will likely involve removing the TRELLIS_PREFIX constant and adding two new constants:

TRELLIS_SCHEME = "trellis:"
TRELLIS_DATA = TRELLIS_SCHEME + "data"

Ignore server-managed triples on PUT and POST

The LDP specification allows servers to ignore server-managed triples on PUT and POST. It would be convenient for Trellis to ignore ldp:contains triples on mutating requests, which would make it possible to retrieve a container resource, make changes locally and then PUT that resource back to the server (and to do so without having to filter out ldp:contains triples).

Add support for WebSub

WebSub is a W3C recommendation and it would be very easy to add support for this to Trellis (as a "publisher"). This would involve generating an additional Link header in responses.

The WebSub specification requires two link headers in responses from publishers, (e.g. for the resource /container/resource and the hub at /pubsubhub/url):

Link: <https://example.com/pubsubhub/url>; rel="hub"
Link: <https://example.com/container/resource>; rel="self"

It would likely be easiest to add this as a JAX-RS filter, which would keep it entirely separate from all of the existing implementation code. The implementation would likely be quite similar to the current CacheControlFilter -- the constructor would accept the location of the WebSub hub and that would be used to build the header in the filter method.

The second part of this (which could be part of a second ticket) would be to use a JAX-RS http client to send the WebSub hub notifications of the resource changes. These notification will take the form of a POST operation with the form values: hub.mode="publish" and hub.url=(the URL of the resource that was updated)

Support random access to Binary resources via the BinaryService

The BinaryService interface offers a single method for fetching resources:

Optional<InputStream> get(IRI);

In the case where a client requests a partial resource (via Range header requests), the HTTP layer currently requests the entire Binary as an InputStream, and it then drops any non-relevant segments. While this works, it is not efficient for very, very large binaries which may be partitioned into multiple blocks across different storage locations. It would be much more flexible to extend this method into something like:

Optional<InputStream> get(IRI, Range...);

Where Range could be a type such as a pair of Integers. Or perhaps there is an appropriate built-in type or something from a Commons library. Or it may be necessary to add a new type to the Trellis API (or better: generalize an existing type, such as o.t.api.VersionRange).

I am suggesting here a variable number of Range objects since the HTTP specification for range requests supports the possibility of multiple ranges.

Superclass for `*HandlerTest`?

There are *HandlerTests for all the various PutHandler, DeleteHandler, etc. I suspect (might be wrong!) that we could abstract over them and pull up some of the tests using some generics tricks, and this issue is a note to self to try it.

Add support for acl:origin

https://www.w3.org/wiki/WebAccessControl#Cors_User_Agents

Add Authorization tests

There are a lot of unit tests for authorization workflows, but there should also be end-to-end testing of WebAC and JWT authorization.

It appears that the security context is not being set when presented with a valid JWT token. See: https://groups.google.com/d/msg/trellis-ldp/3NCF4vIt788/HbGwIl8oAQAJ

Add a WebSocket-based notification system

The SOLID specification recommends using WebSockets for notifications, and I think this would be a good way to provide an embedded message broker for Trellis.

There are a few outstanding questions to resolve w/r/t implementation. First, SOLID suggests making websocket endpoints available on each resource. This seems like a very interesting idea, as the notifications can be scoped to a particular resource or container of resources.

If a websocket is initiated at resource /foo, should notifications from all contained resources be made available (i.e. recursively?) or just those that that relate to that resource/container?
If 1. is answered in the negative, is there a resource location where all notifications can be accessed?

Add a deployable application to the Trellis repository

This application would make use of the various service implementations in the repository.

Add session information to mutating ResourceService methods

The mutating ResourceService methods (::create, ::replace and ::delete) will cause events to be emitted. Before the recent refactor in #31, this data had been available in the Audit-related triples, but now that data is only included in the ::add method. By including a Session value (or similar) in the mutating methods, it would be possible to get access to these data.

This would also be an opportunity to remove the use of the default graph in these methods, the purpose of which relates only to providing data to the event producer. This likely will require adding a field (e.g. baseURL) to the Session type. This will also lead to some cleaner code in the HTTP layer.

Upgrade to latest commons-rdf

Commons-RDF 0.5.0 is currently being voted on. Once released, the trellis code should be updated to use this. With commons-rdf-api/0.5.0, it will be possible to start offering OSGi deployment support in Trellis.

HTTP/2 over TLS support requires ALPN dependency

In order to make trellis-app with dropwizard's h2 support configuration in JDK 9 or JDK 10, I added a dependency:

compile group: 'org.eclipse.jetty', name: 'jetty-alpn-java-server', version: '9.4.8.v20171121'

This version is built for JDK 9, but it works on JDK 10.

Add support for JPMS

I am experimenting with supporting the module system with trellis. I will use this as a tracking task.
One can view the progress here

Here is what I have discovered so far:

Error occurred during initialization of boot layer java.lang.module.FindException: Unable to derive module descriptor for /home/christopher/.gradle/caches/modules-2/files-2.1/org.apache.geronimo.specs/geronimo-annotation_1.2_spec/1.0-alpha-1/804747c40f1145ae9cc13cb9e927fca82e6e3c1b/geronimo-annotation_1.2_spec-1.0-alpha-1.jar Caused by: java.lang.IllegalArgumentException: geronimo.annotation.1.2.spec: Invalid module name: '1' is not a Java identifier
This seems to be because of an "illegal" artifactID ("geronimo-annotation_1.2_spec" has a "."). This is a dependency of apache tamaya. The module name is derived from the jar if there is not an automatic module name in the manifest. Not sure how to move forward with this...but it is a problem for all geronimo specs...maybe file an issue upstream?
gradle jar tasks seem to break module-info resolution. Removing them allows the build to proceed.
the servicemix bundle wrapper for javax.inject does not resolve as a module. Quick fix is to add javax.inject as a dependency and add it as a requirement instead. Not clear what this will do in OSGI.

AS message produced without provenence / type

I am working on a new camel-kafka-elasticsearch integration and have noticed a possible issue related with the triplestore resource service event implementation. For some reason (as yet unknown) the event type (e.g. https://www.w3.org/ns/activitystreams#Create) is empty.

Here is an example AS message sourced from Kafka for reference:

{
  "@context" : "https://www.w3.org/ns/activitystreams",
  "id" : "urn:uuid:525573b5-f20f-489d-99dd-33668d3534bb",
  "type" : [ ],
  "object" : {
    "id" : "http://trellis:8080/ldp-test-6b56d629-3150-4ff5-8f69-1efa672f60fb",
    "type" : [ "http://www.w3.org/ns/ldp#RDFSource", "http://www.w3.org/ns/oa#TimeState", "http://www.w3.org/ns/activitystreams#Application", "http://xmlns.com/foaf/0.1/Person", "http://www.w3.org/ns/oa#Choice", "http://www.w3.org/ns/oa#HttpRequestState", "http://purl.org/dc/dcmitype/Sound", "http://www.w3.org/ns/oa#SpecificResource", "http://www.w3.org/ns/oa#TextualBody", "http://www.w3.org/ns/oa#Annotation", "http://www.w3.org/ns/oa#TextPositionSelector", "http://www.w3.org/ns/oa#FragmentSelector", "http://www.w3.org/ns/oa#CssStyle" ]
  },
  "published" : "2018-03-07T07:15:42.159590Z"
}

The target types are populated. The triple is created. Could have a simple explanation, I will keep looking at it.

Clarify semantics of Future<Boolean> in ResourceService responses

Mutating requests in the ResourceService respond with a Future<Boolean>. The HTTP layer currently handles these responses in the following way:

Create:

Future<true> => 201 Created
Future<false> => 500 Server Error
RuntimeException => 500 Server Error

Replace/Delete:

Future<true> => 204 No Content
Future<false> => 500 Server Error
RuntimeException => 500 Server Error

The Future<true> and RuntimeException cases seem correct, but it is unclear to me whether a Future<false> ought to return a 5xx error. Perhaps a 4xx error would be more appropriate. If so, does a Future<false> indicate some form of conflict (e.g. 409) or is it a generic 400 error?

Replace VersionRange with commons lang equivalent

Rather than defining our own VersionRange type, it would be far better to use a Range<Instant> type from commons-lang.

Related to #16

Response headers in Create-on-PUT

At present, it is possible to create resources with HTTP PUT. As with all HTTP PUT operations in Trellis, the response code is 204 No Content, but for resource creation operations, it seems that responding with 201 Created would be more accurate.

Also, if the response is 201 Created, the Content-Location: header (with the value of the resource location) ought to be included.

building membership and containment messages with TriplestoreResourceService?

From what I can gather, the TripleStoreResourceService is publishing notifications to a single configured topic (default "trellis"). This is different than the FileResourceService that has an EventProducer to build containment and membership messages that are published to distinct topics read by the async processor.

Not clear to me with TripleStoreResourceService how this works. In brief evaluation today, there does not seem to be a "built-in" mechanism to do this in trellis-app yet. Is this accurate? Can you explain the design intent for this briefly? Thanks!

Move the functional tests from trellis-app to trellis-test

The functional tests could more easily be reused across projects if they are refactored into abstract classes in the trellis-test module.

Simplify constraint service interface

The current constraint service interface accepts a baseUrl in the ::constrainedBy method, but in practice this is just the internal data IRI. That is, this argument could be removed without changing how the service functions.

Security principal is set before authentication filters are run

The WebAC and Agent-related JAX-RS filters are annotated with @PreMatching and so they are run before a security principal is set through the authentication filters. These annotations should be removed.

Related to #48

Add versioning support to triplestore-based implementation

The triplestore-based resource service does not currently support versioning, but it should.

Semantics of `Future<Boolean>` for persistence

If a persist call (that returns Future<Boolean>) is completed unexpectedly (in my example in hand, an InterruptedException which does not indicate failure of persistence, but failure of the thread monitoring persistence, since the actual persistence is happening elsewhere on the network) should true or false result?

IOW, if the backend cannot give us definite information about the completion, what should it report forward?

I don't want to get into Future<Optional<Boolean>> weirdness. But maybe an enum Completion {Success, Failure, Unknown} or the like and return Future<Completion>? Or maybe there's a better way to say the same thing within the Java concurrency APIs…

trellis-binary directory is empty

it looks like trellis-binary did not get copied into the new repository structure.

Add coveralls reporting

Use this plugin https://github.com/kt3k/coveralls-gradle-plugin, noting the required configuration for multi-module builds.

`MutableDataService::create` vs `::replace`

If ::replace is called with an identifier for a resource that does not already exist, should ::replace return without trying to mutate anything (and presumably with a false-valued Future)?

I think so, but wanted to make sure.

Reconsider Commons RDF

I'd like to understand the value being added to Trellis by the use of Commons RDF. Using it introduces a huge number of short-lived objects, the project isn't very responsive, and the only implementation that we actually use so far is commons-rdf-jena. Is the expectation that people writing new modules for Trellis might want to use RDF4J or some more obscure RDF framework?

Add support for ldp:PreferMinimalContainer

This Prefer header is not currently supported, but it ought to be.

Add pax-exam testing

Pax Exam should be used to test the Karaf features.

Rename Exception class

The current RuntimeRepositoryException should be renamed to RuntimeTrellisException.

audit info inclusion

Does Trellis take the stance of adding audit info to a response unless told otherwise, or only adding it if told to (by a Prefer header)? I know I could trace out the code and see what it currently does, but I want to make sure I understand why.

Create a simple "reference implementation"

The main purpose of this would be to exercise the interfaces and service layers in the context of a simple, single-node application. Much of the kafka-based implementation could be re-used here (i.e. file-based persistence but without zk and kafka).

Support multiple range request segments

The HTTP specification on range requests allows for multiple, non-contiguous segments in a request. E.g.

Range: bytes=1-100,301-400

The Trellis HTTP layer currently only supports a single range (i.e. in the header parsing logic) though the underlying APIs could support an arbitrary number of ranges. It could be useful to support range requests that include multiple ranges.

LDP-NR always returns RDF after PATCH

After PATCHing the description of an LDP-NR, the responses are always RDF.

Add documentation for subprojects

The various subprojects have little to no documentation in their respective README files.

Clean up app configuration

Remove unnecessary hierarchy and add structure where needed (json-ld section).

Relax constraints on rdf:type

At present, the constraints module forbids setting any rdf:type triples where the type is in the LDP type domain. The idea here was to encourage clients to use Link headers for setting the resource types.

However, this seems to be rather heavy-handed. I would rather remove any sort of "type restrictions"; that is, users should be able to put whatever RDF in a resource they want to, even if that RDF makes no sense -- it shouldn't be up to Trellis to enforce such things.

That is, if a client wants to create a ldp:BasicContainer resource (via link headers) that contains the triple <> rdf:type ldp:DirectContainer, so be it. The LDP spec is clear on this point: the Link header always wins.

A middle position would be to allow setting rdf:type triples in the LDP domain but only if that type is the same as the resource's interaction model or some subtype thereof. My counter-argument to that is just that doing so will add more complexity to the code and I really don't want Trellis to be in the business of policing rdf:type semantics.