eclipse-edc / connector Goto Github PK

View Code? Open in Web Editor NEW

263.0 263.0 222.0 45.43 MB

EDC core services including data plane and control plane

License: Apache License 2.0

Java 99.94% Dockerfile 0.01% Shell 0.05%

connector's People

Contributors

Stargazers

Watchers

Forkers

mercedes-benz paullatzelsperger ticapix denisneuling bmw-innovationlab mikhailgordienk mindsphere sap-contributions bscholtes1a securityforge fraunhofer-aisec oxisto mohitsaxenaknoldus suhpry agera-catenax fraunhoferisst ramunter izzzu marcgs dvasunin akelecevic drcgjung momo-essifi zoltan-kiss stoyanstatanasov agera-edc jamietallingbell amineessifi-bmw ppuetsch drandarov-io jgrabner-max saudkhan116 mkindermann algattik florianrusch-zf diegogomez-zf fabio-popp tuncaytunc-zf srnal ehj-52n catenax jbonofre chldglx jimmarino metaform eliasdio nitin-vavdiya catenax-ng alexandrudanciu julianlegler chrislomonico reisman234 ogelmez13 obaidulkabirrubayed dcandas schoenenberg cosmokonietzko iwasakims kkakui mswiente nexyo-io jurajkubelka matthiashub-porsche saiki-shunya anoojkrish91 cckmit maximilianschmidt-zf think-it-labs kvaithin trustrelay kashankhalid21 juliapampus cpeeyush egekorkan chainstep doemming boschresearch logicalfarhad daniarfadd chrisweissdt chricee wolf4ood git-masoud florianwege-iese skyroot huebl maciejkizlich-zf haseeb-xd jensmueller2709 white-hill fusiongalaxy redingjl marcingajek-zf domreuter agri-gaia sruthinsandagiri kaaniee fojerole thierry1405 fhennig

connector's Issues

Extension compatibility strategy

Looking at the DemoS3FlowController I wonder how it can be sure that required fields are in the DataAddress properties (ex. bucketName). I assume that providers and consumers can have different versions of the extensions. How does a consumer understand that it is supposed to send bucketName and object key separately and not in one URL?

Contract negotiation

Concept/Implementation for the negotiation using IDS-REST
Persist / Contract management (implement interfaces and (initially) a basic in-memory storage)

Improve Launcher concept

Move out the runtime code (e.g. ConnectorRuntime.java) to the core:bootstrap module, make it overridable
introduce bill-of-material (BOM) build files, much like the IDS BOM or the Ion BOM
the launcher's build.gradle.kts then just references BOMs resulting in a very simplistic basic launcher

The upside of that is that we don't need to dupllicate the *Runtime.java files and a launcher becomes just a Gradle build file.

[Eclipse] Legal Notice

A NOTICE file must be included, see
https://www.eclipse.org/projects/handbook/#legaldoc-repo
https://www.eclipse.org/projects/handbook/#legaldoc-notice

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

README.md is not current

The readme is still talking about microsoft/data-application-gx and not about eclipse-dataspace connector.

Restructuring/Refactoring of code base

The code base should be reorganised to improve structure, readability and modularity. An initial proposal and comments have been made, see Restructuring.md and Comments.md respectively.

Policy evaluation and rule enforcement

It looks like policy evaluation is currently done on the provider side and performed by checking JWT token claims against stored policy (please correct me if I am wrong here).
Could you please provide an example of such JWT tokens and an example of their evaluation against the corresponding policy?

What are mechanisms to enforce policy on the client side (ex. delete the data after some period)?

Is there a way to enforce the "One up, one down" case, to ensure that different consumers access different subsets of data?

[Build] Define Java version in Gradle

We should set a fixed java language version. Best would be to stick to latest LTS, i.e. java11.
Applies for gradle builds and docker file creation.

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Decouple status checking from provisioned resources

Currently StatusChecker takes a type of Provisioned resource:

boolean isComplete(T provisionedResource);

StatusCheckerRegistry also dispatches based on the ProvisionedResource type. This has the unintended side-effect of coupling transfer status checking to resource provisioning. For example, if an extender wants to implement a DataFlowController and supply a StatusChecker, they must also implement a ProvisionedResouce, a Provisioner and a ManifestGenerator.

For some cases, a StatusChecker will need access to a ProvisionedResource, for example, to check if data has arrived in a target bucket. However, it may be the case that only the TransferProcess id is necessary for A StatusChecker implementation.

To decouple status checking, I plan on changing the registry to dispatch based on DataAddress type and then pass the transfer process id and a collection of ProvisionedResources to the StatusChecker. For cases when no provisioned resources are created, the collection will be empty.

In addition to making life easier for extension implementors, this will have the other benefit of treating status checking the same for managed and unmanaged resources.

Data catalog seeding

Define the data asset model and fill the data catalog with data and metadata

IDS-Messaging

Implement DAPS Communication / Identity Management
Implement an extension that understands IDS-REST messages and responses accordingly
/map incoming & outgoing IDS-messages into internal asset index (internal cata-log)

[Design] Dataspace Connector Logo

I think having a logo helps the project a lot. We can use it in the documentation, so that the reader can recognize the project easier. It also helps the project to look more professional and makes it easier to sell our product to people, who are not involved yet.
Additionally the image can be used in various documents or presentations, which again, helps the project to look more professional.

I think getting a logo should not be that complicated. Maybe some of us know some designers in our companies or other talented employees which can help us out here. What do you think?

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Multiple extensions for the same feature

Will it be possible to support multiple extensions of the same type (ex. multiple Vault types)?

Integrate DID/SSI

Sovereign identities can foster a sovereign data exchange.
Beside the DAPS proposed by IDS today we'd like to prepare the EDC for integration of decentralized identifier / self-sovereign identity solutions.

API for Metadata management

How metadata records are supposed to get into MetadataStore? For now, there is no API for that, only some test values get stored during Nifi extension initialization.

HTTPS proxy config

My build server is behind firewall and needs to go through a Proxy server. gradlew -> gradle-wrapper.jar apparently does not have a way to specify a proxy server and build is stuck at:
Downloading https://services.gradle.org/distributions/gradle-7.1.1-bin.zip

Provide API specification

It would be helpful to have an API specification (ex. Swagger) for consumer/provider controllers.

[Eclipse] Include LICENSE and NOTICE file in every jar

Suggest to include those file in every jar, even if we just distribute those as part of the fat jar inside the docker container.
see https://www.eclipse.org/projects/handbook/#legaldoc-distribution

something like that inside build.gradle allprojects section should solve it:

tasks.jar {
            metaInf {
                from("${rootProject.projectDir.path}/LICENSE")
            }
        }

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Provide openAPI spec for REST endpoints

Would be nice to have something providing an API description either on the build stage or in runtime. For example, it could be a swagger plugin that adds an additional endpoint, like /swagger.json.

spi folder potentially contains classes/packages not directly involved in SPI loading

The directory/module "spi" should only contain interfaces (and the classes used to define these interfaces) that define the types of service provider interfaces (SPIs) which are loaded by java.util.ServiceLoader.
Currently these are:

MonitorExtension
ServiceExtension
VaultExtension

and their base interfaces.

The following packages contain interfaces which are not directly loaded by java.util.ServiceLoader, but by an extension/spi implementation of org.eclipse.dataspaceconnector.system.ServiceExtension:

policy (ServiceExtension)
transfer (ServiceExtension)
protocol (ServiceExtension)

Suggestion

separate plain SPI interfaces (that define functionality loaded by java.util.ServiceLoader) from service interfaces introduced and registered by implementations of org.eclipse.dataspaceconnector.spi.system.ServiceExtension.

_{Helmut Pfister [email protected], Daimler TSS GmbH, legal info/Impressum}

Improve folder structure

We will have multiple associated services in our repo next to the connector, e.g. registration service for DID functionality.
I suggest to update the folder structure in a way that makes it easier to understand, which code belongs to which service.
Proposal:

/docs                       overall docs
/common                     shared libs, like the service loading framework
/services                   all services that form the connector ecosystem
e.g.
/services/connector         the connector itself :-)
/services/did-registration  registration service that crawls ION network
/services/catalog           ...

each service can have the following subfolders

.../core                core business functionality of the service
.../launcher            one or more bundles / launchers
.../scripts             e.g. helm charts
.../extensions          extensions that can get bundled
.../interfaces (or spi) extension interfaces

What do you think?
_{Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum}

Rename client to consumer

Suggest to replace "client" by "consumer" in the extensions code as this term is used in IDS and Gaia-X documents.
Also feels more natural to have the pair "provider & consumer" instead of "provider & client".

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Standardize EDC Setting Keys

The EDC setting keys should be standardized and implement a namespace feature. Dot notation is used already, so we should probably reserve the first segment for that purpose. The issue is some settings use "dataspaceconnector" while others use "edc". For example:

public class CosmosTransferProcessStoreExtension implements ServiceExtension {

@EdcSetting
private static final String COSMOS_DBNAME_SETTING = "edc.cosmos.database.name";
@EdcSetting
private static final String COSMOS_PARTITION_KEY_SETTING = "dataspaceconnector.cosmos.partitionkey";
private static final String DEFAULT_PARTITION_KEY = "dataspaceconnector";
private static final String CONTAINER_NAME = "transferprocess";

I would vote for brevity, and hence "edc".

Missing Jackson annotation

Hello!

I believe there is a missing @JsonAnySetter annotation in GenericDataCatalogEntry.Builder, it should be as follows:

    @JsonPOJOBuilder(withPrefix = "")
    public static class Builder {
        private final GenericDataCatalogEntry lookup;

        private Builder() {
            lookup = new GenericDataCatalogEntry();
        }

        @JsonCreator
        public static Builder newInstance() {
            return new Builder();
        }

        @JsonAnySetter    // Otherwise, deserialization won't work
        public Builder property(String key, String value) {
            lookup.properties.put(key, value);
            return this;
        }

        public GenericDataCatalogEntry build() {
            return lookup;
        }
    }

Integrate Spiffe

Being able to control and protocol the usage of data requires identities for every kind of workload coming in touch with it.
SPIFFE/SPIRE could be technical solution. Let's use this issue to discuss how a potential integration would look like.

_{Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum}

Security token in request body

Why JWT tokens are passed in the request body and not in the header as regular bearer tokens?

Create Onboarding Experience

In order to make it easier to welcome new community members (devs, companies, ...) we should create an easy-to-understand onboarding experience.

I propose a multi-step tutorial:

build and operate your first connector (i.e. the launchers/basic launcher)
write your first extension e.g. write a simple HTTP /hello-world endpoint
implement your first file transfer process using the file-copy example created by @DominikPinsel
improve the file transfer to copy a file from Azure Blob Store to AWS S3 (optional)

Accompanying this I'll record an Onboarding Video (see the respective issue) explaining the different moving parts of the connector code base

Add CONTRIBUTING.md

A CONTRIBUTING.md file should be provided stating how potential contributors can help with the project.

_{Denis Neuling [email protected], Daimler TSS GmbH, legal info/Impressum}

Record and publish an onboarding video

In order to become more scalable a short video introduction to the different moving parts of the connector will be recorded.
Most notably this will involve the following areas not necessarily in that order:

basic folder structure
nomenclature (what is a "runtime", "connector", "launcher", "extension", etc.)
the ServiceLoader mechanism
core components such as Vault, Monitor, TransferProcessManager, Provisioner, DataFlowController, Policies, etc.
the role of IDS
why we try to avoid third party dependencies

We should publish that video to Youtube unless the Eclipse Foundation has special requirements. @mspiekermann can you follow up on this with EF?

tranport protocols and protocol negotiation

in case if multiple transport protocols are supported by connectors, how is a transport protocol for the transfer agreed upon between the communication parties? can consumer or producer put prios (i.e S3>FTP)? is the tranport protocol negotiation reflected in the APIs as of now? is there a must-have set of transport protocols which has to be implemented by any connector on the network to ensure communication (i.e. FTP as an ultimate fallback?)

Add various secrets to the repo settings

In order for the CI build to run through, we need to add the following repository secrets:

AZ_STORAGE_KEY: for tests against Azure Storage accounts
AZ_STORAGE_SAS: for tests against Azure Storage accounts
COSMOS_KEY: for tests against Azure CosmosDB
S3_ACCESS_KEY_ID: for tests involving an S3 Bucket
S3_SECRET_ACCESS_KEY: for tests involving an S3 Bucket
RUN_INTEGRATION_TEST: this is just true or false to determine, whether to run i-tests or not.

Note: only repo owners can add secrets.

Get AWS infrastructure for the project

[Eclipse] provide SECURITY.md

At the latest with our first release we should provide a SECURITY.md describing how vulnerabilities can be reported.

See:
https://www.eclipse.org/security/policy.php
https://gitlab.eclipse.org/eclipse/dash/org.eclipse.dash.handbook/-/issues/150

Example:
https://github.com/eclipse/rdf4j/blob/main/SECURITY.md

_{Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum}

Logging and Monitoring

Write an extension for basic logging and monitoring

StatusCheckerRegistryImpl should be in a subpackage

...maybe /provision?

@beardyinc

DataRequest managed resources

Why is the managedResources flag set to true by default? I can implement FlowController that just moves files from one existing S3 bucket to another, without creating additional resources, and the corresponding TransferProcess will still be marked as having managed resources.

Also, having that flag set during DataRequest instance creation means the decision about whether to provide additional resources is made independently of a particular FlowController implementation. As I see it (please correct me) the decision should be made by the FlowController implementation so the rest of the system remains unaware of what resources were created to perform the task.

[Eclipse] Copyright Header

We need to include a copyright header in every source file, see
https://www.eclipse.org/projects/handbook/#ip-copyright-headers

/********************************************************************************
 * Copyright (c) {year} {owner}[ and others] 
 *
 * This program and the accompanying materials are made available under the 
 * terms of the Apache License, Version 2.0 which is available at
 * https://www.apache.org/licenses/LICENSE-2.0
 *
 * SPDX-License-Identifier: Apache-2.0
 * 
 * Contributors: 4
 *   {name} - {description}
 ********************************************************************************/

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Itemize possible destination types

In DemoUiApiController I noticed the dataspaceconnector:s3 constant being used to configure destination type.
Shouldn't there be a complete list of such constants?

Protocol negotiation

In case a Provider does not support the requested type (ex. does not support S3) shouldn't there be any procedure for the Consumer to propose another protocol.

Data transfer

Implement data exchange (push data to HTTP-endpoint)
Trigger IDS

GAIA-X Hackathon: Task 4: Enhance IDS API and policies

Initial task:

Look at improving the IDS data sharing API (only partially implemented)

Discuss and improve extension-dependency naming scheme

Lets see if we can come up with a consistent and intuitive way to express dependencies (i.e. the requires() and provides()) that is consistent over all modules.

There are a few suggestions:

fully qualified classname. Can be problematic when one extension provides several features and when several modules provide the same feature.
a hierarchical namespace like edc:<feature>[:<subfeature>], e.g. edc:policy-registry or edc:communication:http-client

We should also think about defining constants for that in the interface class, e.g. in PolicyRegistry.java

Implementent Basic IDS Self-Description API of the Connector

This is the corresponding issue for the defined offer interfaces in the draft pull request #38

The connector should be able to answer IDS self-description requests for itself. It is not necessary to support description requests of artifacts and resources yet.

I also assume for the implementation

A new DescriptionMultipartController should be added to a new IDS Multipart API Extension.
Instead of using the official java library for the IDS Information Model its probably better if we implement the IDS classes, that are necessary for this implementation, ourselves in a new IDS Information Model in the IDS Core Extension.
A new Core Extension for Contracts must be added. This will be the place for our business logic regarding all contract topics (negotiation, offering, ...)
The AssetIndex will be implemented in a new SampleExtension. Its totally fine if it just works with some hard coded assets. Please check whether this extension needs to support the Catalog interfaces, too.
The interfaces from pull request #38 will be added to the SPI Extension

We at Daimler will start with the implementation today. It would be nice if a committer could create a new branch in this repository (e.g. feature/41-ids-self-description), so that we push our changes upstream.

Feel free to comment our suggestions from above. We appreciate any comments.

_{Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum}

Setup Teams/Slack channels

People are using the Issues list to ask for more detail on the project roadmap and implementation questions.

I propose we do three things:

Move design and roadmap discussions to a specific chat channel (Teams or Slack). We could repurpose the "General" Teams channel for this.
Create a separate channel for people to post questions.
Only use issues to file bug requests, feature requests, etc.

@mspiekermann, I assigned this to you so we can discuss it in the next committers meeting.

DCAT-AP

Hi,

Is it part of the roadmap to integrate DCAT-AP to increase semantic interoperability with existing European data catalogues ?

Best regards,
Pierre

Create Checkstyle configuration and Github Action

In order to ensure a consistent code style let's create a CheckStyle configuration that is enforced as a pre-PullRequest check on Github.

Why CheckStyle? Because it's open-source, widely used, highly configurable, has plugins for most Java IDEs, Gradle, Maven and also has a (actually several) Github Action(s).

Checkstyle is configured using an XML file, which will be committed here. The initial config will be based on Google's Style.

Once this issue is done, devs can simply reference the CheckStyle config via HTTP in their IDE.

PS: As convenience I will also create a SaveActions config, that reformats code such that it conforms to the Checkstyle configuration.

eclipse-edc / connector Goto Github PK

connector's People

Contributors

Stargazers

Watchers

Forkers

connector's Issues

Recommend Projects

Recommend Topics

Recommend Org