eclipse-edc / connector Goto Github PK
View Code? Open in Web Editor NEWEDC core services including data plane and control plane
License: Apache License 2.0
EDC core services including data plane and control plane
License: Apache License 2.0
Looking at the DemoS3FlowController I wonder how it can be sure that required fields are in the DataAddress properties (ex. bucketName). I assume that providers and consumers can have different versions of the extensions. How does a consumer understand that it is supposed to send bucketName and object key separately and not in one URL?
Move out the runtime code (e.g. ConnectorRuntime.java
) to the core:bootstrap
module, make it overridable
introduce bill-of-material (BOM) build files, much like the IDS BOM or the Ion BOM
the launcher's build.gradle.kts
then just references BOMs resulting in a very simplistic basic launcher
The upside of that is that we don't need to dupllicate the *Runtime.java
files and a launcher becomes just a Gradle build file.
A NOTICE file must be included, see
https://www.eclipse.org/projects/handbook/#legaldoc-repo
https://www.eclipse.org/projects/handbook/#legaldoc-notice
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
The readme is still talking about microsoft/data-application-gx and not about eclipse-dataspace connector.
The code base should be reorganised to improve structure, readability and modularity. An initial proposal and comments have been made, see Restructuring.md and Comments.md respectively.
It looks like policy evaluation is currently done on the provider side and performed by checking JWT token claims against stored policy (please correct me if I am wrong here).
Could you please provide an example of such JWT tokens and an example of their evaluation against the corresponding policy?
What are mechanisms to enforce policy on the client side (ex. delete the data after some period)?
Is there a way to enforce the "One up, one down" case, to ensure that different consumers access different subsets of data?
We should set a fixed java language version. Best would be to stick to latest LTS, i.e. java11.
Applies for gradle builds and docker file creation.
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
Currently StatusChecker takes a type of Provisioned resource:
boolean isComplete(T provisionedResource);
StatusCheckerRegistry also dispatches based on the ProvisionedResource type. This has the unintended side-effect of coupling transfer status checking to resource provisioning. For example, if an extender wants to implement a DataFlowController and supply a StatusChecker, they must also implement a ProvisionedResouce, a Provisioner and a ManifestGenerator.
For some cases, a StatusChecker will need access to a ProvisionedResource, for example, to check if data has arrived in a target bucket. However, it may be the case that only the TransferProcess id is necessary for A StatusChecker implementation.
To decouple status checking, I plan on changing the registry to dispatch based on DataAddress type and then pass the transfer process id and a collection of ProvisionedResources to the StatusChecker. For cases when no provisioned resources are created, the collection will be empty.
In addition to making life easier for extension implementors, this will have the other benefit of treating status checking the same for managed and unmanaged resources.
Define the data asset model and fill the data catalog with data and metadata
I think having a logo helps the project a lot. We can use it in the documentation, so that the reader can recognize the project easier. It also helps the project to look more professional and makes it easier to sell our product to people, who are not involved yet.
Additionally the image can be used in various documents or presentations, which again, helps the project to look more professional.
I think getting a logo should not be that complicated. Maybe some of us know some designers in our companies or other talented employees which can help us out here. What do you think?
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
Will it be possible to support multiple extensions of the same type (ex. multiple Vault types)?
Sovereign identities can foster a sovereign data exchange.
Beside the DAPS proposed by IDS today we'd like to prepare the EDC for integration of decentralized identifier / self-sovereign identity solutions.
How metadata records are supposed to get into MetadataStore? For now, there is no API for that, only some test values get stored during Nifi extension initialization.
My build server is behind firewall and needs to go through a Proxy server. gradlew -> gradle-wrapper.jar apparently does not have a way to specify a proxy server and build is stuck at:
Downloading https://services.gradle.org/distributions/gradle-7.1.1-bin.zip
It would be helpful to have an API specification (ex. Swagger) for consumer/provider controllers.
Suggest to include those file in every jar, even if we just distribute those as part of the fat jar inside the docker container.
see https://www.eclipse.org/projects/handbook/#legaldoc-distribution
something like that inside build.gradle allprojects section should solve it:
tasks.jar {
metaInf {
from("${rootProject.projectDir.path}/LICENSE")
}
}
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
Would be nice to have something providing an API description either on the build stage or in runtime. For example, it could be a swagger plugin that adds an additional endpoint, like /swagger.json
.
The directory/module "spi" should only contain interfaces (and the classes used to define these interfaces) that define the types of service provider interfaces (SPIs) which are loaded by java.util.ServiceLoader.
Currently these are:
and their base interfaces.
The following packages contain interfaces which are not directly loaded by java.util.ServiceLoader, but by an extension/spi implementation of org.eclipse.dataspaceconnector.system.ServiceExtension:
Suggestion
Helmut Pfister [email protected], Daimler TSS GmbH, legal info/Impressum
We will have multiple associated services in our repo next to the connector, e.g. registration service for DID functionality.
I suggest to update the folder structure in a way that makes it easier to understand, which code belongs to which service.
Proposal:
/docs overall docs
/common shared libs, like the service loading framework
/services all services that form the connector ecosystem
e.g.
/services/connector the connector itself :-)
/services/did-registration registration service that crawls ION network
/services/catalog ...
each service can have the following subfolders
.../core core business functionality of the service
.../launcher one or more bundles / launchers
.../scripts e.g. helm charts
.../extensions extensions that can get bundled
.../interfaces (or spi) extension interfaces
What do you think?
Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum
Suggest to replace "client" by "consumer" in the extensions code as this term is used in IDS and Gaia-X documents.
Also feels more natural to have the pair "provider & consumer" instead of "provider & client".
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
The EDC setting keys should be standardized and implement a namespace feature. Dot notation is used already, so we should probably reserve the first segment for that purpose. The issue is some settings use "dataspaceconnector" while others use "edc". For example:
public class CosmosTransferProcessStoreExtension implements ServiceExtension {
@EdcSetting
private static final String COSMOS_DBNAME_SETTING = "edc.cosmos.database.name";
@EdcSetting
private static final String COSMOS_PARTITION_KEY_SETTING = "dataspaceconnector.cosmos.partitionkey";
private static final String DEFAULT_PARTITION_KEY = "dataspaceconnector";
private static final String CONTAINER_NAME = "transferprocess";
I would vote for brevity, and hence "edc".
Hello!
I believe there is a missing @JsonAnySetter
annotation in GenericDataCatalogEntry.Builder, it should be as follows:
@JsonPOJOBuilder(withPrefix = "")
public static class Builder {
private final GenericDataCatalogEntry lookup;
private Builder() {
lookup = new GenericDataCatalogEntry();
}
@JsonCreator
public static Builder newInstance() {
return new Builder();
}
@JsonAnySetter // Otherwise, deserialization won't work
public Builder property(String key, String value) {
lookup.properties.put(key, value);
return this;
}
public GenericDataCatalogEntry build() {
return lookup;
}
}
Being able to control and protocol the usage of data requires identities for every kind of workload coming in touch with it.
SPIFFE/SPIRE could be technical solution. Let's use this issue to discuss how a potential integration would look like.
Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum
Why JWT tokens are passed in the request body and not in the header as regular bearer tokens?
In order to make it easier to welcome new community members (devs, companies, ...) we should create an easy-to-understand onboarding experience.
I propose a multi-step tutorial:
launchers/basic
launcher)/hello-world
endpointAccompanying this I'll record an Onboarding Video (see the respective issue) explaining the different moving parts of the connector code base
A CONTRIBUTING.md
file should be provided stating how potential contributors can help with the project.
Denis Neuling [email protected], Daimler TSS GmbH, legal info/Impressum
In order to become more scalable a short video introduction to the different moving parts of the connector will be recorded.
Most notably this will involve the following areas not necessarily in that order:
Vault
, Monitor
, TransferProcessManager
, Provisioner
, DataFlowController
, Policies, etc.We should publish that video to Youtube unless the Eclipse Foundation has special requirements. @mspiekermann can you follow up on this with EF?
in case if multiple transport protocols are supported by connectors, how is a transport protocol for the transfer agreed upon between the communication parties? can consumer or producer put prios (i.e S3>FTP)? is the tranport protocol negotiation reflected in the APIs as of now? is there a must-have set of transport protocols which has to be implemented by any connector on the network to ensure communication (i.e. FTP as an ultimate fallback?)
In order for the CI build to run through, we need to add the following repository secrets:
AZ_STORAGE_KEY
: for tests against Azure Storage accountsAZ_STORAGE_SAS
: for tests against Azure Storage accountsCOSMOS_KEY
: for tests against Azure CosmosDBS3_ACCESS_KEY_ID
: for tests involving an S3 BucketS3_SECRET_ACCESS_KEY
: for tests involving an S3 BucketRUN_INTEGRATION_TEST
: this is just true
or false
to determine, whether to run i-tests or not.Note: only repo owners can add secrets.
At the latest with our first release we should provide a SECURITY.md describing how vulnerabilities can be reported.
See:
https://www.eclipse.org/security/policy.php
https://gitlab.eclipse.org/eclipse/dash/org.eclipse.dash.handbook/-/issues/150
Example:
https://github.com/eclipse/rdf4j/blob/main/SECURITY.md
Moritz Keppler [email protected], Daimler TSS GmbH, legal info/Impressum
Write an extension for basic logging and monitoring
...maybe /provision?
Why is the managedResources
flag set to true by default? I can implement FlowController that just moves files from one existing S3 bucket to another, without creating additional resources, and the corresponding TransferProcess
will still be marked as having managed resources.
Also, having that flag set during DataRequest
instance creation means the decision about whether to provide additional resources is made independently of a particular FlowController implementation. As I see it (please correct me) the decision should be made by the FlowController implementation so the rest of the system remains unaware of what resources were created to perform the task.
We need to include a copyright header in every source file, see
https://www.eclipse.org/projects/handbook/#ip-copyright-headers
/********************************************************************************
* Copyright (c) {year} {owner}[ and others]
*
* This program and the accompanying materials are made available under the
* terms of the Apache License, Version 2.0 which is available at
* https://www.apache.org/licenses/LICENSE-2.0
*
* SPDX-License-Identifier: Apache-2.0
*
* Contributors: 4
* {name} - {description}
********************************************************************************/
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
In DemoUiApiController I noticed the dataspaceconnector:s3 constant being used to configure destination type.
Shouldn't there be a complete list of such constants?
In case a Provider does not support the requested type (ex. does not support S3) shouldn't there be any procedure for the Consumer to propose another protocol.
Initial task:
Lets see if we can come up with a consistent and intuitive way to express dependencies (i.e. the requires()
and provides()
) that is consistent over all modules.
There are a few suggestions:
edc:<feature>[:<subfeature>]
, e.g. edc:policy-registry
or edc:communication:http-client
We should also think about defining constants for that in the interface class, e.g. in PolicyRegistry.java
This is the corresponding issue for the defined offer interfaces in the draft pull request #38
The connector should be able to answer IDS self-description requests for itself. It is not necessary to support description requests of artifacts and resources yet.
I also assume for the implementation
We at Daimler will start with the implementation today. It would be nice if a committer could create a new branch in this repository (e.g. feature/41-ids-self-description), so that we push our changes upstream.
Feel free to comment our suggestions from above. We appreciate any comments.
Dominik Pinsel [email protected], Daimler TSS GmbH, legal info/Impressum
People are using the Issues list to ask for more detail on the project roadmap and implementation questions.
I propose we do three things:
Move design and roadmap discussions to a specific chat channel (Teams or Slack). We could repurpose the "General" Teams channel for this.
Create a separate channel for people to post questions.
Only use issues to file bug requests, feature requests, etc.
@mspiekermann, I assigned this to you so we can discuss it in the next committers meeting.
Hi,
Is it part of the roadmap to integrate DCAT-AP to increase semantic interoperability with existing European data catalogues ?
Best regards,
Pierre
In order to ensure a consistent code style let's create a CheckStyle configuration that is enforced as a pre-PullRequest check on Github.
Why CheckStyle? Because it's open-source, widely used, highly configurable, has plugins for most Java IDEs, Gradle, Maven and also has a (actually several) Github Action(s).
Checkstyle is configured using an XML file, which will be committed here. The initial config will be based on Google's Style.
Once this issue is done, devs can simply reference the CheckStyle config via HTTP in their IDE.
PS: As convenience I will also create a SaveActions config, that reformats code such that it conforms to the Checkstyle configuration.
Will this project be moved to eclipse organization on github?
Implement the framework as an extension. In a first step provide a contract that just allows everything as a response.
is a rough roadmap or backlog (jira?) available for the project?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.