Code Monkey home page Code Monkey logo

data-mesh-pattern's People

Contributors

avinashsingh77 avatar caldeirav avatar eformat avatar taneem-ibrahim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

data-mesh-pattern's Issues

๐Ÿ› [bug] - Issue with fybrik-dev installing opa et.al

๐Ÿ“ Description

[... of the issue you're seeing in the content / tech demo exercises]
During the Supply chain Builds step it has us create our argocd app of apps.

Problem is that when we run it
(

)

we get an error with the security context not matching any constraints:

    ```pods "opa-5867777fb9-" is forbidden: unable to validate against any
    security context constraint: [provider "anyuid": Forbidden: not usable
    by user or serviceaccount, provider "pipelines-scc": Forbidden: not
    usable by user or serviceaccount,
    spec.initContainers[0].securityContext.runAsUser: Invalid value:
    1000810000: must be in the ranges: [1000860000, 1000869999],
    spec.containers[0].securityContext.runAsUser: Invalid value: 1000810000:
    must be in the ranges: [1000860000, 1000869999],
    spec.containers[1].securityContext.runAsUser: Invalid value: 1000810000:
    must be in the ranges: [1000860000, 1000869999], provider "restricted":
    Forbidden: not usable by user or serviceaccount, provider
    "container-build": Forbidden: not usable by user or serviceaccount,
    provider "nonroot-v2": Forbidden: not usable by user or serviceaccount,
    provider "nonroot": Forbidden: not usable by user or serviceaccount,
    provider "hostmount-anyuid": Forbidden: not usable by user or
    serviceaccount, provider "machine-api-termination-handler": Forbidden:
    not usable by user or serviceaccount, provider "hostnetwork-v2":
    Forbidden: not usable by user or serviceaccount, provider "hostnetwork":
    Forbidden: not usable by user or serviceaccount, provider "hostaccess":
    Forbidden: not usable by user or serviceaccount, provider
    "node-exporter": Forbidden: not usable by user or serviceaccount,
    provider "privileged": Forbidden: not usable by user or serviceaccount]```

๐Ÿ› [bug] - fix jhub single user profiles for ODH/RHODS and Spark

๐Ÿ“ Description

the JHUB single user profiles need integrating with ODH/RHODS config.

๐Ÿšถ Steps to reproduce

choosing a Spark based JHUB image does not spin up a cluster for the user.

๐Ÿง™โ€โ™€๏ธ Suggested solution

the code exists for a odh/custom jhub deployment. need to see if this can be made to work with rhods using config.

https://github.com/opendatahub-io-contrib/jupyterhub-singleuser-profiles

๐Ÿ› [bug] - Build failure for git-sync

๐Ÿ“ Description

Cloning "https://gitlab-ce.apps.osc-cl4.apps.os-climate.org/osclimate-datamesh/data-mesh-pattern" ...
Commit: efb9821cee326adb0256eaa715d14ab17deb4bae (UPDATE - project rename)
Author: Derek Dinosaur [email protected]
Date: Tue Jun 13 07:40:45 2023 +0000
time="2023-06-21T15:57:03Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0621 15:57:03.856489 1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".
Pulling image registry.access.redhat.com/ubi8/ubi:8.7-1112 ...
Trying to pull registry.access.redhat.com/ubi8/ubi:8.7-1112...
Getting image source signatures
Copying blob sha256:6208c5a2e205726f3a2cd42a392c5e4f05256850d13197a711000c4021ede87b
Copying config sha256:768688a189716f9aef8d33a9eef4209f57dc2e66e9cb5fc3b8862940f314b9bc
Writing manifest to image destination
Storing signatures
Adding transient rw bind mount for /run/secrets/rhsm
[1/2] STEP 1/11: FROM registry.access.redhat.com/ubi8/go-toolset:1.17.12-3 AS builder
Trying to pull registry.access.redhat.com/ubi8/go-toolset:1.17.12-3...
Getting image source signatures
Copying blob sha256:7e3624512448126fd29504b9af9bc034538918c54f0988fb08c03ff7a3a9a4cb
Copying blob sha256:e0dc1b5a4801cf6fec23830d5fcea4b3fac076b9680999c49935e5b50a17e63b
Copying blob sha256:db0f4cd412505c5cc2f31cf3c65db80f84d8656c4bfa9ef627a6f532c0459fc4
Copying blob sha256:354c079828fae509c4f8e4ccb59199d275f17b0f26b1d7223fd64733788edf32
Copying blob sha256:26f52032c311fbc800e08f09294173c94c35c8fcd36ed2d43ee3255bda598373
Copying config sha256:068b656b38eb7ca9715019ba440d0cd2dade3154390e13b6397d4601a8bdce66
Writing manifest to image destination
Storing signatures
[1/2] STEP 2/11: ARG ARG_OS=linux
--> ef8de5d13a9
[1/2] STEP 3/11: ARG ARG_ARCH=amd64
--> f7bf97ebc3e
[1/2] STEP 4/11: ARG ARG_BIN=git-sync
--> 010315e264e
[1/2] STEP 5/11: ARG TARGETOS=linux
--> d643f9978f3
[1/2] STEP 6/11: ARG TARGETARCH=amd64
--> decd079af01
[1/2] STEP 7/11: WORKDIR /workspace
--> 5fe2777e29d
[1/2] STEP 8/11: RUN git clone https://github.com/kubernetes/git-sync.git /workspace
Cloning into '/workspace'...
/workspace/.git: Permission denied
error: build error: error building at STEP "RUN git clone https://github.com/kubernetes/git-sync.git /workspace": error while running runtime: exit status 1

๐Ÿ› [bug] - OpenMetadata Keycloak - Roles + Logout

๐Ÿ“ Description

Integration for Keycloak based login was added in #49

Two issues need some more work:

(1) No Backchannel logout mechanism in Openmetadata for KC. The frontchannel configured in the example did not seem to work (ie. session remains in KC post logout)

https://github.com/open-metadata/openmetadata-demo/blob/main/keycloak-sso/config/data-sec.json

likely this needs fixing in OMD itself.

(2) The Team / Roles - seem to be managed in the app - i.e. Admin is set using the env.var AUTHORIZER_ADMIN_PRINCIPALS and the default roles in KC client have no effect:

  roles:
    - name: DataConsumer
      composite: false
      clientRole: true
    - name: Admin
      composite: false
      clientRole: true
    - name: DataSteward
      composite: false
      clientRole: true

Would be nice if these Roles could be managed in KC instead.

Review lakeFS / DVC for data versioning

Pachyderm provides limited capabilities for us to manage data versioning, in particular:

  • No actual integration with the git repository for the data product
  • No ability for a client to visually check data and data versioning process in an interface
  • No integration with query capabilities

This issue is to review other open source projects such as lakeFS and DVC for potential replacement.

Integrate kepler with observability and data mesh

Integrate kepler (https://sustainable-computing.io/) with the data mesh pattern to generate power consumption data at pod level and leverage as optimization data for AIOps use-cases.

This will include:

  • Kepler deployment as part of the pattern (likely leveraging helm charts)
  • Integration with cluster prometheus metrics
  • Integration with observability layer via distributed tracing
  • Persist kepler data long-term in lakehouse (on container storage) with a real-time pipeline

Note: ideally we filter workload data (no persistence of control plane) given how much storage this will create, and there is a way to start / stop the collection.

๐Ÿ› [bug] - osc-ingest-tools 0.4.3 vs. SqlAlchemy 2.0

๐Ÿ“ Description

First off, the Elyra Dockerfile likely needs osc-ingest-tools to do anything interesting when it comes to building data pipelines. Alas, os-climate/osc-ingest-tools#46 cites that osc-ingest-tools uses code that's deprecated in SqlAlchemy 2.0. @HeatherAck

๐Ÿšถ Steps to reproduce

When there is a Data Mesh pattern available in one of the OS-Climate clusters, I'll create a recipe for reproduction. This issue is just book-keeping at this point. @redmikhail

๐Ÿง™โ€โ™€๏ธ Suggested solution

We need an updated osc-ingest-tools library (@erikerlandson) and an updated Elyra Dockerfile referencing that updated library.

๐Ÿ› [bug] - OpenMetaData integration with Airflow ingestion

๐Ÿ“ Description

Deployed Airflow data ingestion wit new version of OpenMetadata.. Able to create a pipeline

๐Ÿšถ Steps to reproduce

... Able to create a pipeline , but not able to deploy data ingestion pipeline to Airflow

๐Ÿง™โ€โ™€๏ธ Suggested solution

There is a issue with latest version of Metadata . opened support ticket in OpenMetaData slack waiting for resolution from OpenMetaData community
https://app.slack.com/client/T02BVTLN3G8

Prototype upstream pattern / downstream integration starting with Trino configuration

The pattern should take care of how downstream deployment evolve / get updated over time, allow evolution without breaking changes and safeguard downstream specific configurations (such as connectors).

For a start, we want to test in the context of OS-Climate how specific Trino configuration can be driven over time, use this as a way to start exploring the relationship between upstream and downstream.

Integrate data mesh over platform observability

Integrate with OpenTelemetry to export metrics, logs, and traces from the platform (as well as potentially Kepler) into data mesh ingestion. For this we focus on technical stacks to be used long term by our engineering team for metrics / logs / traces collection in the platform.

Metrics: Prometheus / Thanos
Logs: Loki / Vector
Traces: Jaeger / OpenTelemetry

The proposed approach would create a single layer of data delivery for metrics, logs and traces for the data collected and stored (potentially via ingestion through Trino / Iceberg).

Setup BigChainDB and Trino connector (via MongoDB)

BigChainDB (https://www.bigchaindb.com/) allows developers and enterprise to deploy blockchain proof-of-concepts, platforms and applications with a blockchain database, supporting a wide range of industries and use cases. In particular, it is used in GAIA-X and CatenaX for building decentralized data exchange secured by tokenization.

This issue is to support a PoC for deploying BigChainDB on our cluster, create a digital record (https://www.bigchaindb.com/developers/guide/tutorial-piece-of-art/) and then query the data via the MongoDB connector (https://docs.bigchaindb.com/en/latest/query.html)

Rule engine for data transformation

Maintenance of taxonomies for data should ideally be done in some kind of standard format with the ability to build rules for data equivalence between different data formats. This would be useful in particular in the case of ESG taxonomies mapping. Without such an ability to have mappings maintained in a one dimensional format, a lot of maintenance is required for cross-mappings for example:

https://github.com/OS-SFT/Taxonomy-Mappings-Library

This issue is to investigate a better way to maintain mappings in order to support the taxonomy equivalence project run within OS-Climate.

@MichaelTiemannOSC

Dynamic Query Routing for Trino

Trino users may utilize various clients such as CLI, JupyterHub, SQL editors, custom reporting tools, and other JDBC-based apps to connect to Trino from a wide range of locations over an enterprise network. Therefore, implementation of data mesh patterns at scale typically will have multiple trino clusters to avoid single point of failure, scaling, and potentially optimize the network routing closer to the query location. This can be achieved with dynamic query routing and Goldman Sachs has implemented a solution using envoy proxies to support this type of distributed trino deployment.

https://developer.gs.com/blog/posts/enabling-highly-available-trino-clusters-at-goldman-sachs

We should review this architecture and determine if and how we could support similar deployment models with our pattern, in order to provide an out-of-the-box high availability approach.

๐Ÿ› [bug] - Cosmetic - Hyperlinks pointing to incorrect/different endpoints

๐Ÿ“ Description

The following URLs as mentioned in https://github.com/opendatahub-io-contrib/data-mesh-pattern#data-mesh-pattern are pointing to incorrect endpoints:
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh -- Zhamak Dehghani, Thoughtworks

Data Mesh Principles and Logical Architecture -- Zhamak Dehghani, Thoughtworks

๐Ÿšถ Steps to reproduce

Click on any of the following two URLs in https://github.com/opendatahub-io-contrib/data-mesh-pattern#data-mesh-pattern:
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh -- Zhamak Dehghani, Thoughtworks

Data Mesh Principles and Logical Architecture -- Zhamak Dehghani, Thoughtworks

๐Ÿง™โ€โ™€๏ธ Suggested solution

[How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh] should(most probably) be linked to https://martinfowler.com/articles/data-monolith-to-mesh.html

[Data Mesh Principles and Logical Architecture] should(most probably) be linked to https://martinfowler.com/articles/data-mesh-principles.html

Trino Connector - Mechanism for drop in Connector Configuration

๐Ÿ“ Description

Currently the helm chart supports separate Hive/S3 catalog deployments. You can define multiple Hive catalogs in the catalogs list:

https://github.com/opendatahub-io-contrib/data-mesh-pattern/blob/main/gitops/trino/chart/trino/values.yaml#L80-L85

catalogs:
  # Hive Demo Catalog
  - name: demo
    enabled: true
    replicaCountHive: 1
    replicaCountDb: 1

an have the different connection secrets e.g. DEMO_*

https://github.com/opendatahub-io-contrib/data-mesh-pattern/blob/main/gitops/trino/chart/trino/templates/hive-db-secret.yaml#LL15C1-L19C60

  database-password: <{{ $cat.name | upper }}_HIVE_DB_PASSWORD>
  database-host: {{ $cat.name }}-hive-db
  database-port: "5432"
  database-name: <{{ $cat.name | upper }}_HIVE_DB_NAME>
  database-user: <{{ $cat.name | upper }}_HIVE_DB_USERNAME>

Different approaches to this problem exist.

e.g. For example in the upstream chart a simple ConfigMap is used for this example:

https://github.com/trinodb/charts/blob/main/charts/trino/templates/configmap-catalog.yaml#L12-L17

e.g. And in Operate-First/OS-climate this is done using kustomize overlays:

https://github.com/operate-first/apps/blob/master/kfdefs/overlays/osc/osc-cl2/trino/configs/catalogs/oef_openclimate.properties

We would like to document and extend a mechanism to support a broader range of connectors.

  • should connectors be separate from the Helm Chart ?
  • for example you could have them exclusively as a set of connector kustomize overlays ?
  • or with helm - inject all the connector properties with separate connector values.yaml files ?

๐Ÿ› [issue] - single rhods-notebooks project, can we cater for multiple data science teams ?

๐Ÿ“ Description

RHODS places all of the users and their notebook pods in one namespace rhods-notebooks

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_science/1/html-single/managing_users_and_user_resources

in a single OpenShift cluster it would be nice to be able to multi-tenant teams so that users notebooks are not visible to everyone who has access to the rhods-notebooks project.

๐Ÿง™โ€โ™€๏ธ Suggested solution

In the original code base, we could deploy an instance of upstream odh jupyterhub per-team i.e. multiple jupyterhub instances - thus allowing this type of separation.

๐Ÿ‘‘ [exercise] - remove old non-data-mesh examples

๐Ÿ“ Description

the rainforest demo examples need removing and/or changing to target data mesh instead.

๐Ÿฅค Additional Info

see the docs/4-aiml-demos folder for user demo's and examples

โœ… A/Cs

  • Big Picture Updated (if applicable)
  • Facilitator notes updated (if applicable)
  • Exercise peer reviewed / tested with one other region member
  • Addition of new exercise does not affect previous exercise (maintain modularity)

๐Ÿ‘‘ [exercise] - On-board SAMEPATH

  • Module:

๐Ÿ“ Description

The SAMEPATH datasets (https://samepath.shinyapps.io/samepath/#dataAccess) consist of many tables from NGFS, UNIPRI, GECO, and other public sources related to sustainable finance. We want to demonstrate the ease with which we can federate this data from primary sources, maintain the data as it is updated (usually annually), and serve as the future data source for the SAMEPATH visualization (R-Shiny) tools.

๐Ÿฅค Additional Info

...link to the docs where the exercise should be added or links to blogs etc that form the basis of the exercise.

โœ… A/Cs

  • Big Picture Updated (if applicable)
  • Facilitator notes updated (if applicable)
  • Exercise peer reviewed / tested with one other region member
  • Addition of new exercise does not affect previous exercise (maintain modularity)

Self-signed certificate issue when connecting with Trino

We have a certificate issue when running a query against trino passing the self-signed certificate at https://github.com/opendatahub-io-contrib/data-mesh-pattern/blob/main/supply-chain/trino/trino-certs/ca.crt:

Code to reproduce:

certificate_path = '../../ca.crt'
engine = create_engine(
'trino://' + os.environ['TRINO_USER'] + ':' + os.environ['TRINO_PASSWD']
+ '@' + os.environ['TRINO_HOST'] + ':' + os.environ['TRINO_PORT'] + '/'
+ ingest_catalog,
connect_args={'verify': certificate_path},
)

with engine.connect() as connection:
ย  ย  result = connection.execute(text('show catalogs'))
ย  ย  for row in result:
ย  ย  ย  ย  print(row)

Error:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1131)

During handling of the above exception, another exception occurred:

MaxRetryError: HTTPSConnectionPool(host='trino-service.daintree-dev.svc.cluster.local', port=8443): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1131)')))

๐Ÿ› [bug] - pachyderm cannot start

๐Ÿ“ Description

Similar to #79 , pachyderm containers are unable to start as it appears to be looking for an operator group that isnt there:

        failed to populate resolver cache from source
        operatorgroup-unavailable/pachyderm: found 0 operatorgroups in namespace
        pachyderm: expected 1```


๐Ÿ› [bug] - Elyra-tflow container fails to build

๐Ÿ“ Description

During the build of all the containers the elyra-tflow container fails to build:

Cloning "https://gitlab-ce.apps.osc-cl4.apps.os-climate.org/osclimate-datamesh/data-mesh-pattern" ...
	Commit:	efb9821cee326adb0256eaa715d14ab17deb4bae (UPDATE - project rename)
	Author:	Derek Dinosaur <[email protected]>
	Date:	Tue Jun 13 07:40:45 2023 +0000
Replaced Dockerfile FROM image elyra-base:0.2.1
time="2023-06-26T20:43:48Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0626 20:43:48.530450       1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".

Pulling image image-registry.openshift-image-registry.svc:5000/osclimate-datamesh-ci-cd/elyra-base@sha256:c03982d4db4e361d5302a4d8c0632bc07bda9bc8b6ebb1b9029bcd8393bcb3ea ...
Trying to pull image-registry.openshift-image-registry.svc:5000/osclimate-datamesh-ci-cd/elyra-base@sha256:c03982d4db4e361d5302a4d8c0632bc07bda9bc8b6ebb1b9029bcd8393bcb3ea...
Getting image source signatures
Copying blob sha256:028bdc977650c08fcf7a2bb4a7abefaead71ff8a84a55ed5940b6dbc7e466045
Copying blob sha256:af38327575d72c979478aaddf6a33ad9cf561844588f5db47e85c4ee721012ec
Copying blob sha256:819ccd5eb87778d75c516f3a542ae6a3d2367498bd7062a701cb2237995f6cb5
Copying blob sha256:a439c75b0a4f2699983da35fc5e15fd9809bc37f694f54717020886cffc0548b
Copying blob sha256:0c673eb68f88b60abc0cba5ef8ddb9c256eaf627bfd49eb7e09a2369bb2e5db0
Copying blob sha256:c37fd7de0840b4031b29e532b9c694c59a63983ae93162a2e6476882cd075b21
Copying blob sha256:bf105214519e48fd5c21e598563e367f6f3b7c30996d1745a99428752c0ad1ae
Copying blob sha256:0cdbf2b404cc6f9f91c9f46d490f467080c4b5d8ee3b5d4c925e02a340e8d10b
Copying blob sha256:f2316205fe7bc7979d3019254716646bf2f786c1825faa1c1ed39f7420174b25
Copying blob sha256:68057c5053360a1a580bb505ba567d6f4c771d07fe959a30c547d4e276bc0467
Copying blob sha256:988a562fbd90b733eb253c56d63a830afed36df0e609418700caccd23a245fdc
Copying blob sha256:90cf9451d289c16ed981d2a646cfc979874f0eff05ea2e86edfefac87ff0b2e6
Copying blob sha256:ebb3898343c60b4a8d79aed8a93654dc73a0f980ea1bf7e30018bd449d4f611b
Copying blob sha256:059ceb835a667820ab78d7d6fb48b9e7fbb769ce612281ba189bed25ce0a99db
Copying blob sha256:f31e46de923b1250ab065453646dcff2466749a2e9549ea289b038cfa3fefe36
Copying blob sha256:9ff9b64097f0280c8b0ecd3a2a801bf474d0aa3fc160350fd699c1d929e0241b
Copying blob sha256:90c508cf12e1e5825e29e1eec796188af045440ffa6d697f35279a813b004b9b
Copying blob sha256:acab339ca1e8ed7aefa2b4c271176a7787663685bf8759f5ce69b40e4bd7ef86
Copying blob sha256:acab339ca1e8ed7aefa2b4c271176a7787663685bf8759f5ce69b40e4bd7ef86
Copying blob sha256:8fde022b6648ce49357f0c7620a96ba04104c5ad0e9029078ff878cfc37021bb
Copying blob sha256:90367ac5959ea0a29369bb20aff6c90903326a1fa703befc629d1cdf024fc99a
Copying blob sha256:ae97caea9fa3345a096a09d1df0fa8b68a31cdd398c4402748e0b548fe2f25ff
Copying blob sha256:acab339ca1e8ed7aefa2b4c271176a7787663685bf8759f5ce69b40e4bd7ef86
Copying blob sha256:acab339ca1e8ed7aefa2b4c271176a7787663685bf8759f5ce69b40e4bd7ef86
Copying blob sha256:8a441ef86887ecc2a66703d73d2b86538a75edfa38a90cc19d73bb7aaa4aa8cd
Copying blob sha256:23946ae671303d0c6cc4870accc51fe43463e7993e122f7b08082dc2a9726a0f
Copying blob sha256:073c6e194011062bd49b1ccda1819f15aa368590829afae3e0263759cf4dacba
Copying blob sha256:46e601ccae7c5a32545b6b6db733b3d2db5b6581b915520edbcbf262a2b79110
Copying blob sha256:4a269fdc289ca6b7833584bede177c80aa91f2706dea33bc2b94398a3e83d9d0
Copying blob sha256:12ef46b74f05917750404a8de7565168740216fa44be5b19d5d75273a3ec0c86
Copying blob sha256:b2ec48efaf35963d699ae8446e20120869f9fb1ca34ee70f64b82a6050e627f7
Copying blob sha256:c48957ebd2d09b52f4d564cbd5914b1b9e94939f21142f6041db41d0e62fab74
Copying blob sha256:08c9d67bcd774940f73a67eb036be8a756d8eab9b2e4c43bc4e0bcdf17cdaea3
Copying blob sha256:c23f73eb778d14742f04e1238227b8efc4fd1ce51d17a98100744e912e752901
Copying blob sha256:5e0654a3c30dd59ab31f6531ae1a8ad9a8368c5cb6368550e0de2e7c66f9b3b9
Copying blob sha256:237741efa6248120129716d660cc7fece732ea172110784949b97a96e681cb62
Copying blob sha256:58787dd3cb793f5983c0aaa6b70341c30a41a1bb60fc1a5f6f1cd9061ee2edc0
Copying blob sha256:4f5aa417a25f646d2d39642577d4580eedd0fe809c857932aeabd3bb22587bb9
Copying blob sha256:f009e2fceca5421f4769b12a3dd42777940ee1e6e8f17c8c5b77b5e248b9b7d2
Copying blob sha256:2fb528adb3814ee51b07a0165956060c4d0703d454a18f08c6430ed667ed5853
Copying blob sha256:cef676ff822d33c5bdc8cc17a6af24ce425f2353463b189c7ee1a637c2d012ea
Copying blob sha256:f7ea4b46629aacb0aacbf8fe8197fb924a48c9e8875d9f9721565b4a7374549a
Copying blob sha256:d1473e2d5c4be6a885eba43606bbe79229239b92427436391a8cf9edb977e357
Copying blob sha256:d3cda3d33521c0cd44da393733605297f341d7e36a42850e945d122578533ded
Copying config sha256:4a7596a0ebbeb7ba5f97a2ca3d310d6ec4b0842fa024310ec3e235517d45d4dd
Writing manifest to image destination
Storing signatures
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/8: FROM image-registry.openshift-image-registry.svc:5000/osclimate-datamesh-ci-cd/elyra-base@sha256:c03982d4db4e361d5302a4d8c0632bc07bda9bc8b6ebb1b9029bcd8393bcb3ea
STEP 2/8: USER root
--> 8ab6947207f
STEP 3/8: RUN /opt/app-root/bin/pip3 install jinja2==3.1.2
Looking in indexes: https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/simple
Requirement already satisfied: jinja2==3.1.2 in /opt/app-root/lib/python3.8/site-packages (3.1.2)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/app-root/lib/python3.8/site-packages (from jinja2==3.1.2) (2.1.1)
WARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.
You should consider upgrading via the '/opt/app-root/bin/python3.8 -m pip install --upgrade pip' command.
--> fbfab4021e8
STEP 4/8: RUN /opt/app-root/bin/pip3 install certifi
Looking in indexes: https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/simple
Requirement already satisfied: certifi in /opt/app-root/lib/python3.8/site-packages (2022.9.24)
WARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.
You should consider upgrading via the '/opt/app-root/bin/python3.8 -m pip install --upgrade pip' command.
--> 1c77c833b0e
STEP 5/8: RUN /opt/app-root/bin/pip3 install matplotlib numpy pandas scipy scikit-learn tensorflow minio
Looking in indexes: https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/simple
Collecting matplotlib
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/matplotlib/3.7.1/matplotlib-3.7.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (9.2 MB)
Collecting numpy
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/numpy/1.24.4/numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting pandas
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/pandas/2.0.2/pandas-2.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Collecting scipy
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/scipy/1.10.1/scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
Collecting scikit-learn
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/scikit-learn/1.2.2/scikit_learn-1.2.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
Collecting tensorflow
  Downloading https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/tensorflow/2.12.0/tensorflow-2.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (585.9 MB)
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    tensorflow from https://nexus-osclimate-datamesh-ci-cd.apps.osc-cl4.apps.os-climate.org/repository/pypi/packages/tensorflow/2.12.0/tensorflow-2.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=23850332f1f9f778d697c9dba63ca52be72cb73363e75ad358f07ddafef63c01:
        Expected sha256 23850332f1f9f778d697c9dba63ca52be72cb73363e75ad358f07ddafef63c01
             Got        2ecfc624220e0e36c414dc6889ab365f02f50a9edc3f230dcebbd4955cbf62fa

WARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.
You should consider upgrading via the '/opt/app-root/bin/python3.8 -m pip install --upgrade pip' command.
error: build error: error building at STEP "RUN /opt/app-root/bin/pip3 install matplotlib numpy pandas scipy scikit-learn tensorflow minio": error while running runtime: exit status 1

๐Ÿ‘‘ [exercise] - Google Data Commons POC

Google Data Commons POC

  • Module:

๐Ÿ“ Description

The Google Data Commons (https://datacommons.org/) has over 1 trillion datapoints of all kinds, organized in a knowledge graph and available via BigQuery. Some of this data is directly useful to climate and sustainable finance analysis, and some of this data could be useful when linked to corporate ownership (via entity matching).

Here are datasets federated by Google's Data Commons that relate to the topic Environment: https://docs.datacommons.org/datasets/Environment.html

Here is a narrowing of that data that relate to the topic Emissions within the US (based on EPA GHGRP): https://datacommons.org/tools/map#%26sv%3DAnnual_Emissions_CarbonDioxide_NonBiogenic%26pc%3D0%26denom%3DCount_Person%26pd%3Dcountry%2FUSA%26ept%3DState%26ppt%3DEpaReportingFacility

The goal of this exercise is to demonstrate our ability to federate a tiny but meaningful slice of Google's Data Commons data into the Data Mesh and to expose that data within OS-Climate's Data Exchange. The data should be chosen so that a meaningful "so what?" question can be answered, but the overall point of the exercise is to assess the ease with which the Data Mesh can enable data analysts to be maximally productive and effective in when asking and answering climate and sustainable finance questions.

๐Ÿฅค Additional Info

Please feel free to flesh out and/or ask further questions.

โœ… A/Cs

  • Big Picture Updated (if applicable)
  • Facilitator notes updated (if applicable)
  • Exercise peer reviewed / tested with one other region member
  • Addition of new exercise does not affect previous exercise (maintain modularity)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.