Code Monkey home page Code Monkey logo

veda-data-airflow's Introduction

veda-data-airflow

This repo houses function code and deployment code for producing cloud-optimized data products and STAC metadata for interfaces such as https://github.com/NASA-IMPACT/delta-ui.

Project layout

  • dags: Contains the Directed Acyclic Graphs which constitute Airflow state machines. This includes the python for running each task as well as the python definitions of the structure of these DAGs
  • pipeline_tasks: Contains util functions used in python DAGs
  • data: Contains JSON files which define ingests of collections and items
  • docker_tasks: Contains definitions tasks which we want to run in docker containers either because these tasks have special, unique dependencies or for the sake of performance (e.g. using multiprocessing)
  • infrastructure: Contains the terraform modules necessary to deploy all resources to AWS
  • custom policies: Contains custom policies for the mwaa environment execution role
  • scripts: Contains bash and python scripts useful for deploying and for running ingests

Fetching Submodules

First time setting up the repo: git submodule update --init --recursive

Afterwards: git submodule update --recursive --remote

Requirements

Docker

See get-docker

Terraform

See terraform-getting-started

AWS CLI

See getting-started-install

Deployment

This project uses Terraform modules to deploy Apache Airflow and related AWS resources using Amazon's managed Airflow provider.

Make sure that environment variables are set

[.env.example](./.env.example) contains the environment variables which are necessary to deploy. Copy this file and update its contents with actual values. The deploy script will source` and use this file during deployment when provided through the command line:

# Copy .env.example to a new file
$cp .env.example .env
# Fill values for the environments variables

# Init terraform modules
$bash ./scripts/deploy.sh .env <<< init

# Deploy
$bash ./scripts/deploy.sh .env <<< deploy

Note: Be careful not to check in .env (or whatever you called your env file) when committing work.

Currently, the client id and domain of an existing Cognito user pool programmatic client must be supplied in configuration as VEDA_CLIENT_ID and VEDA_COGNITO_DOMAIN (the veda-auth project can be used to deploy a Cognito user pool and client). To dispense auth tokens via the workflows API swagger docs, an administrator must add the ingest API lambda URL to the allowed callbacks of the Cognito client.

Gitflow Model

VEDA pipeline gitflow

License

This project is licensed under Apache 2, see the LICENSE file for more details.

veda-data-airflow's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

veda-data-airflow's Issues

Implement OpenTelemetry in Workflows API

What

Observability of the STAC and Raster API serves to manage telemetry data such as traces, metric and logs. Currently default lambda logs and metrics are sent to Cloudwatch.

AWS Distro for OpenTelemetry or OpenTelemetry Lambda Layer can be used to implement observability in a way compatible with Veda backend services. Telemetry data can be exported to AWS X-Ray, Cloudwatch or Amazon Managed Service for Prometheus; an appropriate exporter must evaluated and selected.

PI Objective

Objective 2: Production Services: CI Updates and Monitoring

Acceptance Criteria

  • Workflow API is instrumented and correlated logs, metrics, and traces are sent to an appropriate backend (CloudWatch, AWS X-Ray, etc.)

Fix item stac record metadata issues

Description

The item stac metadata has some invalid fields:

  • proj should be integer, not float
  • single_datetime vs start_datetime end_datetime format mismatch
  • all datetimes should conform to stac-api spec

Acceptance Criteria

  • The item stac records no longer have these issues

Resources

- rio_stac version should fix the first issue
https://github.com/US-GHG-Center/ghgc-data-airflow/blob/537f43da4f319a8f656a1a19f7265dfe183bc5dd/docker_tasks/build_stac/requirements.txt#L7-L8
see https://github.com/developmentseed/rio-stac/blob/main/CHANGES.md#080-2023-05-26
(it was an ingestor issue)

REQUIRED. The searchable date and time of the assets, which must be in UTC. It is formatted according to RFC 3339, section 5.6. null is allowed, but requires start_datetime and end_datetime from common metadata to be set.

Convert Workflows APIs ingestion logic into modular Airflow DAGs

What

The existing Workflows API includes ingestion logic in the API itself. To be able to generalize the Workflows API for different ingestion processes and stakeholders (i.e. MAAP and VEDA) the specific ingestions logic can be converted to modular Airflow DAGs. This flexibility will allow further user friendly ingestion workflows int the future.

PI Objective

Objective 3: Improve Ingestion Workflows

Success Criteria

  • Workflow API is converted to modular DAGs and deployed to veda-data-airflow dev

Reconcile validations with ingest-api

What

The workflows API duplicates validations that are executed by the ingest API. We need to make sure these have the same effect and/or are removed as not needed in workflows.

Note this is just a reminder to confirm that we didn't migrate any of our legacy validation bugs to the new workflows API. Success could be as simple as confirming that the validators in the workflows API are functionally the same as the recently corrected validators in veda-backend/ingest-api

Moreover, the current workflows API schema base model does not include the renders or providers fields and will fail when run with those properties. Either these fields should be included in the workflows model or leave downstream schema validation to ingestion API.

AC

  • validations are deduplicated if needed, preferring to validate in ingest-api

Add workflow to verify asset availability

Given that we have version control on all inputs to step functions/lambdas which define ingests, it should be possible to write a small integration test which verifies all imagery loaded to S3 is reflected in the database.

(This is an issue migrated over from veda-data-pipelines)

Troubleshoot and fix /list-workflows and /discovery-executions

GET /list-workflows and GET /discovery-executions with a workflow_execution_id that doesn't exist returns an Internal Server Error

/list-workflows error:

"Unhandled exception: 2 validation errors:\n  {'loc': ('response',), 'msg': 'value is not a valid dict', 'type': 'type_error.dict'}\n  {'loc': ('response',), 'msg': 'value is not a valid dict', 'type': 'type_error.dict'}\n",

/discovery-executions error:

\| Exception: Failed to find dag run id: 9ec62382-deca-436b-8038-b3e835cd837e_be3d675f-2f8b-43c4-904c-d113c04477f5

For the discovery-executions exception, if the dag run id can not be found, we should throw a more helpful error message.

Create DAG using stactools to build and publish STAC items

Using stactools can give data providers another way to describe ingestion steps, which supports our goal of making ingestion more user-configurable. Stactools packages can also be re-used in other contexts, which makes it easier for advanced users to bring VEDA data into their own STAC catalogs.

A similar approach has been used in ASDI, but for VEDA we would need to adapt these pipelines to use Airflow.

Related: https://github.com/NASA-IMPACT/veda-architecture/issues/283

Test environment SM2A deployment for VEDA

What

The existing veda MWAA will eventually be deployed as a self managed airflow service (aka SM2A). This will offer stability, speed and additional features compared to AWS MWAA. Before the SM2A can be deployed, exiting task definition DAGs must be reconfigured/

PI Objective

Objective 3: Improve Ingestion Workflows

Acceptance Criteria

  • Reconfigured task definitions are reviewed by the team
  • Test SM2A is deployed in UAH

Continuous ingest of HLS dataset

Description

Goal is to continually update the HLS Dataset based on additional data being uploaded to the DAAC, without having to run any manual functions.

Acceptance Criteria

  1. Given new data has been added to the HLS dataset at the DAAC within parameters that have previously been configured within the VEDA Dashboard, when I visit the VEDA Dashboard then that new data is available for viewing

Event Driven Ingest of daily datasets

The fireline researchers are storing updated datasets in s3 multiple times a day. These updates will need to be ingested using airflow by triggering the discover dag. AWS Event Bridge can be used to facility these events.

Acceptance Criteria

  • Uploading a file to S3 triggers the discover DAG

Update env vars to work with `veda-deploy`

Description

The last PR for adding workflows-api to the repo introduced a bunch of new environment variables. Now that we're planning to use veda-deploy for deployment, we need to be mindful that dependent env vars have been outputted via CfnOutput from the dependencies and that the format of environment variables is correct.

A lot of these new env vars already are exported from the other projects and have a certain format. Some can be derived from the existing env vars.

We'll need to update those.

Here's the list:

data_access_role_arn="${DATA_ACCESS_ROLE_ARN}" // VEDA_DATA_ACCESS_ROLE_ARN
workflow_root_path="${WORKFLOW_ROOT_PATH}" // add new in secrets
ingest_url="${INGEST_URL}" // VEDA_STAC_INGESTOR_API_URL
raster_url="${RASTER_URL}" // VEDA_RASTER_URL
stac_url="${STAC_URL}"  // VEDA_STAC_URL
cloudfront_id="${CLOUDFRONT_ID}" // secrets
jwks_url="${JWKS_URL}" // can be derived from VEDA_COGNITO_APP_SECRET
cognito_userpool_id="${COGNITO_USERPOOL_ID}" // can be derived from VEDA_COGNITO_APP_SECRET
cognito_client_id="${COGNITO_CLIENT_ID}"    //  can be derived from VEDA_COGNITO_APP_SECRET

Fix /list-workflows endpoint

Context

In some stacks, like mcp-test, when you try to use the /list-workflows endpoint, you get an Internal Server Error. The logs indicate that the function isn't returning a dict

{'loc': ('response',), 'msg': 'value is not a valid dict', 'type': 'type_error.dict'}

Upon investigation, the function was returning an encoded dict and the decoded stderr said:

Error: Failed to load all files. For details, run `airflow dags 
list-import-errors`

Acceptance Criteria

  • Endpoint is fixed
  • Error messaging is added if there is an error so it's easier to debug/handle

Support for test ingestion run

Description

Some of the datasets to be ingested have a huge number of data files (for example CMIP6), triggering an ingest for this dataset will trigger ingest for all the files in the collection.

Although this can be limited by using filename_regex, it isn't always easy to find a pattern that'll include a certain number of files.

So, to make this process easier, add support for a test_run value in the dataset ingestion definition, which can either be a bool (in which case, trigger ingestion for a predetermined number (eg 10) items), or an int (in which case, trigger ingest for that many items).

That way users can run the workflow for a subset of the dataset, check if everything looks good and if so, rerun the ingest without the test_run key to trigger the full ingest.

Acceptance Criteria

  • [] Dataset ingestion definition supports a test run for a subset of the dataset

Bug: Deployment doesn't push latest changes in the ecs tasks

Description

The ecs tasks (eg: build_stac) image is only updated on deployment if there are changes in the Dockerfile or handler.py, which isn't necessarily correct because there are dependencies on other files too.

This is a limitation in the base terraform module used since the triggers are only those two files.

Add app name and stage to airflow UI title

What

Add title displaying the veda-instance name and
stage of the airflow UI. It looks like a simple environment configuration change:

https://airflow.apache.org/docs/apache-airflow/stable/howto/customize-ui.html#customizing-dag-ui-header-and-airflow-page-titles

Why

It is unsafe to rely on the uuid in the url alone to determine which VEDA instance the airflow system updates

AC

  • it is clear what veda-instance a given airflow UI is tied to from the browser

Include ingestor dataset publish workflows in veda-data-airflow

What

The /dataset/* endpoints in veda-stac-ingestor currently employ circular logic (e.g stac-ingestor -> airflow -> stac-ingestor). To resolve this, the dataset/* endpoints will be moved to the veda-data-airflow.

This work will include porting the dataset endpoints to veda-data-airflow by creating Lambda and API gateway constructs using Terraform.

PI Objective

Acceptance Criteria

  • Runtime and construct for dataset endpoints are ported to veda-data-airflow and tested (this will require NASA-IMPACT/veda-backend#294 complete prior)

Pass discovery output to next step via s3

Motivation

The max size of the payload that can be passed between states in a step function is 256KB. Sometimes, when the number of items discovered is too many (s3-discovery lambda), the payload size exceeds the threshold and results in the cancellation of the state machine.

Workaround

The workaround we've been using up until now is using the filename_regex key to divide the total items to chunks and running separate workflows for each chunk.
Eg:

    {
        "collection": "co2-diff",
        "prefix": "co2-diff/",
        "bucket": "veda-data-store-staging",
        "filename_regex": "^(.*)2015.*.tif$",
        "discovery": "s3"
    },
    {
        "collection": "co2-diff",
        "prefix": "co2-diff/",
        "bucket": "veda-data-store-staging",
        "filename_regex": "^(.*)2016.*.tif$",
        "discovery": "s3"
    },

instead of

    {
        "collection": "co2-diff",
        "prefix": "co2-diff/",
        "bucket": "veda-data-store-staging",
        "filename_regex": "^*.tif$",
        "discovery": "s3"
    },

Solution

Rather than passing the payload directly to another state, we could write the payload to an s3 bucket, pass only the URL of the object and the next state would read the object from the s3 bucket.

Container override not referencing correct task definition container

What

When running an ingest in the dev MWAA, the veda_ingest_raster DAG fails at the build_stac process. The error shown is:

An error occurred (InvalidParameterException) when calling the RunTask operation: Override for container named veda-pipeline-dev-veda-build_stac is not a container in the TaskDefinition.

This error is because the DAG files being uploaded to s3 does not align with the task definition container name in ECS. This may be a result of the way the MWAA module DAG upload to s3 is triggered.

Acceptance Criteria

  • The veda_ingest_raster DAG is able to run without errors and changes to the DAG triggers an upload to the s3 bucket.

Cognito landing page errors

In production, when attempting to authorize through the docs page, I get redirected to a Cognito error page at https://veda-auth-stack-production.auth.us-west-2.amazoncognito.com/error?error=Required+parameters+missing. There might be a missing configuration value, but I can't figure out what it might be.

The same flow on the ingest API, which uses the same auth flow and client ID, brings me to the expected Cognito page, where I can log in successfully. The dev environment also worked for me, so I think this is environment-specific configuration.

cc: @botanical @anayeaye

Implement workflows-api auth login from swagger

What

Currently administrators must manually post username and password to a token endpoint in the veda-backend ingest api and copy paste the token from the response for workflows operations. Update this auth flow to follow the more standard redirect to auth provider for a secure username and password form entry and redirect to the swagger docs.

AC

  • token auth urls are configurable by veda environment
  • cognito user pool updated to allow callback to veda data airflow workflows api docs
  • admins in the cognito user pool associated with backend stack can login via swagger docs
  • admins can successfully use authenticated endpoints like collections/ and ingestions/
  • attempted authenticated airflow pipelines triggers like discover-items and documented success or error response (there are networking issues that may block successful discovery pipelines that should not block the success of this issue)

ADR discussing status tracking

It is not uncommon to see even far simpler ETL and async processing pipelines than what we're aiming to create with robust status-tracking, ingest-identification, and awareness/record keeping around the versions of all software used during ingest. It would be useful for us to consider what our options are to keep track of this kind of information. Perhaps a dynamo database that can track status and other relevant information? Maybe it could be used by the ingestor api as well to clarify system behavior to third parties

Throw exception in DAG if status not 2xx

This issue was encountered when we were trying to ingest fresh collections with veda_discover DAG. The DAG operation was a SUCCESS but no items were ingested by the ingestor API.

Further investigation through Cloudwatch logs revealed that the POST /ingestions endpoint was sending 422 response because it didn't have access to the S3 bucket it was supposed to have. This was ignored by airflow and it continued as a success.

Add an exception handler in airflow that fails the job if response status is other than the range of 2xx.

AC

  • Add exception handler to throw error if status is 4xx or 5xx

Connect to HLS Dataset outside of VEDA

Description

Currently, we have had to upload some small portion of HLS Data directly into VEDA. We would like to connect directly to the data stored on another DAAC, so that we don't have data duplicated in multiple systems.

This ticket is intended to be separate from the work to continuously update data based on additions to the HLS dataset. So, theoretically, it would be a snapshot in time and have to be manually updated until that other ticket is complete.

Acceptance Criteria

  1. Given I am viewing the VEDA Dashboard, when I view the HLS dataset then I see the full data available from the DAAC (which DAAC?) within the map explorer

Add networking configuration to workflows API lambda

What

Currently, our workflows API lambda is not configured with any VPC.

Acceptance Criteria

  • terraform configuration added to set up networking for workflows API lambda
  • testing is done to prove that deployment is successful and lambda is configured with the correct VPC and subnet IDs

Comparison of data ingest using Airflow vs Legacy (Step Functions)

Description

As a validation of the ingests done using the [1] new airflow based pipeline, this issue runs the ingestion using both [1] and [2] legacy step functions based pipeline.

The ingest is initiated via the veda-stac-ingestor api, endpoint /dataset/publish with the same inputs for both except the collection id, as can be seen below:

For [2], the input was:

{
  "collection": "lis-global-da-tws-trend",
  "title": "Terrestrial Water Storage Trend - LIS 10km Global DA",
  "description": "Gridded trend in terrestrial water storage (theil-sen slope estimation in mm yr-1) from 10km global LIS with assimilation",
  "license": "CC0-1.0",
  "is_periodic": false,
  "time_density": null,
  "spatial_extent": {
    "xmin": -179.95,
    "ymin": -59.45,
    "xmax": 179.95,
    "ymax": 83.55
  },
  "temporal_extent": {
    "startdate": "2003-01-01T00:00:00Z",
    "enddate": "2021-12-31T23:59:59Z"
  },
  "sample_files": [
    "s3://veda-data-store-staging/EIS/COG/LIS_GLOBAL_DA/DA_Trends/DATWS_STL_based_trend.cog.tif"
  ],
  "discovery_items": [
    {
      "collection": "lis-global-da-tws-trend-airflow",
      "discovery": "s3",
      "cogify": false,
      "upload": false,
      "dry_run": false,
      "prefix": "EIS/COG/LIS_GLOBAL_DA/DA_Trends/",
      "bucket": "veda-data-store-staging",
      "filename_regex": "(.*)DATWS_STL_based_trend.cog.tif$",
      "start_datetime": "2003-01-01T00:00:00Z",
      "end_datetime": "2021-12-31T23:59:59Z"
    }
  ]
}

Similarly, for [1], the input was:

{
  "collection": "lis-global-da-tws-trend-airflow",
  "title": "Terrestrial Water Storage Trend - LIS 10km Global DA",
  "description": "Gridded trend in terrestrial water storage (theil-sen slope estimation in mm yr-1) from 10km global LIS with assimilation",
  "license": "CC0-1.0",
  "is_periodic": false,
  "time_density": null,
  "spatial_extent": {
    "xmin": -179.95,
    "ymin": -59.45,
    "xmax": 179.95,
    "ymax": 83.55
  },
  "temporal_extent": {
    "startdate": "2003-01-01T00:00:00Z",
    "enddate": "2021-12-31T23:59:59Z"
  },
  "sample_files": [
    "s3://veda-data-store-staging/EIS/COG/LIS_GLOBAL_DA/DA_Trends/DATWS_STL_based_trend.cog.tif"
  ],
  "discovery_items": [
    {
      "collection": "lis-global-da-tws-trend-airflow",
      "discovery": "s3",
      "cogify": false,
      "upload": false,
      "dry_run": false,
      "prefix": "EIS/COG/LIS_GLOBAL_DA/DA_Trends/",
      "bucket": "veda-data-store-staging",
      "filename_regex": "(.*)DATWS_STL_based_trend.cog.tif$",
      "start_datetime": "2003-01-01T00:00:00Z",
      "end_datetime": "2021-12-31T23:59:59Z"
    }
  ]
}

After the ingestion run was done, the stac records for both were compared and they look like the following:

Collection

[1]

{
  "id": "lis-global-da-tws-trend-airflow",
  "type": "Collection",
  "links": [
    {
      "rel": "items",
      "type": "application/geo+json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow/items"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "self",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow"
    }
  ],
  "title": "Terrestrial Water Storage Trend - LIS 10km Global DA",
  "assets": null,
  "extent": {
    "spatial": {
      "bbox": [
        [
          -179.9500000157243, -59.98224871364589, 179.9973980503783,
          89.9999999874719
        ]
      ]
    },
    "temporal": {
      "interval": [["2003-01-01 00:00:00+00", "2003-01-01 00:00:00+00"]]
    }
  },
  "license": "CC0-1.0",
  "keywords": null,
  "providers": null,
  "summaries": {
    "datetime": ["2003-01-01T00:00:00Z"],
    "cog_default": {
      "max": 101.29833221435547,
      "min": -555
    }
  },
  "description": "Gridded trend in terrestrial water storage (theil-sen slope estimation in mm yr-1) from 10km global LIS with assimilation",
  "item_assets": {
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": ["data", "layer"],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  },
  "stac_version": "1.0.0",
  "stac_extensions": null,
  "dashboard:is_periodic": false,
  "dashboard:time_density": null
}

[2]

{
  "id": "lis-global-da-tws-trend",
  "type": "Collection",
  "links": [
    {
      "rel": "items",
      "type": "application/geo+json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend/items"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "self",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend"
    }
  ],
  "title": "Terrestrial Water Storage Trend - LIS 10km Global DA",
  "assets": null,
  "extent": {
    "spatial": {
      "bbox": [
        [
          -179.9500000157243, -59.98224871364589, 179.9973980503783,
          89.9999999874719
        ]
      ]
    },
    "temporal": {
      "interval": [["2003-01-01 00:00:00+00", "2003-01-01 00:00:00+00"]]
    }
  },
  "license": "CC0-1.0",
  "keywords": null,
  "providers": null,
  "summaries": {
    "datetime": ["2003-01-01T00:00:00Z"],
    "cog_default": {
      "max": 101.29833221435547,
      "min": -555
    }
  },
  "description": "Gridded trend in terrestrial water storage (theil-sen slope estimation in mm yr-1) from 10km global LIS with assimilation",
  "item_assets": {
    "cog_default": {
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "roles": ["data", "layer"],
      "title": "Default COG Layer",
      "description": "Cloud optimized default layer to display on map"
    }
  },
  "stac_version": "1.0.0",
  "stac_extensions": null,
  "dashboard:is_periodic": false,
  "dashboard:time_density": null
}

Items

[1]

{
  "type": "FeatureCollection",
  "context": {
    "limit": 10,
    "matched": 0,
    "returned": 1
  },
  "features": [
    {
      "id": "DATWS_STL_based_trend.cog",
      "bbox": [
        -179.9500000157243, -59.98224871364589, 179.9973980503783,
        89.9999999874719
      ],
      "type": "Feature",
      "links": [
        {
          "rel": "collection",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow"
        },
        {
          "rel": "parent",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow"
        },
        {
          "rel": "root",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/"
        },
        {
          "rel": "self",
          "type": "application/geo+json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow/items/DATWS_STL_based_trend.cog"
        }
      ],
      "assets": {
        "cog_default": {
          "href": "s3://veda-data-store-staging/EIS/COG/LIS_GLOBAL_DA/DA_Trends/DATWS_STL_based_trend.cog.tif",
          "type": "image/tiff; application=geotiff; profile=cloud-optimized",
          "roles": ["data", "layer"],
          "title": "Default COG Layer",
          "description": "Cloud optimized default layer to display on map",
          "raster:bands": [
            {
              "scale": 1.0,
              "nodata": 0.0,
              "offset": 0.0,
              "sampling": "area",
              "data_type": "float64",
              "histogram": {
                "max": 101.29833221435547,
                "min": -555.0,
                "count": 11.0,
                "buckets": [
                  7843.0, 0.0, 2.0, 13.0, 24.0, 77.0, 353.0, 1228.0, 118651.0,
                  9.0
                ]
              },
              "statistics": {
                "mean": -36.01088186359726,
                "stddev": 133.02156258224915,
                "maximum": 101.29833221435547,
                "minimum": -555.0,
                "valid_percent": 29.319745316159253
              }
            }
          ]
        }
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [-179.9500000157243, -59.98224871364589],
            [179.9973980503783, -59.98224871364589],
            [179.9973980503783, 89.9999999874719],
            [-179.9500000157243, 89.9999999874719],
            [-179.9500000157243, -59.98224871364589]
          ]
        ]
      },
      "collection": "lis-global-da-tws-trend-airflow",
      "properties": {
        "proj:bbox": [
          -179.9500000157243, -59.98224871364589, 179.9973980503783,
          89.9999999874719
        ],
        "proj:epsg": 4326.0,
        "proj:shape": [1500.0, 3600.0],
        "end_datetime": "2021-12-31T23:59:59+00:00",
        "proj:geometry": {
          "type": "Polygon",
          "coordinates": [
            [
              [-179.9500000157243, -59.98224871364589],
              [179.9973980503783, -59.98224871364589],
              [179.9973980503783, 89.9999999874719],
              [-179.9500000157243, 89.9999999874719],
              [-179.9500000157243, -59.98224871364589]
            ]
          ]
        },
        "proj:transform": [
          0.09998538835169517, 0.0, -179.9500000157243, 0.0,
          -0.09998816580074518, 89.9999999874719, 0.0, 0.0, 1.0
        ],
        "start_datetime": "2003-01-01T00:00:00+00:00"
      },
      "stac_version": "1.0.0",
      "stac_extensions": [
        "https://stac-extensions.github.io/projection/v1.0.0/schema.json",
        "https://stac-extensions.github.io/raster/v1.1.0/schema.json"
      ]
    }
  ],
  "links": [
    {
      "rel": "items",
      "type": "application/geo+json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow/items"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "self",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend-airflow"
    }
  ]
}

[2]

{
  "type": "FeatureCollection",
  "context": {
    "limit": 10,
    "matched": 1,
    "returned": 1
  },
  "features": [
    {
      "id": "DATWS_STL_based_trend.cog",
      "bbox": [
        -179.9500000157243, -59.98224871364589, 179.9973980503783,
        89.9999999874719
      ],
      "type": "Feature",
      "links": [
        {
          "rel": "collection",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend"
        },
        {
          "rel": "parent",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend"
        },
        {
          "rel": "root",
          "type": "application/json",
          "href": "https://dev-stac.delta-backend.com/"
        },
        {
          "rel": "self",
          "type": "application/geo+json",
          "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend/items/DATWS_STL_based_trend.cog"
        }
      ],
      "assets": {
        "cog_default": {
          "href": "s3://veda-data-store-staging/EIS/COG/LIS_GLOBAL_DA/DA_Trends/DATWS_STL_based_trend.cog.tif",
          "type": "image/tiff; application=geotiff; profile=cloud-optimized",
          "roles": ["data", "layer"],
          "title": "Default COG Layer",
          "description": "Cloud optimized default layer to display on map",
          "raster:bands": [
            {
              "scale": 1.0,
              "nodata": 0.0,
              "offset": 0.0,
              "sampling": "area",
              "data_type": "float64",
              "histogram": {
                "max": 101.29833221435547,
                "min": -555.0,
                "count": 11.0,
                "buckets": [
                  7843.0, 0.0, 2.0, 13.0, 24.0, 77.0, 353.0, 1228.0, 118651.0,
                  9.0
                ]
              },
              "statistics": {
                "mean": -36.01088186359726,
                "stddev": 133.02156258224915,
                "maximum": 101.29833221435547,
                "minimum": -555.0,
                "valid_percent": 29.319745316159253
              }
            }
          ]
        }
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [-179.9500000157243, -59.98224871364589],
            [179.9973980503783, -59.98224871364589],
            [179.9973980503783, 89.9999999874719],
            [-179.9500000157243, 89.9999999874719],
            [-179.9500000157243, -59.98224871364589]
          ]
        ]
      },
      "collection": "lis-global-da-tws-trend",
      "properties": {
        "proj:bbox": [
          -179.9500000157243, -59.98224871364589, 179.9973980503783,
          89.9999999874719
        ],
        "proj:epsg": 4326.0,
        "proj:shape": [1500.0, 3600.0],
        "end_datetime": "2021-12-31T23:59:59+00:00",
        "proj:geometry": {
          "type": "Polygon",
          "coordinates": [
            [
              [-179.9500000157243, -59.98224871364589],
              [179.9973980503783, -59.98224871364589],
              [179.9973980503783, 89.9999999874719],
              [-179.9500000157243, 89.9999999874719],
              [-179.9500000157243, -59.98224871364589]
            ]
          ]
        },
        "proj:transform": [
          0.09998538835169517, 0.0, -179.9500000157243, 0.0,
          -0.09998816580074518, 89.9999999874719, 0.0, 0.0, 1.0
        ],
        "start_datetime": "2003-01-01T00:00:00+00:00"
      },
      "stac_version": "1.0.0",
      "stac_extensions": [
        "https://stac-extensions.github.io/projection/v1.0.0/schema.json",
        "https://stac-extensions.github.io/raster/v1.1.0/schema.json"
      ]
    }
  ],
  "links": [
    {
      "rel": "items",
      "type": "application/geo+json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend/items"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/"
    },
    {
      "rel": "self",
      "type": "application/json",
      "href": "https://dev-stac.delta-backend.com/collections/lis-global-da-tws-trend"
    }
  ]
}

Comparison

On comparison, the STAC records look exactly the same for ingests from both systems [1] and [2].

Note: Did notice a discrepancy where the context["matched"] value is wrong for the airflow ingestion, but that's an auto-generated value and not because of any ingestion faults, right @anayeaye?

PI Objective

https://github.com/NASA-IMPACT/veda-architecture/issues/164

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.