Code Monkey home page Code Monkey logo

pgstac's Introduction

PostgreSQL schema and functions for Spatio Temporal Asset Catalog (STAC)

Test Package version


PGStac

PGDatabase Schema and Functions for Storing and Accessing STAC collections and items in PostgreSQL

STAC Client that uses PGStac available in STAC-FastAPI

PGStac requires Postgresql>=13 and PostGIS>=3. Best performance will be had using PostGIS>=3.1.

PGStac Settings

PGStac installs everything into the pgstac schema in the database. You will need to make sure that this schema is set up in the search_path for the database.

There are additional variables that control the settings used for calculating and displaying context (total row count) for a search, as well as a variable to set the filter language (cql-json or cql-json2). The context is "off" by default, and the default filter language is set to "cql2-json".

Variables can be set either by passing them in via the connection options using your connection library, setting them in the pgstac_settings table or by setting them on the Role that is used to log in to the database.

Example for updating the pgstac_settings table with a new value:

INSERT INTO pgstac_settings (name, value)
VALUES
    ('default-filter-lang', 'cql-json'),
    ('context', 'on')

ON CONFLICT ON CONSTRAINT pgstac_settings_pkey DO UPDATE SET value = excluded.value;

Alternatively, update the role:

ALTER ROLE <username> SET SEARCH_PATH to pgstac, public;
ALTER ROLE <username> SET pgstac.context TO <'on','off','auto'>;
ALTER ROLE <username> SET pgstac.context_estimated_count TO '<number of estimated rows when in auto mode that when an estimated count is less than will trigger a full count>';
ALTER ROLE <username> SET pgstac.context_estimated_cost TO '<estimated query cost from explain when in auto mode that when an estimated cost is less than will trigger a full count>';
ALTER ROLE <username> SET pgstac.context_stats_ttl TO '<an interval string ie "1 day" after which pgstac search will force recalculation of it's estimates>>';

PyPGStac

PGStac includes a Python utility for bulk data loading and managing migrations.

PyPGStac is available on PyPI

pip install pypgstac

Or can be built locally using Poetry

git clone https://github.com/stac-utils/pgstac
cd pgstac/pypgstac
poetry build pypgstac
pip install dist/pypgstac-[version]-py3-none-any.whl
pypgstac --help
Usage: pypgstac [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

  --help                Show this message and exit.

Commands:
  initversion  Get initial version.
  load         Load STAC data into a pgstac database.
  migrate      Migrate a pgstac database.
  pgready      Wait for a pgstac database to accept connections.
  version      Get version from a pgstac database.

PyPGStac will get the database connection settings from the standard PG environment variables:

  • PGHOST=0.0.0.0
  • PGPORT=5432
  • PGUSER=username
  • PGDATABASE=postgis
  • PGPASSWORD=asupersecretpassword

It can also take a DSN database url "postgresql://..." via the --dsn flag.

Migrations

PyPGStac has a utility to help apply migrations to an existing PGStac instance to bring it up to date.

There are two types of migrations:

  • Base migrations install PGStac into a database with no current PGStac installation. These migrations follow the file pattern "pgstac.[version].sql"
  • Incremental migrations are used to move PGStac from one version to the next. These migrations follow the file pattern "pgstac.[version].[fromversion].sql"

Migrations are stored in pypgstac/pypgstac/migration`s and are distributed with the PyPGStac package.

Running Migrations

PyPGStac has a utility for checking the version of an existing PGStac database and applying the appropriate migrations in the correct order. It can also be used to setup a database from scratch.

To create an initial PGStac database or bring an existing one up to date, check you have the pypgstac version installed you want to migrate to and run:

pypgstac migrate

Bulk Data Loading

A python utility is included which allows to load data from any source openable by smart-open using python in a memory efficient streaming manner using PostgreSQL copy. There are options for collections and items and can be used either as a command line or a library.

To load an ndjson of items directly using copy (will fail on any duplicate ids but is the fastest option to load new data you know will not conflict)

pypgstac load items

To load skipping any records that conflict with existing data

pypgstac load items --method insert_ignore

To upsert any records, adding anything new and replacing anything with the same id

pypgstac load items --method upsert

Contribution & Development

PGStac uses a dockerized development environment. You can set this up using:

scripts/setup

To bring up the development database:

scripts/server

To run tests, use:

scripts/test

To rebuild docker images:

scripts/update

To drop into a console, use

scripts/console

To drop into a psql console on the database container, use:

scripts/console --db

To run migrations on the development database, use

scripts/migrate

To stage code and configurations and create template migrations for a version release, use

scripts/stageversion [version or increment type]

Examples:

scripts/stageversion 0.2.8
scripts/stageversion patch # if current version is 0.2.7 will bump to 0.2.8
scripts/stageversion minor # if current version is 0.2.7 will bump to 0.3.0
scripts/stageversion major # if current version is 0.2.7 will bump to 1.0.0

This will create a base migration for the new version and will create incremental migrations between any existing base migrations. The incremental migrations that are automatically generated by this script will have the extension ".staged" on the file. You must manually review (and make any modifications necessary) this file and remove the ".staged" extension to enable the migration.

Making Changes to SQL

All changes to SQL should only be made in the /sql directory. SQL Files will be run in alphabetical order.

Adding Tests

PGStac uses PGTap to test SQL. Tests can be found in tests/pgtap.sql and are run using scripts/test

Release Process

  1. Make sure all your code is added and committed
  2. Create a PR against the main branch
  3. Once the PR has been merged, start the release process.
  4. Use scripts/stagerelease as documented in migrations section above making sure to rename any files ending in ".staged" in the migrations section
  5. Add details for release to the CHANGELOG
  6. Add/Commit any changes
  7. Run tests scripts/test
  8. Create a git tag git tag v0.2.8 using new version number
  9. Push the git tag git push origin v0.2.8
  10. The CI process will push pypgstac to PyPi, create a docker image on ghcr.io, and create a release on github.

pgstac's People

Contributors

bitner avatar lossyrob avatar vincentsarago avatar drnextgis avatar zstatmanweil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.