raster-foundry / raster-foundry Goto Github PK

View Code? Open in Web Editor NEW

140.0 29.0 44.0 46.51 MB

The Raster Foundry web application.

Home Page: https://www.rasterfoundry.com

License: Apache License 2.0

Python 4.21% Scala 91.99% Shell 2.03% Groovy 0.30% Dockerfile 0.09% PLpgSQL 1.21% Jinja 0.06% HTML 0.11%

geotrellis raster earth-observation geospatial geospatial-processing

raster-foundry's Introduction

Raster Foundry

Getting Started

Requirements

AWS CLI 1.10+
AWS Account (to store artifacts, secrets)
jabba for managing Java versions
Rollbar Account (error reporting -- optional)

tl;dr:

export AWS_PROFILE=raster-foundry
export RF_SETTINGS_BUCKET=...
jabba use -- if you don't have that version available, also jabba install the version from .jabbarc
./scripts/bootstrap
./scripts/update
./scripts/server

Setting Up AWS Account

There are a set of tasks necessary before starting development in order to provision Raster Foundry. Raster Foundry depends heavily on AWS resources and using AWS resources to manage secrets/containers/artifacts in development. If only local development is being done, the primary resource that will be used are S3 buckets to store secrets.

In the AWS account you need to create a few buckets for the following:

A bucket to house raw data (e.g. geotiffs, JPEG2000, ingest definitions, etc.)
A config bucket that will store secrets for development and or other environments, an exported database for development data
A bucket to house processed data (e.g. thumbnails, processed raster RDDs)

The names of the buckets are not important, but they should be memorable and easy to parse for your own sake. On your host machine you need to set up an AWS profile for the account with the S3 buckets. For instance, to set up an AWS profile called raster-foundry with the AWS cli the following command would be used:

$ aws configure --profile raster-foundry

You will be prompted for an access key and secret key.

Setting Development Environment Variables

The .env.template file is a template file with environment variables that get injected into running containers during development. This file should be copied into the AWS config bucket created after filling in sensitive information (replacing all PLACEHOLDER values with appropriate values for your AWS setup). When provisioning this file is copied to the development environment and injected into containers with docker-compose.

In addition to setting up an AWS account, you must register for an Auth0 account to produce secrets to use in the .env file. You need to go through setting up an application and copying over the client IDs, domain, and secret.

Additionally, if you want to exercise token management in the application, you need to generate a management API app to handle managing the generation of refresh tokens for users via the management API. This is not necessary for most functionality in the application and can be deferred until later if you desire.

The last thing to set up with Auth0 are the allowed callback URLs and logout URLs. These need to be edited to allow interaction for local development from localhost:9091 and localhost:9100.

Development

Raster Foundry follows the approach outlined here ("Scripts to Rule Them All") to have a mostly consistent development experience. We deviate in a few specific ways:

We don't pin / require a specific Java version. The application will eventually run in a jdk8 container, and for reproduction it's helpful to have jabba to be able to describe issues that occur on some Java versions but not others, but largely this does not make a difference at this point.
We expect the user to install jabba on their host, instead of running everything in containers. Users can choose to run everything in containers, but that's not how the development environment is organized by default.

Almost all interaction with consoles and servers can be managed via calls to a script located in ./scripts. Default values for the S3 config and data buckets in addition to AWS profile will be used if they are not set with an environment variable. Before running scripts/bootstrap, these should be injected into your shell environment:

export RF_AWS_PROFILE=raster-foundry
export RF_SETTINGS_BUCKET=rasterfoundry-development-config-us-east-1

After exporting your environment settings, you are ready to get started:

$ ./scripts/bootstrap
$ ./scripts/update
$ ./scripts/server

The servers should come up successfully.

Then, kill your servers. To get the database loaded with sample data, you can run ./scripts/load_development_data --download. This will fetch a database dump from S3 and some development images. You can use these data for consistent testing instructions with other developers. This script will also apply any outstanding migrations not present in the dev database.

Migrations

Database migrations are managed using flyway. You can run flyway commands with scripts/migrate. Some commands you can run are:

scripts/migrate migrate: apply outtanding migrations
scripts/migrate repair: reconcile the checksums of applied migrations in the database with what's present on disk

There is no command to revert migrations.

The workflow for creating a new migration is:

Write a migration in db/src/main/resources/Vxx__migration_name.sql
./scripts/migrate migrate
Verify the schema changes in PostgreSQL with ./scripts/psql

Ports

The Vagrant configuration maps the following host ports to services running in the virtual machines. Ports can be overridden for individual developers using environment variables

Service	Port	Environment Variable
Nginx (api)	`9100`	`RF_PORT_9100`
Application Server (akka)	`9000`	`RF_PORT_9000`
Tile Server (http4s)	`8081`	`RF_PORT_8081`
Application Server (JMX)	`9010`	`RF_PORT_9010`
Tile Server (JMX)	`9030`	`RF_PORT_9030`

Scripts

Helper and development scripts are located in the ./scripts directory at the root of this project. These scripts are designed to encapsulate and perform commonly used actions such as starting a development server, accessing a development console, or running tests.

Script Name	Purpose
`bootstrap`	Pulls/builds necessary containers
`update`	Runs migrations, installs dependencies, etc.
`server`	Starts a development server
`console`	Gives access to a running container via `docker-compose run`
`psql`	Drops you into a `psql` console.
`test`	Runs tests and linters for project
`cibuild`	Invoked by CI server and makes use of `test`.
`cipublish`	Publish container images to container image repositories.
`load_development_data`	Load data for development purposes from S3
`rsync-back`	Perform a one-way `rsync` from the VM to the host.
`process-upload`	Process an upload in development

Testing

Run all the tests:

$ ./scripts/test

Processing Imagery

In staging and production, a batch job will automatically be kicked off for processing after a successful upload. In development, you need to process the upload manually which you can do like so:

$ ./scripts/process-upload <upload_id>

raster-foundry's People

Stargazers

Watchers

Forkers

lossyrob lliss kdeloach gitter-badger alexkapps echeipesh altcoder moradology kalxas danlopez00 drewbo dapperauteur aaronxsu radiantearth amigocloud aashish24 donrv fosskers gijs jbouffard cuulee wsf1990 splashinn jthetzel jisantuc wenhaowuuu notthatbreezy sindile ryanjdillon atararaksin dav1dv iheartgeospatial mdheller chakravarthi-vl limtom pomadchin mark-s-bartlett yili9111 snwlnx williamccondori juacarvalho23 ctk41

raster-foundry's Issues

No indication that we failed to sign the upload URL

If (for whatever reason) the "sign request" endpoint fails, Evaporate silently fails. There is no indication in the UI that we were unable to sign the upload request URL.

May want to consider adding this at the same time we add upload progress indicators.

Research SNS and ElastiCache integration

Create File Upload and Layer Metatdata workflow.

After initial upload screen we need to implement a complex workflow and save metadata to Django.

Currently in Pane 1 we begin the upload immediately (assuming large files). We should evaluate if this is the desired functionality.
Note that the progress bar does not show until the third Pane.
Pane 2 implies that we can cancel uploads using the "x" button. If the file has already made it to S3, this will presumably delete this.
Is the "Total Images" input actually useful or needed? It seems this can be determined from the number of uploaded files (though perhaps this is to deal with zip files).

Implement UUID file names for S3

To avoid name collisions in S3, we should not rely on the incoming file names. We should use a uuid for each file name, probably prepended with the user's id number.
uid-uuidv4
201-f47ac10b-58cc-4372-a567-0e02b2c3d479

Since we cannot assume that the file name is a secret we will need to be careful to enforce uniqueness.

By prepending the user id, we can ensure that even a malicious user will not be able to overwrite another user's data. The front end javascript code can generate the file name. Our signing URL should check to ensure that file name that the user asked to be signed for upload is properly prepended with that user's id. It should fail to sign the request if that is not the case.

This will require that the image model in Django is updated to have a metadata file name (the original file name from the user) as well as the S3 filename (uuid). We may want to allow the metadata filename to be changed.

Remove Leaflet.Draw dependency

I think this was a holdover for the bootstrapping of the app. Don't think there is any vector/draw requirements for this app.

https://github.com/azavea/raster-foundry/blob/develop/src/rf/package.json#L31

Account Managment

Users need to be able to register an account, and have a tracked dedicated amount of space for their account. When a user registers, they are given an API key that they need to be able to sign in and view. This API key will allow them to interact with the API server.

Outstanding questions:

What should be the limit for a free account? My thought is 5G.

Background video resets when switching screens

On the login page, if you click "Forgot?", "Send Activation Email", or "Sign Up" the background video resets. That transition used to be seamless.

Create layer selector component

Create a simple reusable React component for layers. This component is needed on the "My Imports", "Public Catalog", and "Favorites" tabs on the Library page.

Implement Rest API

Now that we've experimented with DRF ViewSets and sketched out the URL structure, actually implement and test the API.

See #5

Layer API

The Layer API server will have the following endpoints. Interaction with any endpoint will require API key verification:

Create Layer

Given a list of images on S3 and additional metadata, tile those images into EFS and create a tile service endpoint. This endpoint returns success=true if the job is submitted successfully or success=false with an error if not.

Tooling will be built to allow for import into s3 from another non-public s3 bucket, the local file system or other storage as an intermediate step.

Example request:

{
    // REQUIRED: Name of the mosaic being created
    // MUST BE UNIQUE TO ACCOUNT.
    "name" : "nepal-imagery",

    // REQUIRED: List of s3 URIs to images.
    // Must be from a publicly accessible or raster-foundry owned bucket.
    // The order of the images is the priority of the images in the mosaic, with
    // images higher in the list being higher priority than images lower in the list.
    // So if a pixel is covered by more than one image, the image higher in the list
    // will have it's pixel shown.
    // Currently, these images have to be in RGB or RBGA format. If RGB, 
    // Pixels with value (0, 0, 0) are considered to be No Data.
    images : [
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-9.tif",
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-24.tif",
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-20.tif"
    ]

    // OPTIONAL. Default: null. A text description of the layer. The
    // description can contain any legal character. Implementations SHOULD NOT
    // interpret the description as HTML.
    "description": "Nepal imagery from Digital Globe",
}

Example responses

{
    "success": true
}

{
    "success": false,
    "error": "Layer with the name 'nepal-imagery' already exists."
}

Layer metadata endpoint

Given a layer name, the status is either processing, failed with an error message message, JSON containing the metadata in TileJSON format with additional fields.

Example Request:

{
    "name": "nepal-imagery"
}

Example Responses:

{
    "name": "nepal-imagery",

    // One of: ["created", "failed", "processing"]
    "state": "processing"
}

{
    "name": "nepal-imagery",
    "state": "failed",
    "error" : "Unable to create mosaic due to account limits. In order to build this mosaic, please upgrade your account."
}

{
    "state": "created",

    // Date the layer was created
    "created" : "2015-07-25T00:32:22-04:00",

    // Location of simple leaflet web viewer
    "viewer" : "http://rasterfoundry.com/username/nepal-imagery/index.html",

    // The rest is taken from TileJSON spec: https://github.com/mapbox/tilejson-spec/tree/master/2.1.0
    // I have removed some optional elements of that spec that may be added later.

    // REQUIRED. A semver.org style version number. Describes the version of
    // the TileJSON spec that is implemented by this JSON object.
    "tilejson": "2.1.0",

    // The name of the layer (provided by the client in Layer Creation API call)
    "name": "nepal-imagery",

    // OPTIONAL. Default: null. A text description of the layer. The
    // description can contain any legal character. Implementations SHOULD NOT
    // interpret the description as HTML.
    "description": "Nepal imagery from Digital Globe",

    // REQUIRED. An array of tile endpoints. {z}, {x} and {y}, if present,
    // are replaced with the corresponding integers. If multiple endpoints are specified, clients
    // may use any combination of endpoints. All endpoints MUST return the same
    // content for the same URL. The array MUST contain at least one endpoint.
    "tiles": [
        "http://rasterfoundry.com/username/nepal-imagery/{z}/{x}/{y}.png"
    ],

    // An integer specifying the minimum zoom level.
    "minzoom": 0,

    // An integer specifying the maximum zoom level. MUST be >= minzoom.
    "maxzoom": 11,

    // The maximum extent of available map tiles. Bounds MUST define an area
    // covered by all zoom levels. The bounds are represented in WGS:84
    // latitude and longitude values, in the order left, bottom, right, top.
    // Values may be integers or floating point numbers.
    "bounds": [ -180, -85.05112877980659, 180, 85.0511287798066 ],

    // The first value is the longitude, the second is latitude (both in
    // WGS:84 values), the third value is the zoom level as an integer.
    // Longitude and latitude MUST be within the specified bounds.
    // The zoom level MUST be between minzoom and maxzoom.
    // Implementations can use this value to set the default location. If the
    // value is null, implementations may use their own algorithm for
    // determining a default location.
    "center": [ -76.275329586789, 39.153492567373, 8 ]
}

Delete Layer

This endpoint allows the user to delete a layer from the account.

Example request:

{
    "name": "nepal-imagery"
}

Example response:

{
    "success": true
}

Research ElastiCache integration

Upload directly to Amazon S3

Given that we expect file uploads for this project to be large, and we want to house raw input on Amazon S3, it may make sense to route user uploads directly to Amazon S3. In order to make this happen, we need to retrieve a signed URL from S3 via our backend, and thread it through to the JavaScript frontend.

A more detailed description, along with Python implementation details can be found here. That's more to understand the process though. Eventually, I think we'd want to lean on a JavaScript library like EvaporateJS that solves with problem for us, along with a custom Django route to provide the signed URL.

The main thing we trade-off here is the ability to do upload validation within Django. That may or not end up being acceptable.

Style password change templates

The password change templates from registration redux need to be modified to match the style of the other RF registration components.

Front-end menu bar and drop-down

Wire-up prototype using JS

Also Favorites and processing (just static for now)

Add registration/auth Rest API

Add a Rest API for user registration and authentication. Try getting django-rest-auth to work, but if not, extract functionality from MMW and make it more generic so it can be reused more easily.

Endpoint to receive image verification status

As outlined in #76 we will utilize Lambda to perform unzipping and image verification. After that is complete, a message should be posted to an SNS topic. That topic is responsible for posting a message to our application so that we can track all verified images before submitting the layer for geoprocessing.

We need an endpoint that can receive an SNS message containing a raster image ID and status. The structure of that request will need to be determined. Additional tables or other changes to the database structure may be needed to support this.

Add Django registration redux

Research Swagger and DRF integration

Figure out how Swagger integrates with DRF and start to formulate plans for REST API endpoints (See #6).

Provide drag and drop file uploads.

In the prototype there is a drag and drop area that should kick off file uploads to S3.

Front-end account screen

Wire-up prototype using JS

Wire up upload metadata modal

Cannot reset password

Though a password reset link is generated our front end does not display the proper forms and so the user never has an opportunity to reset the password.

Related to #63

Add staging environment

Initiate initial design meetings for application backend

Handling of external file sources.

Decide how to handle external file sources:

See right panel.

Front-end layer selector

Wire-up prototype using Javascript

Incorporate new upload screen from updated prototype

Incorporate new upload screen from updated prototype.
See #68

Foundry command line client

A python (debatable) client for the Raster Foundry APIs.

Requires configuration with an API key, set either through an environment variable, ~/.foundry config, or the like.

Commands

`foundry ls`

This command will list the names of a user's image layers.

`foundry show [layer-name]`

This command will describe the layer. If the layer has a processing status, show the user that fact, with a percentage complete if available through the API.

Otherwise, show the metadata described in #5.

`foundry import [layer-name] [image uris]*`

Import the set of images described in the URIs as a layer in the user's Raster Foundry account.
URIs can be local files or s3 URIs (to start).

`foundry rm [layer-name]`

Deletes a layer from Raster Foundry.

How will zip files be handled?

Prototype assumes, zip files can be uploaded to S3.

Will raster processor work with zips directly?
Should we unzip them and then upload them to S3?
Can we unzip directly in S3?
Should we drop this functionality?

Front-end login, logout, registration

Wire-up prototype using Javascript

User registration workflow

Add ability to register a new user account and login.

Add sample spray service

Create a test spray service inside geop VM to test endpoint integration.

Change favicon

Currently, the favicon is the one used for MMW.

Research uploading images to S3

Unzip zip files on the backend

Using AWS Lambda and then store the unzipped files on S3

Connect login and logout UI actions to backend

Actually call the login and logout endpoints from the client

Research image verification

Using GDAL

Setup development Vagrant environment

This should include a skeleton Django app for testing.

Validate user uploads

Create endpoint to associate file uploads with a registered user to restrict unregistered users from uploading.

Investigate validation worker options

Due to the fact that Layer Images are being directly uploaded from the client to S3, and it's an expensive operation to boot up a Spark cluster to begin processing, we should investigate ways to have a validation service determine if the files are worth further processing. The test may be as simple as detecting if GDAL can read it. Since we're allowing zipped files directly from the client, we could also piggy back the service to expand-validate-rewrite the zip files prior to processing.

Possible flow:
Upload to S3 -> triggers AWS Lambda job which tries to read (or unzips and reads) the Images -> sends or queues a message that the processing should begin (processing service) or that there are errors (app server).