Code Monkey home page Code Monkey logo

raster-foundry's Introduction

Raster Foundry

Getting Started

Requirements

  • AWS CLI 1.10+
  • AWS Account (to store artifacts, secrets)
  • jabba for managing Java versions
  • Rollbar Account (error reporting -- optional)

tl;dr:

  • export AWS_PROFILE=raster-foundry
  • export RF_SETTINGS_BUCKET=...
  • jabba use -- if you don't have that version available, also jabba install the version from .jabbarc
  • ./scripts/bootstrap
  • ./scripts/update
  • ./scripts/server

Setting Up AWS Account

There are a set of tasks necessary before starting development in order to provision Raster Foundry. Raster Foundry depends heavily on AWS resources and using AWS resources to manage secrets/containers/artifacts in development. If only local development is being done, the primary resource that will be used are S3 buckets to store secrets.

In the AWS account you need to create a few buckets for the following:

  • A bucket to house raw data (e.g. geotiffs, JPEG2000, ingest definitions, etc.)
  • A config bucket that will store secrets for development and or other environments, an exported database for development data
  • A bucket to house processed data (e.g. thumbnails, processed raster RDDs)

The names of the buckets are not important, but they should be memorable and easy to parse for your own sake. On your host machine you need to set up an AWS profile for the account with the S3 buckets. For instance, to set up an AWS profile called raster-foundry with the AWS cli the following command would be used:

$ aws configure --profile raster-foundry

You will be prompted for an access key and secret key.

Setting Development Environment Variables

The .env.template file is a template file with environment variables that get injected into running containers during development. This file should be copied into the AWS config bucket created after filling in sensitive information (replacing all PLACEHOLDER values with appropriate values for your AWS setup). When provisioning this file is copied to the development environment and injected into containers with docker-compose.

In addition to setting up an AWS account, you must register for an Auth0 account to produce secrets to use in the .env file. You need to go through setting up an application and copying over the client IDs, domain, and secret.

Additionally, if you want to exercise token management in the application, you need to generate a management API app to handle managing the generation of refresh tokens for users via the management API. This is not necessary for most functionality in the application and can be deferred until later if you desire.

The last thing to set up with Auth0 are the allowed callback URLs and logout URLs. These need to be edited to allow interaction for local development from localhost:9091 and localhost:9100.

Development

Raster Foundry follows the approach outlined here ("Scripts to Rule Them All") to have a mostly consistent development experience. We deviate in a few specific ways:

  • We don't pin / require a specific Java version. The application will eventually run in a jdk8 container, and for reproduction it's helpful to have jabba to be able to describe issues that occur on some Java versions but not others, but largely this does not make a difference at this point.
  • We expect the user to install jabba on their host, instead of running everything in containers. Users can choose to run everything in containers, but that's not how the development environment is organized by default.

Almost all interaction with consoles and servers can be managed via calls to a script located in ./scripts. Default values for the S3 config and data buckets in addition to AWS profile will be used if they are not set with an environment variable. Before running scripts/bootstrap, these should be injected into your shell environment:

export RF_AWS_PROFILE=raster-foundry
export RF_SETTINGS_BUCKET=rasterfoundry-development-config-us-east-1

After exporting your environment settings, you are ready to get started:

$ ./scripts/bootstrap
$ ./scripts/update
$ ./scripts/server

The servers should come up successfully.

Then, kill your servers. To get the database loaded with sample data, you can run ./scripts/load_development_data --download. This will fetch a database dump from S3 and some development images. You can use these data for consistent testing instructions with other developers. This script will also apply any outstanding migrations not present in the dev database.

Migrations

Database migrations are managed using flyway. You can run flyway commands with scripts/migrate. Some commands you can run are:

  • scripts/migrate migrate: apply outtanding migrations
  • scripts/migrate repair: reconcile the checksums of applied migrations in the database with what's present on disk

There is no command to revert migrations.

The workflow for creating a new migration is:

  • Write a migration in db/src/main/resources/Vxx__migration_name.sql
  • ./scripts/migrate migrate
  • Verify the schema changes in PostgreSQL with ./scripts/psql

Ports

The Vagrant configuration maps the following host ports to services running in the virtual machines. Ports can be overridden for individual developers using environment variables

Service Port Environment Variable
Nginx (api) 9100 RF_PORT_9100
Application Server (akka) 9000 RF_PORT_9000
Tile Server (http4s) 8081 RF_PORT_8081
Application Server (JMX) 9010 RF_PORT_9010
Tile Server (JMX) 9030 RF_PORT_9030

Scripts

Helper and development scripts are located in the ./scripts directory at the root of this project. These scripts are designed to encapsulate and perform commonly used actions such as starting a development server, accessing a development console, or running tests.

Script Name Purpose
bootstrap Pulls/builds necessary containers
update Runs migrations, installs dependencies, etc.
server Starts a development server
console Gives access to a running container via docker-compose run
psql Drops you into a psql console.
test Runs tests and linters for project
cibuild Invoked by CI server and makes use of test.
cipublish Publish container images to container image repositories.
load_development_data Load data for development purposes from S3
rsync-back Perform a one-way rsync from the VM to the host.
process-upload Process an upload in development

Testing

Run all the tests:

$ ./scripts/test

Processing Imagery

In staging and production, a batch job will automatically be kicked off for processing after a successful upload. In development, you need to process the upload manually which you can do like so:

$ ./scripts/process-upload <upload_id>

raster-foundry's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raster-foundry's Issues

No indication that we failed to sign the upload URL

If (for whatever reason) the "sign request" endpoint fails, Evaporate silently fails. There is no indication in the UI that we were unable to sign the upload request URL.

May want to consider adding this at the same time we add upload progress indicators.

Create File Upload and Layer Metatdata workflow.

After initial upload screen we need to implement a complex workflow and save metadata to Django.

  • Currently in Pane 1 we begin the upload immediately (assuming large files). We should evaluate if this is the desired functionality.
  • Note that the progress bar does not show until the third Pane.
  • Pane 2 implies that we can cancel uploads using the "x" button. If the file has already made it to S3, this will presumably delete this.
  • Is the "Total Images" input actually useful or needed? It seems this can be determined from the number of uploaded files (though perhaps this is to deal with zip files).

screencap1
screencap2
screencap3

Implement UUID file names for S3

To avoid name collisions in S3, we should not rely on the incoming file names. We should use a uuid for each file name, probably prepended with the user's id number.
uid-uuidv4
201-f47ac10b-58cc-4372-a567-0e02b2c3d479

Since we cannot assume that the file name is a secret we will need to be careful to enforce uniqueness.

By prepending the user id, we can ensure that even a malicious user will not be able to overwrite another user's data. The front end javascript code can generate the file name. Our signing URL should check to ensure that file name that the user asked to be signed for upload is properly prepended with that user's id. It should fail to sign the request if that is not the case.

This will require that the image model in Django is updated to have a metadata file name (the original file name from the user) as well as the S3 filename (uuid). We may want to allow the metadata filename to be changed.

Account Managment

Users need to be able to register an account, and have a tracked dedicated amount of space for their account. When a user registers, they are given an API key that they need to be able to sign in and view. This API key will allow them to interact with the API server.

Outstanding questions:

  • What should be the limit for a free account? My thought is 5G.

Create layer selector component

Create a simple reusable React component for layers. This component is needed on the "My Imports", "Public Catalog", and "Favorites" tabs on the Library page.

image

Implement Rest API

Now that we've experimented with DRF ViewSets and sketched out the URL structure, actually implement and test the API.

See #5

Layer API

The Layer API server will have the following endpoints. Interaction with any endpoint will require API key verification:

Create Layer

Given a list of images on S3 and additional metadata, tile those images into EFS and create a tile service endpoint. This endpoint returns success=true if the job is submitted successfully or success=false with an error if not.

Tooling will be built to allow for import into s3 from another non-public s3 bucket, the local file system or other storage as an intermediate step.

Example request:

{
    // REQUIRED: Name of the mosaic being created
    // MUST BE UNIQUE TO ACCOUNT.
    "name" : "nepal-imagery",

    // REQUIRED: List of s3 URIs to images.
    // Must be from a publicly accessible or raster-foundry owned bucket.
    // The order of the images is the priority of the images in the mosaic, with
    // images higher in the list being higher priority than images lower in the list.
    // So if a pixel is covered by more than one image, the image higher in the list
    // will have it's pixel shown.
    // Currently, these images have to be in RGB or RBGA format. If RGB, 
    // Pixels with value (0, 0, 0) are considered to be No Data.
    images : [
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-9.tif",
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-24.tif",
        "http://hotosm-oam.s3.amazonaws.com/356f564e3a0dc9d15553c17cf4583f21-20.tif"
    ]

    // OPTIONAL. Default: null. A text description of the layer. The
    // description can contain any legal character. Implementations SHOULD NOT
    // interpret the description as HTML.
    "description": "Nepal imagery from Digital Globe",
}

Example responses

{
    "success": true
}
{
    "success": false,
    "error": "Layer with the name 'nepal-imagery' already exists."
}

Layer metadata endpoint

Given a layer name, the status is either processing, failed with an error message message, JSON containing the metadata in TileJSON format with additional fields.

Example Request:

{
    "name": "nepal-imagery"
}

Example Responses:

{
    "name": "nepal-imagery",

    // One of: ["created", "failed", "processing"]
    "state": "processing"
}
{
    "name": "nepal-imagery",
    "state": "failed",
    "error" : "Unable to create mosaic due to account limits. In order to build this mosaic, please upgrade your account."
}
{
    "state": "created",

    // Date the layer was created
    "created" : "2015-07-25T00:32:22-04:00",

    // Location of simple leaflet web viewer
    "viewer" : "http://rasterfoundry.com/username/nepal-imagery/index.html",

    // The rest is taken from TileJSON spec: https://github.com/mapbox/tilejson-spec/tree/master/2.1.0
    // I have removed some optional elements of that spec that may be added later.

    // REQUIRED. A semver.org style version number. Describes the version of
    // the TileJSON spec that is implemented by this JSON object.
    "tilejson": "2.1.0",

    // The name of the layer (provided by the client in Layer Creation API call)
    "name": "nepal-imagery",

    // OPTIONAL. Default: null. A text description of the layer. The
    // description can contain any legal character. Implementations SHOULD NOT
    // interpret the description as HTML.
    "description": "Nepal imagery from Digital Globe",

    // REQUIRED. An array of tile endpoints. {z}, {x} and {y}, if present,
    // are replaced with the corresponding integers. If multiple endpoints are specified, clients
    // may use any combination of endpoints. All endpoints MUST return the same
    // content for the same URL. The array MUST contain at least one endpoint.
    "tiles": [
        "http://rasterfoundry.com/username/nepal-imagery/{z}/{x}/{y}.png"
    ],

    // An integer specifying the minimum zoom level.
    "minzoom": 0,

    // An integer specifying the maximum zoom level. MUST be >= minzoom.
    "maxzoom": 11,

    // The maximum extent of available map tiles. Bounds MUST define an area
    // covered by all zoom levels. The bounds are represented in WGS:84
    // latitude and longitude values, in the order left, bottom, right, top.
    // Values may be integers or floating point numbers.
    "bounds": [ -180, -85.05112877980659, 180, 85.0511287798066 ],

    // The first value is the longitude, the second is latitude (both in
    // WGS:84 values), the third value is the zoom level as an integer.
    // Longitude and latitude MUST be within the specified bounds.
    // The zoom level MUST be between minzoom and maxzoom.
    // Implementations can use this value to set the default location. If the
    // value is null, implementations may use their own algorithm for
    // determining a default location.
    "center": [ -76.275329586789, 39.153492567373, 8 ]
}

Delete Layer

This endpoint allows the user to delete a layer from the account.

Example request:

{
    "name": "nepal-imagery"
}

Example response:

{
    "success": true
}

Upload directly to Amazon S3

Given that we expect file uploads for this project to be large, and we want to house raw input on Amazon S3, it may make sense to route user uploads directly to Amazon S3. In order to make this happen, we need to retrieve a signed URL from S3 via our backend, and thread it through to the JavaScript frontend.

A more detailed description, along with Python implementation details can be found here. That's more to understand the process though. Eventually, I think we'd want to lean on a JavaScript library like EvaporateJS that solves with problem for us, along with a custom Django route to provide the signed URL.

The main thing we trade-off here is the ability to do upload validation within Django. That may or not end up being acceptable.

Style password change templates

The password change templates from registration redux need to be modified to match the style of the other RF registration components.

Endpoint to receive image verification status

As outlined in #76 we will utilize Lambda to perform unzipping and image verification. After that is complete, a message should be posted to an SNS topic. That topic is responsible for posting a message to our application so that we can track all verified images before submitting the layer for geoprocessing.

We need an endpoint that can receive an SNS message containing a raster image ID and status. The structure of that request will need to be determined. Additional tables or other changes to the database structure may be needed to support this.

Cannot reset password

Though a password reset link is generated our front end does not display the proper forms and so the user never has an opportunity to reset the password.

Related to #63

Foundry command line client

A python (debatable) client for the Raster Foundry APIs.

Requires configuration with an API key, set either through an environment variable, ~/.foundry config, or the like.

Commands

foundry ls

This command will list the names of a user's image layers.

foundry show [layer-name]

This command will describe the layer. If the layer has a processing status, show the user that fact, with a percentage complete if available through the API.

Otherwise, show the metadata described in #5.

foundry import [layer-name] [image uris]*

Import the set of images described in the URIs as a layer in the user's Raster Foundry account.
URIs can be local files or s3 URIs (to start).

foundry rm [layer-name]

Deletes a layer from Raster Foundry.

How will zip files be handled?

Prototype assumes, zip files can be uploaded to S3.

  • Will raster processor work with zips directly?
  • Should we unzip them and then upload them to S3?
  • Can we unzip directly in S3?
  • Should we drop this functionality?

screencap

Validate user uploads

Create endpoint to associate file uploads with a registered user to restrict unregistered users from uploading.

Investigate validation worker options

Due to the fact that Layer Images are being directly uploaded from the client to S3, and it's an expensive operation to boot up a Spark cluster to begin processing, we should investigate ways to have a validation service determine if the files are worth further processing. The test may be as simple as detecting if GDAL can read it. Since we're allowing zipped files directly from the client, we could also piggy back the service to expand-validate-rewrite the zip files prior to processing.

Possible flow:
Upload to S3 -> triggers AWS Lambda job which tries to read (or unzips and reads) the Images -> sends or queues a message that the processing should begin (processing service) or that there are errors (app server).

Setup Jenkins jobs

This includes jobs to build the project and run unit tests for the develop branch and pull requests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.