Code Monkey home page Code Monkey logo

geospatial-etl's Introduction

Extract Transform Load (ETL) Python tools for Geospatial data

This repository includes multiple tools to perform Extract-Transform-Load (ETL) operations on Geospatial data, implemented as components for the WINGS semantic workflow system. WINGS is a system that helps scientists design and run experiments represented as workflows, for more information visit the WINGS website.

The tools are implemented as Python scripts that can also be used from the command line in Linux systems. Internally, the Python scripts are calling GDAL tools. Each script has particular inputs and outputs, and in some cases also parameters to configure the data processing operation.

Application example: Preparing a Digital Elevation Model

Scientific models in Geosciences use spatial datasets such as Shapefiles and Digital Elevation Models DEMs as inputs; however, most of these datasets need to be prepared before they can be feed into the models. Therefore, tools to perform Extract-Transform-Load (ETL) operations over Geospatial data are fundamental for the completion of most experiments. Figure 1 shows the common usage of models in Geosciences.

Figure 1 Figure 1. Common usage of models in Geosciences.

As an example, we use some of the ETL tools in this repository to prepare a DEM for further analysis in hydrological models. The data preparation processing includes merging two raster tiles, reprojecting the result to a local reference system and then clipping it to a specific area of interest defined by a polygon in a Shapefile. Figure 2 illustrates the merging and clipping of two DEM tiles in the region of Aweil Centre in South Sudan.

Figure 2 Figure 2. DEM tiles before and after the data preparation workflow.

These tools are combined to form a complete workflow in the WINGS system as shown in Figure 3. Light blue boxes at the top represent raw datasets, green boxes are parameters such as the coordinate reference system to use, yellow boxes are processing components and dark blue boxes are output datasets.

Figure 3 Figure 3. Workflow for DEM preparation in WINGS.

The resulting DEM is ready for use in multiple Geosciences models as well as conventional GIS Software such as QGIS.


Description of software components

Below you will find brief descriptions of each component, including its inputs, parameters, and outputs.

Receives a GeoTIFF raster file and clips it using a vector clip file.

Inputs

  • A raster file in GeoTIFF format.
  • A vector file in shapefile format.

Parameters

  • None

Outputs

  • A raster file in GeoTIFF format.

Receives a shapefile vector file and returns a shapefile vector file filtered with the elements that correspond to certain field-value pair.

Inputs

  • A vector file in shapefile format.

Parameters

  • Field name.
  • Field value.

Outputs

  • A filtered vector file in shapefile format.

Converts a GeoJSON file into a shapefile.

Inputs

  • A vector file in GeoJSON format.

Parameters

  • EPSG code of the input GeoJSON as defined in the EPSG collection.

Outputs

  • A vector file in shapefile format.

Converts a KML file into a shapefile.

Inputs

  • A vector file in KML format.

Parameters

  • EPSG code of the input KML as defined in the EPSG collection.

Outputs

  • A vector file in shapefile format.

Merges two raster files in GeoTIFF format into one.

Inputs

  • Two raster files in GeoTIFF format.

Parameters

  • None

Outputs

  • A raster file in GeoTIFF format.

Reprojects a GeoJSON file to another coordinate reference system. GeoJSON files are expected to use geographic coordinates in datum WGS 84. See GeoJSON file format reference.

Inputs

  • A vector file in GeoJSON format.

Parameters

  • EPSG code of the input GeoJSON as defined in the EPSG collection.
  • EPSG code of the output GeoJSON as defined in the EPSG collection.

Outputs

  • A vector file in GeoJSON format.

Reprojects a GeoTIFF file to another coordinate reference system.

Inputs

  • A raster file in GeoTIFF format.

Parameters

  • EPSG code of the input GeoTIFF as defined in the EPSG collection.
  • EPSG code of the output GeoTIFF as defined in the EPSG collection.

Outputs

  • A raster file in GeoTIFF format.

Reprojects a shapefile file to another coordinate reference system.

Inputs

  • A vector file in shapefile format.

Parameters

  • EPSG code of the input shapefile as defined in the EPSG collection.
  • EPSG code of the output shapefile as defined in the EPSG collection.

Outputs:

  • A vector file in shapefile format.

Acknowledgements

Juan Carrillo was financially supported for this project by the Mitacs Globalink Research Award and the University of Waterloo Machine Learning Lab. Juan Carrillo gives special thanks to Mark Crowley, Daniel Garijo, and Yolanda Gil for their mentoring and contributions during this research project. Last but not least thanks to the administrative staff at Mitacs and the Information Sciences Institute at University of Southern California for their kind help with multiple requirements.

geospatial-etl's People

Contributors

jmcarrillog avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.