Code Monkey home page Code Monkey logo

spatialjoin's Introduction

SpatialJoin

Perform Spatial Join on BigData using Hadoop MapReduce

Sample Density Run

In this example, we are joining a set of points from HDFS with a small set of polygons that is loaded from the distributed cache.

Create polygon sample data into local file system:

$ awk -f data.awk | awk -f polygons.awk > /tmp/polygons.txt

Create points sample data into HDFS:

$ awk -f data.awk | hadoop fs -put - points.txt

Build, package and run the tool:

$ mvn -PDensityTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar points.txt file:///tmp/polygons.txt output

Save the summary output into local file system:

$ hadoop fs -cat output/part-* > /tmp/relate.txt

Convert polygon set into a shapefile using ogr2ogr so it can be viewed in ArcMap and related with the above file for visualization:

$ awk -f featurecollection.awk /tmp/polygons.txt > /tmp/fc.json
$ ogr2ogr -f "ESRI Shapefile" /tmp/polygons /tmp/fc.json

Spatial Join ArcMap

Sample Airport/Route Run

In this example of a map side spatial join, I have a list of airport code pairs forming flight routes between two cities. In addition, I have a list of the airport code, city and location in lat/lon format. The task at hand it count the number of times an airport is overflown by the flight routes.

Place the data into HDFS:

$ hadoop fs -put data/airports.csv airports.csv
$ hadoop fs -put data/routes.csv routes.csv

Build, package and run the tool:

$ mvn -PJoinMapTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar routes.csv output

See the output:

$ hadoop fs -cat output/part-00000 | sort -n -r --key=2 | head -10

Sample Polygon/Polygon Spatial Join

In this example of reduce side spatial join, we will join two polygon sets and return the intersection set.

Create the two polygon sets:

$ awk -f data.awk | hadoop fs -put - data1.txt
$ awk -f data.awk | hadoop fs -put - data2.txt

Build, package and run the tool - here I am specifying a cell size of 5 units:

$ mvn -PJoinRedTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar -D com.esri.size=5 data1.txt data2.txt output

To view the result (as long it is small :-), I am using the Esri-Leaflet project.

Create a folder under your web application and extract the data into that folder as GeoJSON formatted files.

$ hadoop fs -cat data1.txt | awk -f geojson.awk -v N=data1 > data1.js
$ hadoop fs -cat data2.txt | awk -f geojson.awk -v N=data2 > data2.js
$ hadoop fs -cat output/part-* | awk -f union.awk > union.js

Copy the file esri-leaflet.js and index.html in the webapp folder to that folder and launch your browser and view the result at http://localhost/myfolder/index.html

Esri Leaflet

Unit Testing

I highly recommend that you look at the test folder - There are two types of unit test cases in there. one that tests the mapper by itself, the reducer by itself and the combination of mapper and reducer. The other type is one that launches a mini cluster that launches a name node, a data node, a job tracker and a task tracker all in one VM. This enable the test case to write some well known data at startup time into HDFS and read the result of the MapReduce job back from HDFS to assert the values.

spatialjoin's People

Contributors

mraad avatar

Watchers

James Cloos avatar sivanantham chellam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.