SpatialJoin

Perform Spatial Join on BigData using Hadoop MapReduce

Sample Density Run

In this example, we are joining a set of points from HDFS with a small set of polygons that is loaded from the distributed cache.

Create polygon sample data into local file system:

$ awk -f data.awk | awk -f polygons.awk > /tmp/polygons.txt

Create points sample data into HDFS:

$ awk -f data.awk | hadoop fs -put - points.txt

Build, package and run the tool:

$ mvn -PDensityTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar points.txt file:///tmp/polygons.txt output

Save the summary output into local file system:

$ hadoop fs -cat output/part-* > /tmp/relate.txt

Convert polygon set into a shapefile using ogr2ogr so it can be viewed in ArcMap and related with the above file for visualization:

$ awk -f featurecollection.awk /tmp/polygons.txt > /tmp/fc.json
$ ogr2ogr -f "ESRI Shapefile" /tmp/polygons /tmp/fc.json

Sample Airport/Route Run

In this example of a map side spatial join, I have a list of airport code pairs forming flight routes between two cities. In addition, I have a list of the airport code, city and location in lat/lon format. The task at hand it count the number of times an airport is overflown by the flight routes.

Place the data into HDFS:

$ hadoop fs -put data/airports.csv airports.csv
$ hadoop fs -put data/routes.csv routes.csv

Build, package and run the tool:

$ mvn -PJoinMapTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar routes.csv output

See the output:

$ hadoop fs -cat output/part-00000 | sort -n -r --key=2 | head -10

Sample Polygon/Polygon Spatial Join

In this example of reduce side spatial join, we will join two polygon sets and return the intersection set.

Create the two polygon sets:

$ awk -f data.awk | hadoop fs -put - data1.txt
$ awk -f data.awk | hadoop fs -put - data2.txt

Build, package and run the tool - here I am specifying a cell size of 5 units:

$ mvn -PJoinRedTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar -D com.esri.size=5 data1.txt data2.txt output

To view the result (as long it is small :-), I am using the Esri-Leaflet project.

Create a folder under your web application and extract the data into that folder as GeoJSON formatted files.

$ hadoop fs -cat data1.txt | awk -f geojson.awk -v N=data1 > data1.js
$ hadoop fs -cat data2.txt | awk -f geojson.awk -v N=data2 > data2.js
$ hadoop fs -cat output/part-* | awk -f union.awk > union.js

Copy the file esri-leaflet.js and index.html in the webapp folder to that folder and launch your browser and view the result at http://localhost/myfolder/index.html

Unit Testing

I highly recommend that you look at the test folder - There are two types of unit test cases in there. one that tests the mapper by itself, the reducer by itself and the combination of mapper and reducer. The other type is one that launches a mini cluster that launches a name node, a data node, a job tracker and a task tracker all in one VM. This enable the test case to write some well known data at startup time into HDFS and read the result of the MapReduce job back from HDFS to assert the values.

siva1987c / spatialjoin Goto Github PK

spatialjoin's Introduction

SpatialJoin

Sample Density Run

Sample Airport/Route Run

Sample Polygon/Polygon Spatial Join

Unit Testing

spatialjoin's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent