Code Monkey home page Code Monkey logo

biggis-landuse's Introduction

biggis-landuse

Land use update detection based on Geotrellis and Spark

Quick and dirty usage example

# first, we compile everything and produce a fat jar
# which contains all the dependences
mvn package

# now we can run the example app
java -cp target/biggis-landuse-0.0.8-SNAPSHOT.jar \
  biggis.landuse.spark.examples.GeotiffToPyramid \
  /path/to/raster.tif \
  new_layer_name \
  /path/to/catalog-dir

GettingStarted Example

Code for this example is located inside src/main/scala/biggis.landuse.spark.examples/GettingStarted.scala

# based on https://github.com/geotrellis/geotrellis-landsat-tutorial
# download examples from geotrellis-landsat-tutorial
# to data/geotrellis-landsat-tutorial
wget http://landsat-pds.s3.amazonaws.com/L8/107/035/LC81070352015218LGN00/LC81070352015218LGN00_B3.TIF
wget http://landsat-pds.s3.amazonaws.com/L8/107/035/LC81070352015218LGN00/LC81070352015218LGN00_B4.TIF
wget http://landsat-pds.s3.amazonaws.com/L8/107/035/LC81070352015218LGN00/LC81070352015218LGN00_B5.TIF
wget http://landsat-pds.s3.amazonaws.com/L8/107/035/LC81070352015218LGN00/LC81070352015218LGN00_BQA.TIF
wget http://landsat-pds.s3.amazonaws.com/L8/107/035/LC81070352015218LGN00/LC81070352015218LGN00_MTL.txt

Using an IDE

We strongly recommend using an IDE for Scala development, in particular IntelliJ IDEA which has a better support for Scala than Eclipse

For IDE Builds please select Maven Profile IDE before running to avoid using scope provided (necessary for cluster builds)

Since Geotrellis uses Apache Spark for processing, we need to set the spark.master property first.

  • For local debugging, the easiest option is to set the VM command line argument to -Dspark.master=local[*].
  • Other option for local debugging, that is closer to a cluster setup is to run geotrellis in a docker container as implemented in biggis-spark. In this case, use -Dspark.master=spark://localhost:7077
  • Third option is to use a real cluster which can run on the same docker-based infrastructure from biggis-spark

Geotrellis always work with a "catalog" which is basically a directory either in local filesystem or in HDFS. You might want to use target/geotrellis-catalog during development. This way, the catalog will be deleted when running mvn clean and won't be included into git repository.

biggis-landuse's People

Contributors

aklink avatar vsimko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

biggis-landuse's Issues

ToDo: Get rid of hardcoded spark jars

In biggis.landuse.spark.examples.Utils.scala (oriiginally using Spark 1.6.2) it was necessary to Initialize SparkContext with the propper Jars, which was done hardcoded:

def initSparkClusterContext: SparkContext = {
//TODO: get rid of the hardcoded JAR
sparkConf.setJars(Seq("hdfs:///jobs/landuse-example/biggis-landuse-0.0.7-SNAPSHOT.jar"))

Spark 2.0 (we are using Spark 2.2) introduced SparkSession as a Container for SparkContext:

SparkSession.builder
.config("spark.jars", "hdfs:///jobs/landuse-example/biggis-landuse-0.0.7-SNAPSHOT.jar")
.getOrCreate()

To avoid hardcoded spark jars "spark.jars" has to be set externally using JSON in Spark Submit Master.

Otherwise the following error occurs:

Java.lang.CassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

Maven build failes due to reflection issue in Java 10

biggis-landuse is developed with scala for jdk 1.8, it is not compatible to java 10

if maven build fails with reflection issue it might be due to wrong java version. Please install Java 8:
sudo apt-get install openjdk-8-jdk
and set it to higher priority than java 10:
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-8-oracle/jre/bin/java 1181
or change default:
sudo update-alternatives --config java
see: https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04

MultibandGeotiffTilingExample invalid hdfs path

MultibandGeotiffTilingExample uses a wrong hdfs path, when running in a docker container that was not started in cluster mode (started via docker-compose)

when importing

  • hdfs:///landuse-demo/landuse/dop ("hdfs://" + "/landuse-demo/landuse/dop")

then the path is falsely truncated to

  • hdfs:/landuse-demo/landuse/dop ("hdfs:" + "/landuse-demo/landuse/dop")

similar behaviour to already closed issue #15,
in which a serialization of the hadoop config was necessary.

implicit val conf: Configuration = sc.hadoopConfiguration
val serConf = new SerializableConfiguration(conf)

unfortunatelly MultibandGeotiffTilingExample uses SparkContext for hadoopMultibandGeoTiffRDD directly, not the (to be serialized) hadoopConfiguration

sc.hadoopMultibandGeoTiffRDD

ERROR org.apache.spark.executor.Executor ... scala.MatchError: Some() at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.getEllipsoidInfo

When I try:

java -cp target/biggis-landuse-0.0.1-SNAPSHOT.jar
biggis.landuse.spark.examples.GeotiffToPyramid
./data/DOP_RGBI_T2.tif
new_layer
./data/pyramid

I get the following error message:

15:58:50 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
scala.MatchError: Some() (of class scala.Some)
at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.getEllipsoidInfo(GeoTiffCSParser.scala:570)
at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.createGeoTiffCSParameters(GeoTiffCSParser.scala:172)
at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.geoTiffCSParameters$lzycompute(GeoTiffCSParser.scala:78)
at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.geoTiffCSParameters(GeoTiffCSParser.scala:78)
at geotrellis.raster.io.geotiff.reader.GeoTiffCSParser.model(GeoTiffCSParser.scala:80)
at geotrellis.raster.io.geotiff.tags.TiffTags.crs$lzycompute(TiffTags.scala:207)
at geotrellis.raster.io.geotiff.tags.TiffTags.crs(TiffTags.scala:205)
at geotrellis.raster.io.geotiff.reader.GeoTiffReader$.readGeoTiffInfo(GeoTiffReader.scala:315)
at geotrellis.raster.io.geotiff.reader.GeoTiffReader$.readSingleband(GeoTiffReader.scala:67)
at geotrellis.raster.io.geotiff.reader.GeoTiffReader$.readSingleband(GeoTiffReader.scala:61)
at geotrellis.raster.io.geotiff.SinglebandGeoTiff$.apply(SinglebandGeoTiff.scala:40)
at geotrellis.spark.io.hadoop.formats.GeotiffInputFormat.read(GeotiffInputFormat.scala:28)
at geotrellis.spark.io.hadoop.formats.BinaryFileInputFormat$$anonfun$createRecordReader$1.apply(BinaryFileInputFormat.scala:34)
at geotrellis.spark.io.hadoop.formats.BinaryFileInputFormat$$anonfun$createRecordReader$1.apply(BinaryFileInputFormat.scala:34)
at geotrellis.spark.io.hadoop.formats.BinaryFileRecordReader.initialize(BinaryFileInputFormat.scala:18)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:158)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Maven build fails recently with maven-surefire-plugin 2.x

Maven Build fails recently

tests failing using maven-surefire-plugin in recent builds (affects all branches, even old snapshots)
affects 2.18.1 (used in pom.xml), as well as all versions until 2.22.1
seems to be fixed in:
3.0.0-M1

https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-surefire-plugin/3.0.0-M1

<!-- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-surefire-plugin -->
<dependency>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>3.0.0-M1</version>
</dependency>

might be related to:
https://issues.apache.org/jira/projects/SUREFIRE/issues/SUREFIRE-1574?filter=allopenissues

No configuration setting found for key 'akka.version'

When I try:

java -cp target/biggis-landuse-0.0.1-SNAPSHOT.jar \
  biggis.landuse.spark.examples.GeotiffToPyramid \
  ./data/DOP_RGBI_T2.tif \
  new_layer \
  ./data/pyramid

I get the following error message:

09:25:35 INFO  org.apache.spark.util.Utils                                   - Successfully started service 'sparkDriver' on port 54655.
09:25:35 ERROR org.apache.spark.SparkContext                                 - Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
        at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
        at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:169)
        at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:505)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
        at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)
        at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015)
        at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
        at biggis.landuse.spark.examples.GeotiffToPyramid$.apply(GeotiffToPyramid.scala:55)
        at biggis.landuse.spark.examples.GeotiffToPyramid$.main(GeotiffToPyramid.scala:41)
        at biggis.landuse.spark.examples.GeotiffToPyramid.main(GeotiffToPyramid.scala)

Raster tile to pixels and back

While writing a paper about BigGIS architecture, I realized that for the "pixelization" operation, we actually don't want the (ts, lat, lon) coordinates because they are more complicated to use when converting pixels back to a tile.
Instead, I would propose to use (sfc, offset) coordinates.

  • Each tile has a space-filling curve index sfc = HASH_SFC(gridx, gridy, ts) generated by geotrellis.
  • Each pixel within a tile has an offset = row * width + column

What do you think?

LayerToGeotiff/MultibandLayerToGeotiff invalid hdfs path

Invalid hdfs path issue in:

when exporting GeoTiff using hdfs path, e.g.

  • hdfs:///landuse-demo/landuse/out ("hdfs://" + "/landuse-demo/landuse/out")

then the path is falsely truncated to

  • hdfs:/landuse-demo/landuse/out ("hdfs:" + "/landuse-demo/landuse/out")

by GeoTiff.write(filename) or MultibandGeoFiff.write(filename) , causing
java.io.FileNotFoundException: hdfs:/landuse-demo/landuse/out/result.tif (No such file or directory))

Possible reason is that GeoTiff.write uses standard java.io to local filesystem.

A similar issue reading/writing JSON files in hdfs could be solved by using

// Hadoop Config is accessible from SparkContext
implicit val fs : FileSystem = FileSystem.get(sc.hadoopConfiguration); 

and then replacing java.io.FileWriter by OutputStreamWriter with BufferedOutputStream and fs.create(Path(filename)):

  • val bw = new java.io.BufferedWriter(new java.io.FileWriter(new java.io.File(fileNameJson)))

changed to:

  • val bw = new java.io.BufferedWriter(new java.io.OutputStreamWriter(new java.io.BufferedOutputStream(fs.create(new Path(fileNameJson)))))

The problem is the java.io is hidden inside GeoTiff.write, which only accepts String as filename. There must be an alternative using org.apache.hadoop.fs in Geotrellis!

TODO: Changing / Creating new zoom level (Upsampling / Downsampling)

We need to be able to change resolution / zoom levels if merging (mosaicing or layer stacking) layers with different zoom levels. Current approach selects highest zoom level common to all layers, but it fails if there is no common zoom level.
Also it would be nice to be able to set resolution to a specific value (alos between two zoom levels).

We will try to use ZoomResample for it:

https://github.com/locationtech/geotrellis/blob/master/spark/src/test/scala/geotrellis/spark/resample/ZoomResampleSpec.scala

https://github.com/locationtech/geotrellis/blob/master/spark/src/main/scala/geotrellis/spark/resample/ZoomResample.scala

An other approach is using Regrid:

https://github.com/locationtech/geotrellis/blob/master/spark/src/test/scala/geotrellis/spark/regrid/RegridSpec.scala

https://github.com/locationtech/geotrellis/blob/master/spark/src/main/scala/geotrellis/spark/regrid/Regrid.scala

Read GeoJSON in Cluster fails with invalid hdfs path

UtilsShape.readGeoJSONMultiPolygonLongAttribute
fails in Cluster due to truncated HDFS path in
val collection = GeoJson.fromFile[WithCrs[JsonFeatureCollection]](geojsonName)

see #15 LayerToGeotiff/MultibandLayerToGeotiff invalid hdfs path

Hadoop Layer Writer - Directory already exists

The following code was working fine with Geotrellis 0.10.3 and Spark 1.6.2:

// Create the writer that we will use to store the tiles in the local catalog.
val writer = HadoopLayerWriter(catalogPathHdfs, attributeStore)
val layerId = LayerId(layerName, zoom)
[..]
logger debug "Writing reprojected tiles using space filling curve"
writer.write(layerId, reprojected, ZCurveKeyIndexMethod)

Using Geotrellis 1.0.0 and Spark 2.1.0 now I get the failure:

10:29:29 INFO  org.apache.spark.storage.BlockManagerInfo                     - Removed broadcast_7_piece0 on 10.0.75.1:53190 in memory (size: 6.9 KB, free: 4.1 GB)
Exception in thread "main" geotrellis.spark.io.package$LayerWriteError: Failed to write Layer(name = "layer_label", zoom = 17)
                at geotrellis.spark.io.hadoop.HadoopLayerWriter._write(HadoopLayerWriter.scala:63)
                at geotrellis.spark.io.hadoop.HadoopLayerWriter._write(HadoopLayerWriter.scala:36)
                at geotrellis.spark.io.LayerWriter$class.write(LayerWriter.scala:59)
                at geotrellis.spark.io.hadoop.HadoopLayerWriter.write(HadoopLayerWriter.scala:36)
                at biggis.landuse.spark.examples.MultibandGeotiffTilingExample$.apply(MultibandGeotiffTilingExample.scala:76)
                at biggis.landuse.spark.examples.WorkflowExample$.apply(WorkflowExample.scala:52)
                at biggis.landuse.spark.examples.WorkflowExample$.main(WorkflowExample.scala:15)
                at biggis.landuse.spark.examples.WorkflowExample.main(WorkflowExample.scala)
Caused by: java.lang.Exception: Directory already exists: target/geotrellis-catalog/layer_label/17
                at geotrellis.spark.io.hadoop.HadoopRDDWriter$.write(HadoopRDDWriter.scala:85)
                at geotrellis.spark.io.hadoop.HadoopLayerWriter._write(HadoopLayerWriter.scala:65)
                ... 7 more

The write fails after creating the layer (zoom level 17 and partitions inside exist, also meta data for layer exists). I have cleaned the geotrellis-catalog before (to avoid data conflict between versions).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.