tilezen / joerd Goto Github PK
View Code? Open in Web Editor NEWJoerd can be used to download, merge and generate tiles from digital elevation data
License: MIT License
Joerd can be used to download, merge and generate tiles from digital elevation data
License: MIT License
Baldur to the rescue!
At the moment, files are downloaded which intersect the region(s) of interest. Also, tiles are generated which intersect the region(s) of interest. However, given that a tile might overhang the edge of the region of interest, the files which are downloaded should encompass all tiles which might be generated.
Loop over the tiles first, and union their bounding boxes to provide the regions input for the download stage.
From the USGS NED site:
1-meter – This dataset was introduced in 2015 with limited coverage of the U.S.,
but will be expanding as new DEMs from 3DEP quality level 2 or better lidar data
are acquired. Horizontal coordinates are referenced to the Universal Transverse
Mercator projection.
Note the different projection.
The current design assumes that bounding boxes for each raster tile can be determined without downloading the tile - usually from coordinates embedded in the tile name. This is mostly useful because there are many very large datasets (e.g: SRTM, NED) and it would be onerous to have to download all of them if you only wanted to render a small part of the world.
However, this assumption means that it's more difficult to write data sources which don't adhere to this pattern (e.g: Great Lakes).
A possible fix is to introduce a new phase to enumerate / index all the tiles within a source, which could involve downloading the source, or a metadata file, which would provide the bounds.
To run this across multiple processes we will need:
Upon receiving a job from the queue (which should be a large section of tiles, to amortize download costs), the server needs to run the download
, buildvrt
and generate
jobs with the region configured as per the incoming request.
An issue that seems to keep happening with GMTED data is that the server will drop the connection mid-file and subsequent requests will get an "empty response" for some period of time. This appears to be some sort of TCP-level rate-limiting, but is currently undetectable to us until we try to build the VRT.
What should happen:
urlgrabber
?)These can take a while to fetch, so it's worth caching them.
NED contains many files which overlap each other, some even in the same year. For example, overlapping chunks near SF;
All of the above have the same bounding box, although the first one is the only one which covers the whole bounding box without large regions of NODATA. It would be good to have some way of detecting this, so that the other files aren't downloaded and don't have the opportunity to make stitching the data back up any harder than it has to be.
We have the following problem near the coast:
In this example, the red data is something like GMTED or SRTM for which we have no bathymetry, and the blue data is something of lower resolution but with bathymetry, such as ETOPO1. The problem occurs because the NODATA (represented by the empty circle) can't interpolate into the sea - and we wouldn't want it to, as we have no data there. But the lower resolution dataset fills in with something that will probably be a shallower gradient, given the wider spacing between data points. This leads to a "halo" around the land, and it's particularly bad near cliffs. It gets worse still when there's an offset mis-match between the data sets and the coastline is at different positions.
There are many ways to fix this. Some random thoughts:
There will be other ways that I've missed. Thoughts?
The logger path at the moment is relative to $CWD
, but it should be relative to the location of the config file, as this means fewer surprises when running Joerd from arbitrary locations.
From the Tangram Team, request adding normal map files for hill shading.This would be more optimized to do on the server than re-computing over and over in the client.
32-bit PNG, 3 channels of normal map info, 1 channel of 8-bit quantized elevation (not for showing actual height, but a rough indicator of where the pixel falls relative to sea level, for styling).
Larger machines will have spare RAM (looks like at least 4GB spare on the dev cluster machines). Using that for cache can only help.
Using the Lanczos filter everywhere not only takes a lot of time, but also leads to artefacts at or near the edge of datasets at lower zooms. Instead, the filter function to use should be a function of the scale. To do this, we need to pass in the scale of the current tile (min scale, x & y?) to the function which returns the filter.
Instead of following the standard, well-understood and expected scheme used by almost all tiled mercator sources in existence, our TIFs should:
N
rather than retina tiles at N-1
.From @glw:
A list was put together on Wikipedia here:
https://en.wikipedia.org/wiki/National_Lidar_Dataset_%28United_States%29
If you do, do an update can you start with Illinois? :)
https://clearinghouse.isgs.illinois.edu/data/elevation/illinois-height-modernization-ilhmp-lidar-data
At the moment, an out-of-space error is fatal - an exception will be thrown, and nothing will be generated.
Instead, after downloading each file, the download code should ensure there's enough space to move the file into place by deleting files which aren't needed any more. The main work here is to figure out which files aren't needed any more (or, conversely, which files are needed), but we already know some of this as we build the download list before running.
For example, NED_FTP_SERVER
and NED_BASE_PATH
in the NED source.
This allows them to be reconfigured if stuff changes (or we discover a mirror!), and also should allow us to start adding tests for these sorts of things.
At the moment 1 job = 1 message in the queue. This leads to situations where jobs can download a lot of data just to do a single output tile, and then download a completely different set of data for the next job. This makes caching less effective and reduces throughput.
When render jobs are added to the queue, the enqueueing process knows which source files they depend upon, so jobs can be grouped together by the set of dependencies they have and batched into arrays of jobs in each message. SQS has a limit on the size of each message, which will curtail the benefits of grouping everything together, but also allow a certain degree of granularity.
Geoscience Australia provides a 5 m Digital Elevation Model (DEM) of Australia derived from LiDAR. It's licensed under CC-BY-4.0, dataset covers 245000 km² of "Australia's populated coastal zone; floodplain surveys within the Murray Darling Basin, and individual surveys of major and minor population centres"
http://www.cencoos.org/data/parameters/bathymetry
ftp://coast.noaa.gov/pub/DigitalCoast/lidar1_z/geoid12a/data/2612/
This project merged recently collected topographic, bathymetric, and acoustic elevation data along the entire California coastline from approximately the 10 meter elevation contour out to California's 3 mile state water's boundary.
Topographic lidar data was from the 2009-2011 CA Coastal Conservancy Lidar Project (Oct 2009 - Aug 2011), coverage extends landward 500 m from the shoreline, along the entire California coastline.
Bathymetric lidar data was 2009-2010 U.S. Army Corps of Engineers JALBTCX lidar collected for the California Coastal Mapping Project.
The multibeam acoustic data were high resolution CSMP survey data combined NGDC data.
Data sources such as NED are far too detailed to use at low zooms. Further, having a "whole USA" tile would require downloading all of NED, which is a non-trivial amount of data, most of which would be averaged away.
Each source should have a "valid zoom range" (or scale might be more appropriate), against which requests are intersected before downloading.
This N38.0 x W122.5 tile (ftp://rockyftp.cr.usgs.gov/vdelivery/Datasets/Staged/NED/19/IMG/ned19_n38x00_w122x50_ca_sanfrancisco_topobathy_2010.zip) fails to download. I don't think this is a code problem in Joerd, as it also failed to download with wget
. Either a temporary server failure, or something weirder is going on. Needs investigation.
We're using the Python bindings to GDAL, so it's possible to build the VRT directly in Python without needing to shell out to gdalbuildvrt
. The question will be how much logic is in gdalbuildvrt
that we would need to re-implement?
I'm currently using Valhalla as a fallback for working out the ground height below buildings in Polygon City (another Mapzen project) and having more accurate height data in urban environments would be very, very useful for us (as it would with my own project, ViziCities) – SRTM just doesn't cut it for the accuracy we need, both horizontally and vertically.
Let me know if this isn't helpful here – I saw that you wanted suggestions for open elevation datasets to use and assumed this was the best place. I have plenty more to suggest if it's helpful.
The Environment Agency in the UK recently released the first tranche of their LIDAR elevation data for around 72% of the UK – including DSM and DTM ASCII files at horizontal resolutions of 25cm, 50cm, 1m, and 2m. This data has been released under an Open Government License, which is compatible with the CC-BY 4.0 license.
There's a lot of data and currently the only way to download it is manually through a portal. The data is also more verbose than it needs to be due to unnecessary decimal accuracy and the EA are looking to reduce this overhead (and the file-size) in the near future.
There are 2 potential approaches to automate the download of the data:
NED is a dataset at various scales in different parts of the USA - some 1/9th arc second, 1/3rd or 1. We should pull in this data and apply it such that we get the best resolution in the areas which have it.
In the terrarium output, in addition to supporting GeoTIFFs, we should also output 16-bit greyscale PNGs. These aren't widely supported, but should be readable with pngtoy. Note that PNGs only support unsigned channels, so we'll need to add an offset (32,768?) to handle the bathymetry depths.
@lossyrob did some benchmarking of the different compression options available to GDAL here: http://openterrain.tumblr.com/post/117553678231/gdal-compression-options-against-ned-data
For floating point data, the conclusion was that DEFLATE (with the floating point predictor) produces the smallest files at only a small time premium.
Context is this comment from @rmarianski.
Possible issues: we sometimes need to get 2 files out of the zip - is there a callback per file that we can use? Also, (IIRC - this might be bad information) zip files have the directory block at the end of the file - does this mean we'd need to buffer it all in RAM? If so, how large is the largest file we're likely to download?
Team Tangram has requested raw 16-bit elevations, as a 2-channel PNG (two channels of 8 bits each).
/cc @bcamper @blair1618
Joerd pulls in the 1/9th arcsecond NED, but this doesn't have full coverage of the USA (yet?). The 1/3rd arcsecond has some issues around the coastline, but probably better than SRTM.
At the moment, each job will download the data it needs individually. This is great for simplicity, but at smaller job sizes it means duplicating a lot of downloads. Instead, it would be better to download (or mirror) the set of source files to S3 in one batch step, then process them from/to S3 in a second batch step. This would avoid stress on the source servers as well as providing higher bandwidth access and lower latency job processing from within EC2.
Clone dev S3 bucket so we have a stable set of tiles isolated from dev for California demo.
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>30A9CFAE12A5F5FE</RequestId>
<HostId>
XHlEHd5bfdzTKIg50O1jeS1skJ2hZlt5mhLlqus3ltuCse7dAEm39icGGl//eZoMJnEBKn9/vug=
</HostId>
</Error>
The original Skadi scripts compress the HGT files with gzip, and Joerd should do the same.
GeoTIFFs are currently Int16
, which means that the limit on vertical resolution is 1m. Higher resolution datasets (i.e. LiDAR) effectively lose resolution when converted to Int16
. (NED is distributed as Float32
, so this may already be occurring.)
That's tile terrarium/7/33/44
, but the effect is visible at zooms 6 & 7, but fixed at 8. This seems to suggest a problem with one of the data sources which is overlaid at 8+. Possibly, it's due to a piece of data being included which shouldn't be (bbox parsing error?), or possibly it's an error in the upstream data.
Needs investigation.
The scaling of heights needs to be refined.
Because the tile spans so much north-south, some cos(lat) per cell might be useful?
Files at rest in the store can/should be compressed. At the moment, the SRTM and NED files are large and not compressed, which means they can be slow to download. Another option, and one which would be automatically supported by GDAL, would be converting these files from a non-compressed format such as HGT or HFA/IMG to one which supports built-in compression such as GeoTIFF.
Currently "Area":
$ gdalinfo 0.tif
Driver: GTiff/GeoTIFF
Files: 0.tif
Size is 512, 512
Coordinate System is:
PROJCS["WGS 84 / Pseudo-Mercator",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]],
PROJECTION["Mercator_1SP"],
PARAMETER["central_meridian",0],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"],
AUTHORITY["EPSG","3857"]]
Origin = (-20037508.339999999850988,20037508.339999999850988)
Pixel Size = (78271.516953124999418,-78271.516953124999418)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left (-20037508.340,20037508.340) (180d 0' 0.00"W, 85d 3' 4.06"N)
Lower Left (-20037508.340,-20037508.340) (180d 0' 0.00"W, 85d 3' 4.06"S)
Upper Right (20037508.340,20037508.340) (180d 0' 0.00"E, 85d 3' 4.06"N)
Lower Right (20037508.340,-20037508.340) (180d 0' 0.00"E, 85d 3' 4.06"S)
Center ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N)
Band 1 Block=256x256 Type=Int16, ColorInterp=Gray
NoData Value=-32768
(This is purely a metadata thing; it doesn't require changes to the underlying data.)
The dataset is available here:
http://data.kartverket.no/download/content/digital-terrengmodell-10-m-utm-33
Each job sent to Joerd has spatial extent and zoom extent already.
We can send one job to the queue for the low zooms, then for each zoom 9 tile that wasn't empty needs to be specified and enqueued.
Biased on Natural Earth land + minor islands.
When generating derived datasets, I'd like to be able to restrict the areas being processed to those where data is available.
In GDAL-land, I would run gdaltindex
against each of the source files to generate a Shapefile (or whatever) with features covering the extent of each source file. This collection of features can then be run through something (e.g. mercantile
) to generate a list of tile candidates for processing (rather than running over the whole world).
As a work-around, I'm going to try to use one of the Natural Earth coastline shapes as the input to mercantile
to restrict the input dataset.
From @nvkelso:
Canadian Digital Elevation Model Mosaic (CDEM)
The Canadian Digital Elevation Model (CDEM) is part of Natural Resources Canada's altimetry system designed to better meet the users' needs for elevation data and products. The CDEM stems from the existing Canadian Digital Elevation Data (CDED). In these data, elevations can be either ground or reflective surface elevations. A CDEM mosaic can be obtained for a pre-defined or user-defined extent. The coverage and resolution of a mosaic varies according to latitude and to the extent of the requested area. Derived products such as slope, shaded relief and colour shaded relief maps can also be generated on demand.
Additional Information
Date published November 6, 2012
Series Canadian Digital Elevation Model Mosaic (CDEM) NaN
Issue 3.0
Purpose The CDEM plays the same role as contours and relief shading on conventional paper maps. The CDEM serves as key primary data in a range of applications critical to achieving sustainable development. These applications include environmental and ecological impact assessments, water flow and water quality analysis, climate change studies, forest regeneration planning and wildlife habitats. In addition, the CDEM can be used in the generation of three-dimensional graphics displaying terrain slope, profiles and lines of sight. Non-graphic applications include geoid calculations, terrain modelling, flood simulations and telecommunication studies.... show more
Author Government of Canada, Natural Resources Canada, Earth Sciences Sector, Mapping Information Branch, GeoAccess Division
Language This product is available in English and French
Product type Elevation 5
Migrated from: https://github.com/valhalla/skadi/issues/41.
http://www.flake.igb-berlin.de/ep-data.shtml
A binary file GlobalLakeDepth.dat is a global gridded lake-depth data file with the spatial resolution (pixel size) of 30 arcsec (ca. 1 km at the equator). A raster map of ECOCLIMAP2 data set is used to locate lakes on the globe, i.e. to discriminate between the lake pixels and the non-lake pixels. The lake depth is given in decimetres.
(Ported from https://github.com/valhalla/skadi/issues/27.)
Open access data available here: http://maps.zh.ch/download/hoehen/2014/dtm/tif/
SRTM has artifacts near the coast - blocks of data which are neither zero nor NODATA, but aren't valid data nevertheless. It seems like we need to bring in another source of data to clip against - an accurate coastline, preferably OSM or NE.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.