Code Monkey home page Code Monkey logo

joerd's People

Contributors

iandees avatar kevinkreiser avatar louh avatar meetar avatar migurski avatar nvkelso avatar rmarianski avatar rmglennon avatar zerebubuth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

joerd's Issues

Expand bounding box to include all regions.

At the moment, files are downloaded which intersect the region(s) of interest. Also, tiles are generated which intersect the region(s) of interest. However, given that a tile might overhang the edge of the region of interest, the files which are downloaded should encompass all tiles which might be generated.

Loop over the tiles first, and union their bounding boxes to provide the regions input for the download stage.

Add NED 1m data, where available.

From the USGS NED site:

1-meter – This dataset was introduced in 2015 with limited coverage of the U.S.,
but will be expanding as new DEMs from 3DEP quality level 2 or better lidar data
are acquired. Horizontal coordinates are referenced to the Universal Transverse
Mercator projection.

Note the different projection.

Determine bounding boxes from data

The current design assumes that bounding boxes for each raster tile can be determined without downloading the tile - usually from coordinates embedded in the tile name. This is mostly useful because there are many very large datasets (e.g: SRTM, NED) and it would be onerous to have to download all of them if you only wanted to render a small part of the world.

However, this assumption means that it's more difficult to write data sources which don't adhere to this pattern (e.g: Great Lakes).

A possible fix is to introduce a new phase to enumerate / index all the tiles within a source, which could involve downloading the source, or a metadata file, which would provide the bounds.

Clusterise

To run this across multiple processes we will need:

  • A long-running server process,
  • Which listens for jobs on a queue (e.g: SQS),
  • Which stores tiles on a remote filesystem (e.g: S3).

Upon receiving a job from the queue (which should be a large section of tiles, to amortize download costs), the server needs to run the download, buildvrt and generate jobs with the region configured as per the incoming request.

Make downloads more robust

An issue that seems to keep happening with GMTED data is that the server will drop the connection mid-file and subsequent requests will get an "empty response" for some period of time. This appears to be some sort of TCP-level rate-limiting, but is currently undetectable to us until we try to build the VRT.

What should happen:

  1. Use a more reliable download library (e.g: urlgrabber?)
  2. Integrate retries with back-off
  3. Check download integrity before calling it done (e.g: run gdalinfo on it)

NED has overlapping parts

NED contains many files which overlap each other, some even in the same year. For example, overlapping chunks near SF;

  • ned19_n38x00_w122x50_ca_sanfrancisco_topobathy_2010
  • ned19_n38x00_w122x50_ca_sanfrancisocoast_2010
  • ned19_n38x00_w122x50_ca_goldengate_2010
  • ned19_n38x00_w122x50_ca_alamedaco_2007
  • ned19_n38x00_w122x50_ca_contracostaco_2008

All of the above have the same bounding box, although the first one is the only one which covers the whole bounding box without large regions of NODATA. It would be good to have some way of detecting this, so that the other files aren't downloaded and don't have the opportunity to make stitching the data back up any harder than it has to be.

Coastal resampling

We have the following problem near the coast:

Interpolation badness

In this example, the red data is something like GMTED or SRTM for which we have no bathymetry, and the blue data is something of lower resolution but with bathymetry, such as ETOPO1. The problem occurs because the NODATA (represented by the empty circle) can't interpolate into the sea - and we wouldn't want it to, as we have no data there. But the lower resolution dataset fills in with something that will probably be a shallower gradient, given the wider spacing between data points. This leads to a "halo" around the land, and it's particularly bad near cliffs. It gets worse still when there's an offset mis-match between the data sets and the coastline is at different positions.

There are many ways to fix this. Some random thoughts:

  1. Interpolate bathymetry and height data separately, and composite them. For example; first interpolate all land stuff with NODATA set to 0, composite that together. Then interpolate all the water stuff with NODATA set to zero, composite that together. Then use a water/land mask to chose between the two. This would require a high resolution coastline shapefile.
  2. Backfill at the native resolution. At the moment, we mask each data source with NODATA at their native resolution, then warp / interpolate them to the target projection and composite them. The issue arises because we are interpolating a source with NODATA, so we could start by warping ETOPO1 to GMTED resolution, compositing there, then warping to SRTM resolution and compositing there, etc... The downside is that all the re-warping going on is likely to blur / produce artefacts in the earlier data sources.
  3. Use a cleverer algorithm for interpolation. See "Improving quality of public domain digital elevation
    models through data fusion" by Karkee, Steward, Abd Aziz
    for one way of doing this. Another would be to interpolate over a locally-reconstructed TIN of the "best" data points which aren't NODATA. Either way, this approach would likely produce the best results, but be the most complex to implement.

There will be other ways that I've missed. Thoughts?

Logger path should be relative to config

The logger path at the moment is relative to $CWD, but it should be relative to the location of the config file, as this means fewer surprises when running Joerd from arbitrary locations.

Add normal maps to support hillshading

From the Tangram Team, request adding normal map files for hill shading.This would be more optimized to do on the server than re-computing over and over in the client.

32-bit PNG, 3 channels of normal map info, 1 channel of 8-bit quantized elevation (not for showing actual height, but a rough indicator of where the pixel falls relative to sea level, for styling).

Add memcached cache.

Larger machines will have spare RAM (looks like at least 4GB spare on the dev cluster machines). Using that for cache can only help.

Fix Lanczos artefacts

Using the Lanczos filter everywhere not only takes a lot of time, but also leads to artefacts at or near the edge of datasets at lower zooms. Instead, the filter function to use should be a function of the scale. To do this, we need to pass in the scale of the current tile (min scale, x & y?) to the function which returns the filter.

TIFs should be 512x512

Instead of following the standard, well-understood and expected scheme used by almost all tiled mercator sources in existence, our TIFs should:

  • Be 512x512.
  • Have internal 256x256 blocks.
  • Be named as if they were "even" tiles at zoom N rather than retina tiles at N-1.

Handle out of disk space gracefully

At the moment, an out-of-space error is fatal - an exception will be thrown, and nothing will be generated.

Instead, after downloading each file, the download code should ensure there's enough space to move the file into place by deleting files which aren't needed any more. The main work here is to figure out which files aren't needed any more (or, conversely, which files are needed), but we already know some of this as we build the download list before running.

Move hard-coded URLs to config file

For example, NED_FTP_SERVER and NED_BASE_PATH in the NED source.

This allows them to be reconfigured if stuff changes (or we discover a mirror!), and also should allow us to start adding tests for these sorts of things.

Group similar jobs together.

At the moment 1 job = 1 message in the queue. This leads to situations where jobs can download a lot of data just to do a single output tile, and then download a completely different set of data for the next job. This makes caching less effective and reduces throughput.

When render jobs are added to the queue, the enqueueing process knows which source files they depend upon, so jobs can be grouped together by the set of dependencies they have and batched into arrays of jobs in each message. SQS has a limit on the size of each message, which will curtail the benefits of grouping everything together, but also allow a certain degree of granularity.

High-Resolution California Bathymetry - Multibeam + LIDAR

http://www.cencoos.org/data/parameters/bathymetry
ftp://coast.noaa.gov/pub/DigitalCoast/lidar1_z/geoid12a/data/2612/

This project merged recently collected topographic, bathymetric, and acoustic elevation data along the entire California coastline from approximately the 10 meter elevation contour out to California's 3 mile state water's boundary.

Topographic lidar data was from the 2009-2011 CA Coastal Conservancy Lidar Project (Oct 2009 - Aug 2011), coverage extends landward 500 m from the shoreline, along the entire California coastline.
Bathymetric lidar data was 2009-2010 U.S. Army Corps of Engineers JALBTCX lidar collected for the California Coastal Mapping Project.
The multibeam acoustic data were high resolution CSMP survey data combined NGDC data.

Give each data source a valid zoom range

Data sources such as NED are far too detailed to use at low zooms. Further, having a "whole USA" tile would require downloading all of NED, which is a non-trivial amount of data, most of which would be averaged away.

Each source should have a "valid zoom range" (or scale might be more appropriate), against which requests are intersected before downloading.

NED file fails to download

This N38.0 x W122.5 tile (ftp://rockyftp.cr.usgs.gov/vdelivery/Datasets/Staged/NED/19/IMG/ned19_n38x00_w122x50_ca_sanfrancisco_topobathy_2010.zip) fails to download. I don't think this is a code problem in Joerd, as it also failed to download with wget. Either a temporary server failure, or something weirder is going on. Needs investigation.

Don't shell out to gdalbuildvrt

We're using the Python bindings to GDAL, so it's possible to build the VRT directly in Python without needing to shell out to gdalbuildvrt. The question will be how much logic is in gdalbuildvrt that we would need to re-implement?

Open lidar-based DSM and DTM dataset for ~72% of UK

I'm currently using Valhalla as a fallback for working out the ground height below buildings in Polygon City (another Mapzen project) and having more accurate height data in urban environments would be very, very useful for us (as it would with my own project, ViziCities) – SRTM just doesn't cut it for the accuracy we need, both horizontally and vertically.

Let me know if this isn't helpful here – I saw that you wanted suggestions for open elevation datasets to use and assumed this was the best place. I have plenty more to suggest if it's helpful.

The Environment Agency in the UK recently released the first tranche of their LIDAR elevation data for around 72% of the UK – including DSM and DTM ASCII files at horizontal resolutions of 25cm, 50cm, 1m, and 2m. This data has been released under an Open Government License, which is compatible with the CC-BY 4.0 license.

There's a lot of data and currently the only way to download it is manually through a portal. The data is also more verbose than it needs to be due to unnecessary decimal accuracy and the EA are looking to reduce this overhead (and the file-size) in the near future.

There are 2 potential approaches to automate the download of the data:

  • The download links are easy to automate (based on the Ordnance Survey National Grid) and so it would be possible to batch them all up and slowly work through them
  • I have contacts at the Environment Agency who may be able to help get access a bulk download or some other bulk export

Add NED data

NED is a dataset at various scales in different parts of the USA - some 1/9th arc second, 1/3rd or 1. We should pull in this data and apply it such that we get the best resolution in the areas which have it.

Support PNG outputs.

In the terrarium output, in addition to supporting GeoTIFFs, we should also output 16-bit greyscale PNGs. These aren't widely supported, but should be readable with pngtoy. Note that PNGs only support unsigned channels, so we'll need to add an offset (32,768?) to handle the bathymetry depths.

Investigate whether we can stream zip files

Context is this comment from @rmarianski.

Possible issues: we sometimes need to get 2 files out of the zip - is there a callback per file that we can use? Also, (IIRC - this might be bad information) zip files have the directory block at the end of the file - does this mean we'd need to buffer it all in RAM? If so, how large is the largest file we're likely to download?

Use 1/3rd arcsecond NED as well.

Joerd pulls in the 1/9th arcsecond NED, but this doesn't have full coverage of the USA (yet?). The 1/3rd arcsecond has some issues around the coastline, but probably better than SRTM.

Batch downloading.

At the moment, each job will download the data it needs individually. This is great for simplicity, but at smaller job sizes it means duplicating a lot of downloads. Instead, it would be better to download (or mirror) the set of source files to S3 in one batch step, then process them from/to S3 in a second batch step. This would avoid stress on the source servers as well as providing higher bandwidth access and lower latency job processing from within EC2.

Missing prod tile at 8/41/98 in all formats

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>30A9CFAE12A5F5FE</RequestId>
<HostId>
XHlEHd5bfdzTKIg50O1jeS1skJ2hZlt5mhLlqus3ltuCse7dAEm39icGGl//eZoMJnEBKn9/vug=
</HostId>
</Error>

TIFF: Consider Float32 as the data type

GeoTIFFs are currently Int16, which means that the limit on vertical resolution is 1m. Higher resolution datasets (i.e. LiDAR) effectively lose resolution when converted to Int16. (NED is distributed as Float32, so this may already be occurring.)

Glitch in Lake Superior

image

That's tile terrarium/7/33/44, but the effect is visible at zooms 6 & 7, but fixed at 8. This seems to suggest a problem with one of the data sources which is overlaid at 8+. Possibly, it's due to a piece of data being included which shouldn't be (bbox parsing error?), or possibly it's an error in the upstream data.

Needs investigation.

Compress source files at rest

Files at rest in the store can/should be compressed. At the moment, the SRTM and NED files are large and not compressed, which means they can be slow to download. Another option, and one which would be automatically supported by GDAL, would be converting these files from a non-compressed format such as HGT or HFA/IMG to one which supports built-in compression such as GeoTIFF.

TIFF: Set AREA_OR_POINT=Point

Currently "Area":

$ gdalinfo 0.tif
Driver: GTiff/GeoTIFF
Files: 0.tif
Size is 512, 512
Coordinate System is:
PROJCS["WGS 84 / Pseudo-Mercator",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Mercator_1SP"],
    PARAMETER["central_meridian",0],
    PARAMETER["scale_factor",1],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext  +no_defs"],
    AUTHORITY["EPSG","3857"]]
Origin = (-20037508.339999999850988,20037508.339999999850988)
Pixel Size = (78271.516953124999418,-78271.516953124999418)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-20037508.340,20037508.340) (180d 0' 0.00"W, 85d 3' 4.06"N)
Lower Left  (-20037508.340,-20037508.340) (180d 0' 0.00"W, 85d 3' 4.06"S)
Upper Right (20037508.340,20037508.340) (180d 0' 0.00"E, 85d 3' 4.06"N)
Lower Right (20037508.340,-20037508.340) (180d 0' 0.00"E, 85d 3' 4.06"S)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Band 1 Block=256x256 Type=Int16, ColorInterp=Gray
  NoData Value=-32768

(This is purely a metadata thing; it doesn't require changes to the underlying data.)

Variable max zooms

Each job sent to Joerd has spatial extent and zoom extent already.

We can send one job to the queue for the low zooms, then for each zoom 9 tile that wasn't empty needs to be specified and enqueued.

Biased on Natural Earth land + minor islands.

Generate tile indexes (covering shapes) per dataset

When generating derived datasets, I'd like to be able to restrict the areas being processed to those where data is available.

In GDAL-land, I would run gdaltindex against each of the source files to generate a Shapefile (or whatever) with features covering the extent of each source file. This collection of features can then be run through something (e.g. mercantile) to generate a list of tile candidates for processing (rather than running over the whole world).

As a work-around, I'm going to try to use one of the Natural Earth coastline shapes as the input to mercantile to restrict the input dataset.

Add Canadian DEM data

From @nvkelso:

Canadian Digital Elevation Model Mosaic (CDEM)
The Canadian Digital Elevation Model (CDEM) is part of Natural Resources Canada's altimetry system designed to better meet the users' needs for elevation data and products. The CDEM stems from the existing Canadian Digital Elevation Data (CDED). In these data, elevations can be either ground or reflective surface elevations. A CDEM mosaic can be obtained for a pre-defined or user-defined extent. The coverage and resolution of a mosaic varies according to latitude and to the extent of the requested area. Derived products such as slope, shaded relief and colour shaded relief maps can also be generated on demand.

Additional Information
Date published November 6, 2012
Series Canadian Digital Elevation Model Mosaic (CDEM) NaN
Issue 3.0
Purpose The CDEM plays the same role as contours and relief shading on conventional paper maps. The CDEM serves as key primary data in a range of applications critical to achieving sustainable development. These applications include environmental and ecological impact assessments, water flow and water quality analysis, climate change studies, forest regeneration planning and wildlife habitats. In addition, the CDEM can be used in the generation of three-dimensional graphics displaying terrain slope, profiles and lines of sight. Non-graphic applications include geoid calculations, terrain modelling, flood simulations and telecommunication studies.... show more
Author Government of Canada, Natural Resources Canada, Earth Sciences Sector, Mapping Information Branch, GeoAccess Division
Language This product is available in English and French
Product type Elevation 5

Migrated from: https://github.com/valhalla/skadi/issues/41.

Fix SRTM coastal blockiness issues

SRTM has artifacts near the coast - blocks of data which are neither zero nor NODATA, but aren't valid data nevertheless. It seems like we need to bring in another source of data to clip against - an accurate coastline, preferably OSM or NE.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.