lambgeo / docker-lambda Goto Github PK
View Code? Open in Web Editor NEWAWS Lambda friendly GDAL Docker images and AWS Lambda layer
License: MIT License
AWS Lambda friendly GDAL Docker images and AWS Lambda layer
License: MIT License
I didn't see an example for this, but it does work. Is there a reason there isn't an example for building a Lambda container image?
The example dockerfile in https://github.com/lambgeo/docker-lambda#1-create-dockerfile works for this if you replace the zip commands with
CMD [ "handler.handler" ]
EMR serverless has its own peculiar set of images that people are free (after emr 6.9.x) to customize in various ways. I've cribbed the build steps found here to keep things as small as possible and to avoid depending on anaconda/mamba/etc for building things in production. Perhaps it would be handy to expand the scope of the provided images to support serverless workflows? Perhaps there are other amazon linux image types to consider?
A bit of background on the interest here: GeoTrellis RasterSource
s backed by GDAL bindings now support 3.7.3, and it is increasingly looking like large geospatial workflows backed by spark make sense to run on managed infrastructure. Cluster management is expensive, time consuming, and hard to get right.
let's only care about GDAL
Hi, I love the idea of using GDAL as a layer for lambdas and I have been doing some tests with it lately and it has been working fine. However, recently I was starting one of my projects locally from the scratch and got this error when connecting my app backend (Django) to the database:
psql: symbol lookup error: psql: undefined symbol: PQsetErrorContextVisibility
I am able to access the database container and use psql
without a problem, the issue seems to happen from the container that uses the lambgeo/lambda:gdalX.X-py3.7
image.
What I have been finding about this error around the internet, it is that it's caused from an unsupported version, not finding the symbol PQsetErrorContextVisibility
as stated.
Any idea as why this is happening? (and why it wasn't when I wasn't using a new database?)
let's only maintain 2.4 (legacy) and 3.1
I am trying to use ogr2ogr on a lambda layer but it looks like the DXF driver is missing.
Is there any way to get it? I can't figure out how to build a docker image with DXF vector support (so I could later make a layer package of it). https://gdal.org/drivers/vector/dxf.html#vector-dxf states it is built-in, so I figured it should be supported by default
Running ogrinfo --formats
on lambgeo/lambda-gdal:3.3-al2 does not list it, though
# ogrinfo --formats
Supported Formats:
netCDF -raster,multidimensional raster,vector- (rw+vs): Network Common Data Format
PDS4 -raster,vector- (rw+vs): NASA Planetary Data System 4
VICAR -raster,vector- (rw+v): MIPL VICAR file
JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
MBTiles -raster,vector- (rw+v): MBTiles
BAG -raster,multidimensional raster,vector- (rw+v): Bathymetry Attributed Grid
ESRI Shapefile -vector- (rw+v): ESRI Shapefile
MapInfo File -vector- (rw+v): MapInfo File
OGR_VRT -vector- (rov): VRT - Virtual Datasource
Memory -vector- (rw+): Memory
GML -vector- (rw+v): Geography Markup Language (GML)
KML -vector- (rw+v): Keyhole Markup Language (KML)
GeoJSON -vector- (rw+v): GeoJSON
GeoJSONSeq -vector- (rw+v): GeoJSON Sequence
ESRIJSON -vector- (rov): ESRIJSON
TopoJSON -vector- (rov): TopoJSON
GPKG -raster,vector- (rw+vs): GeoPackage
SQLite -vector- (rw+v): SQLite / Spatialite
PostgreSQL -vector- (rw+): PostgreSQL/PostGIS
FlatGeobuf -vector- (rw+v): FlatGeobuf
PGDUMP -vector- (w+v): PostgreSQL SQL dump
OGR_PDS -vector- (rov): Planetary Data Systems TABLE
MVT -vector- (rw+v): Mapbox Vector Tiles
Any info is appreciated, and thank you for lambgeo/docker-lambda!
My lambda is is us-west-1 region & I'm using the layer arn:aws:lambda:us-west-1:524387336408:layer:gdal32-python38-geo:1
I have also created the 2 environment variables GDAL_DATA and PROJ_LIB
The only thing my lambda does is importing gdal as from osgeo import gdal
When this lambda is executed, I get the below error
{
"errorMessage": "Unable to import module 'handler': No module named 'osgeo'",
"errorType": "Runtime.ImportModuleError",
"stackTrace": []
}
I have tried using Python 7 & 8 7 still see the same error.
Am I missing something? Please help me. Thanks
https://github.com/RemotePixel/amazonlinux
lambgeo/gdal:{version}
lambgeo/gdal:{version}-python3.6
lambgeo/gdal:{version}-python3.7
lambgeo/gdal:{version}-python3.8
ref #8
ref: OSGeo/gdal#3564
Not quite sure what I'm missing when trying to use that the layer arn: arn:aws:lambda:ca-central-1:524387336408:layer:gdal35:3
on ca-central-1 based on this list
I tried to import numpy and rasterio which both failed as well. Finally, if I add the aws-data-wrangler layer, I am able to import numpy
without error.
docker-lambda/scripts/deploy.py
Lines 39 to 40 in 6621f26
😭
running os.system("ldd /opt/bin/gdalinfo")
in aws lambda gives: 👇
linux-vdso.so.1 (0x00007ffe64732000)
libgdal.so => /opt/bin/../lib/libgdal.so (0x00007f66747eb000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f6674396000)
libgeos_c.so.1 => /opt/bin/../lib/libgeos_c.so.1 (0x00007f667415c000)
libwebp.so.7 => /opt/bin/../lib/libwebp.so.7 (0x00007f6673ef2000)
libexpat.so.1 => /opt/bin/../lib/libexpat.so.1 (0x00007f6673cc2000)
libopenjp2.so.7 => /opt/bin/../lib/libopenjp2.so.7 (0x00007f6673a6a000)
libnetcdf.so.18 => /opt/bin/../lib/libnetcdf.so.18 (0x00007f6673737000)
libhdf5.so.200 => /opt/bin/../lib/libhdf5.so.200 (0x00007f6673065000)
libmfhdf.so.0 => /opt/bin/../lib/libmfhdf.so.0 (0x00007f6672e3b000)
libdf.so.0 => /opt/bin/../lib/libdf.so.0 (0x00007f6672b8a000)
libjpeg.so.62 => /opt/bin/../lib/libjpeg.so.62 (0x00007f66728f6000)
libgeotiff.so.5 => /opt/bin/../lib/libgeotiff.so.5 (0x00007f66726c2000)
libpng16.so.16 => /opt/bin/../lib/libpng16.so.16 (0x00007f6672491000)
libpq.so.5 => /opt/bin/../lib/libpq.so.5 (0x00007f6672248000)
libzstd.so.1 => /opt/bin/../lib/libzstd.so.1 (0x00007f6671fc0000)
libproj.so.19 => /opt/bin/../lib/libproj.so.19 (0x00007f6671b01000)
libsqlite3.so.0 => /opt/bin/../lib/libsqlite3.so.0 (0x00007f66717f0000)
libtiff.so.5 => /opt/bin/../lib/libtiff.so.5 (0x00007f6671572000)
libdeflate.so.0 => /opt/bin/../lib/libdeflate.so.0 (0x00007f6671365000)
libz.so.1 => /lib64/libz.so.1 (0x00007f6671150000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6670f32000)
librt.so.1 => /lib64/librt.so.1 (0x00007f6670d2a000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6670b26000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f66708c2000)
libcurl.so.4 => not found
libxml2.so.2 => not found
libm.so.6 => /lib64/libm.so.6 (0x00007f6670582000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f6670200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f666ffea000)
libc.so.6 => /lib64/libc.so.6 (0x00007f666fc3f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f66753a2000)
libcurl.so.4 => not found
libxml2.so.2 => not found
libgeos-3.8.1.so => /opt/lib/libgeos-3.8.1.so (0x00007f666f874000)
libhdf5_hl.so.200 => /opt/lib/libhdf5_hl.so.200 (0x00007f666f652000)
libsz.so.2 => /opt/lib/libsz.so.2 (0x00007f666f43e000)
libcurl.so.4 => not found
libcurl.so.4 => not found
liblzma.so.5 => /var/lang/lib/liblzma.so.5 (0x00007f666f218000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007f666efa9000)
libcurl.so.4 => not found
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f666ed5d000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f666ea79000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f666e875000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f666e644000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f666e435000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f666e231000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f666e01b000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f666ddf4000)
The Dockerfile directive RUN yum install -y gcc gcc-c++
in https://github.com/lambgeo/docker-lambda/blob/master/dockerfiles/runtimes/python adds 500MB to the image, but is unused at runtime.
A more efficient approach would be to use a multi-stage build and copy only the files that are necessary for runtime into the resulting image.
Additionally and/or alternatively, && yum clean all && rm -rf /var/cache/yum /var/lib/yum/history
can be added to the command to reduce the size of the layer.
Hi
hello, i am going around in circle with the error
ERROR 1: PROJ: SQLite3 version is 3.7.17, whereas at least 3.11 should be used
if I run sqlite
/opt/bin/sqlite3 -version 3.33.0 2020-08-14 13:23:32 fca8dc8b578f215a969cd899336378966156154710873e68b3d9ac5881b0ff3f
I'm lost.. any idea?
Hi,
I am just a starter AWS user, successfully added your GDAL ARN (west-2) to my lambda function called geostats as a layer,
but if I try to import the gdal package, I have an error:
#import gdal
#from osgeo import gdal
What else I have to do to make it work?
Not proficient in C++, but running into this problem when trying to run ogr2ogr with a parquet output:
import subprocess
return_code = subprocess.Popen(["ogr2ogr", "-f", "Parquet", "somedestination", "somelocation", "--debug", "ON"], stdout=subprocess.PIPE).poll()
this outputs this:
b'GDAL 3.6.4, released 2023/04/17\n'
--
ERROR 1: Unable to find driver `Parquet'.
[ERROR] FileNotFoundError: somedestination
Traceback (most recent call last): File "/var/task/epsagon/wrappers/aws_lambda.py", line 137, in _lambda_wrapper
result = func(*args, **kwargs) File "/var/task/application/v1/controller/console/test_gdal.py", line 26, in test
df = gpd.read_parquet("somedestination") File "/mnt/efs/lib/geopandas/io/arrow.py", line 560, in _read_parquet table = parquet.read_table(path, columns=columns, filesystem=filesystem, **kwargs) File "/mnt/efs/lib/pyarrow/parquet/core.py", line 2926, in read_table dataset = _ParquetDatasetV2( File "/mnt/efs/lib/pyarrow/parquet/core.py", line 2477, in __init__ self._dataset = ds.dataset(path_or_paths, filesystem=filesystem, File "/mnt/efs/lib/pyarrow/dataset.py", line 762, in dataset return _filesystem_dataset(source, **kwargs) File "/mnt/efs/lib/pyarrow/dataset.py", line 445, in _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, filesystem) File "/mnt/efs/lib/pyarrow/dataset.py", line 421, in _ensure_single_source raise FileNotFoundError(path)
file paths have been replaced.
So - diving into the Cmake flags I see this: https://github.com/OSGeo/gdal/blob/634f60a4181c9db067a64dbfdd9f2872e4992927/ogr/ogrsf_frmts/generic/ogrregisterall.cpp#L251
but don't see anything specifically disabling it in the build, so anyone who can read C++ can you tell me if outputting to parquet is possible in the version built for this image?
using the latest lambgeo/lambda-gdal:3.2-python3.8
gdalinfo COG.tif
Warning 1: COG.tif: COG.tif:JPEG compression support is not configured
ERROR 1: COG.tif: Cannot open TIFF file due to missing codec.
🤷
Now that GDAL 3.6 is out, would you mind releasing a layer for version 3.6? Thanks!
zip is not in this image ghcr.io/lambgeo/lambda-gdal:3.5-python3.9
RUN yum install zip -y
😬
LAMBDA_TASK_ROOT should always be defined in public.ecr.aws/lambda/* to be /var/task. I was wondering why the example lambda packager has PACKAGE_PREFIX was also being set to /var/task instead of using LAMBDA_TASK_ROOT?
Not sure how to document this, but I managed to build a container with Geopandas and Fiona based on this project. Thanks so much!
A couple gotchas:
import pyproj
pyproj.datadir.set_data_dir(os.path.join(os.path.dirname(__file__), 'share/proj'))
RUN CFLAGS="-g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" \
pip install \
--no-cache-dir \
--compile \
--global-option=build_ext \
--global-option="-j 4" \
--no-binary :all: \
--target ${PACKAGE_PREFIX}/ \ fiona pandas geopandas shapely pyproj
# Remove tests, docs, and examples
RUN cd $PREFIX && rm -rf **/tests/ **/_testing/ **/doc/ **/examples/
Because GDAL 2.4 and 3.1 are not evolving (expected), I've removed them from the CI automatic building/publishing. For now I think it's ok to let the docker images as they are and only update the 3.2 images
newer runtimes (nodejs10.x, nodejs12.x, python3.8, java11, ruby2.7) use the new amazonlinux image 😭
while RemotePixel/amazonlinux#17 fixed most of it there is still a huge (~100ms) difference between gdal 2 and 3
This was experienced by @kylebarron in developmentseed/cogeo-mosaic-tiler#3
# GDAL 3.0
bash-4.2# time echo "0 0" | cs2cs +proj=longlat +datum=WGS84 +to +init=epsg:2163
9473741.42 1181205.06 0.00
real 0m0.132s
user 0m0.120s
sys 0m0.010s
# GDAL 2.4
bash-4.2# time echo "0 0" | cs2cs +proj=longlat +datum=WGS84 +to +init=epsg:2163
9473741.42 1181205.06 0.00
real 0m0.008s
user 0m0.000s
sys 0m0.000s
I am using the ARN for a lambda layer (arn:aws:lambda:us-west-2:524387336408:layer:gdal32:3) and adding it to my lambda function, with my lambda function in nodejs. I am trying to use ogr2ogr in the function but I'm getting the following error that is most likely related to the GDAL:
ogr2ogr: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory\n
I tried to set GDAL_DATA and PROJ_LIB as environmental variables on my lambda function (I just went to Environment Variables on the UI of the lambda function and added the two as environmental variables), but I am still getting the same error. Am I doing something wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.