Code Monkey home page Code Monkey logo

docker-lambda's People

Contributors

github-actions[bot] avatar hugozaggo avatar jasongi-ac avatar philvarner avatar vincentsarago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

docker-lambda's Issues

Image publication for other linux 2 containers

EMR serverless has its own peculiar set of images that people are free (after emr 6.9.x) to customize in various ways. I've cribbed the build steps found here to keep things as small as possible and to avoid depending on anaconda/mamba/etc for building things in production. Perhaps it would be handy to expand the scope of the provided images to support serverless workflows? Perhaps there are other amazon linux image types to consider?

A bit of background on the interest here: GeoTrellis RasterSources backed by GDAL bindings now support 3.7.3, and it is increasingly looking like large geospatial workflows backed by spark make sense to run on managed infrastructure. Cluster management is expensive, time consuming, and hard to get right.

psql: undefined symbol: PQsetErrorContextVisibility

Hi, I love the idea of using GDAL as a layer for lambdas and I have been doing some tests with it lately and it has been working fine. However, recently I was starting one of my projects locally from the scratch and got this error when connecting my app backend (Django) to the database:

psql: symbol lookup error: psql: undefined symbol: PQsetErrorContextVisibility

I am able to access the database container and use psql without a problem, the issue seems to happen from the container that uses the lambgeo/lambda:gdalX.X-py3.7 image.

What I have been finding about this error around the internet, it is that it's caused from an unsupported version, not finding the symbol PQsetErrorContextVisibility as stated.

Any idea as why this is happening? (and why it wasn't when I wasn't using a new database?)

Can we get DXF vector drivers?

I am trying to use ogr2ogr on a lambda layer but it looks like the DXF driver is missing.
Is there any way to get it? I can't figure out how to build a docker image with DXF vector support (so I could later make a layer package of it). https://gdal.org/drivers/vector/dxf.html#vector-dxf states it is built-in, so I figured it should be supported by default

Running ogrinfo --formats on lambgeo/lambda-gdal:3.3-al2 does not list it, though

# ogrinfo --formats
Supported Formats:
  netCDF -raster,multidimensional raster,vector- (rw+vs): Network Common Data Format
  PDS4 -raster,vector- (rw+vs): NASA Planetary Data System 4
  VICAR -raster,vector- (rw+v): MIPL VICAR file
  JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
  MBTiles -raster,vector- (rw+v): MBTiles
  BAG -raster,multidimensional raster,vector- (rw+v): Bathymetry Attributed Grid
  ESRI Shapefile -vector- (rw+v): ESRI Shapefile
  MapInfo File -vector- (rw+v): MapInfo File
  OGR_VRT -vector- (rov): VRT - Virtual Datasource
  Memory -vector- (rw+): Memory
  GML -vector- (rw+v): Geography Markup Language (GML)
  KML -vector- (rw+v): Keyhole Markup Language (KML)
  GeoJSON -vector- (rw+v): GeoJSON
  GeoJSONSeq -vector- (rw+v): GeoJSON Sequence
  ESRIJSON -vector- (rov): ESRIJSON
  TopoJSON -vector- (rov): TopoJSON
  GPKG -raster,vector- (rw+vs): GeoPackage
  SQLite -vector- (rw+v): SQLite / Spatialite
  PostgreSQL -vector- (rw+): PostgreSQL/PostGIS
  FlatGeobuf -vector- (rw+v): FlatGeobuf
  PGDUMP -vector- (w+v): PostgreSQL SQL dump
  OGR_PDS -vector- (rov): Planetary Data Systems TABLE
  MVT -vector- (rw+v): Mapbox Vector Tiles

Any info is appreciated, and thank you for lambgeo/docker-lambda!

Invoking lambda always throws "No module named 'osgeo' "

My lambda is is us-west-1 region & I'm using the layer arn:aws:lambda:us-west-1:524387336408:layer:gdal32-python38-geo:1
I have also created the 2 environment variables GDAL_DATA and PROJ_LIB
The only thing my lambda does is importing gdal as from osgeo import gdal

When this lambda is executed, I get the below error
{
"errorMessage": "Unable to import module 'handler': No module named 'osgeo'",
"errorType": "Runtime.ImportModuleError",
"stackTrace": []
}

I have tried using Python 7 & 8 7 still see the same error.
Am I missing something? Please help me. Thanks

Trouble using lambda layers (python 3.9 / gdal 3.5)

Not quite sure what I'm missing when trying to use that the layer arn: arn:aws:lambda:ca-central-1:524387336408:layer:gdal35:3 on ca-central-1 based on this list

I tried to import numpy and rasterio which both failed as well. Finally, if I add the aws-data-wrangler layer, I am able to import numpy without error.

gdal-layer

libcurl and libxml2 are missing in amazonlinux 2

😭
running os.system("ldd /opt/bin/gdalinfo") in aws lambda gives: 👇

        linux-vdso.so.1 (0x00007ffe64732000)
	libgdal.so => /opt/bin/../lib/libgdal.so (0x00007f66747eb000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f6674396000)
	libgeos_c.so.1 => /opt/bin/../lib/libgeos_c.so.1 (0x00007f667415c000)
	libwebp.so.7 => /opt/bin/../lib/libwebp.so.7 (0x00007f6673ef2000)
	libexpat.so.1 => /opt/bin/../lib/libexpat.so.1 (0x00007f6673cc2000)
	libopenjp2.so.7 => /opt/bin/../lib/libopenjp2.so.7 (0x00007f6673a6a000)
	libnetcdf.so.18 => /opt/bin/../lib/libnetcdf.so.18 (0x00007f6673737000)
	libhdf5.so.200 => /opt/bin/../lib/libhdf5.so.200 (0x00007f6673065000)
	libmfhdf.so.0 => /opt/bin/../lib/libmfhdf.so.0 (0x00007f6672e3b000)
	libdf.so.0 => /opt/bin/../lib/libdf.so.0 (0x00007f6672b8a000)
	libjpeg.so.62 => /opt/bin/../lib/libjpeg.so.62 (0x00007f66728f6000)
	libgeotiff.so.5 => /opt/bin/../lib/libgeotiff.so.5 (0x00007f66726c2000)
	libpng16.so.16 => /opt/bin/../lib/libpng16.so.16 (0x00007f6672491000)
	libpq.so.5 => /opt/bin/../lib/libpq.so.5 (0x00007f6672248000)
	libzstd.so.1 => /opt/bin/../lib/libzstd.so.1 (0x00007f6671fc0000)
	libproj.so.19 => /opt/bin/../lib/libproj.so.19 (0x00007f6671b01000)
	libsqlite3.so.0 => /opt/bin/../lib/libsqlite3.so.0 (0x00007f66717f0000)
	libtiff.so.5 => /opt/bin/../lib/libtiff.so.5 (0x00007f6671572000)
	libdeflate.so.0 => /opt/bin/../lib/libdeflate.so.0 (0x00007f6671365000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f6671150000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6670f32000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f6670d2a000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f6670b26000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f66708c2000)
	libcurl.so.4 => not found
	libxml2.so.2 => not found
	libm.so.6 => /lib64/libm.so.6 (0x00007f6670582000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f6670200000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f666ffea000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f666fc3f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f66753a2000)
	libcurl.so.4 => not found
	libxml2.so.2 => not found
	libgeos-3.8.1.so => /opt/lib/libgeos-3.8.1.so (0x00007f666f874000)
	libhdf5_hl.so.200 => /opt/lib/libhdf5_hl.so.200 (0x00007f666f652000)
	libsz.so.2 => /opt/lib/libsz.so.2 (0x00007f666f43e000)
	libcurl.so.4 => not found
	libcurl.so.4 => not found
	liblzma.so.5 => /var/lang/lib/liblzma.so.5 (0x00007f666f218000)
	libssl.so.10 => /lib64/libssl.so.10 (0x00007f666efa9000)
	libcurl.so.4 => not found
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f666ed5d000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f666ea79000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f666e875000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f666e644000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f666e435000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f666e231000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f666e01b000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f666ddf4000)

"RUN yum install -y gcc gcc-c++" adds 500MB to image but is unused at runtime

The Dockerfile directive RUN yum install -y gcc gcc-c++ in https://github.com/lambgeo/docker-lambda/blob/master/dockerfiles/runtimes/python adds 500MB to the image, but is unused at runtime.

A more efficient approach would be to use a multi-stage build and copy only the files that are necessary for runtime into the resulting image.

Additionally and/or alternatively, && yum clean all && rm -rf /var/cache/yum /var/lib/yum/history can be added to the command to reduce the size of the layer.

aws gdal arn usage with python lambda

Hi,
I am just a starter AWS user, successfully added your GDAL ARN (west-2) to my lambda function called geostats as a layer,
but if I try to import the gdal package, I have an error:
#import gdal
#from osgeo import gdal

What else I have to do to make it work?

Is latest 3.6 compiled with parquet / arrow enabled?

Not proficient in C++, but running into this problem when trying to run ogr2ogr with a parquet output:

import subprocess

return_code = subprocess.Popen(["ogr2ogr", "-f", "Parquet", "somedestination", "somelocation", "--debug", "ON"], stdout=subprocess.PIPE).poll()

this outputs this:



b'GDAL 3.6.4, released 2023/04/17\n'
--
ERROR 1: Unable to find driver `Parquet'.
[ERROR] FileNotFoundError: somedestination
Traceback (most recent call last):  File "/var/task/epsagon/wrappers/aws_lambda.py", line 137, in _lambda_wrapper
result = func(*args, **kwargs)  File "/var/task/application/v1/controller/console/test_gdal.py", line 26, in test
 df = gpd.read_parquet("somedestination")  File "/mnt/efs/lib/geopandas/io/arrow.py", line 560, in _read_parquet    table = parquet.read_table(path, columns=columns, filesystem=filesystem, **kwargs)  File "/mnt/efs/lib/pyarrow/parquet/core.py", line 2926, in read_table    dataset = _ParquetDatasetV2(  File "/mnt/efs/lib/pyarrow/parquet/core.py", line 2477, in __init__    self._dataset = ds.dataset(path_or_paths, filesystem=filesystem,  File "/mnt/efs/lib/pyarrow/dataset.py", line 762, in dataset    return _filesystem_dataset(source, **kwargs)  File "/mnt/efs/lib/pyarrow/dataset.py", line 445, in _filesystem_dataset    fs, paths_or_selector = _ensure_single_source(source, filesystem)  File "/mnt/efs/lib/pyarrow/dataset.py", line 421, in _ensure_single_source    raise FileNotFoundError(path)

file paths have been replaced.

So - diving into the Cmake flags I see this: https://github.com/OSGeo/gdal/blob/634f60a4181c9db067a64dbfdd9f2872e4992927/ogr/ogrsf_frmts/generic/ogrregisterall.cpp#L251

but don't see anything specifically disabling it in the build, so anyone who can read C++ can you tell me if outputting to parquet is possible in the version built for this image?

missing JPEG codec?

using the latest lambgeo/lambda-gdal:3.2-python3.8

gdalinfo COG.tif
Warning 1: COG.tif: COG.tif:JPEG compression support is not configured
ERROR 1: COG.tif: Cannot open TIFF file due to missing codec.

🤷

LAMBDA_TASK_ROOT instead of PACKAGE_PREFIX?

LAMBDA_TASK_ROOT should always be defined in public.ecr.aws/lambda/* to be /var/task. I was wondering why the example lambda packager has PACKAGE_PREFIX was also being set to /var/task instead of using LAMBDA_TASK_ROOT?

Geopandas and Fiona success!

Not sure how to document this, but I managed to build a container with Geopandas and Fiona based on this project. Thanks so much!

A couple gotchas:

  1. I had to add the following at the top of my handler file
import pyproj
pyproj.datadir.set_data_dir(os.path.join(os.path.dirname(__file__), 'share/proj'))
  1. In order to get the final package size below 250MB, I used the following options for pip (based on https://github.com/szelenka/shrink-linalg) and trim some fluff:
RUN CFLAGS="-g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" \
            pip install \
            --no-cache-dir \
            --compile \
            --global-option=build_ext \
            --global-option="-j 4" \
            --no-binary :all: \
            --target ${PACKAGE_PREFIX}/ \ fiona pandas geopandas shapely pyproj
# Remove tests, docs, and examples
RUN cd $PREFIX && rm -rf **/tests/ **/_testing/ **/doc/ **/examples/

Only Update GDAL 3.2

Because GDAL 2.4 and 3.1 are not evolving (expected), I've removed them from the CI automatic building/publishing. For now I think it's ok to let the docker images as they are and only update the 3.2 images

GDAL 3 performance issue

while RemotePixel/amazonlinux#17 fixed most of it there is still a huge (~100ms) difference between gdal 2 and 3

This was experienced by @kylebarron in developmentseed/cogeo-mosaic-tiler#3

# GDAL 3.0
bash-4.2# time echo "0 0" | cs2cs +proj=longlat +datum=WGS84 +to +init=epsg:2163
9473741.42	1181205.06 0.00

real	0m0.132s
user	0m0.120s
sys	0m0.010s

# GDAL 2.4
bash-4.2# time echo "0 0" | cs2cs +proj=longlat +datum=WGS84 +to +init=epsg:2163
9473741.42	1181205.06 0.00

real	0m0.008s
user	0m0.000s
sys	0m0.000s

missing `liblzma.so` in nodejs environment

I am using the ARN for a lambda layer (arn:aws:lambda:us-west-2:524387336408:layer:gdal32:3) and adding it to my lambda function, with my lambda function in nodejs. I am trying to use ogr2ogr in the function but I'm getting the following error that is most likely related to the GDAL:

ogr2ogr: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory\n

I tried to set GDAL_DATA and PROJ_LIB as environmental variables on my lambda function (I just went to Environment Variables on the UI of the lambda function and added the two as environmental variables), but I am still getting the same error. Am I doing something wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.