Code Monkey home page Code Monkey logo

lambda_zip's Introduction

rpkilog.com

rpkilog.com is my open-source, publicly-accessible BGP RPKI history tool. This is valuable for root cause analysis, troubleshooting, and verification. It runs on AWS and all the source is available here on github. This project represents only a few dozen hours of work!

Selection of Writings and Presentations

lambda_zip's People

Contributors

jeffsw avatar

Stargazers

 avatar

Watchers

 avatar

lambda_zip's Issues

Support unmanaged layers

Need to ensure support for layers we don't build & update.

If a layer contains C libraries, etc. lambda-zip currently doesn't have a way to build them. More complex is that Python's tools don't necessarily work on MacOS where users may want to use it. For example, below command is needed to produce a psycopg2 layer using a pip package which vendors the required pgsql library. This doesn't work on MacOS and Mac users are likely to need docker to build the layer.

pip install \
  --no-deps \
  --platform manylinux2014_x86_64 \
  --python-version 3.9 \
  --only-binary=:psycopg2: \
  --target=${HOME}/tmp/psycopg2_bin_layer/python \
  psycopg2-binary

Re-implement metadata yaml file in ZIP

The metadata YAML file emitted to the ZIP was removed to support de-duplication for layers.

This can be re-added. Additionally, the directory where lambda_zip is being used should be added to metadata. The various methods & functions for gathering & organizing metadata should be refactored since we now have two groups of metadata fields: those which fit into the Description, and those that are too large, and are only going into the YAML file. This should be more clear and easy to understand & maintain.

Support arbitrary shell commands at build time

We can work around a current limitation (inability to install things with e.g. --platform) by allowing arbitrary shell commands to be invoked at different stages. Some environment variables will need to be prepared for this to work, and the arbitrary commands maybe should be arbitrary multi-line shell scripts, when desired.

Environment vars needed

  • LAMBDA_SRC_DIR
  • LAMBDA_TMP_DIR
  • LAMBDA_ZIP_FILE
  • LAYER_TMP_DIR
  • LAYER_ZIP_FILE

Execution stages

  • PRE_ZIP files added to the LAMBDA_TMP_DIR during this stage would be subject to omit behavior
  • POST_ZIP files may be added to the LAMBDA_ZIP_FILE during this stage and they will be included in hash calc
  • POST_HASH files may be added to the LAMBDA_ZIP_FILE after content hash calculation

Multiline vs single-line

If a script value (string) is multiple lines, it should be written out to a file and the file executed with subprocess.run(). This way, the script could be a shell script, but it could also be Python or something else.

If it's a single line, we just run it with subprocess.run().

Example configuration โ“ ๐Ÿ’ก

[lambda_zip]
script_pre_zip = """
#!/bin/bash
pip install \
  --no-deps \
  --python-version 3.9 \
  --platform manylinux2014_x86_64 \
  --only-binary=:psycopg2:
  --target=${LAMBDA_TMP_DIR} \
  psycopg2-binary
"""

Layer support

Add a feature for storing all dependencies in a layer which may need less frequent updates.

De-duplication of layer updates is an important consideration for this feature. Creating duplicates would mean a new clean-up chore and if that chore isn't done, maybe the user eventually runs into lambda storage limits of their AWS account (75 GB by default). This could happen quickly if used by a pipeline with frequent builds.

We use our own SHA-256 digest which are encoded in the layer description. Our SHA-256 digest is created by reading the contents of all files except .pyc files. This is to work around several problems:

  • Using pip install --target <tmpdir> writes the temporary directory path into the .pyc files. Even if we used a non-random temporary directory name, there is no assurance the directory path would be the same when invoked by different users. This could also add complexity to pipelines.
  • Using pip install --target <tmpdir> seems to cause the current unix timestamp to appear in the .pyc file.
  • If we didn't do the above, the resulting .zip file would still have timestamps inside it, which is another problem. deterministic_zip is a Python module which works around that, but we'd still face the .pyc problems.

Layer versions can have a description up to 256 characters. This isn't enough for a list of what's inside the archive (e.g. the package versions) and may not always be enough for the metadata we store in lambda_zip.yml (see below) but if we're forced to remove some of that metadata from the archive for SHA-256 comparison purposes anyway, it will fit. For example, the below is 220 bytes w/o comments (close to the 256-byte limit) but 142 bytes after removing the fields marked for omission:

branch: GH-10-pagination-support
commit: f23306e47e3c93e28535581f561f7ed1e400f7fe
describe: f23306e
detached: false
dirty: false
lambda_zip_host: boomer # omit
lambda_zip_timestamp: 1676932215 # omit
lambda_zip_user: jsw # omit
untracked: 0

Lambda versions themselves also have descriptions.

The overall Lambda has a description field and also has tags. One tag can be up to 256 bytes and the max number of tags is 50 (doc link).

We need to call the UpdateFunctionConfiguration API anytime a lambda needs to refer to a newer layer version, anyway; so adding metadata to the lambda description seems like a good possibility.

SBOM support

We should be able to generate an SBOM of the packages in the ZIP(s).

We already have code to invoke pip with --report tmpfile.json and consume the JSON, returning it from invoke_pip_install().

. cyclonedx-python-lib looks like an easy way to do it.

Including the SBOM output in our ZIP file would be nice.

Option to post it to DependencyTrack would also be nice, but we need to document how we'll deal with layers and project versions within DT.

Refactor code to be more maintainable

Following the addition of initial layer support, the code is pretty ugly. I'll refactor it a bit later. Some plans:

  • Create a superclass with methods common to AwsLambdaLayer and NewAwsLambdaLayerZip
  • When publishing an NewAwsLambdaLayerZip, return an AwsLambdaLayer object since we can do so
  • separate out a lot of utility functions into their own source file importable by all the others
  • move functions like aws_lambda_update and s3_upload to become methods of a NewAwsLambdaZip or whatever
  • move things from __init__.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.