jeffsw / lambda_zip Goto Github PK

2.0 2.0 0.0 49 KB

License: MIT License

Python 100.00%

lambda_zip's Introduction

rpkilog.com

rpkilog.com is my open-source, publicly-accessible BGP RPKI history tool. This is valuable for root cause analysis, troubleshooting, and verification. It runs on AWS and all the source is available here on github. This project represents only a few dozen hours of work!

Selection of Writings and Presentations

@[email protected] on Mastodon
My StackExchange user profile
My blog at jeffsw.com is badly-maintained, but has a few gems
- SSH Tips I often use when teaching folks how to use SSH keys, ssh-agent, etc.
- How Virtualization Works helps programmers and system engineers understand critical concepts for hardware planning
- MacBook Pro Thunderbolt Charging Performance with fun graphs
NANOG 58 presentation: Layer-2 Multicast State Problems Caused by IPv6 Neighbor Discovery
NANOG 57 presentation with Job Snijders: Understanding MPLS Hashing
Mailing list post diagnosing a significant Internet outage in November 2012.
Well-known slide deck: IPv6 NDP Table Exhaustion Attack -- this affected the industry standard for allocating IPv6 addresses on backbones and in some datacenter use-cases, and eventually coaxed the major router vendors into implementing some mitigation features

lambda_zip's People

Contributors

Stargazers

Watchers

lambda_zip's Issues

Support unmanaged layers

Need to ensure support for layers we don't build & update.

If a layer contains C libraries, etc. lambda-zip currently doesn't have a way to build them. More complex is that Python's tools don't necessarily work on MacOS where users may want to use it. For example, below command is needed to produce a psycopg2 layer using a pip package which vendors the required pgsql library. This doesn't work on MacOS and Mac users are likely to need docker to build the layer.

pip install \
  --no-deps \
  --platform manylinux2014_x86_64 \
  --python-version 3.9 \
  --only-binary=:psycopg2: \
  --target=${HOME}/tmp/psycopg2_bin_layer/python \
  psycopg2-binary

Re-implement metadata yaml file in ZIP

The metadata YAML file emitted to the ZIP was removed to support de-duplication for layers.

This can be re-added. Additionally, the directory where lambda_zip is being used should be added to metadata. The various methods & functions for gathering & organizing metadata should be refactored since we now have two groups of metadata fields: those which fit into the Description, and those that are too large, and are only going into the YAML file. This should be more clear and easy to understand & maintain.

Support arbitrary shell commands at build time

We can work around a current limitation (inability to install things with e.g. --platform) by allowing arbitrary shell commands to be invoked at different stages. Some environment variables will need to be prepared for this to work, and the arbitrary commands maybe should be arbitrary multi-line shell scripts, when desired.

Environment vars needed

LAMBDA_SRC_DIR
LAMBDA_TMP_DIR
LAMBDA_ZIP_FILE
LAYER_TMP_DIR
LAYER_ZIP_FILE

Execution stages

PRE_ZIP files added to the LAMBDA_TMP_DIR during this stage would be subject to omit behavior
POST_ZIP files may be added to the LAMBDA_ZIP_FILE during this stage and they will be included in hash calc
POST_HASH files may be added to the LAMBDA_ZIP_FILE after content hash calculation

Multiline vs single-line

If a script value (string) is multiple lines, it should be written out to a file and the file executed with subprocess.run(). This way, the script could be a shell script, but it could also be Python or something else.

If it's a single line, we just run it with subprocess.run().

Example configuration ❓ 💡

[lambda_zip]
script_pre_zip = """
#!/bin/bash
pip install \
  --no-deps \
  --python-version 3.9 \
  --platform manylinux2014_x86_64 \
  --only-binary=:psycopg2:
  --target=${LAMBDA_TMP_DIR} \
  psycopg2-binary
"""

Layer support

Add a feature for storing all dependencies in a layer which may need less frequent updates.

De-duplication of layer updates is an important consideration for this feature. Creating duplicates would mean a new clean-up chore and if that chore isn't done, maybe the user eventually runs into lambda storage limits of their AWS account (75 GB by default). This could happen quickly if used by a pipeline with frequent builds.

We use our own SHA-256 digest which are encoded in the layer description. Our SHA-256 digest is created by reading the contents of all files except .pyc files. This is to work around several problems:

Using pip install --target <tmpdir> writes the temporary directory path into the .pyc files. Even if we used a non-random temporary directory name, there is no assurance the directory path would be the same when invoked by different users. This could also add complexity to pipelines.
Using pip install --target <tmpdir> seems to cause the current unix timestamp to appear in the .pyc file.
If we didn't do the above, the resulting .zip file would still have timestamps inside it, which is another problem. deterministic_zip is a Python module which works around that, but we'd still face the .pyc problems.

Layer versions can have a description up to 256 characters. This isn't enough for a list of what's inside the archive (e.g. the package versions) and may not always be enough for the metadata we store in lambda_zip.yml (see below) but if we're forced to remove some of that metadata from the archive for SHA-256 comparison purposes anyway, it will fit. For example, the below is 220 bytes w/o comments (close to the 256-byte limit) but 142 bytes after removing the fields marked for omission:

branch: GH-10-pagination-support
commit: f23306e47e3c93e28535581f561f7ed1e400f7fe
describe: f23306e
detached: false
dirty: false
lambda_zip_host: boomer # omit
lambda_zip_timestamp: 1676932215 # omit
lambda_zip_user: jsw # omit
untracked: 0

Lambda versions themselves also have descriptions.

The overall Lambda has a description field and also has tags. One tag can be up to 256 bytes and the max number of tags is 50 (doc link).

We need to call the UpdateFunctionConfiguration API anytime a lambda needs to refer to a newer layer version, anyway; so adding metadata to the lambda description seems like a good possibility.

SBOM support

We should be able to generate an SBOM of the packages in the ZIP(s).

We already have code to invoke pip with --report tmpfile.json and consume the JSON, returning it from invoke_pip_install().

. cyclonedx-python-lib looks like an easy way to do it.

Including the SBOM output in our ZIP file would be nice.

Option to post it to DependencyTrack would also be nice, but we need to document how we'll deal with layers and project versions within DT.

Add config options for lambda/layer name & URLs

Add configuration options for lambda/layer name & URLs.
There should be an ability to have multiple stages e.g. dev/qa/prod or similar.

Refactor code to be more maintainable

Following the addition of initial layer support, the code is pretty ugly. I'll refactor it a bit later. Some plans:

Create a superclass with methods common to AwsLambdaLayer and NewAwsLambdaLayerZip
When publishing an NewAwsLambdaLayerZip, return an AwsLambdaLayer object since we can do so
separate out a lot of utility functions into their own source file importable by all the others
move functions like aws_lambda_update and s3_upload to become methods of a NewAwsLambdaZip or whatever
move things from __init__.py

Investigate pyc compatibility of layers

Investigate any compatibility / performance gotchas involved with including .pyc files in layers.