Add a feature for storing all dependencies in a layer which may need less frequent updates.
De-duplication of layer updates is an important consideration for this feature. Creating duplicates would mean a new clean-up chore and if that chore isn't done, maybe the user eventually runs into lambda storage limits of their AWS account (75 GB by default). This could happen quickly if used by a pipeline with frequent builds.
We use our own SHA-256 digest which are encoded in the layer description. Our SHA-256 digest is created by reading the contents of all files except .pyc
files. This is to work around several problems:
- Using
pip install --target <tmpdir>
writes the temporary directory path into the .pyc
files. Even if we used a non-random temporary directory name, there is no assurance the directory path would be the same when invoked by different users. This could also add complexity to pipelines.
- Using
pip install --target <tmpdir>
seems to cause the current unix timestamp to appear in the .pyc
file.
- If we didn't do the above, the resulting
.zip
file would still have timestamps inside it, which is another problem. deterministic_zip is a Python module which works around that, but we'd still face the .pyc
problems.
Layer versions can have a description up to 256 characters. This isn't enough for a list of what's inside the archive (e.g. the package versions) and may not always be enough for the metadata we store in lambda_zip.yml (see below) but if we're forced to remove some of that metadata from the archive for SHA-256 comparison purposes anyway, it will fit. For example, the below is 220 bytes w/o comments (close to the 256-byte limit) but 142 bytes after removing the fields marked for omission:
branch: GH-10-pagination-support
commit: f23306e47e3c93e28535581f561f7ed1e400f7fe
describe: f23306e
detached: false
dirty: false
lambda_zip_host: boomer # omit
lambda_zip_timestamp: 1676932215 # omit
lambda_zip_user: jsw # omit
untracked: 0
Lambda versions themselves also have descriptions.
The overall Lambda has a description field and also has tags. One tag can be up to 256 bytes and the max number of tags is 50 (doc link).
We need to call the UpdateFunctionConfiguration API anytime a lambda needs to refer to a newer layer version, anyway; so adding metadata to the lambda description seems like a good possibility.