cloud-gov / pages-build-container Goto Github PK
View Code? Open in Web Editor NEWA container image that builds each site as part of the Pages platform
License: Other
A container image that builds each site as part of the Pages platform
License: Other
Just happened to notice while porting this to python. If two repos under the same owner share basically the same name, then the logic to determine files to delete improperly identifies files under the second site. Luckily, the deletion logic won't actually work to delete those files, but the logs are confusing.
Example:
site 1: jseppi/test-federalist-site
site 2: jseppi/test-federalist-site-2
When publishing jseppi/test-federalist-site
, the prefix
argument sent to S3's list_files
API will also get the objects with keys that are prefixed with jseppi/test-federalist-site-2
since the former is a substring of the latter.
Logs will look like this (notice the -2
in the files being deleted):
...
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.eot... Done (0.015464146s)
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.ttf... Done (0.014940206s)
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.woff... Done (0.017011489s)
...
The path to actually delete files is reconstructed in a manner that won't actually delete the second site's files, but it does lead to confusing logs.
cc @jmhooper
I happened to be streaming some logs today when I noticed that successive builds of @davidewarren's site kept uploading a number of video files:
2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: New: 1
2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Modified: 14
2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Deleted: 0
2017-12-01T13:22:22.50-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/feed.xml in 0.03s
2017-12-01T13:22:22.52-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/sitemap.xml in 0.02s
2017-12-01T13:22:22.90-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/assets/videos/jeff-meisel-video.webm in 0.38s
2017-12-01T13:22:23.36-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:23 INFO: Uploaded site/davidewarren/pif-website/assets/videos/mikel-maron-video.mp4 in 0.45s
2017-12-01T13:22:23.90-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:23 INFO: Uploaded site/davidewarren/pif-website/assets/videos/maia-hansen-video.mp4 in 0.54s
2017-12-01T13:22:24.32-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:24 INFO: Uploaded site/davidewarren/pif-website/assets/videos/mikel-maron-video.webm in 0.42s
...
Must be something funky in our change detection code.
I tried to publish a static site, but I get an error that says mv failed, cannot move _site to _site/_site. Looking into the code, it looks like this is a problem with the build.sh file. The commands create a folder with mkdir _site
, then mv * _site
attempts to move the folder into itself, which fails.
Use shlex.quote
as in #90
Using mypy would be great to make sure methods are getting and returning expected types.
The garden build should be responsible for just building one repo. In the case of a user creating a new repo from a Federalist starter, the builder currently takes on the extra responsibility of pushing the files from the starter repo to the user's repo before building. Instead, this logic should happen outside of the builder, triggering a normal build.
@davemcorwin commented on Thu Nov 21 2019
Enable Federalist partners to configure Cache-Control
headers for their content.
Federalist partners are currently unable to set custom Cache-Control
headers for their content and all content is served with the Federalist default.
The current default is cache-control: max-age=60
, which means that files are only cached for 1 minute by browsers and the CDN.
This is not ideal for our partner's users as they can be leveraging the browser cache and CDN for faster responses for cached requests.
We should enable and add documentation about best practices for caching.
This is also a good time to start implementing a more general configuration file for Federalist configuration instead of piggybacking off of the Jekyll _config.yml
.
In general, I propose following the APIs ued by Netlify Configuration File and Netlify Headers.
Cache-Control
headers by pathCache-Control: public, max-age=31536000
Cache-Control: no-cache
As json is probably going to be more familiar to our users than toml, I propose a federalist.json
file that lives in the root of the project.
// federalist.json
// spec
{
"headers": [
"<path>": {
"<header-name>": "<header-value>",
...
},
...
]
}
// example
{
"headers": [
"/*.js": {
"cache-control": "public, max-age=31536000"
}
]
}
federalist.json
and have this configuration availableS3Publisher
, when pushing objects to S3, include any cache-control headers that apply to the object base on its path. This may include including the header information when determining whether an object should be replaced OR refactoring the logic to just replace all files.federalist.json
file with feedback as part of the build process...When the container is complete, it can log the number of files produced for the S3 bucket for future audits.
It would be good if there was a script to do deployments of this image to staging
and prod
without having to manually follow all the steps in https://federalist-docs.18f.gov/pages/how-federalist-works/cloud-gov-setup/#pushing-a-federalist-docker-build-image-to-the-registry
I'm thinking of something that would locally spin up a registry backed by the S3, take an env (ie staging
or prod
), and run all the steps to build, push, and verify the image. I guess the Clair stuff should also be included in such a script, but I don't yet have a good handle on how that works.
I think we could make a nice docker-compose setup to do this.
When a user builds a site hosted in a private S3 bucket, create a default 404 page for the site.
Currently, the shared bucket uses a default 404 page created for all sites. This will generate the default 404 page every time a site is built in a private S3 bucket. Adding this action may allow for site to be able to generate custom 404 pages.
404.html
for private bucket builds.site/owner/repo/404.html
exists in static site build, we don't generate a default '404.html'After evaluating, edit this part:
404.html
exists404.html
exists, generate 404.html
from federalist-404-page/site/<OWNER>/<REPO>/404.html
More info/context in #79 (comment)
Build container code is instrumented and reporting in our New Relic dashboard.
After evaluating, edit this part:
Not totally sure if this is possible, but adding New Relic to monitor processes and collect errors might be a good idea.
The reasons I'm not totally sure about it working are
Done: when we do this or decide its not a good idea
We aren't passing the CONFIG
value when the GENERATOR
is hugo
as we do when it is jekyll
. We should probably do that.
The bash scripts in this repo are, I think, complex enough to warrant using a higher-level language instead. Doing so will likely make testing (which is currently not done at all) easier, which should be a pretty high priority in my opinion.
To Do:
The README and repo description only mention that Jekyll is used to build sites, but since #51 was merged, hugo is also a valid build engine.
We should update all docs to say something like "a static site build engine" in place of "Jekyll" and include a list of supported build engines (only jekyll and hugo for now).
Current LTS version is v8.9.1 (https://nodejs.org/en/download/)
It's currently 1.07GB
, which with the current cloud.gov app cap of 2GB
leaves less than 1G
of space for site builds themselves.
We'll soon (maybe by tomorrow) be able to allocate more than 2GB
to the cloud.gov app instances, but it would still be a good practice to reduce the image size.
clone.sh
contains this line:
echo "[clone.sh] Pushing site to $SOURCE_OWNER/$SOURCE_REPO"
(ref here)
which makes the logs look like something is being pushed to the template repo. In fact, the push
is done to the site repo. It should be changed to ...site to $OWNER/$REPOSITORY...
Minor, but this line log_output "publish.sh" "$output"
logs output as coming from "publish.sh" but that has been replaced by the ruby script called "publish.rb"
This regression is in the staging
branch, which is not currently deployed to the staging environment.
Regression was likely introduced in #121 due to various Path
and /work
directory changes.
Logging line showing offending upload path during publish
step:
Uploading site/jseppi/static-index-test//work/site_repo/_site
level of effort: extremely low
Followed procedure at https://before-you-ship.18f.gov/security/static-analysis/#gemnasium, which previously worked for me, but had no success today with getting Gemnasium scanning working for this repo.
May need to contact Gemnasium support.
When we run our scripts we should make sure that their environment inherits the minimum environment variables necessary for them to do their work.
When this gets resolved, we should open an issue for the execution.
We recently had an issue where a user's site repository contained filenames that had smart quotes and smart apostrophes. For example: accomplishing-agency’s_view=public.html
The upload will choke on these, somewhere down in boto when it attempts to do do an encode
:
...
File "/usr/local/lib/python3.6/site-packages/botocore/utils.py", line 332, in percent_encode
input_str = input_str.encode('utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 131-133: surrogates not allowed
I don't quite know what that UnicodeEncodeError
means other than it doesn't seem to like the ’
character.
This article might shed some light on the situation: http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/
The largest block of currently untested code is the main
task in tasks/main.py
. We should add tests for it.
Hi, I'm noticing in our Federalist builds that our .nvmrc
version setting is not being used and the build.sh
is building the current latest (node 8 & npm 5.0.0). Unless I'm mistaken what the purpose of these lines are, you might need to make them
nvm install "$(cat .nvmrc)"
nvm use "$(cat .nvmrc)"
similar to how your RVM version is installed.
Very strange bug, but it seems like if you have a custom config that has a comment that has an apostrophe in it, like
# my custom config
# this is tom's key:
key: 12345
Then the jekyll build will fail. Currently unsure why it fails on the apostrophe given that it is in a commented line.
Create a compatibility layer to allow invoking the existing tasks via command line, mapping arguments to environment variables to allow the rest of the functionality to remain unchanged.
The new build container only appends custom configs to _config.yml
if it exists. The old one would create _config.yml
if it didn't exist. Some sites, like https://github.com/18F/fedramp-dashboard/blob/federalist/gulpfile.js, relied on the previous behavior.
We need to modify the new builder to behave the same way.
Updated based on discussion below:
For dev instances, set an env var (like SKIP_LOGGING
) from the web app. This var should indicate to the build container to skip POSTing logs and build statuses back to the originating web app (which is a local instance that can't be reached).
We should capture exceptions and continue when requests to logging endpoints fail. This will allow builds from dev instances to succeed even though the logs cannot be posted back to our local instances.
The previous build container code did not fail when log requests failed, so this is a regression.
When there is a custom (demo) domain, federalist-builder
sets an empty value for BASEURL
that is set in the env for the target federalist-garden-build-container
.
When BASEURL
is empty, we should call hugo with hugo --baseURL '' ...
. However, the "default" value expression ${BASEURL-"''"}
appears to not be correct because it instead evaluates to a completely empty string without the single quotation marks, which results in the string --source
being set as the base url instead of being evaluated as a command flag.
hugo --baseURL ${BASEURL-"''"} --source . --destination ./_site
As a Federalist developer, I want to see how my code changes affect my test coverage %.
README badge is not reporting any coverage:
Code Climate dashboard also shows no coverage reporting:
Maybe the reporting was tied to my Code Climate account key or something and it broken after I left the 18F org?
Low
Once the the ability to run builds via Cloud Foundry tasks is released, remove the ability to run any other way.
Right now, this syncs individual files to S3, which could be faster for small changes, but it might be faster overall to just expect and return an archived bundle.
Need to test. I experienced some timeouts locally with the new builder and big sites, like digitalgov.gov.
The current process of launching a build involves having several build container applications running in CF. For each build, an available application is reserved, its environment updated with the arguments required for the build, it is restaged, and the build is run. On completion, the build container continues to run but do nothing until it is reserved for another build.
This has the following consequences:
federalist-builder
Investigate using Cloud Foundry tasks to run individual builds.
This mitigates most, if not all of the issues listed above in addition to providing the ability to track builds from the outside, since they are just assigned assigned task ids and limit the number of concurrent builds by total node memory.
The build logs produce a scary error if the site does not have a "federalist" script in its package.json
. This is because it attempts to run the script and swallows the error if it does not exist. A byproduct of this is a lot of noise about a missing script in the build logs.
We should check to see if the site has a Federalist script before trying to run one.
Currently says "Docker Builder" but that's not really what this is.
There isn't any documentation about how a person new to the project would get this docker image set up locally for development.
We should add a section about development to the README.md
so that it is easier for folks to contribute.
A user should be able to build their site on federalist without receiving error messages in their buildlog that that rvm is out of date
During a client support issue, messages regarding rvm versions were distracting Federalist Support from seeing the actual error message blocking the build. We should make sure Federalist is current in supporting latest ruby and go versions.
After evaluating, edit this part:
This is necessary because we are migrating our sites from S3 in E/W to S3 in GovCloud.
To enable us to use one container image for multiple environments, we should set AWS_DEFAULT_REGION as a variable provided to the container environment by the main web app here: https://github.com/18F/federalist/blob/master/api/services/SQS.js#L38
We use a single docker image for all the types of builds (jekyll, hugo, javascript, etc). On every build, we re-install the dependencies which slows down the build time. This could take several minutes to do. Instead, we could use different docker images for each build type. We could pre-install common versions of ruby so rvm only needs to switch versions, not download and install.
We could also add some kind of caching for dependencies (gems, node modules, etc).
This starts to look very much like how Cloud Foundry does build packs.
We believe some error messages that would be helpful to both us and partners for debugging unsuccessful builds are not properly recorded in build logs. This is likely due to only logging stdout
and not stderr
(ref https://github.com/18F/federalist-garden-build/blob/staging/tasks/main.py#L91)
Consider either using SQS to manage these messages (potentially easier once task failure/success is monitored by federalist-builder
OR configure the status url as an env variable in the application while the build token can be provided as a build argument which fits better with a 12 factor app methodology.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.