Code Monkey home page Code Monkey logo

pages-build-container's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pages-build-container's Issues

publish.rb attempts to delete wrong files

Just happened to notice while porting this to python. If two repos under the same owner share basically the same name, then the logic to determine files to delete improperly identifies files under the second site. Luckily, the deletion logic won't actually work to delete those files, but the logs are confusing.

Example:
site 1: jseppi/test-federalist-site
site 2: jseppi/test-federalist-site-2

When publishing jseppi/test-federalist-site, the prefix argument sent to S3's list_files API will also get the objects with keys that are prefixed with jseppi/test-federalist-site-2 since the former is a substring of the latter.

Logs will look like this (notice the -2 in the files being deleted):

...
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.eot... Done (0.015464146s)
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.ttf... Done (0.014940206s)
Deleting: site/jseppi/test-federalist-site/-2/assets/vendor/uswds/fonts/merriweather-bold-webfont.woff... Done (0.017011489s)
...

The path to actually delete files is reconstructed in a manner that won't actually delete the second site's files, but it does lead to confusing logs.

cc @jmhooper

Video files always appear to be modified and thus uploaded during Publish

I happened to be streaming some logs today when I noticed that successive builds of @davidewarren's site kept uploading a number of video files:

2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: New: 1
2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Modified: 14
2017-12-01T13:22:22.45-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Deleted: 0
2017-12-01T13:22:22.50-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/feed.xml in 0.03s
2017-12-01T13:22:22.52-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/sitemap.xml in 0.02s
2017-12-01T13:22:22.90-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:22 INFO: Uploaded site/davidewarren/pif-website/assets/videos/jeff-meisel-video.webm in 0.38s
2017-12-01T13:22:23.36-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:23 INFO: Uploaded site/davidewarren/pif-website/assets/videos/mikel-maron-video.mp4 in 0.45s
2017-12-01T13:22:23.90-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:23 INFO: Uploaded site/davidewarren/pif-website/assets/videos/maia-hansen-video.mp4 in 0.54s
2017-12-01T13:22:24.32-0600 [APP/PROC/WEB/0] OUT [S3_PUBLISHER] 2017-12-01 19:22:24 INFO: Uploaded site/davidewarren/pif-website/assets/videos/mikel-maron-video.webm in 0.42s
...

Must be something funky in our change detection code.

Publishing Static Site Fails in Build Step

I tried to publish a static site, but I get an error that says mv failed, cannot move _site to _site/_site. Looking into the code, it looks like this is a problem with the build.sh file. The commands create a folder with mkdir _site, then mv * _site attempts to move the folder into itself, which fails.

Integrate mypy

Using mypy would be great to make sure methods are getting and returning expected types.

Move pushing remote repo out of the garden build

The garden build should be responsible for just building one repo. In the case of a user creating a new repo from a Federalist starter, the builder currently takes on the extra responsibility of pushing the files from the starter repo to the user's repo before building. Instead, this logic should happen outside of the builder, triggering a normal build.

Support custom Cache-Control headers

@davemcorwin commented on Thu Nov 21 2019

User Story

Enable Federalist partners to configure Cache-Control headers for their content.

Background (Optional)

Federalist partners are currently unable to set custom Cache-Control headers for their content and all content is served with the Federalist default.

The current default is cache-control: max-age=60, which means that files are only cached for 1 minute by browsers and the CDN.

This is not ideal for our partner's users as they can be leveraging the browser cache and CDN for faster responses for cached requests.

We should enable and add documentation about best practices for caching.

This is also a good time to start implementing a more general configuration file for Federalist configuration instead of piggybacking off of the Jekyll _config.yml.

In general, I propose following the APIs ued by Netlify Configuration File and Netlify Headers.

Acceptance Criteria

  • Partners can configure Cache-Control headers by path
  • Add documentation to Federalist Docs regarding caching best practices
    TLDR:
    • Add unique hash digits to your css, js, and image files (like app-67ce7f3483.css)
    • For js, css, and image files, set Cache-Control: public, max-age=31536000
    • For html files, use Cache-Control: no-cache

Level of effort - medium

Implementation outline (if higher than "low" effort):

  • Determine format and name for the federalist configuration file

As json is probably going to be more familiar to our users than toml, I propose a federalist.json file that lives in the root of the project.

  • Determine format for header configuration
    Ex.
// federalist.json
// spec
{
  "headers": [
    "<path>": {
      "<header-name>": "<header-value>",
      ...
    },
    ...
  ]
}

// example
{
  "headers": [
    "/*.js": {
      "cache-control": "public, max-age=31536000"
    }
  ]
}
  • The garden build should read/parse federalist.json and have this configuration available
  • In the S3Publisher, when pushing objects to S3, include any cache-control headers that apply to the object base on its path. This may include including the header information when determining whether an object should be replaced OR refactoring the logic to just replace all files.
  • Potentially add validation of the federalist.json file with feedback as part of the build process...

Local script to do deployments of the container

It would be good if there was a script to do deployments of this image to staging and prod without having to manually follow all the steps in https://federalist-docs.18f.gov/pages/how-federalist-works/cloud-gov-setup/#pushing-a-federalist-docker-build-image-to-the-registry

I'm thinking of something that would locally spin up a registry backed by the S3, take an env (ie staging or prod), and run all the steps to build, push, and verify the image. I guess the Clair stuff should also be included in such a script, but I don't yet have a good handle on how that works.

I think we could make a nice docker-compose setup to do this.

Create a default 404 page for builds with a private bucket

User Story

When a user builds a site hosted in a private S3 bucket, create a default 404 page for the site.

Background (Optional)

Currently, the shared bucket uses a default 404 page created for all sites. This will generate the default 404 page every time a site is built in a private S3 bucket. Adding this action may allow for site to be able to generate custom 404 pages.

Acceptance Criteria

  • federalist-garden-build generates a default 404.html for private bucket builds.
  • If site/owner/repo/404.html exists in static site build, we don't generate a default '404.html'
  • Change made live via deploy of garden-build.

After evaluating, edit this part:

Level of effort - Low/Medium

Implementation outline (if higher than "low" effort):

  • After building a site's static files, check to see if a 404.html exists
  • If non 404.html exists, generate 404.html from federalist-404-page
  • Allow site publishing to go forward as usual.
  • Update federalist website config to point to /site/<OWNER>/<REPO>/404.html

Implement New Relic "background task" monitoring

Description of feature or bug

More info/context in #79 (comment)

Definition of done

Build container code is instrumented and reporting in our New Relic dashboard.

After evaluating, edit this part:

Level of effort - low-medium

Implementation outline (if higher than "low" effort):

  1. Add New Relic background task decorator or context manager to overarching tasks (main, clone, build, publish).
  2. Add/read necessary config values from the cf env or vcap services.
  3. Ensure reporting works properly in staging.

Explore/add New Relic process monitoring

Not totally sure if this is possible, but adding New Relic to monitor processes and collect errors might be a good idea.

The reasons I'm not totally sure about it working are

  1. this is not a web app,
  2. the containers are "ephemeral",
  3. the containers sleep after each use, which might end up being kinda funky in NR, and
  4. since sub-commands are basically shell'd out, I'm not sure if NR could "globally" track anything in them.

Done: when we do this or decide its not a good idea

Use a higher level language instead of bash?

The bash scripts in this repo are, I think, complex enough to warrant using a higher-level language instead. Doing so will likely make testing (which is currently not done at all) easier, which should be a pretty high priority in my opinion.

To Do:

Update README and repo description to reflect that hugo can also be used

The README and repo description only mention that Jekyll is used to build sites, but since #51 was merged, hugo is also a valid build engine.

We should update all docs to say something like "a static site build engine" in place of "Jekyll" and include a list of supported build engines (only jekyll and hugo for now).

Reduce build container image size

It's currently 1.07GB, which with the current cloud.gov app cap of 2GB leaves less than 1G of space for site builds themselves.

We'll soon (maybe by tomorrow) be able to allocate more than 2GB to the cloud.gov app instances, but it would still be a good practice to reduce the image size.

clone.sh output says it is pushing to the wrong repository

clone.sh contains this line:

echo "[clone.sh] Pushing site to $SOURCE_OWNER/$SOURCE_REPO" (ref here)

which makes the logs look like something is being pushed to the template repo. In fact, the push is done to the site repo. It should be changed to ...site to $OWNER/$REPOSITORY...

Staging bug: work directory is uploaded during publish

This regression is in the staging branch, which is not currently deployed to the staging environment.

Regression was likely introduced in #121 due to various Path and /work directory changes.

Logging line showing offending upload path during publish step:

Uploading site/jseppi/static-index-test//work/site_repo/_site

Research how to manage environment for build scripts

When we run our scripts we should make sure that their environment inherits the minimum environment variables necessary for them to do their work.

When this gets resolved, we should open an issue for the execution.

UnicodeEncodeError: surrogates not allowed / Dealing with invalid characters in filenames

We recently had an issue where a user's site repository contained filenames that had smart quotes and smart apostrophes. For example: accomplishing-agency’s_view=public.html

The upload will choke on these, somewhere down in boto when it attempts to do do an encode:

...
  File "/usr/local/lib/python3.6/site-packages/botocore/utils.py", line 332, in percent_encode
    input_str = input_str.encode('utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 131-133: surrogates not allowed

I don't quite know what that UnicodeEncodeError means other than it doesn't seem to like the character.

This article might shed some light on the situation: http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/

Test the `main` task

The largest block of currently untested code is the main task in tasks/main.py. We should add tests for it.

build.sh not using user-defined .nvmrc node version

Hi, I'm noticing in our Federalist builds that our .nvmrc version setting is not being used and the build.sh is building the current latest (node 8 & npm 5.0.0). Unless I'm mistaken what the purpose of these lines are, you might need to make them

nvm install "$(cat .nvmrc)"
nvm use "$(cat .nvmrc)"

similar to how your RVM version is installed.

An apostrophe in a comment of a custom config will break a jekyll build

Very strange bug, but it seems like if you have a custom config that has a comment that has an apostrophe in it, like

# my custom config
# this is tom's key:
key: 12345

Then the jekyll build will fail. Currently unsure why it fails on the apostrophe given that it is in a commented line.

Don't fail Federalist dev instance-launched builds when log/status requests fail

Updated based on discussion below:

For dev instances, set an env var (like SKIP_LOGGING) from the web app. This var should indicate to the build container to skip POSTing logs and build statuses back to the originating web app (which is a local instance that can't be reached).

We should capture exceptions and continue when requests to logging endpoints fail. This will allow builds from dev instances to succeed even though the logs cannot be posted back to our local instances.

The previous build container code did not fail when log requests failed, so this is a regression.

BASEURL default value is not working for hugo builds

When there is a custom (demo) domain, federalist-builder sets an empty value for BASEURL that is set in the env for the target federalist-garden-build-container.

When BASEURL is empty, we should call hugo with hugo --baseURL '' .... However, the "default" value expression ${BASEURL-"''"} appears to not be correct because it instead evaluates to a completely empty string without the single quotation marks, which results in the string --source being set as the base url instead of being evaluated as a command flag.

hugo --baseURL ${BASEURL-"''"} --source . --destination ./_site

Test coverage badge and report are not working

User Story

As a Federalist developer, I want to see how my code changes affect my test coverage %.

Background from James

README badge is not reporting any coverage:

screen shot 2018-06-20 at 10 42 55 am

Code Climate dashboard also shows no coverage reporting:

screen shot 2018-06-20 at 10 43 45 am

Maybe the reporting was tied to my Code Climate account key or something and it broken after I left the 18F org?

Acceptance Criteria

  • The Code Climate test coverage icon displays a numeric percentage value

Level of Effort

Low

Sync files or use a bundle?

Right now, this syncs individual files to S3, which could be faster for small changes, but it might be faster overall to just expect and return an archived bundle.

Investigate launching builds using CF tasks

The current process of launching a build involves having several build container applications running in CF. For each build, an available application is reserved, its environment updated with the arguments required for the build, it is restaged, and the build is run. On completion, the build container continues to run but do nothing until it is reserved for another build.

This has the following consequences:

  1. Deploying the application can interrupt running builds leading to build failures that are not correctly reported
  2. While restaging should result in an isolated environment, though not fully investigated, some build failures appear to be caused by previous build failures
  3. Arguments for the build are passed as environment variables into the build container, leading to confusion about what is an actual environment variable vs. an argument to the process
  4. Utilization of resources for build container applications that are running but not actually doing anything
  5. Difficulty in tracking the build process from the outside in the case of an unexpected exit
  6. Preset number of build container applications that doesn't vary with demand, capping the number of concurrent builds
  7. Maintain logic for running a queue of build container applications, keeping track of what is available, Ex federalist-builder

Investigate using Cloud Foundry tasks to run individual builds.

  1. Launch a build container application with 0 instances
  2. For each build run a CF task using the 0-instance build container application, passing the arguments as command line arguments instead of environment variables to the launch script

This mitigates most, if not all of the issues listed above in addition to providing the ability to track builds from the outside, since they are just assigned assigned task ids and limit the number of concurrent builds by total node memory.

Missing script "federalist" error in the build logs

The build logs produce a scary error if the site does not have a "federalist" script in its package.json. This is because it attempts to run the script and swallows the error if it does not exist. A byproduct of this is a lot of noise about a missing script in the build logs.

We should check to see if the site has a Federalist script before trying to run one.

Support latest ruby and go versions

User Story

A user should be able to build their site on federalist without receiving error messages in their buildlog that that rvm is out of date

Background (Optional)

During a client support issue, messages regarding rvm versions were distracting Federalist Support from seeing the actual error message blocking the build. We should make sure Federalist is current in supporting latest ruby and go versions.

Acceptance Criteria

  • Run a build for a jekyll site with latest ruby version and no warnings are displayed regarding the version in the build log
  • Run a build for a hugo site with latest Golang version and no warnings are displayed regarding the version in the build log
  • @amirbey to post evidence of rvm change

After evaluating, edit this part:

Level of effort - Low

Implementation outline (if higher than "low" effort):

  • Rebuild and redeploy the federalist garden build image

Optimize container for building projects

We use a single docker image for all the types of builds (jekyll, hugo, javascript, etc). On every build, we re-install the dependencies which slows down the build time. This could take several minutes to do. Instead, we could use different docker images for each build type. We could pre-install common versions of ruby so rvm only needs to switch versions, not download and install.

We could also add some kind of caching for dependencies (gems, node modules, etc).

This starts to look very much like how Cloud Foundry does build packs.

Refactor handling of status callbacks

Consider either using SQS to manage these messages (potentially easier once task failure/success is monitored by federalist-builder OR configure the status url as an env variable in the application while the build token can be provided as a build argument which fits better with a 12 factor app methodology.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.