Code Monkey home page Code Monkey logo

helix-pages's Introduction

helix-pages

CircleCI

Helix Pages is the Helix project behind https://*.project-helix.page/

Installation

Clone the current repo and run hlx up.

Deployment and publication

Direct commits to master branch are blocked, all changes must come via a PR. Once the PR is merged into master, the CI will hlx deploy and hlx publish the new code.

Manual publishing

The project requires some extensions of the default VCL provided by Helix. The publish steps requires to include the extensions in vcl/extensions.vcl. To publish, run:

 hlx publish --custom-vcl='vcl/extensions.vcl'

The patched version of the 3 subroutines adds the parsing of the host url to extract the content owner / repo that would override the one stored in the Fastly dictionary.

Testing with a new helix-publish version

  1. create a branch in helix-publish
  2. create a branch in helix-pages where the name contains -publish-ci
  3. Helix Pages test builds will use the helix-publish@ci version
  4. When everything works, merge the branch in helix-publish first, then in helix-pages

"Redeploy" the current version

It is sometimes useful to re-deploy the current version. Re-running the CI publish process does not work because hlx deploy use the current helix-config.yaml and does not find "something new" to deploy. To trigger a fresh new build, you simply need to push an empty commit to master (you need admin privileges) - the commit message must respect the semantic release logic to trigger what is needed. Here is an example:

git commit --allow-empty -m "fix(ci): trigger a new clean release"

Incident management: revert to a previous working version

In case of incident, you may want to revert the production environment to a earlier version.

Requirements: you need to clone the current project and add a .env file which contains:

HLX_FASTLY_AUTH=<your auth token>
HLX_FASTLY_NAMESPACE=<the helix-pages fastly service id>

Checkout a previous working tag:

git checkout <tag>

and then run the publish command:

 hlx publish --custom-vcl='vcl/extensions.vcl'

After a few seconds, you can test a project like https://theblog--adobe.hlx.page/. Note that the browser cache needs to be clean, otherwise you may have false positives.

Tracing

All actions on Runtime and the Fasty service config are instrumented (via CircleCI env vars) with Epsagon tracing instructions in the "Helix Services" app.

How to use with Google Drive

Go to your Google Drive account

Create a new shared folder to hold the website root

  • Click on the button “+ New” > Folder
  • Give the folder a name (e.g., sitename)
  • Double-click the sitename folder to open

Share the folder with helix

Create a domain mount point on GitHub

  • Go to your GitHub home (e.g., https://github.com/username/ )
  • Select Repositories tab and green New button
  • Create a new repository
    • Give it a short name (for the domain)
    • Make it Public
  • Inside repository, create a new fstab file “fstab.yaml” with content:
mountpoints:
  /g: https://drive.google.com/drive/folders/{id}

where “url” is the Google Drive folder URL above

Create a file for each page of the site

  • Use drag & drop or the button “+ New” > Google Docs
  • Change the filename to index (no extension) or a page name (no extension)
  • Be sure to add an image (bug) and make a Heading 1 style title

Type your new website URL into a browser:

  https://{repo}-{username}.hlx.page/g/index.html

If you wish to view a different branch, you can use the following convention:

  https://{branch}--{repo}--{username}.hlx.page/g/index.html

helix-pages's People

Contributors

adobe-bot avatar auniverseaway avatar constanceyu avatar danrocha avatar davidnuescheler avatar dependabot[bot] avatar dominique-pfister avatar greenkeeper[bot] avatar kptdobe avatar marquiserosier avatar ramboz avatar renovate-bot avatar renovate[bot] avatar rofe avatar royfielding avatar semantic-release-bot avatar snyk-bot avatar spenhar avatar stefan-guggisberg avatar trieloff avatar tripodsan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helix-pages's Issues

ESI: silent on error

i think this may be the wrong repo to report this issue, but it is easiest for me to explain the issue in this context.

i am including a header.plain.html and footer.plain.html via ESI in the html.htl, which leads to a somewhat questionable 404 behavior of just displaying the path, which then in turn is inlined into the document.

see eg. here: https://helix-home-adobe.project-helix.page/hackathons/5-bsl.html?flush

it would be desirable to have an ESI flag (maybe that exists and i am just unaware) to just silently ignore errors in an ESI and not insert anything resulting response. (@tripodsan, @trieloff is there something like that?)

Provide a way to completely clear the cache of a Helix Pages website

For https://github.com/bdelacretaz/helix-example-zero which uses Helix Pages I'm running tests in the CircleCI job that verify the content published at https://helix-example-zero--bdelacretaz.project-helix.page/ - I think it's a great way to validate the whole publishing chain as well as make sure the examples work.

Currently, as per bdelacretaz/helix-example-zero#4 changes to index.md take a few minutes to propagate (even using a cache killer URL parameter) and changes to header/footer.md which are ESI-included take much longer, at least 45 minutes and maybe much more, I haven't seen a change in them since starting my tests.

For those tests to work I'd need a way to completely clear the cache of such a small example Helix Pages website.

Maybe a hlx-flush-cache=<secret> parameter as suggested by @davidnuescheler, with a simple secret to start with.

Support for owner / repo / branch names containing a dot

Try to open:

They both respond with a ERR_CERT_COMMON_NAME_INVALID error. Reason is simple: there is a dot in either the owner, repo or branch names which is not supported at the certificate level.

Would be great if this could be supported but I have no clue if this would even be possible.

Maybe @trieloff or @tripodsan have an idea.

Resolving as won't fix is perfectly fine, we should just make sure this is documented somewhere.

cc @rofe.

helix-cli local development with fstab / google drive

currently i am getting an error when trying to develop a helix pages project locally that has a reference to a google drive folder.

[hlx] warn: google docs mountpoint needs a configured GOOGLE_DOCS_ROOT but is missing.

Prototype landing pages

Prototype the 3 following pages with helix-pages:

Goal is to make sure this can be easily achieved.
For a clean code version, this is blocked by multiple issues (like css override capabilities) but as for a poc, this is fine.

_logImpl error on theblog repo

Try to open https://theblog--davidnuescheler.hlx.page/posts/creating-adobe-experience-platform-pipeline-with-kafka.html
You get a 500 error.

A quick tracing reveals the follow error:

$  wsk activation logs 9bd431b4303a42369431b4303a423687
2019-12-17T14:50:57.840Z       stderr: debug Constructing HTML Pipeline
2019-12-17T14:50:57.841Z       stderr: debug Running HTML pipeline
2019-12-17T14:50:57.842Z       stderr: trying to load https://raw.githubusercontent.com/davidnuescheler/theblog/1cd6616c3638cb6c007f649d2d2482fda4be9e3a/fstab.json
2019-12-17T14:50:58.020Z       stderr: resourcePath=/posts/creating-adobe-experience-platform-pipeline-with-kafka
2019-12-17T14:50:58.020Z       stderr: relPath=/posts/creating-adobe-experience-platform-pipeline-with-kafka
2019-12-17T14:50:58.021Z       stderr: fetching Markdown from https://adobeioruntime.net/api/v1/web/helix/helix-services/word2md@latest?rid=2W0bGtNESm8ETv0aWZyVGht3ci6X79w4&src=davidnuescheler%2Ftheblog%2F1cd6616c3638cb6c007f649d2d2482fda4be9e3a&path=%2Fposts%2Fcreating-adobe-experience-platform-pipeline-with-kafka.docx&shareLink=https%3A%2F%2Fadobe.sharepoint.com%2Fsites%2FTheBlog%2FShared%2520Documents%2Ftheblog%3Fcsf%3D1%26e%3D8Znxth
2019-12-17T14:51:01.554Z       stderr: [ERROR] Exception during #04/use:find from /nodejsAction/XXchKBna/html.js:163608:
TypeError: Cannot read property '_logImpl' of undefined
    at debug (/nodejsAction/XXchKBna/html.js:7069:19)
    at embed (/nodejsAction/XXchKBna/html.js:226419:5)
    at map (/nodejsAction/XXchKBna/html.js:226443:7)
    at preorder (/nodejsAction/XXchKBna/html.js:226480:30)
    at bound (/nodejsAction/XXchKBna/html.js:226489:14)
    at Array.map (<anonymous>)
    at preorder (/nodejsAction/XXchKBna/html.js:226483:35)
    at map (/nodejsAction/XXchKBna/html.js:226476:10)
    at find (/nodejsAction/XXchKBna/html.js:226437:3)
    at execFns (/nodejsAction/XXchKBna/html.js:8431:17)

@koraa @tripodsan This seems to be logger related... any idea?

cc @dominique-pfister

[Google Drive] Add integration tests

The todo:

  • Create a helix-pages-tests-googledrive repository that contains some stable sample content leveraging Google Drive integration
  • Write some integration tests that validates that the Google Drive integration works as expected
  • Trigger the tests as part of the current CI

Provide external images source through fastly

With the google docs integration, the document embedded images are served directly from the google-usercontent server. This has the disadvantage, that they are not cached by fastly. the is especially a problem, since the google-usercontent blobs expire after some time.

The idea is to create (or extend an existing) action that can proxy any external image. In order to prevent XSS problems, it should check the accept header of the request and the content-type header of the external resource accordingly.

Suggestion

  • the external image links are rewritten to /${bas64(url)}.external.image
  • the url is the full url of the external location
  • the VCL (or helix-dispatch) is extended so that all /.*.external.image requests are forwarded to helix-static, including the accept header.
  • helix-static detects the external image request
  • rejects requests with missing or invalid accept headers (404 or 415)
  • helix-static fetches the external content (abort request, if MAX_BYTES is exceeded).
  • if the content-length of the response is < MAX_BYTES the content is returned accordingly.
  • otherwise a 307 redirect is returned with the external url as location

Quick Solution

For a quick solution, we add a external_image.js to helix-pages that just checks the accept header and then sends a redirect.

Discussion

a) Instead of adding extra logic to helix-static, the entire functionality could be implemented in VCL directly.

b) Instead of adding extra logic to helix-static, the entire functionality could be added directly to helix-dispatch

c) Instead of using helix-static, create a new action, specialized in image delivery. this could later contain more transformation functions.

/cc @trieloff @kptdobe @davidnuescheler

Write advanced integration tests

In #8, some basic "smoke" tests have been added. They test:

  • www.hlx.page returns some HTML
  • the subdomain extraction (all combination of <branch>--<repo>--<owner>)

Here needs to be added the integration tests for helix-pages features like:

  • htdocs overrides (style, head.html...)
  • static file delivery
  • ESI includes (header and footer)

My proposal is to create a helix-pages-tests-content repository which contains some stable sample content that covers the various capabilities offered by Helix Pages like:

  • if you dump a style.css at the root, it is the one delivered
  • if you dump a static.html file, it can be delivered
  • ...

include hash to source location to support removal of index documents

If a document is removed from the original source (eg onedrive) it might not be possible to re-compute the original path, hence it would not be possible to remove it from the index.

including a meta tag with the x-source-location could solve this problem. eg:

<meta name="x-source-hash" content="agV4ggs82">

Custom Domain Support

In order to support custom domain mapping, it must be possible to override the subdomain -> repository mapping per custom domain.

there are 3 possibilities:

  1. add the lookup table to edge dictionary and modify the VCL lookup code respectively.
  2. move the repository resolver to an openwhisk action.
  3. create an own fastly service with a 'hard wired' VCL that uses the respective repository.

Where 1) + 2) have the disadvantage that the mapping table needs to be maintained somewhere. the simples would be to have a (text / json) file in the helix-pages repository that gets automatically updated when a new customer is registered. when using 3), we can use fastly to maintain this list.

Then 1) has the advantage that we can use the existing setup and just need to update/generate the VCL when the mapping table is updated. if this table is stored in the repository, this can be done automatically.

And 2) has the advantage that the resolution logic can be performed in a more sophisticated environment. the mapping table (unless stored in a DB) could be deployed inside the action. the mapping table and resolver could be kept in a separate repository. and the resolver action and dispatch action could be joined together with a sequence. this solution is overall more complex and probably more fragile.

With 3), using a dedicated service id for each customer is probably the most robust and might be the best option for upgrading the customer later to helix-proper. however, the management overhead of having many such services just for helix-pages might not justify this.

hybrid) in addition to the above, it is also possible to use the current helix-pages service for the automatic domain mapping, and create new services for the custom domains that bucket customers together. the advantage is a better separation but also brings a more complicated management overhead.

Note: currently, the limit for number of domains per service is 20, which can be negotiated.

[OneDrive] Add integration tests

The todo:

  • Create a helix-pages-tests-onedrive repository that contains some stable sample content leveraging Microsoft OneDrive integration
  • Write some integration tests that validates that the OneDrive integration works as expected
  • Trigger the tests as part of the current CI

Declarative DOM Overrides

I want to make sure that the HTML generated by Helix Pages has the correct wrapping tags, the correct class names and the correct inner tags. I'd like to have a configuration that allows me to specify what MDAST or DOM patterns will be wrapped or amended in which way.

URLs with inexistent repo or org result in errors in the log

Requesting non-existing repo/org combinations with Helix Pages will result in error level log entries. These should be caught and suppressed (or maybe logged at debug level)

To reproduce:

  1. request https://test--my-org.project-helix.page/index.html (example URL mentioned here, which happened to have been crawled by Google Bot)
  2. observe error log entry in Coralogix:
{
   "level"  :  "error" ,
   "ow"  : {
      (...)
  },
   "message"  :  "Unable to resolve branch name 404 failed to fetch git repo info (statusCode: 401, statusMessage: Authorization Required)" ,
   "timestamp"  :  "2019-10-29T04:01:32.127Z" 
}

Expected behavior:
No external action should be able to cause error log messages in our logs

Append the query string to ESI includes to work around the cache (temporary solution)

Using a cache killer (?x=<new value) in the URL is a good way to work around the strong cache we have on Fastly. Problem is that the ESI includes are always the same thus once header / footer are cached, you need a purge cache on Fastly to have the latest version.
Until we have a proper solution for #17, we could simple append the query string to ESI includes in order to them to also benefit from the cache killer.

Setup auto-publish

  • Setup the CI with secrets to deploy and publish
  • Setup restrictive PR rules
  • Don't allow Push to master
  • Auto deploy via CI on branch merge to master

Design 2.0

During last hackathon, @kamendola made some proposals for UI improvements. We should implement them.

@kamendola could you share all the resources necessary for implementation here ? thx.

Extract and productize re-usable code

The project should contain the minimum of custom code - some of the functions created here should be provided by the product like:

Those must be moved somewhere in helix (helix-pipeline) where they can be generically supported and tested...

cc @davidnuescheler

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: config
Error type: Invalid allowedVersions
Message: The following allowedVersions does not parse as a valid version or range: "<14>"

Theme System for Helix Pages

@ramboz's presentation at the AES Hackathon made me think about adding a theme system to Helix Pages.

Background

Theming systems should not just allow to create static file overlays as we have in Helix Pages right now, but they should be able to change the fundamental structure of the generated HTML to reflect different needs and usage patterns of sites.

While it is possible to build a blog using Helix Pages right now (see theblog), it requires larger amounts of client-side scripting that are non ideal. With a proper theming system, a user can select what kind of site they are building (collection of micro sites, blog post, documentation site, etc) and the HTML will follow the best practices.

It also allows us to establish a developer ecosystem of developers who are creating their own themes as forks of Helix Pages, deployed to Adobe I/O Runtime, that can be swapped in dynamically, without any coding or deployment required from the author.

Requirements

  • users should be able to preview a theme before applying it
  • users should be able to select a theme by placing a THEME file into their repository. The THEME file will point to the theme URL

Theme Rendering Implementation

  • Theme preference can be indicated through a X-Theme cookie. The VCL will use the X-Theme cookie value to resolve the correct action to execute.
  • If no X-Theme cookie has been set, the helix-theme web action will be called. This action reads the THEME file and returns the value in the X-Theme header. The response will be cached for a reasonable amount of time (1 hour to start).

Theme Selection UI

  • web based
  • requires GH authentication and selection of repository
  • theme URL is provided as a URL parameter (enabling "preview in Helix" links on the theme home page)
  • UI drops the correct Cookie and loads the preview in an IFrame
  • On save, the THEME file will be written and the theme cache will be cleared

Write Unit Tests

From day 1, html.pre.js and plain_html.pre.js contain a lot of code already. While a lot has to go away (see #5 and #6), we still need to unit test the pre function to make sure they behave as expected.

helix-pages smoke test service selection is not robust...

the test-service is chosen based on the modulo 3 of the circle CI number.
the problem arises if there are several PR running tests in parallel, the CI number doesn't increase sequentially and the chances are high that a PR uses a service already in use.

I don't know of a better solution, though....maybe base the modulo on the PR number instead?

supported / target browsers for helix pages

looking at current browser marketshare on:
http://gs.statcounter.com/browser-market-share/

i think that we should support the top 3 browsers for desktop and mobile,

...as of May 2019:

Desktop

  • Chrome - 69.09%
  • Firefox - 10.01%
  • Safari - 7.25%

Mobile

  • Chrome - 59.43%
  • Safari - 20.81%
  • Samsung Internet - 7.12%

i don't know that we would need to be very specific about the versions of these...

i am definitely open to supporting more browsers if we want to, but initially i don't think we need to make the list much bigger.

Clean up scripts

the following scripts are not longer needed as the respective mechanism is provided by the helix dispatcher:

src/
├── css.js
├── js.js
├── min_js.js
├── plain_html.htl

Prototype docs pages

Support for google docs mounts

For easy authoring it is cool to use a content provider like google docs for certain parts of a webpage. experimentally, this will be implement in the helix-pages scripts but could eventually be moved to helix proper.

Approach

  1. user shares a google drive folder with the helix bot
  2. user puts a fstab.json in his helix-pages repository, containing a mount root and a url pointing to the shared google drive folder: eg:
{
  "mountpoints": [{
      "root": "/mystuff",
      "url": "https://drive.google.com/drive/u/1/folders/1I_FwT5qXkZTevAeZ9EqUqLaS0RbLFkI2"
    }]
}
  1. the pipeline will load the fstab and use the information to fetch the markdown from a google-docs provider (could be implemented as a runtime action in the future, using the docs/drive API directly)

Support cache invalidation / purge

If you use helix-pages and request https://<repo>-<owner>.project-helix.page, pages get cached while you surf your website. Now, if you change some markdown in the owner/repo content repo, a cache purge needs to be done on Fastly. This is not optimal at all and limits the use of helix-pages.
Ideally: when a md file is changed, the corresponding page would be removed from the cache so that on next call, they are re-rendered. It is also expected that pages ESincluding the modified page gets invalidated.
Note that this might require several enhancements in the various Helix stack layers.

Example project repos

At the Basel hackathon we discussed creating example project repos which get users started quickly and let them progressively discover more Helix (Pages) features:

  • First Example (e.g. https://github.com/bdelacretaz/helix-example-zero)
    • content only: index.md (or use README.md as fallback?)
    • custom header and footer: add header.md & footer.md
    • add html files
    • custom styling: add style.css as override
  • Advanced Example
    • custom code: add src folder with html.htl & html.pre.js
    • separate code and content: add helix-config.yaml
    • proxy strain

static pdf file in content not delivered

Reproduction steps

  1. create a repo with index.md
  2. add file (e.g. test.pdf) next to index.md
  3. Link to ./test.pdf from index.md
  4. Go to https://repo-owner.project-helix.page/index.html
  5. Click link

Expected behavior

PDF gets rendered (or downloaded) by the browser

Actual behavior

The path to the PDF is shown (default behavior if ESI results in 404)

Create Integration Tests "infrastructure"

An important piece of the project is the customisation of the VCL code in order to extract the owner / repo / branch from the subdomain. Unfortunately, VCL code cannot be unit tested and requires some tricky regex.

Goal here is to write IT that validates against the Fastly environment that the different url patterns are recognised as expected. It is easy to construct urls that must return something assuming the underlying Git projects do not disappear:

https://hlxtest-tripodsan.project-helix.page/
https://hlxtest--tripodsan.project-helix.page/
https://hello-helix--stefan-guggisberg.project-helix.page/README.html
https://helix-test-davidnuescheler.project-helix.page/5-bsl.html
https://helix-test--davidnuescheler.project-helix.page/team.html
https://to-helix-pages--helix-test--davidnuescheler.project-helix.page/

We should probably create a stable Git repo for each of the test case (owner, repo, branch, with and without a dash) that we can test against.

idx_json

idx json doesn't work when content is fetched via fstab.

proposing to wrap sections into <main>

looking at some of the dom manipulations and usage of selectors it may both be semantically correct and more convenient to wrap the sections into a <main> tag which, also seems to be symmetrical to the use of <header> and <footer>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.