Code Monkey home page Code Monkey logo

pdf-service's Introduction

WeasyPrint in Docker

Stateless HTTP API to convert HTML to PDF

codecov.io Docker Image Version Docker Pulls Docker Image Size

A dockerized HTTP service, that generates PDF files from HTML using WeasyPrint. The primary use-case is generation of documents from developer controlled templates, such as invoices. It is not meant as a general webpage to PDF converter. The service expects input HTML and other resources to be safe and doesn't do any hardening or sandboxing that would be required for arbitrary inputs. Please consult the security section of this document.

Usage

Run the docker image mormahr/pdf-service and POST the HTML to /generate on port 8080.

Consult the API section for details about supported features and how to use them. See the deployment section (security in particular) for best practices in production environments.

docker run --rm -d --name pdf -p 8080:8080 mormahr/pdf-service

curl \
  -H "Content-Type: text/html" \
  --data '<p>Hello World!</p>' \
  http://localhost:8080/generate \
  > hello_world.pdf

docker stop pdf

API

Basic "simple" API without asset support

Make a POST request to /generate with the HTML file you want to render as the body. The response will be the PDF file.

curl \
  -H "Content-Type: text/html" \
  --data '<p>Hello World!</p>' \
  https://pdf.example.com/generate \
  > hello_world.pdf

Multipart API

Make a POST request to /generate with a Content-Type of multipart/form-data. Provide your HTML input as index.html and add any other required assets. The assets can be referenced in the HTML either as an absolute URL like /image.png or a relative one image.png. Relative URLs are resolved against /. Omit the leading slash for the multipart/form-data name attribute.

curl \
  -F [email protected] \
  -F [email protected] \
  -F sub-path/image.png=@sub-path/image.png \
  https://pdf.example.com/generate \
  > hello_world.pdf
<!-- index.html -->
<p>With an image:</p>
<img src="/image.png" />
<img src="/sub-path/image.png" />

Deployment

Versioning

The docker image is tagged as mormahr/pdf-service.

We follow semver as well as possible, including visual changes when we detect them. As such, we also tag release versions like :1.1.0. We support semver major (:1) or minor (:1.1) tags that use the latest minor or patch release version.

Images of the current development version are continuously pushed to the :edge tag. We strongly recommend that you use a release version instead of :edge.

Licensing

The service code is licensed under the MIT license. WeasyPrint, the underlying PDF generator library, is licensed under the BSD license. The prebuilt production container image contains a variety of licenses, including GPLv2 and GPLv3 code.

Security

It's not recommended allowing untrusted HTML input. Use trusted HTML templates and sanitize user inputs.

Fetching of external assets is prohibited as of now. You can add internal assets with the multipart API.

If your instance is exposed publicly, I recommend using a reverse proxy to terminate TLS connections and require authentication. You could use HTTP Basic Auth and then pass the pdf-service URL to your client software via an environment variable. This way auth information can be embedded like this: https://API_USER:[email protected]/generate, where API_USER and API_TOKEN are the credentials you set up in the reverse proxy.

Environment variables

  • WORKER_COUNT (default: 4) Sets the worker pool size of the gunicorn server executing pdf_service.

  • HOST if the hostname isn't set on the container, pass it as an environment variable to identify the service in Sentry.

  • SENTRY_DSN Enable the Sentry integration and use this DSN to submit data.

  • SENTRY_TRACES_SAMPLE_RATE (0.0 ... 1.0) If the Sentry integration is enabled this controls the tracing sample rate. It defaults to 1.0. Set it to 0.0 to disable tracing.

  • SENTRY_ENVIRONMENT This sets the environment sent to Sentry. Defaults to development.

  • SENTRY_RELEASE This sets the release sent to Sentry. We set this to the current git SHA and you normally shouldn't need to overwrite it.

  • SENTRY_TAG_* Set a tag to a specific value for all transactions. For example to set the tag test to abc, set the environment variable SENTRY_TAG_TEST=abc.

Health check

The service has a /health endpoint that will respond with a 200 status code if the service is running. This endpoint is also configured as a docker HEALTHCHECK.

Supported architectures

The docker image supports the linux/amd64 (regular Intel and AMD 64bit processors on x86_64) and linux/arm64 (Apple Silicon, AWS Graviton, etc.) architectures. Image sizes and other information that varies between architectures is taken from the linux/amd64 variant.

If you need a different architecture, please open an issue with your use-case.

Native Windows docker images are not supported. The linux image can be run on Windows using Docker Desktop.

Development

Setup the development environment

  • Setup python venv
  • pip install -r requirements.txt -r requirements-dev.txt (or: pip install -e '.[dev]')
  • Install docker and docker-compose to run tests. Tests run in docker to ensure render output doesn't differ based on platform.

Running

  • Run the development server with python -m pdf_service
  • Run tests with ./test or ./test-watch
    • Tests are executed within docker, to ensure render results are identical to the containerized version. The image contains external dependencies, but code and test files will be mounted from the project source. If you want to rebuild the dev image add --build to the end of the command. This will instruct docker-compose to rebuild the image.

Visual tests with reference images

e2e/data contains reference inputs *.html and corresponding output .png. The e2e test will render the html files and compare the output with the reference images to ensure no changes slipped in.

To update reference images or add new test cases run ./regenerate-e2e-references.

pdf-service's People

Contributors

jpxd avatar mormahr avatar renovate-bot avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ssharunas

pdf-service's Issues

Improve DX

  • docker-compose
    • test
    • test-local (volume from src)
    • toolbox (volume for test files)

Multi Stage Docker Build

To remove AGPL dev dependency from final image. Different stage for dev dependencies will also reduce rebuilds. (Don't copy source files)

Update licensing info

After changing to a multistage build and alpine the licensing situation has changed and the corresponding readme section should be revisited. The production images (without the -testing suffix) should no longer contain AGPL licensed code.

A structured approach would be best: #145

Generic sentry tags

Read all environment variables with SENTRY_TAG_ prefix and apply them instead of SENTRY_ORGANIZATION.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

File: .github/renovate.json5
Error type: The renovate configuration file contains some invalid settings
Message: packageRules: You have included an unsupported manager in a package rule. Your list: pip. Supported managers are: (ansible, ansible-galaxy, azure-pipelines, batect, batect-wrapper, bazel, buildkite, bundler, cargo, cdnurl, circleci, cloudbuild, cocoapods, composer, deps-edn, docker-compose, dockerfile, droneci, git-submodules, github-actions, gitlabci, gitlabci-include, gomod, gradle, gradle-lite, gradle-wrapper, helm-requirements, helm-values, helmfile, helmv3, homebrew, html, jenkins, kubernetes, kustomize, leiningen, maven, meteor, mix, nodenv, npm, nuget, nvm, pip_requirements, pip_setup, pipenv, poetry, pre-commit, pub, regex, ruby-version, sbt, setup-cfg, swift, terraform, terraform-version, terragrunt, terragrunt-version, travis).

Worker timeout every 17min

When leaving the pdf-service container running for some time without any load, every 17mins the workers quit with a critical error and are respawned.

Example log:

weasyprint_1  | [2021-08-29 12:42:14 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:9)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:10)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [9] [INFO] Worker exiting (pid: 9)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:11)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [10] [INFO] Worker exiting (pid: 10)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [11] [INFO] Worker exiting (pid: 11)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:12)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [12] [INFO] Worker exiting (pid: 12)
weasyprint_1  | [2021-08-29 12:42:14 +0000] [169] [INFO] Booting worker with pid: 169
weasyprint_1  | [2021-08-29 12:42:14 +0000] [170] [INFO] Booting worker with pid: 170
weasyprint_1  | [2021-08-29 12:42:14 +0000] [171] [INFO] Booting worker with pid: 171
weasyprint_1  | [2021-08-29 12:42:15 +0000] [172] [INFO] Booting worker with pid: 172
weasyprint_1  | [2021-08-29 12:59:36 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:169)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:170)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [169] [INFO] Worker exiting (pid: 169)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:171)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:172)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [171] [INFO] Worker exiting (pid: 171)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [170] [INFO] Worker exiting (pid: 170)
weasyprint_1  | [2021-08-29 12:59:36 +0000] [172] [INFO] Worker exiting (pid: 172)
weasyprint_1  | [2021-08-29 12:59:37 +0000] [329] [INFO] Booting worker with pid: 329
weasyprint_1  | [2021-08-29 12:59:37 +0000] [8] [WARNING] Worker with pid 171 was terminated due to signal 9
weasyprint_1  | [2021-08-29 12:59:37 +0000] [330] [INFO] Booting worker with pid: 330
weasyprint_1  | [2021-08-29 12:59:37 +0000] [331] [INFO] Booting worker with pid: 331
weasyprint_1  | [2021-08-29 12:59:37 +0000] [332] [INFO] Booting worker with pid: 332
weasyprint_1  | [2021-08-29 13:17:56 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:329)
weasyprint_1  | [2021-08-29 13:17:56 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:330)
weasyprint_1  | [2021-08-29 13:17:56 +0000] [8] [CRITICAL] WORKER TIMEOUT (pid:331)

Fix documentation typo

"SENTRY_TAG_* Set a tag to a specific value for all transactions. For example to set the tag user to abc, set the environment variable SENTRY_TAG_TEST=abc."

Has to be "the tag test to abc"

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update dependency MarkupSafe to v2.1.5
  • Update alpine Docker tag to v3.20
  • Update dependency attrs to v23.2.0
  • Update dependency blinker to v1.8.2
  • Update dependency cffi to v1.17.0
  • Update dependency coverage to v7.6.1
  • Update dependency fire to v0.6.0
  • Update dependency itsdangerous to v2.2.0
  • Update dependency pdf2image to v1.17.0
  • Update dependency pluggy to v1.5.0
  • Update dependency pycparser to v2.22
  • Update dependency pydyf to v0.11.0
  • Update dependency pyphen to v0.16.0
  • Update dependency pytest-mock to v3.14.0
  • Update dependency tinycss2 to v1.3.0
  • Update actions/upload-artifact action to v4
  • Update codecov/codecov-action action to v4
  • Update dependency Flask to v3
  • Update dependency attrs to v24
  • Update dependency packaging to v24
  • Update dependency pdfminer.six to v20240706
  • Update dependency pytest to v8
  • Update dependency pytest-cov to v5
  • Update dependency watchdog to v4
  • Update dependency weasyprint to v62
  • Update docker/build-push-action action to v6
  • Update docker/login-action action to v3
  • Update docker/metadata-action action to v5
  • Update docker/setup-buildx-action action to v3
  • Update github/codeql-action action to v3
  • 🔐 Create all rate-limited PRs at once 🔐

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

docker-compose
docker-compose.regenerate-e2e.yml
docker-compose.yml
e2e/docker-compose.yml
dockerfile
Dockerfile
  • python 3.10.4-alpine3.14
  • python 3.10.4-alpine3.14
e2e/Dockerfile
  • alpine 3.18
github-actions
.github/workflows/ci.yml
  • actions/checkout v4
  • docker/setup-buildx-action v2
  • docker/metadata-action v4
  • docker/metadata-action v4
  • docker/login-action v2
  • docker/build-push-action v3
  • docker/build-push-action v3
  • actions/checkout v4
  • docker/setup-buildx-action v2
  • docker/metadata-action v4
  • docker/login-action v2
  • docker/build-push-action v3
  • actions/checkout v4
  • codecov/codecov-action v3
  • actions/checkout v4
  • actions/upload-artifact v3
  • philips-labs/tern-action v1.2.0
  • actions/upload-artifact v3
  • docker/login-action v2
  • docker/metadata-action v4
  • ubuntu 20.04
  • ubuntu 20.04
  • ubuntu 20.04
  • ubuntu 20.04
  • ubuntu 20.04
  • ubuntu 20.04
.github/workflows/codeql-analysis.yml
  • actions/checkout v4
  • github/codeql-action v2
  • github/codeql-action v2
.github/workflows/release.yml
  • docker/login-action v2
  • docker/metadata-action v4
  • ubuntu 20.04
pip_requirements
requirements-dev.txt
  • attrs ==23.1.0
  • chardet ==4.0.0
  • colorama ==0.4.6
  • coverage ==7.3.1
  • cryptography ==41.0.6
  • diffimg ==0.3.0
  • docopt ==0.6.2
  • fire ==0.5.0
  • iniconfig ==2.0.0
  • packaging ==21.3
  • pdf2image ==1.16.3
  • pdfminer.six ==20201018
  • pluggy ==1.3.0
  • py ==1.11.0
  • pyparsing ==2.4.7
  • pytest ==6.2.5
  • pytest-cov ==2.12.1
  • pytest-mock ==3.11.1
  • pytest-watch ==4.2.0
  • sortedcontainers ==2.4.0
  • termcolor ==1.1.0
  • toml ==0.10.2
  • watchdog ==2.3.1
requirements.txt
  • blinker ==1.6.2
  • Brotli ==1.1.0
  • certifi ==2023.7.22
  • cffi ==1.15.1
  • click ==8.1.7
  • cssselect2 ==0.7.0
  • Flask ==2.3.3
  • fonttools ==4.42.1
  • html5lib ==1.1
  • itsdangerous ==2.1.2
  • Jinja2 ==3.1.2
  • MarkupSafe ==2.1.3
  • Pillow ==10.0.1
  • pycparser ==2.21
  • pydyf ==0.8.0
  • pyphen ==0.14.0
  • sentry-sdk ==1.31.0
  • six ==1.16.0
  • tinycss2 ==1.2.1
  • urllib3 ==2.0.7
  • weasyprint ==58.1
  • webencodings ==0.5.1
  • Werkzeug ==2.3.7
  • zopfli ==0.2.2

  • Check this box to trigger a request for Renovate to run again on this repository

Test regression in basic test

When I ported the test to curl i unintentionally made it a multipart request. The basic request should simply post the HTML as the body.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.