Code Monkey home page Code Monkey logo

pdiiif's Introduction

pdiiif logo

Demo

Sample PDF generated with the library

Library API Documentation

pdiiif is a JavaScript library to create PDFs from IIIF Manifests. For the most part, it runs both in browsers and as a Node.js server-side application. When generating a PDF in the browser, almost all communication happens directly between the user's browser and the IIIF APIs referenced from the Manifest. The only exception is for generating the cover page, which by default needs to be generated on the server. (see this section for more details)

It comes with a small sample web application that demonstrates how to use the library in the browser, you can check out a public instance of it at https://pdiiif.jbaiter.de, the source code is located in the pdiiif-web subdirectory.

A main goal of the library is to be as memory-efficient as possible, by never holding more than a few pages in memory and streaming directly to the user's disk (precise method depends on the environment).

It is also well-suited for embedding in other applications due to its relatively small footprint, for example, the example web application comes in at ~120KiB gzipped with all dependencies.

In addition to the images on the IIIF Canvases referenced in the manifest, the library can create a hidden text layer from OCR associated with each canvas (ALTO or hOCR referenced from a canvas' seeAlso property).

In order to not sever the connection between the PDF and the original IIIF resources on the Web, every PDF generated by pdiiif includes the IIIF Manifest as a PDF attachment, as well as every OCR file referenced in seeAlso. Additionally, pdiiif can generate the PDFs in a way that also makes them valid ZIP files that contain the manifest and all of the images and OCR files, with almost no storage overhead. (thanks to Ange Albertini and his work on Poc||GTFO for the inspiration!)

Features

  • PDF Page for every single-image Canvas in a Manifest
  • Rendering Canvases with multiple images
  • PDF Table of Contents from IIIF Ranges
  • Cover page with metadata, attribution and licensing information
  • Hidden text layer from ALTO or hOCR OCR
  • Render IIIF layers as PDF "optional content groups" that can be toggled
  • Rendering of IIIF Annotations as PDF annotations
  • Include IIIF Manifest and referenced OCR files as PDF attachments
  • Generate polyglot PDFs that are also ZIP files of all resources

Quickstart

Besides using the public instance at https://pdiiif.jbaiter.de, you can also run the app yourself. The easiest way to do this is with Docker:

$ docker build . -t pdiiif
# SYS_ADMIN capabilities are required (for Puppeteer's headless Chrome instance to generate cover page PDFs)
$ docker run -p 8080:8080 --cap-add=SYS_ADMIN --name pdiiif pdiiif

Cookbook Matrix

The IIIF Cookbook has a matrix of "recipes" with viewer support, here's an overview of the recipe support in pdiiif:

Basic Recipes (4 of 6 supported)
IIIF Properties (8 of 15 supported)
Structuring Resources (3 of 6 supported)
Image Recipes (6 of 6 supported)
Annotation Recipes (4 of 5 supported)
  • Simple Annotation — Tagging: YES
  • Tagging with an External Resource: NO
  • Annotation with a Non-Rectangular Polygon: YES
  • Simplest Annotation: YES
  • Embedded or referenced Annotations: YES

Structure of the repository

  • ./pdiiif-lib: Contains the library source code
  • ./pdiiif-api: Small node.js server application that is responsible for generating the cover pages and that can be used as a fallback for browsers that don't support the Native Filesystem API or service workers.
  • ./pdiiif-web: Sample web application (using Svelte) to demonstrate using pdiiif in the browser

Cover Page Endpoints

pdiiif tries to includes a cover page with a thumbnail, descriptive metadata and rights and attribution information. Since typesetting these pages is beyond the scope of what our bespoke PDF generator can provide (most notably, TTF/OTF font retrieval for arbitrary languages/scripts and font subsetting), this cover page currently needs to be generated elsewhere. By default, the library is using a public endpoint at https://pdiiif.jbaiter.de/api/coverpage, which generates a PDF with the default template. The endpoint can be changed with the coverPageEndpoint configuration parameter in the options passed to the convertManifest function.

If you want to customize the template that is being used, you can either host the API provided in this repository yourself (see Quickstart) and override the template by mounting your own custom Handlebars template into the image at /opt/pdiiif/pdiiif-api/dist/asses/coverpage.hbs. For a list of available helpers that you can use, refer to handlebars-helpers. Also available are these two custom helpers:

  • qrcode, takes a value and an optional { width, height, padding, color, background, ecl } options object and returns the value encoded as a SVG QR code image
  • sanitize-html, takes an arbitrary HTML string and sanitizes it according to the IIIF HTML rules

If you want to provide your own implementation, make sure that your HTTP endpoint generates a valid PDF and accepts a JSON POST body with the following shape (i.e. does not throw an error when encountering any of these fields):

{
  title: string;
  manifestUrl: string;
  thumbnail?: {
    url: string;
    iiifImageService?: string;
  };
  provider?: {
    label: string;
    homepage?: string;
    logo?: string;
  };
  requiredStatement?: {
    label: string;
    value: string;
  };
  rights?: {
    text: string;
    url?: string;
    logo?: string;
  };
  // [key, value] pairs, with value either single- or multi-valued
  metadata?: Array<[string, string | Array<string>]>;
  pdiiifVersion: string;
}

pdiiif's People

Contributors

jbaiter avatar phil-plencner-hl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pdiiif's Issues

can't build newest pull

Hi johannes,

when i pull down the most recent from main, i can't get it to build in docker. this is the error i get:

% docker build . -t pdiiif
[+] Building 85.8s (12/13)                                                                                                                                                                                                                                                                       
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                        0.1s
 => => transferring dockerfile: 37B                                                                                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                           0.0s
 => => transferring context: 34B                                                                                                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/node:16-slim                                                                                                                                                                                                                             1.2s
 => [1/9] FROM docker.io/library/node:16-slim@sha256:158eb4c4873c6ec690d6a556558aa78298ae4016040dfd3c759a23ba5ca20fb6                                                                                                                                                                       0.0s
 => => resolve docker.io/library/node:16-slim@sha256:158eb4c4873c6ec690d6a556558aa78298ae4016040dfd3c759a23ba5ca20fb6                                                                                                                                                                       0.0s
 => [internal] load build context                                                                                                                                                                                                                                                           0.1s
 => => transferring context: 20.28kB                                                                                                                                                                                                                                                        0.1s
 => CACHED [2/9] RUN apt-get update     && apt-get install -y wget gnupg     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/goo  0.0s
 => CACHED [3/9] RUN wget -q -O - https://unpkg.com/@pnpm/self-installer | node                                                                                                                                                                                                             0.0s
 => CACHED [4/9] WORKDIR /opt/pdiiif                                                                                                                                                                                                                                                        0.0s
 => CACHED [5/9] RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser     && mkdir -p /home/pptruser/Downloads     && chown -R pptruser:pptruser /home/pptruser     && chown -R pptruser:pptruser /opt/pdiiif                                                         0.0s
 => CACHED [6/9] COPY --chown=pptruser . .                                                                                                                                                                                                                                                  0.0s
 => CACHED [7/9] WORKDIR /opt/pdiiif                                                                                                                                                                                                                                                        0.0s
 => ERROR [8/9] RUN cd pdiiif-lib &&     pnpm i && pnpm build &&     cd ../pdiiif-web &&     pnpm i && pnpm build &&     cd ../pdiiif-api &&     pnpm i && pnpm run build &&     rm -rf ~/.pnpm-store                                                                                      84.2s
------                                                                                                                                                                                                                                                                                           
 > [8/9] RUN cd pdiiif-lib &&     pnpm i && pnpm build &&     cd ../pdiiif-web &&     pnpm i && pnpm build &&     cd ../pdiiif-api &&     pnpm i && pnpm run build &&     rm -rf ~/.pnpm-store:                                                                                                  
#12 3.093 Scope: all 3 workspace projects                                                                                                                                                                                                                                                        
#12 3.791 ..                                       | Progress: resolved 1, reused 0, downloaded 0, added 0                                                                                                                                                                                       
#12 4.478                                                                                                                                                                                                                                                                                        
#12 4.478    ╭──────────────────────────────────────────────────────────────────╮                                                                                                                                                                                                                
#12 4.478    │                                                                  │
#12 4.478    │                Update available! 6.27.1 → 6.28.0.                │
#12 4.478    │   Changelog: https://github.com/pnpm/pnpm/releases/tag/v6.28.0   │
#12 4.478    │                 Run pnpm add -g pnpm to update.                  │
#12 4.478    │                                                                  │
#12 4.478    │      Follow @pnpmjs for updates: https://twitter.com/pnpmjs      │
#12 4.478    │                                                                  │
#12 4.478    ╰──────────────────────────────────────────────────────────────────╯
#12 4.478 
#12 4.812 ..                                       | Progress: resolved 19, reused 0, downloaded 13, added 0
#12 6.008 ..                                       | Progress: resolved 19, reused 0, downloaded 14, added 0
#12 7.017 ..                                       | Progress: resolved 27, reused 0, downloaded 21, added 0
#12 8.020 ..                                       | Progress: resolved 34, reused 0, downloaded 31, added 0
#12 9.026 ..                                       | Progress: resolved 43, reused 0, downloaded 37, added 0
#12 10.03 ..                                       | Progress: resolved 55, reused 0, downloaded 48, added 0
#12 11.04 ..                                       | Progress: resolved 63, reused 0, downloaded 59, added 0
#12 12.07 ..                                       | Progress: resolved 72, reused 0, downloaded 65, added 0
#12 13.08 ..                                       | Progress: resolved 123, reused 0, downloaded 109, added 0
#12 14.08 ..                                       | Progress: resolved 152, reused 0, downloaded 140, added 0
#12 15.08 ..                                       | Progress: resolved 171, reused 0, downloaded 157, added 0
#12 16.09 ..                                       | Progress: resolved 184, reused 0, downloaded 168, added 0
#12 17.36 ..                                       | Progress: resolved 184, reused 0, downloaded 169, added 0
#12 18.37 ..                                       | Progress: resolved 233, reused 0, downloaded 218, added 0
#12 19.37 ..                                       | Progress: resolved 288, reused 0, downloaded 275, added 0
#12 20.37 ..                                       | Progress: resolved 320, reused 0, downloaded 307, added 0
#12 21.38 ..                                       | Progress: resolved 373, reused 0, downloaded 356, added 0
#12 22.38 ..                                       | Progress: resolved 447, reused 0, downloaded 428, added 0
#12 23.39 ..                                       | Progress: resolved 495, reused 0, downloaded 475, added 0
#12 24.41 ..                                       | Progress: resolved 526, reused 0, downloaded 507, added 0
#12 25.18  WARN  deprecated [email protected]: babel-eslint is now @babel/eslint-parser. This package will no longer receive updates.
#12 25.42 ..                                       | Progress: resolved 573, reused 0, downloaded 542, added 0
#12 26.42 ..                                       | Progress: resolved 612, reused 0, downloaded 579, added 0
#12 27.52 ..                                       | Progress: resolved 652, reused 0, downloaded 620, added 0
#12 28.53 ..                                       | Progress: resolved 725, reused 0, downloaded 695, added 0
#12 29.33  WARN  deprecated [email protected]: some dependency vulnerabilities fixed, support for node < 10 dropped, and newer ECMAScript syntax/features added
#12 29.53 ..                                       | Progress: resolved 780, reused 0, downloaded 751, added 0
#12 30.55 ..                                       | Progress: resolved 832, reused 0, downloaded 796, added 0
#12 31.56 ..                                       | Progress: resolved 887, reused 0, downloaded 854, added 0
#12 32.05 ../pdiiif-api                            |  WARN  deprecated [email protected]
#12 32.59 ..                                       | Progress: resolved 969, reused 0, downloaded 928, added 0
#12 33.47  WARN  deprecated [email protected]: https://github.com/lydell/resolve-url#deprecated
#12 33.52  WARN  deprecated [email protected]: Please see https://github.com/lydell/urix#deprecated
#12 33.59 ..                                       | Progress: resolved 1065, reused 0, downloaded 1034, added 0
#12 34.89 ..                                       | Progress: resolved 1088, reused 0, downloaded 1066, added 0
#12 34.91 ..                                       |    +1071 ++++++++++++++++++++++++++++
#12 35.60 Packages are hard linked from the content-addressable store to the virtual store.
#12 35.60   Content-addressable store is at: /home/pptruser/.pnpm-store/v3
#12 35.60   Virtual store is at:             ../node_modules/.pnpm
#12 35.89 ..                                       | Progress: resolved 1088, reused 0, downloaded 1068, added 55
#12 36.90 ..                                       | Progress: resolved 1088, reused 0, downloaded 1069, added 365
#12 37.90 ..                                       | Progress: resolved 1088, reused 0, downloaded 1069, added 764
#12 38.90 ..                                       | Progress: resolved 1088, reused 0, downloaded 1070, added 1070
#12 39.57 .../[email protected]/node_modules/core-js postinstall$ node -e "try{require('./postinstall')}catch(e){}"
#12 39.65 .../node_modules/svelte-preprocess postinstall$ echo "[svelte-preprocess] Don't forget to install the preprocessors packages that will be used: node-sass/sass, stylus, less, postcss & postcss-load-config, coffeescript, pug, etc..."
#12 39.67 .../node_modules/svelte-preprocess postinstall: [svelte-preprocess] Don't forget to install the preprocessors packages that will be used: node-sass/sass, stylus, less, postcss & postcss-load-config, coffeescript, pug, etc...
#12 39.67 .../node_modules/svelte-preprocess postinstall: Done
#12 39.78 .../[email protected]/node_modules/core-js postinstall: Thank you for using core-js ( https://github.com/zloirock/core-js ) for polyfilling JavaScript standard library!
#12 39.78 .../[email protected]/node_modules/core-js postinstall: The project needs your help! Please consider supporting of core-js on Open Collective or Patreon: 
#12 39.78 .../[email protected]/node_modules/core-js postinstall: > https://opencollective.com/core-js 
#12 39.78 .../[email protected]/node_modules/core-js postinstall: > https://www.patreon.com/zloirock 
#12 39.78 .../[email protected]/node_modules/core-js postinstall: Also, the author of core-js ( https://github.com/zloirock ) is looking for a good job -)
#12 39.79 .../[email protected]/node_modules/core-js postinstall: Done
#12 39.80 .../[email protected]/node_modules/esbuild postinstall$ node install.js
#12 39.90 ..                                       | Progress: resolved 1088, reused 0, downloaded 1071, added 1071, done
#12 39.94 .../[email protected]/node_modules/esbuild postinstall: Done
#12 39.95 .../node_modules/puppeteer install$ node install.js
#12 39.98 .../[email protected]/node_modules/nodemon postinstall$ node bin/postinstall || exit 0
#12 40.21 .../[email protected]/node_modules/nodemon postinstall: Love nodemon? You can now support the project via the open collective:
#12 40.21 .../[email protected]/node_modules/nodemon postinstall:  > https://opencollective.com/nodemon/donate
#12 40.36 .../[email protected]/node_modules/nodemon postinstall: Done
#12 40.39 .../node_modules/puppeteer install: **INFO** Skipping browser download. "PUPPETEER_SKIP_CHROMIUM_DOWNLOAD" environment variable was found.
#12 40.41 .../node_modules/puppeteer install: Done
#12 41.31 
#12 41.31 dependencies:
#12 41.31 + async-mutex 0.3.2
#12 41.31 + cross-fetch 3.1.4
#12 41.31 + dedent-js 1.0.1
#12 41.31 + jsdom 17.0.0
#12 41.31 + lodash 4.17.21
#12 41.31 + manifesto.js 4.2.4
#12 41.31 + p-queue 6.6.2
#12 41.31 + prom-client 14.0.1
#12 41.31 + tslib 2.3.1
#12 41.31 
#12 41.31 devDependencies:
#12 41.31 + @types/common-tags 1.8.1
#12 41.31 + @types/jest 26.0.24
#12 41.31 + @types/jsdom 16.2.13
#12 41.31 + @types/lodash 4.14.177
#12 41.31 + @types/node 10.17.60
#12 41.31 + @types/node-fetch 2.5.12
#12 41.31 + @types/pako 1.0.2
#12 41.31 + @types/tmp 0.2.2
#12 41.31 + @types/wicg-file-system-access 2020.9.4
#12 41.31 + @typescript-eslint/eslint-plugin 4.33.0
#12 41.31 + @typescript-eslint/parser 4.33.0
#12 41.31 + eslint 7.32.0
#12 41.31 + eslint-config-prettier 6.15.0
#12 41.31 + eslint-config-recommended 4.1.0
#12 41.31 + globby 10.0.2
#12 41.31 + jest 26.6.3
#12 41.31 + jest-fetch-mock 3.0.3
#12 41.31 + npm-run-all 4.1.5
#12 41.31 + pdf-lib 1.17.1
#12 41.31 + prettier 2.4.1
#12 41.31 + tmp 0.2.1
#12 41.31 + ts-jest 26.5.6
#12 41.31 + ts-node 8.10.2
#12 41.31 + typedoc 0.22.10
#12 41.31 + typescript 4.5.2
#12 41.31 
#12 41.32  WARN  Issues with peer dependencies found
#12 41.32 pdiiif-api
#12 41.32 └─┬ @wesleytodd/openapi
#12 41.32   └─┬ redoc
#12 41.32     ├── ✕ missing peer react@"^16.8.4 || ^17.0.0"
#12 41.32     ├── ✕ missing peer react-dom@"^16.8.4 || ^17.0.0"
#12 41.32     ├── ✕ missing peer styled-components@"^4.1.1 || ^5.1.1"
#12 41.32     ├── ✕ missing peer mobx@^6.0.4
#12 41.32     ├── ✕ missing peer core-js@^3.1.4
#12 41.32     ├─┬ @redocly/react-dropdown-aria
#12 41.32     │ ├── ✕ missing peer react@"^16.8.4 || ^17.0.0"
#12 41.32     │ ├── ✕ missing peer react-dom@"^16.8.4 || ^17.0.0"
#12 41.32     │ └── ✕ missing peer styled-components@^5.1.1
#12 41.32     ├─┬ mobx-react
#12 41.32     │ ├── ✕ missing peer react@"^16.8.0 || ^17"
#12 41.32     │ ├── ✕ missing peer react-dom@"*"
#12 41.32     │ ├── ✕ missing peer mobx@^6.1.0
#12 41.32     │ └─┬ mobx-react-lite
#12 41.32     │   ├── ✕ missing peer react@"^16.8.0 || ^17"
#12 41.32     │   ├── ✕ missing peer react-dom@"*"
#12 41.32     │   └── ✕ missing peer mobx@^6.1.0
#12 41.32     └─┬ react-tabs
#12 41.32       └── ✕ missing peer react@"^16.3.0 || ^17.0.0-0"
#12 41.32 Peer dependencies that should be installed:
#12 41.32   core-js@^3.1.4
#12 41.32   mobx@">=6.1.0 <7.0.0"
#12 41.32   react-dom@">=16.8.4 <17.0.0 || >=17.0.0 <18.0.0"
#12 41.32   react@">=16.8.4 <17.0.0 || >=17.0.0-0 <17.0.0 || >=17.0.0 <18.0.0"
#12 41.32   styled-components@">=5.1.1 <6.0.0"
#12 41.32 
#12 41.32 pdiiif-lib
#12 41.32 └─┬ jest
#12 41.32   └─┬ @jest/core
#12 41.32     └─┬ jest-config
#12 41.32       ├── ✕ unmet peer ts-node@>=9.0.0: found 8.10.2
#12 41.32       └─┬ @jest/test-sequencer
#12 41.32         └─┬ jest-runner
#12 41.32           └─┬ jest-config
#12 41.32             ├── ✕ unmet peer ts-node@>=9.0.0: found 8.10.2
#12 41.32             └─┬ jest-jasmine2
#12 41.32               └─┬ jest-runtime
#12 41.32                 └─┬ jest-config
#12 41.32                   └── ✕ unmet peer ts-node@>=9.0.0: found 8.10.2
#12 41.32 
#12 42.47 
#12 42.47 > [email protected] build /opt/pdiiif/pdiiif-lib
#12 42.47 > run-s build:*
#12 42.47 
#12 43.94 
#12 43.94 > [email protected] build:esm /opt/pdiiif/pdiiif-lib
#12 43.94 > tsc
#12 43.94 
#12 59.30 
#12 59.30 > [email protected] build:cjs /opt/pdiiif/pdiiif-lib
#12 59.30 > tsc -p tsconfig-cjs.json && echo '{
#12 59.30   "type": "commonjs"
#12 59.30 }' > ./lib/commonjs/package.json
#12 59.30 
#12 73.73 Scope: all 4 workspace projects
#12 75.23 
#12 76.26 
#12 76.26 > [email protected] build /opt/pdiiif/pdiiif-web
#12 76.26 > run-s check compile
#12 76.26 
#12 77.64 
#12 77.64 > [email protected] check /opt/pdiiif/pdiiif-web
#12 77.64 > svelte-check --human && tsc --noEmit
#12 77.64 
#12 78.69 
#12 78.69 ====================================
#12 78.69 Loading svelte-check in workspace: /opt/pdiiif/pdiiif-web
#12 78.69 Getting Svelte diagnostics...
#12 78.69 
#12 83.99 /opt/pdiiif/pdiiif-web/src/App.svelte:6:45
#12 83.99 Error: Module '"pdiiif"' has no exported member 'CancelToken'. (ts)
#12 83.99   import { debounce, without } from 'lodash';
#12 83.99   import { convertManifest, ProgressStatus, CancelToken } from 'pdiiif';
#12 83.99 
#12 83.99 
#12 83.99 
#12 83.99 /opt/pdiiif/pdiiif-web/src/Preview.svelte:43:19
#12 83.99 Hint: 'err' is declared but its value is never read. (ts)
#12 83.99             <strong>{(size / 1024 / 1024).toFixed(2)} MiB</strong>
#12 83.99           {:catch err}
#12 83.99             <strong>{$_('errors.estimate_failure')}</strong>
#12 83.99 
#12 83.99 
#12 84.00 ====================================
#12 84.00 svelte-check found 1 error, 0 warnings, and 1 hint
#12 84.04  ELIFECYCLE  Command failed with exit code 1.
#12 84.06 ERROR: "check" exited with 1.
#12 84.07  ELIFECYCLE  Command failed with exit code 1.
------
executor failed running [/bin/sh -c cd pdiiif-lib &&     pnpm i && pnpm build &&     cd ../pdiiif-web &&     pnpm i && pnpm build &&     cd ../pdiiif-api &&     pnpm i && pnpm run build &&     rm -rf ~/.pnpm-store]: exit code: 1

thanks!

api-server doesn't seem to validate manifestUrl

when i post the demo manifest in the pdflib code ( https://iiif.wellcomecollection.org/presentation/v2/b18035723 ) or my manifest ( https://iiif.lib.harvard.edu/manifests/drs:5678456 ) against the api server, i get this error:

info: Rejected PDF request due to validation errors: Request validation failed {"details":[{"dataPath":".query.canvasNos","keyword":"type","message":"should be array","params":{"type":"array"},"schemaPath":"#/properties/query/properties/canvasNos/type"}],"service":"pdiiif"}

i'm not sure what i'm doing wrong here. i'm using the swagger ui and sending those manifests as the manifestUrl.

Docker istallation pnpm

Hi Johannes, thanks for this tool for the whole IIIF community,

I am unsure if this is an issue or if I need to add some settings in Dockerfile.

At the first start, I get an issue in installing of pnpm.

> `Step 10/23 : RUN wget -q -O - https://unpkg.com/@pnpm/self-installer | node
>  ---> Running in ac4d7615dfbf
> Downloading https://registry.npmjs.org/pnpm/-/pnpm-7.18.1.tgz
> GotError [RequestError]: connect ETIMEDOUT 104.16.24.35:443
>     at ClientRequest.<anonymous> ([stdin]:11520:22)
>     at Object.onceWrapper (node:events:642:26)
>     at ClientRequest.emit (node:events:527:28)
>     at TLSSocket.socketErrorListener (node:_http_client:454:9)
>     at TLSSocket.emit (node:events:527:28)
>     at emitErrorNT (node:internal/streams/destroy:157:8)
>     at emitErrorCloseNT (node:internal/streams/destroy:122:3)
>     at processTicksAndRejections (node:internal/process/task_queues:83:21) {
>   code: 'ETIMEDOUT',
>   host: 'registry.npmjs.org',
>   hostname: 'registry.npmjs.org',
>   method: 'GET',
>   path: '/pnpm/-/pnpm-7.18.1.tgz',
>   protocol: 'https:',
>   url: 'https://registry.npmjs.org/pnpm/-/pnpm-7.18.1.tgz'
> 
> `

I can reach the following IP ( 104.16.24.35:443) from my server.
Do you have any ideas on how to solve it?

Page range generation fails due to structures

If a user requests a range of pages for a manifest that contains structures (a.k.a. table of contents for the generated PDFs) then PDF generation will fail if any on the canvases present in the structures are not also included in the requested page range.

The effect is that a "Could not find canvas with id XXX in manifest!" error is thrown in the _addOutline method, which causes PDF generation to fail.

Steps to reproduce

Visit the demo site and use the sample Book-with-TOC manifest
https://iiif.io/api/cookbook/recipe/0024-book-4-toc/manifest.json - if you select any page range other than 1-6 then it will fail, given that all six pages are referenced in the structures array.

Add IIIF manifest as parameter to url

Discussed in #14

Originally posted by andybuki January 6, 2023
Hi Johannes, I want to ask you about the following scenario:

I add to my application page a button that I call PDF generation.

image

I know the URL of the IIIF manifest, which I want to add at the next step to pdfiiif application.

Can you recommend how to integrate it into Manifest URL?

image

Broken PDF for Harvard Manifest

https://iiif.lib.harvard.edu/manifests/drs:5981093

Results in corruption only sometimes, not always reproducibly.

$ qpdf --check 
....
WARNING: Catholic Church. Heures de Nôtre Dame (use of Troyes and Sens) _ manuscript, [ca. 1470]. MS Richardson 7. Houghton Library, Harvard University, Cambridge, Mass.pdf, stream object 570 0: operation for dictionary attempted on object of type stream: returning false for a key containment request
WARNING: Catholic Church. Heures de Nôtre Dame (use of Troyes and Sens) _ manuscript, [ca. 1470]. MS Richardson 7. Houghton Library, Harvard University, Cambridge, Mass.pdf, stream object 570 0: /Type key should be /Page but is not; overriding
WARNING: Catholic Church. Heures de Nôtre Dame (use of Troyes and Sens) _ manuscript, [ca. 1470]. MS Richardson 7. Houghton Library, Harvard University, Cambridge, Mass.pdf, stream object 570 0: operation for dictionary attempted on object of type stream: ignoring key replacement request
...

Object 570 in the PDF is referenced as a /Page object in multiple places, but upon inspection it's actually the PKZIP header object for the page's image, with the actual page object being Object 568.

The mismatch probably comes from the fact that for some reason the page at Object 566 does not have the expected number of associated XObjects due to a missing image.

This probably happens because we get an error while fetching the info.json for the associated canvas: https://ids.lib.harvard.edu/ids/iiif/5981246/info.json returns 500 (but not always?)

Multiple other images in the manifest also return a HTTP 403 when trying to download them. Are we running into some kind of abuse detection system?

Takeaways:

  • For 500 and 403 errors we should retry a few times with a backoff before we fail
  • The code should not silently ignore image download failures, but notify the user of them
  • The correctness of the pre-calculated object numbers should be preserved by either inserting a placeholder image for these failed images, or simply place empty dummy objects with an empty stream in their place

Update: When debugging this tonight, the manifest timed out a lot, which revealed another bug in the StreamSaver I/O implementation, when the cover page is taking a long time to generate, the stream is prematurely closed and the user is left with an empty PDF file.

  • Fix empty PDFs when cover page generation takes too long (timeout, fix keepalive implementation)

Manifests from Vatican Library don't work

Example: https://digi.vatlib.it/iiif/MSS_Arch.Cap.S.Pietro.C.132/manifest.json

Images are discovered, but can't be fetched due to a missing CORS header on the image responses.
Our server-fallback is not triggered, since we only perform the CORS check on the image resource identifier, which is guaranteed to be a resolvable image URL in IIIFv3, but not in IIIFv2.
In the Vatican manifests, the identifier resolves to the JSON-LD of the Image Resource itself, not to the image, which has CORS enabled, so we mistakenly assume that the image itself has CORS enabled as well.

What this means for the code: To check for CORS, we have to:

  • Determine if the identifier of an image resource resolves to an actual image by fetching it
  • If it does not, determine an Image URL from an associated IIIF Image API service definition
  • Make a GET request to the response (HEAD support is spotty and unreliable, e.g. the Vatican endpoint doesn't give us the info we need) and check if we can access the first few bytes of the response from JavaScript

Failed to generate PDF: navigator.locks is undefined

Hello! I've just tried out your code (finally 1 year after the 2022 conference) and I am running into an issue.

"Failed to generate PDF: navigator.locks is undefined"

This is a 2.1 manifest: https://data.metascripta.org/iiif/VFL_VTL_03177_01.json

Initially I had an @id mismatch and I got a "u is null" error, but then I tested with the validator and got it corrected.
http://presentation-validator.iiif.io/validate?version=2.1&url=https://data.metascripta.org/iiif/VFL_VTL_03177_01.json

Now when I try to generate PDF, I get the error above. It does successfully calculate the PDF size (some 800MB).

Any ideas? Running Docker on Mac OS Monterey (12.3.1).

Screen Shot 2023-06-09 at 3 42 32 AM

Canvases without associated Image Service, but only an Identifier are not recognized

This PDF is generated but pages are all empty. Loads without problems in Mirador:
https://projectmirador.org/embed/?iiif-content=https://icat.digitaldogsbody.net/iiif/1/manifest

Manifest: https://icat.digitaldogsbody.net/iiif/1/manifest

Problematic Canvas Example:

{
"id": "https://icat.digitaldogsbody.net/iiif/1/canvas/11",
"type": "Canvas",
"height": 1471,
"width": 1000,
"items": [
	{
		"id": "https://icat.digitaldogsbody.net/iiif/1/canvas/11/page",
		"type": "AnnotationPage",
		"items": [
			{
				"id": "https://icat.digitaldogsbody.net/iiif/1/canvas/11/page/image",
				"type": "Annotation",
				"motivation": "painting",
				"body": {
					"id": "https://files.transkribus.eu/Get?id=CUWUBAJLTWELOXAXSQTGJMMM&fileType=view",
					"type": "Image",
					"height": 1471,
					"width": 1000,
					"format": "image/jpg"
				},
				"target": "https://icat.digitaldogsbody.net/iiif/1/canvas/11"
			}
		]
	}
  ]
}

grafik

Originally posted by @ch-sander in #23 (comment)

Image dimensions

Hi Johannes,

what a fantastic tool! Congrats on a great idea and excellent execution.

One thing I noticed is that in the PDF download the image dimensions seem warped. At least for one of our manifests (https://anet.be/iiif/4de3efcf34bc21682b4a82d35fe32b5facfaf0e5/manifest) the PDF looks rather different than when it is viewed in our viewer (https://anet.be/brocade/imageviewer/universalviewer/uv.html#?manifest=/iiif/4de3efcf34bc21682b4a82d35fe32b5facfaf0e5/manifest&config=/iiif/viewercfg.phtml%3Fcg=uauniversalviewer).

See screenshots...
native_viewer
pdf_view

Create PDF not working properly in Docker

I can start the docker container and add a manifest.
On the demo web page (https://pdiiif.jbaiter.de/), and I get a PDF back,
but on my Docker Server I have the following issue:

PDF-Erstellung fehlgeschlagen: Cannot read properties of undefined (reading 'request')

> index.ab1e0e54.js:22 Failed to render page TypeError: Cannot read properties of undefined (reading 'request')
>     at index.ab1e0e54.js:107:30836
>     at new Promise (<anonymous>)
>     at Object.write (index.ab1e0e54.js:107:30805)
>     at kw.write (index.ab1e0e54.js:27:358)
>     at Fw.write (index.ab1e0e54.js:27:182)
>     at mC._write (index.ab1e0e54.js:98:217)
>     at mC._flush (index.ab1e0e54.js:89:1306)
>     at mC.finalizePdfHeader (index.ab1e0e54.js:87:5797)
>     at mC.renderPage (index.ab1e0e54.js:87:6037)
>     at xC (index.ab1e0e54.js:106:19776)
> error @ index.ab1e0e54.js:22
> xC @ index.ab1e0e54.js:106
> await in xC (async)
> X @ index.ab1e0e54.js:107
> await in X (async)
> _ @ index.ab1e0e54.js:107
> index.ab1e0e54.js:107 TypeError: Cannot read properties of undefined (reading 'request')
>     at index.ab1e0e54.js:107:30836
>     at new Promise (<anonymous>)
>     at Object.write (index.ab1e0e54.js:107:30805)
>     at kw.write (index.ab1e0e54.js:27:358)
>     at Fw.write (index.ab1e0e54.js:27:182)
>     at mC._write (index.ab1e0e54.js:98:217)
>     at mC._flush (index.ab1e0e54.js:89:1306)
>     at mC.finalizePdfHeader (index.ab1e0e54.js:87:5797)
>     at mC.renderPage (index.ab1e0e54.js:87:6037)
>     at xC (index.ab1e0e54.js:106:19776)

When I click on PDF button on your server it opens a window where I can save my pdf file. (works correctly)
In my case it creates an empty pop-window (http://b-lx0201.sbb.spk-berlin.de:8082/streamsaver-mitm.html)

Do you have any ideas?

Adding option for color space 'gray' or 'bitonal'

  • Allow users to set a colorspace supported by the IIIF Image API
  • This should be combined with support for PNG images in the generated PDFs, especially with the bitonal color space this could result in very small PDFs compared to color.

Originally posted by @rdf9 in #32

Cannot read properties of undefined (reading 'type')

Hi, after last update i get following issue:

You can test it using my manifest: [https://iiif.crossasia.org/s/dllm/madoc/api/manifests/20817/export/3.0]

hub-77f45d8c.js:6 Error while generating cover page TypeError: Cannot read properties of undefined (reading 'type') at ab.get (index-1587c411.js:48:81127) at i (index-1587c411.js:17:32273) at Object.i [as getBestThumbnailAtSize] (index-1587c411.js:17:32900) at Pb (index-1587c411.js:50:2214) at e1 (index-1587c411.js:101:16175) at t1 (index-1587c411.js:101:19039) at async x (index-1587c411.js:102:46184) at async HTMLButtonElement.N (index-1587c411.js:102:48214) (anonymous) @ hub-77f45d8c.js:6 hub-77f45d8c.js:6 TypeError: Cannot read properties of undefined (reading 'type') at ab.get (index-1587c411.js:48:81127) at i (index-1587c411.js:17:32273) at Object.i [as getBestThumbnailAtSize] (index-1587c411.js:17:32900) at Pb (index-1587c411.js:50:2214) at e1 (index-1587c411.js:101:16175) at t1 (index-1587c411.js:101:19039) at async x (index-1587c411.js:102:46184) at async HTMLButtonElement.N (index-1587c411.js:102:48214)

Do you have any ideas?

Provider without logo causes exception

Trying a Presentation 3 Manifest with a provider that does not contain a logo triggers an exception.

logo: provider.logo?.[0].id,
See also the previous line's use of homepage. It looks like these properties are being initialized to empty arrays if no value is provided in the Manifest. Debugger view:

Screen Shot 2022-12-07 at 2 45 10 PM

Sample Manifest: https://collections.library.yale.edu/manifests/15504927
Web page of object: https://collections.library.yale.edu/catalog/15504927

Table of contents is duplicated for nested ranges

If a manifest includes nested ranges in its structures array, the generated PDF will duplicate named links to each of the canvases in the table of contents, instead of correctly displaying a single hierarchy.

Steps to reproduce

Using the demo site with a manifest containing nested ranges in its structures array such as https://iiif.lib.harvard.edu/manifests/drs:4997399 generates a PDF that has both the correct hierarchy and the repeated individual canvases in the table of contents. For comparison, this duplication does not display in the "Index" tab of the Mirador demo site.

"Fetch is not defined" error when starting docker

When firing up Docker on the latest from main I'm currently seeing:

file:///opt/pdiiif/pdiiif-lib/lib/esm/iiif.js:14
let fetchImpl = fetch;
                ^

ReferenceError: fetch is not defined
    at file:///opt/pdiiif/pdiiif-lib/lib/esm/iiif.js:14:17
    at ModuleJob.run (node:internal/modules/esm/module_job:198:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:385:24)
    at async loadESM (node:internal/process/esm_loader:88:5)
    at async handleMainPromise (node:internal/modules/run_main:61:12)

A potential workaround is to update the offending lines in download.ts, iiif.ts, ocr.ts to be something like:

// Fetch for node
let fetchImpl: typeof fetch;
if (typeof fetch === 'undefined') {
  fetchImpl = nodeFetch as typeof fetch;
} else {
  fetchImpl = fetch;
}

Then things start working for me again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.