Code Monkey home page Code Monkey logo

mupdf.js's Introduction

MuPDF.js

This is a build of MuPDF for JavaScript and TypeScript, using the speed and performance of WebAssembly.

The MuPDF.js library can be used both in browsers and in Node.js.

Features

  • Render PDF pages to images
  • Search PDF file text contents
  • Create and edit PDF annotations
  • Access and fill out PDF forms
  • Edit PDF documents
  • Supports basic CJK (Chinese, Japanese, Korean) fonts

Installing

From the command line, go to the folder you want to work from and run:

npm install mupdf

The mupdf module is only available as an ESM module. Either use the .mjs file extension or change the project type:

npm pkg set type=module

Running

The following example script demonstrates how to load a document and then print out the page count.

Create a file count-pages.mjs:

import * as process from "node:process"
import * as fs from "node:fs"
import * as mupdf from "mupdf"

if (process.argv.length < 3) {
    console.error("usage: node count-pages.mjs file.pdf");
    process.exit(1);
}

const filename = process.argv[2];
const doc = mupdf.Document.openDocument(fs.readFileSync(filename), "application/pdf");
const count = doc.countPages();

console.log(`${filename} has ${count} pages.`);

Run the script:

node count-pages.mjs file.pdf

Using Typescript

To use TypeScript you need to create a tsconfig.json project file to tell the compiler and Visual Studio Code to use the "nodenext" module resolution:

{
    "compilerOptions": {
        "module": "nodenext"
    }
}

License and Copyright

MuPDF.js is available under Open Source AGPL and commercial license agreements. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.

Documentation

For documentation please refer to mupdfjs.readthedocs.io.

Code Examples

Check out the example projects to help you get started. The examples include a simple PDF Viewer that runs mupdf in the browser, several command line scripts, and more!

Getting Started with Local Development

You can build the MuPDF.js library from source by referring to BUILDING.md.

Contributing

To contribute please open up (or help answer!) an Issue on our Github board and create a Pull Request (PR) for review. Find us on Discord at #mupdf-js to chat with us directly.

mupdf.js's People

Contributors

ccxvii avatar jamie-lemon avatar github-actions[bot] avatar sebras avatar chris avatar penxla avatar mipo1357 avatar

Stargazers

3λiȯ+ avatar  avatar Age avatar Al-Khawarizmi avatar Ilya Elias Sidorov avatar Hwantae Ji avatar Cristián Lávaque avatar  avatar Hasan avatar JamesLi avatar  avatar  avatar 诸葛蛋 avatar danger-dream avatar rubickecho avatar v avatar zhangkejiang avatar UnicornLee avatar rongye avatar Louis Loo avatar Hàn Đặng Phương Nam avatar Loki avatar 一叶知秋olka avatar  avatar ourines avatar  avatar Jehoshaphat Tse avatar Firejox avatar 爱可可-爱生活 avatar Jesús Villamarín avatar Thibault Durand avatar bertrand avatar Chris Hart avatar  avatar author.zero avatar Seungil Kim avatar Jiho Park avatar Jinseok Eo avatar Bhavin Patel avatar Henrik Westphal avatar ik5 avatar sungmin hur avatar Sung Jeon avatar JEP avatar ChangSik Yoon avatar  avatar Mustafa avatar 조코딩 JoCoding avatar Roman Timashev avatar 김지섭 avatar 정종훈 avatar Derick Rodriguez avatar Lee Seungmin avatar Tom avatar YoungGeun Kwon avatar Ricardo Tavares avatar Alexis May Chan avatar Huy Nguyen Quang avatar  avatar Jinman avatar Priestch avatar 0w0 avatar RunFridge avatar Jorj X. McKie avatar Dongsu Jang avatar  avatar Bumsoo Kim avatar  avatar Chanwoong Kim avatar Avinash avatar  avatar KimVuu avatar 잉여개발기 avatar Bamdad Sabbagh avatar Heung Jun Park avatar kokoro avatar Soumyajit Pathak avatar Daniel Sudmann avatar  avatar  avatar Eden avatar SEOA7777 avatar Han Lee avatar Hyeseong Kim avatar dohoons avatar Chichi avatar hmmhmmhm avatar asklsd avatar Terra avatar stargt avatar yongsk0066 avatar  avatar zenofile avatar  avatar ted-millie avatar  avatar Danny Spangenberg avatar Branden Colen avatar Chris avatar Shalom Friss avatar

Watchers

Adelar da Silva Queiróz avatar  avatar  avatar James Cloos avatar Robin Watts avatar Marc Seitz avatar  avatar  avatar  avatar  avatar  avatar Julian Smith avatar  avatar

mupdf.js's Issues

API for remove embedded file

We can add embedded files to an annotation, can we add an API to remove the embedded file from an annotation?

DocumentWriter.close WASM RuntimeError

The DocumentWriter.close function seems malfunction and throws WASM RuntimeError.

This is the code reduced to the minimum:

import * as mupdf from 'mupdf'

const outBuffer = new mupdf.Buffer()
const out = new mupdf.DocumentWriter(outBuffer, ".pdf", "")
out.close()

It throws the following error

wasm://wasm/0239b566:1


RuntimeError: null function or function signature mismatch
    at wasm://wasm/0239b566:wasm-function[1894]:0x13ff94
    at wasm://wasm/0239b566:wasm-function[1893]:0x13ff49
    at wasm://wasm/0239b566:wasm-function[1985]:0x147d8e
    at wasm://wasm/0239b566:wasm-function[1895]:0x13fff1
    at wasm://wasm/0239b566:wasm-function[3558]:0x2669f9
    at invoke_viiii (file:///.../node_modules/mupdf/dist/mupdf-wasm.js:5784:29)
    at wasm://wasm/0239b566:wasm-function[3548]:0x25ff90
    at wasm://wasm/0239b566:wasm-function[3546]:0x25c738
    at wasm://wasm/0239b566:wasm-function[3567]:0x26cc49
    at wasm://wasm/0239b566:wasm-function[2299]:0x1797b5

Node.js v21.6.2

instead if I try to add a page:

import * as mupdf from 'mupdf'

const outBuffer = new mupdf.Buffer()
const out = new mupdf.DocumentWriter(outBuffer, ".pdf", "")
const dev = out.beginPage([0,0,200,200])
/* draw something on the Device or not, it doesn't matter */
out.endPage()
out.close()

another error is thrown:

wasm://wasm/0239b566:1


RuntimeError: memory access out of bounds
    at wasm://wasm/0239b566:wasm-function[1894]:0x13ffad
    at wasm://wasm/0239b566:wasm-function[1893]:0x13ff49
    at wasm://wasm/0239b566:wasm-function[1985]:0x147ecf
    at wasm://wasm/0239b566:wasm-function[1895]:0x13fff1
    at wasm://wasm/0239b566:wasm-function[3558]:0x2669f9
    at invoke_viiii (file:///.../node_modules/mupdf/dist/mupdf-wasm.js:5784:29)
    at wasm://wasm/0239b566:wasm-function[3548]:0x25ff90
    at wasm://wasm/0239b566:wasm-function[3546]:0x25c738
    at wasm://wasm/0239b566:wasm-function[3567]:0x26cc49
    at wasm://wasm/0239b566:wasm-function[2299]:0x1797b5
    
Node.js v21.6.2

No matter what, outBuffer.getLength() is always 0 after the out.close()

When recreating everything from mupdf's muconvert.c in JS, the second error is thrown as well.

REST server should cache open documents

Repeatedly doing fetch() on the same document from a third party server without caching is going to be slower than it needs to be. We should cache the most recently used documents and reuse the same array buffer that has already been fetched.

This caching can be handled in loadDocumentFromUrl, which can resolve to the cached document if it is in the cache.

Ideally we should use the fetch HTTP response headers to check for freshness as well, but that may be overkill for an example server.

Annotations: Add API for "DS"

We can extract required information for color and alignment from a text item from the "DS" property.

image

Can we expose a get/set API for this?

Annotation: Measure API

Let's support these "Measure" dictionary objects as defined on page 746 of the PDF version 1.7 specification

"Error: invalid page number: 2"

I was trying to open one of my go-to pdfs for stress testing, but this seems to occur when trying to open any pdf in Firefox (nightly).

Stack trace:

Error: invalid page number: 2                                                    viewer.js:750:11
    2114013 https://mupdf.com/wasm/demo/lib/mupdf-wasm.js:1142
    _emscripten_asm_const_int https://mupdf.com/wasm/demo/lib/mupdf-wasm.js:3820
    invoke_vi https://mupdf.com/wasm/demo/lib/mupdf-wasm.js:6179
    createExportWrapper https://mupdf.com/wasm/demo/lib/mupdf-wasm.js:994
    loadPage https://mupdf.com/wasm/demo/lib/mupdf.js:1392
    getPageSize https://mupdf.com/wasm/demo/worker.js:78
    onmessage https://mupdf.com/wasm/demo/worker.js:39
    open_document_from_file https://mupdf.com/wasm/demo/viewer.js:750
    onchange https://mupdf.com/wasm/demo/index.html:1

Sanitise the simple-viewer example

At the moment the solution works, however it is perhaps over-engineered. Let's try to tidy it up and make it more "simple" to follow :)

OCR

Haven't found any reference of OCR in the mupdf.js docs, but see that tesseract is mupdf's optional dependency. Is there an option do OCR using mupdf.js?

Add "has" accessors to mupdf.js

It looks like PDFAnnotation in lib/mupdf.js is missing all the hasXxx accessors and corresponding functions in lib/mupdf.c - let's add them to improve the API.

Trying mupdf library directly with React

With this setup we can run a React-based app via a node server, so basically a REST API provided by the node server to the react client.

react-via-node-server

However, if we just want to run a React-based app directly upon the MuPDF library, i.e.:

react-direct

It fails to compile with:

./node_modules/mupdf/lib/mupdf-wasm.js:130:7
Module not found: Can't resolve 'fs'

Could there be some further environmental setup that the wasm library needs to understand?

REST server should catch exceptions from WASM

Each API call should really be wrapped in a try/catch to return a user friendly HTTP status code and error message if there's an exception in the mupdf library.

For example opening a file that is corrupt, or not a PDF file, or loading a page that is out of range, etc.

REST server should rate limit fetches to third party server

We do not want our example server to be able to be used as a DDOS proxy.
For each API request a user does, we issue a fetch to the third party server in the URL.
This can be used to launch a DDOS attack using our REST server's bandwidth.

Here are a few possible solutions:

  • Do not a fetch for the same file if the same URL is already being fetched.
  • Rate limit how many fetches we do to each domain.
  • Only allow fetching from a domains that is in a whitelist.

simple-viewer search always skips results from the current page

The search function of simple-viewer has some minor flaws.

When I type in the search panel and click Next for the first time, the search results on the current page are always skipped and the results on the next page are displayed directly.

It is expected that the search results on the current page should be displayed first, and when Next is clicked again, the results on the next page will be displayed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.