Code Monkey home page Code Monkey logo

unpdf's Issues

Missing `pdfjs-dist` types

Environment

  • unpdf v0.10.1
  • node v18.19.0

Reproduction

The types should be exported here so we can use them.

import * as PDFJS from './types/src/pdf'
declare function resolvePDFJS(): Promise<typeof PDFJS>
export { resolvePDFJS }

Hence, the currently generated declaration file looks like this:

import * as PDFJS from './types/src/pdf'
declare function resolvePDFJS(): Promise<typeof PDFJS>
export { resolvePDFJS } // no types are included

unpdf/pdfjs type not exported error

Describe the bug

The types are not exported together with unpdf/pdfjs. This prevents typing variables / function params when composing with the library.

Additional context

No response

Logs

No response

Unpdf can't render pages with images

Environment

Node.js v20.9.0
PNPM v8.10.0
UnPDF v0.10.0

Reproduction

Example Code: CodeSandBox

Describe the bug

When I am ready to render a page with unpdf, renderPageAsImage will report an error if there is an image embedded in the pdf.

Additional context

No response

Logs

TypeError: r.createCanvas is not a function
    at NodeCanvasFactory._createCanvas (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1316062)
    at NodeCanvasFactory.create (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1153693)
    at CachedCanvases.getCanvas (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1163909)
    at CanvasGraphics.paintInlineImageXObject (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1198408)
    at CanvasGraphics.paintImageXObject (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1197232)
    at CanvasGraphics.executeOperatorList (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1172983)
    at InternalRenderTask._next (file:///workspaces/workspace/node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1152632)

PDF Generator

Describe the feature

It would be nice addon a PDF Generator (something like adding pdf-lib )

Additional information

  • Would you be willing to help implement this feature?

Cannot find module '../build/Release/canvas.node'

Environment

System:
OS: macOS 13.3.1
CPU: (12) arm64 Apple M2 Pro
Memory: 89.58 MB / 16.00 GB
Shell: 5.9 - /bin/zsh

Reproduction

Using the exact code snippet from the README for renderPageAsImage

Describe the bug

node:internal/modules/cjs/loader:1075
  const err = new Error(message);
              ^

Error: Cannot find module '../build/Release/canvas.node'
Require stack:
- /Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/lib/bindings.js
- /Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/lib/canvas.js
- /Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/index.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:1075:15)
    at a._resolveFilename (/Users/powella/Library/pnpm/global/5/.pnpm/[email protected]/node_modules/tsx/dist/cjs/index.cjs:1:1729)
    at Module._load (node:internal/modules/cjs/loader:920:27)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (/Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/lib/bindings.js:3:18)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Object.S (/Users/powella/Library/pnpm/global/5/.pnpm/[email protected]/node_modules/tsx/dist/cjs/index.cjs:1:1292)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/lib/bindings.js',
    '/Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/lib/canvas.js',
    '/Users/code/node_modules/.pnpm/[email protected]/node_modules/canvas/index.js'
  ]
}

Node.js v18.15.0

Additional context

No response

Logs

No response

Does not work in BGSW using Plasmo framework

Environment

Framework: Plasmo 0.84.0
Client side Chrome Browser Extension

Reproduction

Can be reproduced by creating a BGSW in the Plasmo framework and importing unpdf. Error message is:

๐Ÿ”ด ERROR | Build failed. To debug, run plasmo dev --verbose.
๐Ÿ”ด ERROR | Failed to resolve 'unpdf/pdfjs' from './node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/index.mjs'

Describe the bug

Bug is as aforementioned: unpdf seems to be looking for a pdfjs dependency that is inaccessible.

Additional context

No response

Logs

No response

Struggling to get it to work in a Supabase Edge Function

Environment

Using via esm: https://esm.sh/[email protected]/
Deno version: 1.38.4 (I'm guessing this because it's not easy to see the supabase edge function environment, but they say they use the latest stable version.

Reproduction

Sorry, it's a bit hard to reproduce since it's only failing in the deployed Supabase edge function. When I run it locally in my docker container, it works fine. Here's the relevant code though of my edge function:

import { configureUnPDF, getResolvedPDFJS } from 'https://esm.sh/[email protected]';
import * as pdfjs from 'https://esm.sh/[email protected]/dist/pdfjs.mjs';

configureUnPDF({
  // deno-lint-ignore require-await
  pdfjs: async () => pdfjs,
});
const resolvedPdfJs = await getResolvedPDFJS();
const { getDocument } = resolvedPdfJs;

export async function convertPdfToText(
  arrayBuffer: ArrayBuffer
): Promise<string> {
  try {
    const data = new Uint8Array(arrayBuffer);

    // Get the document
    const doc = await getDocument(data).promise;
    let allText = '';

    // Iterate through each page of the document
    for (let i = 1; i <= doc.numPages; i++) {
      const page = await doc.getPage(i);
      const textContent = await page.getTextContent();

      // Combine the text items with a space (adjust as needed)
      const pageText = textContent.items
        .map((item) => {
          if ('str' in item) {
            return item.str;
          }
          return '';
        })
        .join(' ');
      allText += pageText + '\n'; // Add a newline after each page's text
    }

    return allText;
  } catch (error) {
    console.error('Error converting PDF to text', error);
    throw error;
  }
}

Describe the bug

In the supabase edge functions log, it consistently throws this error:

event loop error: Error: PDF.js is not available. Please add the package as a dependency.
    at f (https://esm.sh/v135/[email protected]/deno/unpdf.mjs:2:574)
    at async h (https://esm.sh/v135/[email protected]/deno/unpdf.mjs:2:230)
    at async file:///home/runner/work/tl.ai/tl.ai/supabase/functions/process/index.ts:12:23

Originally, I followed the base setup instructions. Then, I tried to use getResolvedPDFJS. Finally, I tried to first configureUnPDF and pointing pdfjs specifically to the one exported from your package. However, all still failed in the production environment.

I'm mainly wondering if I'm not following the instructions correctly for configuring pdfjs. Thanks in advance for your help!

Additional context

No response

Logs

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.