Code Monkey home page Code Monkey logo

node-pdfreader's Introduction

Node-PDFReader

A PDF reader for Node. Based on PDF.JS.

WARNING

This is super experimental. It's more a proof of concept. Some terrible things:

  • no test coverage
  • hacked up code
  • sync file operations to first store font files on disk and later read them again (yeah, it's really that awful)
  • no windows support (due to lack of freetype support in node-canvas)

Overview

Right now you can:

  • Render single or all pages to PNG files
  • Get the text content of single pages

Installation

You need to have node and build tools installed.

If you haven't installed the cairo library or installed it but without with freetype support, you can install it by running this script (make sure to change into a directory, where you can store some temporary files created during the build process of the libraries):

$ cd <download-folder or somewhere you can put some temporary build-files>
$ bash <(curl -fsSk https://raw.github.com/jviereck/node-canvas/font/install)

Once that is done, install the dependencies:

$ npm install

This will fetch node-canvas and build it.

Usage

See the example directory. You can run the example from the root directory using

$ node example/simple.js

This loads the trace-monkey PDF, extracts the text of the first page and dumps it to the console, renders the first page using a white background and all the other pages without a background. The resuling PNG files are stored in the example/ directory.

The code of the simple.js file looks like this:

var PDFReader = require('../index').PDFReader;

function errorDumper(err) {
  if (err) {
    console.log('something went wrong :/');
    throw err;
  }
}

var pdf = new PDFReader(__dirname + '/trace.pdf');
pdf.on('error', errorDumper);
pdf.on('ready', function(pdf) {
  // Render a single page.
  pdf.render(1 /* First page */, {
    bg: true,  /* Enable white background */
    output: __dirname + '/page-single.png'
  }, errorDumper);

  // Render all pages.
  pdf.renderAll({
    output: function(pageNum) {
      return __dirname + '/page' + pageNum + '.png';
    }
  }, errorDumper);

  // Get the text content of single pages (similar to pdf2txt).
  pdf.getContent(1 /* First page */, function(err, content) {
    console.log(content);
  }, errorDumper);
});

FAQ

I get the error "Need to compile node-canvas/cairo with font support."

You need to have a version of cairo with freetype2 font support. Best is to first compile and install the freetype2 library and then compile cairo. At the end of running ./configure when building cairo, you should see the freetype listed "yes" as one of the font backends.

Is there windows support?

Not for rendering. I just haven't tested the special node-canvas build on windows, so I've disabled windows support.

Can you please implement X?

No. I don't want to invest too much time in this project. It's a proof of concept for me. However, I'm happy to help others to implement missing features and accept PR :)

Is that necessary to compile cairo/freetype/node-canvas just to extract text?

No. This was just the easiest way for me to get something out the door.

node-pdfreader's People

Contributors

jviereck avatar dariusk avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.