Code Monkey home page Code Monkey logo

epub-gen's Introduction

epub-gen - a library to make EPUBs from HTML

Join the chat at https://gitter.im/cyrilis/epub-gen

Generate EPUB books from HTML with simple API in Node.js.


This epub library will generate temp html and download images in your DOMs, then generate the epub book you want.

It's very fast, except the time to download images from the web.

Usage

Install the lib and add it as a dependency (recommended), run on your project dir:

npm install epub-gen --save

Then put this in your code:

    const Epub = require("epub-gen");

    new Epub(option [, output]).promise.then(
        () => console.log("Ebook Generated Successfully!"),
	err => console.error("Failed to generate Ebook because of ", err)
    );

Options

  • title: Title of the book

  • author: Name of the author for the book, string or array, eg. "Alice" or ["Alice", "Bob"]

  • publisher: Publisher name (optional)

  • cover: Book cover image (optional), File path (absolute path) or web url, eg. "http://abc.com/book-cover.jpg" or "/User/Alice/images/book-cover.jpg"

  • output Out put path (absolute path), you can also path output as the second argument when use new , eg: new Epub(options, output)

  • version: You can specify the version of the generated EPUB, 3 the latest version (http://idpf.org/epub/30) or 2 the previous version (http://idpf.org/epub/201, for better compatibility with older readers). If not specified, will fallback to 3.

  • css: If you really hate our css, you can pass css string to replace our default style. eg: "body{background: #000}"

  • fonts: Array of (absolute) paths to custom fonts to include on the book so they can be used on custom css. Ex: if you configure the array to fonts: ['/path/to/Merriweather.ttf'] you can use the following on the custom CSS:

    @font-face {
        font-family: "Merriweather";
        font-style: normal;
        font-weight: normal;
        src : url("./fonts/Merriweather.ttf");
    }
    
  • lang: Language of the book in 2 letters code (optional). If not specified, will fallback to en.

  • tocTitle: Title of the table of contents. If not specified, will fallback to Table Of Contents.

  • appendChapterTitles: Automatically append the chapter title at the beginning of each contents. You can disable that by specifying false.

  • customOpfTemplatePath: Optional. For advanced customizations: absolute path to an OPF template.

  • customNcxTocTemplatePath: Optional. For advanced customizations: absolute path to a NCX toc template.

  • customHtmlTocTemplatePath: Optional. For advanced customizations: absolute path to a HTML toc template.

  • content: Book Chapters content. It's should be an array of objects. eg. [{title: "Chapter 1",data: "<div>..."}, {data: ""},...]

    Within each chapter object:

    • title: optional, Chapter title
    • author: optional, if each book author is different, you can fill it.
    • data: required, HTML String of the chapter content. image paths should be absolute path (should start with "http" or "https"), so that they could be downloaded. With the upgrade is possible to use local images (for this the path must start with file: //)
    • excludeFromToc: optional, if is not shown on Table of content, default: false;
    • beforeToc: optional, if is shown before Table of content, such like copyright pages. default: false;
    • filename: optional, specify filename for each chapter, default: undefined;
  • verbose: specify whether or not to console.log progress messages, default: false.

Output

If you don't want pass the output pass the output path as the second argument, you should specify output path as option.output.


Demo Code:

    const Epub = require("epub-gen");

    const option = {
        title: "Alice's Adventures in Wonderland", // *Required, title of the book.
        author: "Lewis Carroll", // *Required, name of the author.
        publisher: "Macmillan & Co.", // optional
        cover: "http://demo.com/url-to-cover-image.jpg", // Url or File path, both ok.
        content: [
            {
                title: "About the author", // Optional
                author: "John Doe", // Optional
                data: "<h2>Charles Lutwidge Dodgson</h2>"
                +"<div lang=\"en\">Better known by the pen name Lewis Carroll...</div>" // pass html string
            },
            {
                title: "Down the Rabbit Hole",
                data: "<p>Alice was beginning to get very tired...</p>"
            },
            {
                ...
            }
            ...
        ]
    };

    new Epub(option, "/path/to/book/file/path.epub");

Demo Preview:

Demo Preview

From Lewis Carroll "Alice's Adventures in Wonderland", based on text at https://www.cs.cmu.edu/~rgs/alice-table.html and images from http://www.alice-in-wonderland.net/resources/pictures/alices-adventures-in-wonderland.

License

(The MIT License)

Copyright (c) 2015 Cyril Hou <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

epub-gen's People

Contributors

actuallymentor avatar athird avatar cyrilis avatar dehlen avatar dependabot[bot] avatar e-adrien avatar emagnier avatar gitter-badger avatar grawlinson avatar jamesporter avatar jpittner avatar jrpelegrina avatar lvscar avatar maschad96 avatar pedrosanta avatar spookmango avatar wangcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epub-gen's Issues

Images with query strings are given the wrong extension

When epub-gen is provided an img with a src with a query string, then the image's extension is incorrectly changed to .bin.

For example:

<img src="http://example.com/image.jpg?w=400">

I believe the bit of the code to blame is here, where the mime package returns:

mime.lookup("http://example.com/image.jpg")       // => "image/jpeg"
mime.lookup("http://example.com/image.jpg?w=400") // => "application/octet-stream"

mime.extension("image/jpeg")                      // => "jpeg"
mime.extension("application/octet-stream")        // => "bin"

We can try to strip the query string at our end, but sometimes query strings are used to contain options for how the image is rendered (a common example being Gravatar).

It would be best if epub-gen stripped the query string when trying to determine the mediaType and extension.

QUESTION: Move on from CoffeeScript?

Hi all,

So something has been crossing my mind over and over... at the time the work on this lib began (2015) the JS ecosystem was quite different from nowadays, and within that context I understand the benefits that CoffeeScript brought, namely, class syntax, arrow functions of sorts, and many others.

But given the latest advances of JS, namely with ES6, Promises and all, and I ponder if it wasn't beneficial from a maintenance standpoint to consider to migrate to ES6 or TypeScript?

I focus on the maintenance standpoint for two main reasons:

  • By using ES6, we would make the development easier, simpler and remove the compilation/build phase.
  • By using TypeScript, we would maintain the compilation/build phase, but would gain type checking.

Personally, I find the usage of CoffeeScript at this point to be more cumbersome (and make the code harder to read) than not, and between the two proposals above I'm still leaning to ES6 at this point, basically for the sake of simplicity. (We can always move on to TypeScript if we strongly feel that need later on.)

Thoughts? @cyrilis?

zip command not running in quiet mode

Whenever an EPUB is generated, the zip command spits out a lot of information to the console.

For example:


  adding: mimetype (stored 0%)

  adding: META-INF/ (stored 0%)
  adding: META-INF/container.xml (deflated 30%)
  adding: OEBPS/ (stored 0%)
  adding: OEBPS/<each chapter>.html (deflated 62%)


When generating multiple EPUB files, it would be beneficial if this output was suppressed.

Error: ENOTDIR, not a directory

Hello,
I'm creating a desktop app that is built on Electron with AngularJS. Before building as a production app, this module works as intended, amazing work. But after packaging it into a release build, it gives the following error: Error: ENOTDIR, not a directory.

This is probably some incompatibility with angular or something and I'd like some help fixing this bug.
The download path provided is:

let option = {
      title: novel.info.name,
      author: novel.info.author,
      cover: novel.info.cover, 
      content: chapters
};
new epubGen(option, "A:\Downloads\My new book.epub");

The issue seems to originate at EPub.generateTempFile (C:\resources\app.asar\node_modules\epub-gen\lib\index.js:234):

generateTempFile() {
      var base, generateDefer, htmlTocPath, ncxTocPath, opfPath, self;
      generateDefer = new Q.defer();
      self = this;
      if (!fs.existsSync(this.options.tempDir)) {
        fs.mkdirSync(this.options.tempDir);
      }
      fs.mkdirSync(this.uuid);        <=========================== ERROR HERE
      fs.mkdirSync(path.resolve(this.uuid, "./OEBPS"));
      ...

I was wondering what uuid is for and if it could lead to fixing the bug.
Kind Regards,
dr-nyt.

Metadata pollution

I'm currently using this to rapidly generate ePub files and I've noticed that the metadata somehow becomes polluted.

If I open a generated ePub in Calibre (or any ePub reader/editor), and look at the metadata, the IDs field shows something like this:

urn:urn:uuid:<PROJECT DIRECTORY>/node_modules/epub-gen/tempDir/246c4f08-e94f-4377-a197-a3fcca958615

Having the folder structure of the project exposed isn't desirable, it would be nice if it showed only the UUID.

Content Showing as Part of Content List

Hello
I generated an ePub and everything went well, but the only issue is the persistent "Content" or "Table of Content" always showing as part of the contents and this kept disrupting our structure

screen shot 2018-02-27 at 03 07 57
How can i remove that
Also is there way not to have Table of content page showing at all

Generation fails when deployed to Heroku

Hey, thank you for this neat library. While it works perfectly locally, I seem to have trouble generating epub file when my application is deployed to Heroku.

It seems to always generate the following error code:

Generating Template Files.....
Downloading Images...
Making Cover...
Generating Epub Files...
Zipping temp dir to archive/test.epub
events.js:183
	throw er; // Unhandled 'error' event
	^

Here's the little snippet that I used:

app.post("/epub", (req, res) => {
  let epubURL = req.body.url;
  let epubTitle = req.body.title;
  let epubAuthor = req.body.author;
  let epubSummary = req.body.summary;
  let epubContent = req.body.content;

  const fileName = `archive/${epubTitle}.epub`;
  const option = {
    title: epubTitle,
    author: epubAuthor, 
    content: [{ data: epubSummary }, { data: epubContent }]
  };

  new Epub(option, fileName).promise
    .then(() => {
      console.log("[#] Success => Id: ", epubURL, "\n");
      const file = __dirname + `/${fileName}`;
      res.download(file);
    })
    .catch(err => {
      console.log(err);
    });
});

Any help would be appreciated.

FEATURE: Exclude content from TOC

Hi,

It would be nice to see the features that Nodepub has which allows content to be:
a) excluded from the TOC, and
b) also placed before the TOC
(see: https://www.npmjs.com/package/nodepub#addsection--title-content-excludefromcontents-isfrontmatter- )

This could be by flags added to the content config - e.g:

content: [
            {
                title: "Down the Rabbit Hole",
                data: "<p>Alice was beginning to get very tired...</p>"
                excludeFromContents: false,
                isFrontMatter: false
            },

Thx.

Use contentType for images

Right now, the img mimetype is being looked up using the url. It would be best to look at the contentType header if it set.

Options such as allowing input of baseURL would be great too.

mimetype is empty

Mimetype file is always empty for the generated epub. Excatly this means the file is there in the epub archive but the file is 0 bytes in size and does not have the application/epub+zip string in it.

ENAMETOOLONG for chapters with long title names

epub-gen fails to render an ePub with an arbitrarily long chapter title. This is because the chapter titles gets turned into a slug, and this slug exceeds the system's maximum for path length.

A simple fix for this would be to truncate titles to a shorter length. If this is done, it would be nice to feed back to the consumer what these slugs are (ie as the resolution of the Promise). This is especially useful when - for example - trying to construct your own custom table of contents.

Return a stream

Is it possible for the module to return a stream, in addition to a promise?

Generate from Markdown

A common practices for tech people and also very easy for non tech, can help who wants to build an e-book.

verbose: false outputs some logs

It seems that the verbose flag is not working for some logs or that some logs should depend on this flag. The output below is always showing:

(node:56119) [DEP0066] DeprecationWarning: OutgoingMessage.prototype._headers is deprecated
[Success] cover image downloaded successfully!
Zipping temp dir to /Users/jlijo/Documents/opensource/obooks/books/sql-pocket-guide,-4th-edition/sql-pocket-guide,-4th-edition.epub

FEATURE: Add cover to first page (cover doesn't show on eReaders)

I noticed that while apps like iBooks respect the cover in the meta, many eReaders (kobo) show the first page of the epub as the title page.

What this results in is the table of contents showing as the book thumbnail.

Is this a known issue?

I'm happy to do a PR for this functionality.

“Namespace prefix epub for type on a is not defined”

Hello,

I am trying to use the EPUB3 provision for footnote popups by adding epub:type="noteref" to my footnote declarations.

In ibooks this results in the error in the title (screenshot attached).

I can see in your lib that you declare xmlns:epub=”http://www.idpf.org/2007/ops” which should make iBooks respect the epub namespace. I've inspected the epub itself and found the markup to indeed include xmlns:epub=http://www.idpf.org/2007/ops.

I did notice that http://www.idpf.org/2007/ops gives a 404. If this is an issue the link in your code should be update to whatever is the current live link.

I am using version 0.0.17 with { version: 3 }.

If you have any idea how to debug this it would be most welcome.

Chapter discontinued after first image

I am trying to make a comic epub out of scanned pages. A chapter consists of many images acting as one page. I have tried to display them like so:

<img src="file:///path/to.file1" /><img src="file:///path/to.file2" /><img src=...

or:

<img src="file:///path/to.file1" /><br />
<img src="file:///path/to.file2" /><br />
<img src=...

But somehow only the code until and including the first image makes it into the epub file.

FEATURE: Add optional filename

Hi,

How about the ability to specify a filename for each content item, e.g.:

content: [
            {
                title: "Down the Rabbit Hole",
                data: "<p>Alice was beginning to get very tired...</p>",
                filename: "chapter2"
            }

Why? Because its very hard to construct links in the markup that point to other content if you do not know the relevant filename.

(I know, I can work it out from counting the place in the content array and concatenating the title, but that doesn't seem very robust)

Thx.

QUESTION: More robust tests?

Hello,

So, another topic for discussion.

For some time now, I have felt the need to have more robust tests, a more robust test coverage and test workflow for this lib, because I think this will also help this lib to become more robust, reliable.

I don't have a strong opinion on the test stack – so I'm open to comments –, perhaps we could start setting up some groundwork using Mocha, and take it from there, or so.

Thoughts? @cyrilis?

PS: Perhaps we could bundle this improvement with #49 and #56, and, with a more robust lib, start discussing future 0.1/0.x/1.0(?) release roadmaps? Does that makes sense?

Images and fonts do not appear on Apple Books

Hello,

ePubs generated by this script work wonderfully on iBooks for Mac OS and other e-readers, but...

...the same ePub viewed through Apple's new Books app for iOS does not display the images, instead a placeholder is shown:

IMG_2042

The above is from the Alice in Wonderland ePub generated by epub-gen’s own test.js script. I've found the same issue with my own ePubs. The images display fine in iBooks on Mac OS, and other e-readers, but on Apple's Books app for iOS - only a placeholder is shown.

Also, any custom fonts you specify do not work. Again, they work perfectly elsewhere.

I've ran the script to generate an ePub version 2 and 3 - and both have the same issue. My iPhone is fully updated.

Any ideas?

FEATURE: Javascript support

If I understand correctly EPUB3 has js support in the spec. Do you plan to allow custom js the same way you have css?

Should be relatively straightforward.

I'm happy to do a pull request if you don't have the time.

EPUB 2 needs XHTML files

Hi @cyrilis,

I've been using epub-gen to generate EPUBs for a project and I ran into some issues. I wanted to support EPUB 2 (to support a broader range of eBook readers) but for that standard it needs valid XHTML files. The issues I was getting was that, custom fonts weren't rendered in some eBook readers (such iBooks) when I used HTML. Changing that o XHTML, worked flawlessly.

What do you think about updating the lib to use XHTML?

Cheers.

Cheerio version uses deprecated packages

The cheerio version you are using throws some Warns about how CSSselect and CSSwhat are deprecated and to use css-select and css-what. This is an issue in the cheerio version in the package.json file. You should update the version of cheerio being used.

QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data?

Hello,

Following up on @jamesporter #38, I had this question for a while: should it really be responsibility of the lib to be concerned with the data the lib users add on each content, namely on cleaning up/encoding content.data?

In the past there was an effort to make the lib produce valid version 2 and 3 EPUBs, but I feel that concern should only be present in the parts directly to do with the lib, and should be the responsibility of the lib user to provide valid EPUB 2/3 XHTML for their added contents.

This way the issue @jamesporter raised on #38 would not occur too.

So, I would vote to remove this verification/cleanup/encoding from the lib all together. I can still admit the warning about unsupported tags, but even so, I think it should warn but not alter the content user added to its EPUB.

@cyrilis @jamesporter thoughts?

Make a nested table of contents

Hey, I was wondering if it is possible to make a table of contents with multiple levels. I want to make something like:

  1. Table of contents
  2. Chapter 1
    2.1 Some Subchapter <---
  3. Chapter 2
  4. Chapter 3
    4.1 Subchapter <---
    4.2 Another Subchapter <---

Could not find any information on this in the docs so it would be appreciated if someone could point me to the right direction.

Thanks

What is ZIP and how can I install

It's possible that this is a stupid question, but I'm getting this error:

zip -q -X -0 book.epub.zip mimetype "zip" not recognized internal or external command

How can I install ZIP for node.js?

Thanks!

Could not download images

I find that the epub book works fine in ibooks osx.
But if I transfer this book to ipad by Airdrop and open it with ibooks App, the images all gone.

I try unzip the epub, there are nothing image files.

Books without images give warning on EPUB check/validator

So, bumped into this issue, when you have a book with no images, it throws a warning complaining about the images directory being empty, as such:

screen shot 2016-08-02 at 15 45 59

We need to make this directory creation dependent of having images or not - perhaps move the create dir code to the download images method or so.

Anyway, putting this in as a note.

CLI tool

This is a nice lib, and most options cam be easily defined in a command line.

epub-gen could be a nice CLI tool for epub users.

Does not parse image urls in css like `background: url('img/url.png');`

I am trying to make a comic epub out of scanned pages. To make sure the image covers the whole page i tested the following code as chapter data:

<div style="
    position: absolute;
    top: 0;
    left: 0;
    right: 0;
    bottom: 0;
    background-repeat: no-repeat;
    background-position: center;
    background-size: contain;
    background-image: url(file://inserting/my/file.here);
">
</div>

This however does only work since the file is fetched from the same location by the eBook reader, a quick look into the code reveals that the URL has not been changed and the file was not added.

Images not being included, url linked only

According to the documentation any local image with file:// appended should be loaded into the epub. Likewise remote http and https links should be included.

The module does not do this but leaves the original img src="" intact. This means that images are either 1) loaded through the internet or 2) not portable.

Is this expected behaviour (in which case the readme should be modified) or is this a bug (in which case the images need to be imported and added to the manifest)?

Not working with Grunt

Hi Guys,

Thanks for producing this package.

I'm trying to run this under Grunt, as part of an automated publication platform, but getting no output generated. Specifically:

  • it installs ok
  • it creates node-modules/epub-gen/tempDir/{uuid} ok
  • it creates files in that temp dir ok, and they all look as expected
  • it creates (opens) an output epub file
  • but there it finishes - no errors - no content in the epub (0 bytes)
  • last console output is 'Zipping temp dir to . . . .' - as expected - but no 'Done zipping' message

If I step into the node-modules/epub-gen folder and run npm test, I get all the console messages (right through to 'Done') and good output generated.

I have logged cwd, option.output and the archive and output (fs) objects all the way through good (npm) and bad (grunt) runs, and there is no difference. :(

I have tried it on Windows and Ubuntu with exactly the same results.

Have to say, I'm stumped.

Any ideas?

Custom ids gets converted to something another

I'm trying to add custom hyperlinks by ids, but my hyperlinks to ids don't work. I converted the epub file to html and I see that all ids are converted incrementally to calibre_link0, calibre_link1 how do I prevent this and keep my own ids?

I want to do this because, I want to create a table of contents page and set links to specific parts of the book.

Showing different pub on different reader.

CSS was a bit off and content seems to be missing from one of the reader. Showing different pub. version on different reader. Any way to make this consistent on all readers.

Missing Hard timeout & Retry mechanism when downloading images.

Thanks to author, I made a daemon thanks to the epub-gen, but it could hangs forever frequently when downloading images with pool network connection condition by await (new Epub(option, path).promise).

With quick browsing to related code, epub-gen seems missing hard timeout when piping streams, could you please solving it author?

Process Hangs if cover is not a string

Summary

If you pass something that is not a string as a cover option, the epub generation process will hang on the "Making Cover..." step.

Steps to Reproduce

  • Pass an empty object {} as options.cover.

Expected Behaviour

Process returns an error saying that options.cover must be a string.

Actual Behaviour

Process hangs.

Proposed Solution

Check the type of the cover option.

if (!(cover instanceof String) && (typeof cover !== 'string')) {
   // handle error
}

Epub to html

Is it possible to convert generated epub file into html?
If so, Please tell me the way to convert it back into the html.

Math expression

Hi,

Thanks for this awesome library.

Is there a way to include math rendering like MathJax or KaTeX inside the generated epub file?

Thanks ;)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.