Code Monkey home page Code Monkey logo

go-epub's Introduction

CI Coverage Status Go Report Card License GoDoc


โš ๏ธ This project is unmaintained. Please use go-shiori/go-epub or another fork.

Features

  • Documented API
  • Creates valid EPUB 3.0 files
  • Adds an additional EPUB 2.0 table of contents (as seen here) for maximum compatibility
  • Includes support for adding CSS, images, and fonts

For an example of actual usage, see https://github.com/bmaupin/go-docs-epub

Contributions

Contributions are welcome; please see CONTRIBUTING.md for more information.

Development

Clone this repository using Git. Run tests as documented below.

Dependencies are managed using Go modules

Testing

EPUBCheck

EPUBCheck is a tool that will check an EPUB for validation errors.

If EPUBCheck is installed locally, it will be run alongside the Go tests. To install EPUBCheck:

  1. Make sure you have Java installed on your system

  2. Get the latest version of EPUBCheck from https://github.com/w3c/epubcheck/releases

  3. Download and extract EPUBCheck in the root directory of this project, e.g.

    wget https://github.com/IDPF/epubcheck/releases/download/v4.2.5/epubcheck-4.2.5.zip
    unzip epubcheck-4.2.5.zip
    

If you do not wish to install EPUBCheck locally, you can manually validate the EPUB:

  1. Set doCleanup = false in epub_test.go

  2. Run the tests (see below)

  3. Upload the generated My EPUB.epub file to http://validator.idpf.org/

Run tests

go test

go-epub's People

Contributors

1l0 avatar amandacameron avatar bhasfe avatar bmaupin avatar fmartingr avatar gonejack avatar hmelder avatar lucasew avatar missdeer avatar monirzadeh avatar moyamejiasr avatar nitxy avatar owulveryck avatar pgundlach avatar propan avatar stuartmscott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-epub's Issues

Option to call addImage for media with absolute URLs automatically

I think I'm asking for a feature?

My conclusion is based on behaviour, not on debugging: go-epub will leave the src attribute for images and other media untouched, right? It will not go and call addImage() for them.

Maybe it should? It would be really convenient. I'm assuming for the huge majority of epub use cases you'd want to keep images embedded in the epub.

Thanks!

Transferring this project

Hi everyone,

As I mentioned in #28, I don't actually use this project any more. I've tried to maintain it over the last couple of years but at this point it doesn't make much sense for me to keep working on it with the limited time that I have.

I'd like to handle the transition as smoothly as possible so that it doesn't break for anyone. I've created a new go-epub organisation as my first thought is to transfer this project there. My understanding is that GitHub will automatically create a Git redirect so that it won't break for anyone using this library with the old Git URL.

It's due for a new major release anyway due to the new Go version requirements as part of #66, so I think this would be a good time to make the change.

@owulveryck You're the only other maintainer at the moment so I don't want to just dump this on you without consulting you first. Some options I see:

  • Transfer the project to the new go-epub organisation as-is, leaving its future in your hands. I can stick around as a maintainer long enough to help transfer the project, update the README, etc.
  • Transfer the project and add some more maintainers. @fmartingr has expressed some interest.
  • Archive the project and allow it to evolve naturally. I guess whoever creates a fork and maintains it will become the de-facto maintainer, and if a particular fork stands out I can update the readme and point to it

What do you think? It's been 2 years since I created #28 so there's no urgency to make a decision right away, but I wanted to start the conversation.

Better error handling

Hi!

This library is using extensively the panic function. I guess that, in most case, we should raise an error instead of panicing.

For example, I am using this library in a tool that fetches article on the web, apply a reading mode and create an epub.
Depending on the HTML structure, some stuffs are failing (bad src tags for images for examples, cf #45 ).
We would benefit from a proper error handling. The drawback is that it would change the API.

for example

func (e *Epub) SetCover(internalImagePath string, internalCSSPath string) {
...
}

could return an error if it cannot set the cover:

func (e *Epub) SetCover(internalImagePath string, internalCSSPath string) error {
...
}

I can take the action to post a PR for discussion if you want.

Resolve go.uuid breaking API changes

go.uuid has recently introduced breaking changes in their API:

$ go get github.com/bmaupin/go-epub
# github.com/bmaupin/go-epub
../../../go/src/github.com/bmaupin/go-epub/epub.go:134:44: multiple-value uuid.NewV4() in single-value context

Possible solutions:

  • Lock dependencies to a specific tag/commit of go.uuid (e.g. 0633591)
    • I believe this will require checking in the vendor folder so that it will still work for users who are using go get instead of golang/dep
  • Update to the new go.uuid API (missdeer/go-epub@ed07f4e)
  • Switch to another UUID library (e.g. google/uuid)
    • google/uuid might be a more official UUID library (breaking changes handled better), but doesn't yet have a stable API

Add EPUBCheck to CI/CD tests

Right now Travis only runs the Go tests via go test which is great but may not catch if the resulting EPUB is invalid.

The tests are already configured to run EPUBCheck if it's present. Can we configure the CI/CD to run EPUBCheck? This would make it much easier for contributors to submit PRs and see right away if there are EPUB validation issues.

AddSection title cannot be empty

As per #21, an empty title is invalid.

However, that's currently used as the mechanism to add a section to an EPUB without adding it to the table of contents:

The title is optional; if no title is provided, the section will not be added to the table of contents.

https://pkg.go.dev/github.com/bmaupin/go-epub?utm_source=godoc#Epub.AddSection

Two problems need to be solved:

  1. What should the appropriate behaviour be when an empty title is added?

    This seems to be related to the philosophy of how this library should handle errors... Should we

    1. Throw an error if an empty title is added?
    2. Generate a title if an empty title is added?
    3. Keep the current behaviour but make a note in the documentation that an empty title is invalid?
  2. How should we handle the ability to add a section to the EPUB without adding it to the table of contents? Some possibilities:

    • Keep the current behaviour (conflicts with #17)
    • Remove this functionality
    • Add this functionality in a different way

Fix timestamps in tests

One of the tests failed, likely because it was running in Windows (which is a bit slower) and the got timestamp in the <package> didn't match the timestamp in Expected:

=== RUN   TestEpubWrite
    epub_test.go:137: Package file contents don't match
        Got: <?xml version="1.0" encoding="UTF-8"?>
        <package xmlns="http://www.idpf.org/2007/opf" unique-identifier="pub-id" version="3.0">
          <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:identifier id="pub-id">urn:uuid:fb92f2ca-2778-4577-8016-8fb5bf177da8</dc:identifier>
            <dc:title>My title</dc:title>
            <dc:language>en</dc:language>
            <meta property="dcterms:modified">2021-06-15T17:34:48Z</meta>
          </metadata>
          <manifest>
            <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"></item>
            <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"></item>
          </manifest>
          <spine toc="ncx"></spine>
        </package>
        
        Expected: <?xml version="1.0" encoding="UTF-8"?>
        <package xmlns="http://www.idpf.org/2007/opf" unique-identifier="pub-id" version="3.0">
          <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:identifier id="pub-id">urn:uuid:fb92f2ca-2778-4577-8016-8fb5bf177da8</dc:identifier>
            <dc:title>My title</dc:title>
            <dc:language>en</dc:language>
            <meta property="dcterms:modified">2021-06-15T17:34:49Z</meta>
          </metadata>
          <manifest>
            <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"></item>
            <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"></item>
          </manifest>
          <spine toc="ncx"></spine>
        </package>

https://github.com/bmaupin/go-epub/pull/38/checks?check_run_id=2832041267

Always clean up tempDir when doCleanup = false

Right now if doCleanup = false in epub_test.go, the files in /tmp aren't deleted. It seems like it would be desirable to always clean those up and instead leave just the test epub in the project directory.

Incorrect manifest ids created

I add an image with AddImage(filename, destpath), but then the image name starts with an digit, I get an error from epubcheck.

For example when I add 07-savepages.png, the resulting line in package.opf in the manifest section is:

<item id="07-savepages.png" href="images/07-savepages.png" media-type="image/png"></item>

The 07-savepages.png is not a valid XML id.

Looking for maintainers

@andypillip @gonejack @pgundlach @ystyle @AmandaCameron @1l0 @propan @missdeer @onelio @lucasew

First of all, my apologies for the spam. Secondly, I'm not abandoning this project :)

I created this project because I wanted to learn Go and at the time there weren't any decent EPUB libraries for it so it seemed like something fun to do. I had big plans to use Go, but in spite of all the amazing things about Go I didn't find it very fun to use, and moved on to other languages.

I don't actually use this library any more and I didn't expect it to get so popular, but it's gotten over 100 stars and I thought it might be good to actually have the people using it be more involved, e.g.:

  • Helping review and merge pull requests
    • Mostly I've been making sure that the API stays as stable as possible (or the version number is bumped as needed) and that new changes are covered by tests as best as reasonably possible
  • Making decisions about the direction of the project
    • I've tried to make sure that at the very least this library generates valid EPUBs. Beyond that, is it okay if it allows users to generate EPUBs that aren't valid? Or should every effort possible be made to make sure that doesn't happen? (for example, see #24)
    • At some point this project should probably hit v1, so it would be nice to maybe figure some of this out beforehand

I'm not completely sure how this works, but I think there are a few options:

  • I can add collaborators directly to this repository
  • I can create an organization just for this repository and transfer it there
  • I can transfer this repository to an existing organization
    (I did reach out to gofrs a while back about this last option but never heard anything: gofrs/help-requests#40)

At any rate, anyone who is interested feel free to follow up here. I am going to try to make an effort in the meantime to go through the issue/PR backlog and see if I can make some progress there ๐Ÿคž

Thanks!

request: Use HTTP Content-Type header value

Not sure if that's acceptible in the epub spec, but I'm working on a program to convert some various web sources into .epubs for downlod, and one of the sources I'm using has a .bmp on it which is causing the system to fail.

Another thought might be, when a url is passed to AddImage it could use the returned Content-Type instead of trying to divine it from the file name, at least for images.

Remove vendor directory

Edit: #5 and then #6 should be resolved first as both will affect this.

The vendor directory was added to the repository to work around issues with go.uuid (see #2). This shouldn't cause any problems, but once golang/dep is officially absorbed into the go toolchain (expected around Go v1.10), the vendor directory should no longer be necessary.

Test fails with EPUBCheck 4.2.5

$ go test
Validating using EPUB version 3.2 rules.
ERROR(RSC-005): My EPUB.epub/EPUB/xhtml/section0002.xhtml(5,12): Error while parsing file: Element "title" must not be empty.

Check finished with errors
Messages: 0 fatals / 1 error / 0 warnings / 0 infos

EPUBCheck completed

--- FAIL: TestEpubValidity (3.64s)
    epub_test.go:652: EPUB validation failed
FAIL
exit status 1
FAIL	github.com/bmaupin/go-epub	4.042s

Add Go 1.17 to tests

Technically we always test with the latest version (1.x), so we'll need to add 1.16.

No thumbnail on MacOS

image

No thumbnail for the original one.

By dragging origin.epub into Apple Books then drag it out, the export.epub, it got thumbnail.

epubs.zip

Invalid reference returned by AddImage on Windows

The last line should use path instead of filepath.

Because filepath returning ..\images\.... on Windows.

Which is invalid for html img src.

https://github.com/bmaupin/go-epub/blob/master/epub.go#L439-L443

func addMedia(source string, internalFilename string, mediaFileFormat string, mediaFolderName string, mediaMap map[string]string) (string, error) {
	err := validateFileSource(source)
	if err != nil {
		return "", &FileRetrievalError{
			Source: source,
			Err:    err,
		}
	}

	if internalFilename == "" {
		// If a filename isn't provided, use the filename from the source
		internalFilename = filepath.Base(source)
		// If that's already used, try to generate a unique filename
		if _, ok := mediaMap[internalFilename]; ok {
			internalFilename = fmt.Sprintf(
				mediaFileFormat,
				len(mediaMap)+1,
				strings.ToLower(filepath.Ext(source)),
			)
		}
	}

	if _, ok := mediaMap[internalFilename]; ok {
		return "", &FilenameAlreadyUsedError{Filename: internalFilename}
	}

	mediaMap[internalFilename] = source

	return filepath.Join(
		"..",
		mediaFolderName,
		internalFilename,
	), nil
}

Two mimetype file created when running on Windows machine

On the writeEpub function of write.go, it seems that calling filepath.Join(rootEpubDir, mimetypeFilename) on a Windows machine would generate a path joined by a backslash \ instead of usual forward slash /.

Calling fs.WalkDir(filesystem, rootEpubDir, addFileToZip) later would result in comparing forward slash path of mimetype with the backslash version by filepath.Join(rootEpubDir, mimetypeFilename) which in the end resulting with another mimetype with deflate method.

Changing path == filepath.Join(rootEpubDir, mimetypeFilename) to filepath.Base(path) == mimetypeFilename would be a quick fix but there might be a better way to do this.

Working without filesystem

I am trying to compile my tool into webassembly.

The problem is that this library is using temporary files on the fs, and this is forbidden by webassembly.

I could be nice to add an option to deal with the media in a pseudo memory fs.

This would lead to a new write method that could take an io.Writer as a parameter (and eventually deprecating the WriteFile method).

I can make a POC when I have time if you are interested (and you can assign this issue to me)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.