Code Monkey home page Code Monkey logo

mammoth-wordpress-plugin's Introduction

=== Mammoth .docx converter ===
Contributors: michaelwilliamson
Donate link: https://liberapay.com/mwilliamson/donate
Tags: docx, html, word, office, paste
Requires at least: 4.0
Tested up to: 6.4.3
Stable tag: 1.21.0
License: BSD 2-clause
License URI: http://opensource.org/licenses/BSD-2-Clause

Mammoth converts semantically marked up .docx documents to simple and clean HTML, allowing pasting from Word documents and Google Docs without the usual mess.

== Description ==

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, Google Docs and LibreOffice, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style `Heading1` to `h1` elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. This allows you to paste from Word documents without the usual mess.

There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.

The following features are currently supported:

* Headings.

* Lists.

* Tables. The formatting of the table itself, such as borders, is currently ignored, but the formatting of the text is treated the same as in the rest of the document.

* Footnotes and endnotes.

* Images.

* Bold, italics, superscript and subscript.

* Links.

* Text boxes. The contents of the text box are treated as a separate paragraph that appears after the paragraph containing the text box.

= Embedded style maps =

By default, Mammoth maps some common .docx styles to HTML elements. For instance, a paragraph with the style name `Heading 1` is converted to a `h1` element. If you have a document with your own custom styles, you can use an embedded style map to tell Mammoth how those styles should be mapped. For instance, you could convert paragraphs with the style named `WarningHeading` to `h1` elements with `class="warning"` with the style mapping:

    p[style-name='WarningHeading'] => h1.warning:fresh

[An online tool](http://mike.zwobble.org/projects/mammoth/embed-style-map/) can be used to embed style maps into an existing document. Details of [how to write style maps can be found on the mammoth.js documentation](https://github.com/mwilliamson/mammoth.js#writing-style-maps).

A style map to be used for all documents can be set by configuring Mammoth (see below).

= Configuration =

Mammoth can be configured by writing a separate plugin. For instance, [this example plugin](https://github.com/mwilliamson/mammoth-wordpress-plugin/tree/master/examples/options-plugin) adds a custom style map, and uses a document transform to detect paragraphs of monospace text and converts them to paragraphs with the style "Code Block".

As a WordPress plugin, Mammoth uses the JavaScript library mammoth.js to convert documents. Mammoth will use the JavaScript global `MAMMOTH_OPTIONS` whenever calling mammoth.js, which allows for some customisation. `MAMMOTH_OPTIONS` should be defined as a function that returns an options object. This options object will then be passed in as the `options` argument to `convertToHtml`. The [mammoth.js docs](https://github.com/mwilliamson/mammoth.js) describe the various options available.

The global `MAMMOTH_OPTIONS` will be called with `mammoth` as the first argument. This can be useful if you need to use a function from mammoth.js, such as `mammoth.transforms.getDescendantsOfType`.

= FAQs =

[Answers to some frequently asked questions about Mammoth](https://mike.zwobble.org/projects/mammoth/faqs/).

== Installation ==

Install the plugin in the usual way, and you should be able to use the Mammoth .docx converter when adding a post. If you can't see the meta box, make sure that it's selected by taking a look at the "Screen Options" for adding a post.

== Changelog ==

= 1.21.0 =

* Update mammoth.js to 1.7.0. This includes support for documents in the strict format.

= 1.20.0 =

* Update mammoth.js to 1.4.21. This includes improved underline support and image handling.

= 1.19.0 =

* Update mammoth.js to 1.4.18. This includes better support for internal hyperlinks.

= 1.18.0 =

* Update mammoth.js to 1.4.17. This includes better support for numbering, and conversion of symbols to their corresponding Unicode characters.

= 1.17.0 =

* Update mammoth.js to 1.4.13. This includes support for soft hyphens and improved underline support.

= 1.16.0 =

* Improve support for detecting when the Gutenberg editor is active. This should fix compatibility with some other plugins such as Yoast SEO when the Gutenberg editor is disabled.

= 1.15.0 =

* Update mammoth.js to 1.4.9.

= 1.14.0 =

* Improve support when X-Frame-Options is set to "deny".

= 1.13.0 =

* Update mammoth.js to 1.4.8.

= 1.12.0 =

* Add basic Gutenberg support.

* Update mammoth.js to 1.4.7.

= 1.11.0 =

* Fix IE11 support.

= 1.10.0 =

* Add workaround for a bug in tinyMCE in WordPress 4.9.6.

= 1.9.0 =

* Update mammoth.js to 1.4.6. This includes preservation of whitespace in pre elements, and paragraphs in endnotes, footnotes and comments.

= 1.8.0 =

* Update mammoth.js to 1.4.4. This includes better support for reading documents created by Word Online.

= 1.7.0 =

* Update mammoth.js to 1.4.2. This includes improved handling of grouped objects and non-breaking hyphens.

= 1.6.0 =

* Allow MAMMOTH_OPTIONS to override idPrefix.

* Update mammoth.js to 1.4.0. This includes improved handling of hyperlinks, and converts table headers into thead elements.

= 1.5.0 =

* Handle unsuccessful image uploads where the HTTP request succeeds, but WordPress rejects the file. Fixes an issue where documents with EMF images couldn't be imported.

* Update mammoth.js to 1.3.2. This includes a fix for documents where images are referenced by a URI relative to the base URI.

= 1.4.0 =

* Update mammoth.js to 1.3.1. This includes new ways to map styles, such as style name prefixes.

* Improve styling of preview to match the editor.

* Fix a bug where images wouldn't upload on certain server configurations.

* Allow options to be passed to mammoth.js through a MAMMOTH_OPTIONS global variable.

= 1.3.0 =

* Update mammoth.js to 1.2.5. This includes better support for image alt text and boolean run properties (bold, italic, underline and strikethrough).

= 1.2.0 =

* Include wp-image-* class when inserting images. This allows the WordPress editor to correctly identify the image and show appropriate editing options.

* If an image has an alt text description in the original document, set the alt text in the media library when uploading that image.

* If an image has an alt text description in the original document, use it to generate the filename.

* Set image filename extension based on the image content type.

* Show a message while the document is being inserted.

= 1.1.0 =

* Update mammoth.js to 1.1.0. This includes support for merged table cells and content controls, such as bibliographies. This should also improve performance when converting larger documents.

= 1.0.0 =

* Update mammoth.js to 0.3.33. This includes better support for reading documents that use undefined styles, and generates simpler HTML in some cases.

= 0.1.25 =

* Update mammoth.js to 0.3.30. This includes better support for lists made in LibreOffice.

* Fix JavaScript error on admin pages without editors.

= 0.1.24 =

* Update mammoth.js to 0.3.29. This improves support for mc:AlternateContent elements.

= 0.1.23 =

* Update mammoth.js to 0.3.28. This improves support for reading images.

= 0.1.22 =

* Update mammoth.js to 0.3.28-pre.1. Fixes newlines being inserted around inline elements when the editor is in text mode.

= 0.1.21 =

* Update mammoth.js to 0.3.27. Fixes recursive collapsing of HTML elements.

= 0.1.20 =

* Update mammoth.js to 0.3.26. Improves the collapsing of HTML elements, such as allowing collapsing elements generated by different runs.

= 0.1.19 =

* Update mammoth.js to 0.3.25-pre.1. Includes experimental support for embedded style maps.

= 0.1.18 =

* Update mammoth.js to 0.3.23. Includes support for links and images in footnotes and endnotes.

= 0.1.17 =

* Update mammoth.js to 0.3.22. Includes support for strikethrough.

= 0.1.16 =

* Update mammoth.js to 0.3.21. Includes basic support for text boxes.

= 0.1.15 =

* Update mammoth.js to 0.3.18. Includes support for hyperlinks to bookmarks in the same document.

* Add support for CKEditor.

= 0.1.14 =

* Support any post type that supports the WordPress editor.

* Generate consistent footnote and endnote IDs based on the post ID.

* Update mammoth.js to 0.3.15.

= 0.1.13 =

* Update mammoth.js to 0.3.14. Includes support for endnotes.

= 0.1.12 =

* Fix preview rendering on Chrome.

* Update mammoth.js to 0.3.12.

= 0.1.11 =

* Update mammoth.js to 0.3.11. Includes support for superscript and subscript text.

= 0.1.10 =

* Update mammoth.js to 0.3.8. Includes support for line breaks.

= 0.1.9 =

* Remove old script reference.

= 0.1.8 =

* Update to mammoth.js 0.3.5. Includes support for tables.

= 0.1.7 =

* Update to mammoth.js 0.3.2. Includes support for footnotes.

= 0.1.6 =

* Update to mammoth.js 0.2.2

* Pretty print HTML output

* Hide inline image data in raw HTML preview

= 0.1.5 =

* Fix versions

= 0.1.4 =

* Fix readme.txt

= 0.1.3 =

* Update to the latest version of mammoth.js (0.2.1)

= 0.1 =

* Initial release

== Donations ==

If you'd like to say thanks, feel free to [make a donation through Ko-fi](https://ko-fi.com/S6S01MG20).

If you use Mammoth as part of your business, please consider supporting the ongoing maintenance of Mammoth by [making a weekly donation through Liberapay](https://liberapay.com/mwilliamson/donate).

mammoth-wordpress-plugin's People

Contributors

mwilliamson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mammoth-wordpress-plugin's Issues

Problem with custom style map change heading line 1 to h2

Hi there!
First, amazing work. I have installed the plugin on my website. But I encountered a problem when using options-plugin.
I want change heading1 to h2 by using custom style map. But it didn't work.
my mammoth-options.js is below

function MAMMOTH_OPTIONS(mammoth) { var styleMap = [ "p[style-name^='Heading'] => h2:fresh", "p[style-name='Heading 1'] => h2:fresh", "p[style-name^='1'] => h2", "p[style-name='标题1'] => h2:fresh", "p[style-name='标题2'] => h3:fresh", ]; console.log('styleMap load!-7'); }

And the docx file is attached.
demo-en.docx

Can you figure out what problem is?

Cannot update to v1.4

I just tried to do the standard wordpress update for this great plugin but it failed due to "probable incompatible permissions" - this didn't happen with the previous version or with other plugins that I updated at the same time - unfortunately I don't seem to be able to get any further information on exactly which permissions may have been wrong and as far as I know they are all standard. Any idea why this update may have caused this to happen?

Numbered lists are not properly converted

Example .docx content with numbered list:

1) item 1
some text
some text
2) item 2
3) item 3
some text
4) item 4

Converted into:

1. item 1
some text
some text
1. item 2
1. item 3
some text
1. item 4

WordPress 4.9.2
PHP 7.0.25

Links are splitted if special charaters

Hello,

When .doc file has special character in link, ie: comunicación, then the parser split it in several parts. The link seems to be well and actually works, but code is quite obfuscated.

In this example result code will be similar to this:
<a href="https://es.wikipedia.org/wiki/Comunicación">comunicaci</a><a href="https://es.wikipedia.org/wiki/Comunicación">ó</a><a href="https://es.wikipedia.org/wiki/Comunicación">n</a>

Is not crucial but I think is important that code is as clean as possible.

thanks in advance

Space characters on new lines breaking <pre> blocks

The HTML conversion inserts 2 space characters as a tab on each new line, which is great for legibility, but in code blocks it's breaking Python code when you copy it out. Would it be possibly to (perhaps optionally) switch this off when it's in code, so additional spaces aren't displayed in the rendered <pre>?

Sorry.

Also: hi!

Custom post types support

Hi there,

I hope this email finds you well. In regards to your question, I wanted to confirm whether or not the plugin we offer supports custom post types with your awesome plugin.

If you could kindly provide me with more details, I would be happy to look into this for you and get back to you as soon as possible.

Thank you for your time and I look forward to hearing from you soon.

Best regards,

Mihály

Formatting Issue WordPress 5.6

I noticed a issue with formatting when importing word documents in WordPress 5.6.

When i initially go to import word documents the preview that i am presented with looks correct. When I preview the post as a whole after mammoth coverts the word document to HTML I notice that none of the underlying HTML that I saw in the HTML view is present out side of

  • and associated tags.

    There is a work around for this that I found but I still wanted to bring what i presume is a bug to your attention.

    After importing the word document I was presented with the option Covert to Blocks in the wordpress visual editor. After converting the imported HTML to Blocks the formatting seemed to be retained form the word document that was originally lost if I did not do that.

  • Where do I place the custom mappings?

    Hello Michael,

    Thanks for responding to my email. I am using the WordPress plugin to import license docs to a WordPress site. Because stying is critical to these items I need to use some custom mapping. I get the following error using default settings:
    Heading EULA.docx

    Warning: Unrecognised paragraph style: 'Heading EULA' (Style ID: HeadingEULA)
    Warning: Unrecognised paragraph style: 'Heading Software Title' (Style ID: HeadingSoftwareTitle)
    Warning: Unrecognised run style: 'Preamble' (Style ID: Preamble)
    Warning: Unrecognised paragraph style: 'Compliance Border Above' (Style ID: ComplianceBorderAbove)
    Warning: Unrecognised paragraph style: 'List Style 4' (Style ID: ListStyle4)
    Warning: Unrecognised paragraph style: 'List Style 5' (Style ID: ListStyle5)
    Warning: Unrecognised paragraph style: 'List Style 2' (Style ID: ListStyle2)
    Warning: Unrecognised paragraph style: 'List Style 3' (Style ID: ListStyle3)

    I would like to add the following custom mappings to the plugin so anyone using the converter gets the correct styles (with help from CSS).

    p[style-name='Heading EULA'] => h1.headingeula:fresh
    p[style-name='Heading Software Title'] => h1.headingsoftwaretitle:fresh
    p[style-name='Preamble'] =>
    p[style-name='Compliance Border Above'] =>
    p[style-name='List Style 4'] => ul.liststyle4 > li
    p[style-name='List Style 5'] => ul.liststyle5 > li
    p[style-name='List Style 2’] => ol.liststyle2 > li
    p[style-name='List Style 3’] => ol.liststyle3 > li

    I must be missing something but I see how to style the mappings but not where to place them in the plugin.

    • Pierre

    Style applied / interpreted on each character instead of word

    Hello,
    I have the strange phenomenon that every now and then, for a word, the styling is interpreted internally per letter.
    Even if I reset the styling to "default / standard" and reformat the word, this error remains.
    Only when I completely rewrite the word and reassign the style, it then works again.
    Of course, you can't be held responsible for the bugs in Word, so just a suggestion on how to solve this problem anyway.
    So my idea is:
    If letters with identical, consecutive styles appear, interpret these letters as one word with only one style.

    Would this perhaps be possible and could it work or is there another work-around? Thanks a lot!

    Empty links being pulled through by importer

    Not a bug so much as the importer working perfectly and Word being odd, but we're seeing some empty links (no href element) being pulled through. In the Word doc they take the form , and when they're imported the name element is converted to an ID element with the post number added to the value as a prefix.

    Would it be feasible to add an option to not import links that don't have an href value? We've struggled to identify why these links are being placed in Word in the first place, and it doesn't seem to be straightforward to delete them from a document once they're there.

    Problems with default image filename (long import time)

    Hi
    As I've faced a lot of problems with image upload time, which is getting longer and longer together with the number of images in Media Library, I made an investigation. As I have found out, based on plugin code Wordpress is creating file word-image.png. But that is only for the first time. The second image filename is word-image-1.png. Problem appears when we have a lot of images called word-image-{$incremented_number}, because plugin by default wants to create image word-image.png, but let's say, that this name already exists in Library, so wordpress is incrementing filename, and check if word-image-1.png exists, if no, then ok. If yes, WP check next one, and the next one. In case we have images called from word-image.png to word-image-1000.png, a thousand of ajax request is being carried out to find the first free filename with above pattern. My solution for this problem is very simple. I wrote simple function to generate 9 digit filename, and the problem has gone.

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.