lonekorean / wordpress-export-to-markdown Goto Github PK

Converts a WordPress export XML file into Markdown files.

License: MIT License

JavaScript 100.00%

wordpress-export-to-markdown's Introduction

wordpress-export-to-markdown

Converts a WordPress export file into Markdown files that are compatible with static site generators (Eleventy, Gatsby, Hugo, etc.).

Each post is saved as a separate Markdown file with frontmatter. Images are downloaded and saved.

Quick Start

You'll need:

Node.js installed
Your WordPress export file (be sure to export "All content").

To make things easier, you can rename your WordPress export file to export.xml and drop it into the same directory that you run this script from.

You can run this script immediately in your terminal with npx:

npx wordpress-export-to-markdown

Or you can clone this repo, then from within the repo's directory, install and run:

npm install && node index.js

Either way, the script will start a wizard to configure your options. Answer the questions and off you go!

Command Line

Options can also be configured via the command line. The wizard will skip asking about any such options. For example, the following will give you Jekyll-style output in terms of folder structure and filenames.

Using npx:

npx wordpress-export-to-markdown --post-folders=false --prefix-date=true

Using a locally cloned repo:

node index.js --post-folders=false --prefix-date=true

The wizard will still ask you about any options not specified on the command line. To skip the wizard entirely and use default values for unspecified options, add --wizard=false.

Options

These are the questions asked by the wizard. Command line arguments, along with their default values, are also being provided here if you want to use them.

Path to WordPress export file?

Command line: --input=export.xml

The path to your WordPress export file. To make things easier, you can rename your WordPress export file to export.xml and drop it into the same directory that you run this script from.

Path to output folder?

Command line: --output=output

The path to the output directory where Markdown and image files will be saved. If it does not exist, it will be created.

Create year folders?

Command line: --year-folders=false

Whether or not to organize output files into folders by year.

Create month folders?

Command line: --month-folders=false

Whether or not to organize output files into folders by month. You'll probably want to combine this with --year-folders to organize files by year then month.

Create a folder for each post?

Command line: --post-folders=true

Whether or not to save files and images into post folders.

If true, the post slug is used for the folder name and the post's Markdown file is named index.md. Each post folder will have its own /images folder.

/first-post
    /images
        potato.png
    index.md
/second-post
    /images
        carrot.jpg
        celery.jpg
    index.md

If false, the post slug is used to name the post's Markdown file. These files will be side-by-side and images will go into a shared /images folder.

/images
    carrot.jpg
    celery.jpg
    potato.png
first-post.md
second-post.md

Either way, this can be combined with with --year-folders and --month-folders, in which case the above output will be organized under the appropriate year and month folders.

Prefix post folders/files with date?

Command line: --prefix-date=false

Whether or not to prepend the post date to the post slug when naming a post's folder or file.

If --post-folders is true, this affects the folder.

/2019-10-14-first-post
    index.md
/2019-10-23-second-post
    index.md

If --post-folders is false, this affects the file.

2019-10-14-first-post.md
2019-10-23-second-post.md

Save images attached to posts?

Command line: --save-attached-images=true

Whether or not to download and save images attached to posts. Generally speaking, these are images that were uploaded by using Add Media or Set Featured Image in WordPress. Images are saved into /images.

Save images scraped from post body content?

Command line: --save-scraped-images=true

Whether or not to download and save images scraped from <img> tags in post body content. Images are saved into /images. The <img> tags are updated to point to where the images are saved.

Include custom post types and pages?

Command line: --include-other-types=false

Some WordPress sites make use of a "page" post type and/or custom post types. Set this to true to include these post types in the output. Posts will be organized into post type folders.

Customizing Frontmatter and Other Advanced Settings

You can edit settings.js to configure advanced settings beyond the options above. This includes things like customizing frontmatter, date formatting, throttling image downloads, and more.

You'll need to run the script locally (not using npx) to edit these advanced settings.

Contributing

Please read the contribution guidelines.

wordpress-export-to-markdown's People

Contributors

Stargazers

Watchers

Forkers

lukeocodes heecheolman sparkalow controlgap superted17 divinedominion adenin-sites ilyalesik ubbdst raurir hao-hao-hao jfitzsimmons2 idreaminteractive msaladna bobwalsh davestewart masterthepixel gildotdev obedparla bbonkr richdacuban jaslloyd luizeof t04glovern recklessgentleman ahmedaisar andymantell thedamon kyle-cagb brandonmartinez ekafyi chinciusan ckissi hispanic badlydrawnben francois2metz joahn3 rodrigoplp rl-king laszlorozsas sigitp-git pharalia theaor horup kevinamiri gaelbillon stevermeister deerawan kiminozo theophoric mccamon stanlemon lebatuananh ejgal 9wick icarnaghan pubpub collectingbaggage leftiefriele 17cliu vibbits yar-s alv-bsp dmshvetsov mortenhofft jamiehunterdesign fazcue olehermanse mrwatson vosaul uzikilon leogopaldev muranava reime005 rfos tombo-gokuraku ryanzav onurh polaralex wkaven 1redone romaingiraud quangkr avantar glennsong sgeezy mfukushim magicianred fharper karlbovski psrebniak thomassuedbroecker matt-gadd xy-lin lofyer rakshitsoral jayknyn hellosapumal oddnavy petetrickey

wordpress-export-to-markdown's Issues

Error on import

Looks like a great utility. I see it reference all over.

But I'm getting an error right off the bat. Maybe u can see some obvious:

Something went wrong, execution halted early.
TypeError: Cannot read property 'readFile' of undefined
at Object.parseFilePromise (/Users/apple/Deanonsoftware/wordpress-export-to-markdown/src/parser.js:10:36)
at /Users/apple/Deanonsoftware/wordpress-export-to-markdown/index.js:15:29
at
at process._tickCallback (internal/process/next_tick.js:188:7)

Thx

Dean

Windows control characters in the code

GitHub Windows client strikes again, eh?

/usr/local/bin/wordpress-export-to-markdown --post-folders=true --prefix-date=false --input "export.xml" --output "/mnt/c/zip/tmp/output" --year-folders=false --month-folders=false --save-attached-images=true --save-scraped-images=true --include-other-types=true

/usr/bin/env: ‘node\r’: No such file or directory

cat -v /usr/local/bin/wordpress-export-to-markdown|head -10

#!/usr/bin/env node^M
^M
const compareVersions = require('compare-versions');^M
const path = require('path');^M
const process = require('process');^M
^M
const wizard = require('./src/wizard');^M
const parser = require('./src/parser');^M
const writer = require('./src/writer');^M`

Split posts by category

Is it possible to split posts by categories under folder?

self-signed certificates are not supported

I was tracing the issue with images going missing (turns out all webp do) and other things that could be scripted and my local WordPress exporting server could be reached, but this repo would baulk at the self signed certificate.

Add the title of the page as "# header" into the MD

I'm converting a site where the title is displayed as part of the WP template, so I'm loosing that info in the MD file.
If would be really useful to add to the MD as header at the first line like this:
# title of the WP post
[Rest of the content]

cleaning up titles

I was surprised to find that my Wordpress XML export is formatted so weirdly:

		<title>
			The Title here!	</title>

The actual title strings are all surrounded by tons of tabs, and newlines. It was easy enough to clean it up with Search & Replace for this script.

But I wonder if the better course of action would be to trim/strip surrounding whitespace altogether.

@superted17 and others, did you deal with the same problem or is it just my Wordpress install?

TypeError: node.childNodes[i].getAttribute is not a function

I tried running this and i'm getting the following error:

Something went wrong, execution halted early.
TypeError: node.childNodes[i].getAttribute is not a function
    at Object.replacement (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown-plugin-gfm/lib/turndown-plugin-gfm.cjs.js:61:30)
    at TurndownService.replacementForNode (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:877:10)
    at /Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:836:40
    at NodeList.reduce (<anonymous>)
    at TurndownService.process (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:829:17)
    at TurndownService.replacementForNode (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:872:25)
    at /Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:836:40
    at NodeList.reduce (<anonymous>)
    at TurndownService.process (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:829:17)
    at TurndownService.replacementForNode (/Users/achristianson/Downloads/wordpress-export-to-markdown/node_modules/turndown/lib/turndown.cjs.js:872:25)

I'm running node version 14.15.1, though I also did try using 12.21.0. I also double checked the versions of turndown and turndown-plugin-gfm. I check on those projects and don't see any issues related to these errors.

Thoughts?

Path to WordPress export file? (export.xml)

i m stuck at first step,

i just exported a xml of post of oct 2015 to dec 2015 for testing

the file downloaded was different , i rename it to export.xml and than moved it to root of the project

Any suugestions

cd wordpress-export-to-markdown && npm install && node index.js and its stuck

i m using node v14.15.3 and npm v6.14.9

emoji is not support

When the post's content has emoji, the export doestn't work well. The emoji will be replace by '?', hope add supporting of emohi

Handle drafts

Hi,

First off, thanks for the AMAZING utility!

One small thing I've noticed is that if there are drafts in the XML export and prefixdate is set as true, the generated markdown files are named something like Invalid DateTime-my-blog-post.md (because drafts don't have a date by definition).

Maybe we could look into taking another flag to include (or not) drafts, and name any draft posts draft-my-blog-post.md as a fallback for the lack of date.

What do you think?

property '0' of undefined

Parsing...
1126 posts found.

Something went wrong, execution halted early.
TypeError: Cannot read property '0' of undefined
at C:\Users\herib\AppData\Local\npm-cache_npx\a8913a54bce5e168\node_modules\wordpress-export-to-markdown\src\parser.js:146:79
at Array.filter ()
at collectAttachedImages (C:\Users\herib\AppData\Local\npm-cache_npx\a8913a54bce5e168\node_modules\wordpress-export-to-markdown\src\parser.js:146:4)
at Object.parseFilePromise (C:\Users\herib\AppData\Local\npm-cache_npx\a8913a54bce5e168\node_modules\wordpress-export-to-markdown\src\parser.js:22:18)
at async C:\Users\herib\AppData\Local\npm-cache_npx\a8913a54bce5e168\node_modules\wordpress-export-to-markdown\index.js:23:16

Turn into a npm package

Awesome work!, one suggestion would love to see this as an NPM package, if it was users could just run npx wordpress-export [args]... and npm would pull the package and run the files. They could also install it on their system for later usage.

If you do not have the bandwidth for this, I could look at doing it.

Jason

Custom fields?

Why aren't custom fields exported as frontmatter? ACF seems to be exported as serialized data instead.

SyntaxError: Unexpected token {

Hello!

First of all, thanks for the tool, it would be really great to use it for our site migration. I did export the WP content as xml and followed the README instructions (I run Ubuntu), but I get this error:

/home/succurro/packages/wordpress-export-to-markdown/node_modules/webidl-conversions/lib/index.js:357
    } catch {
            ^

SyntaxError: Unexpected token {
    at createScript (vm.js:80:10)
    at Object.runInThisContext (vm.js:139:10)
    at Module._compile (module.js:616:28)
    at Object.Module._extensions..js (module.js:663:10)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)
    at Module.require (module.js:596:17)
    at require (internal/module.js:11:18)
    at Object.<anonymous> (/home/succurro/packages/wordpress-export-to-markdown/node_modules/jsdom/lib/jsdom/browser/Window.js:3:27)

The nodejs and npm installations were successful, I only got 3 warnings which I do not know if they are relevant to this issue so I report them here:

$ npm install
npm WARN [email protected] requires a peer of canvas@^2.5.0 but none was installed.
npm WARN [email protected] requires a peer of bufferutil@^4.0.1 but none was installed.
npm WARN [email protected] requires a peer of utf-8-validate@^5.0.2 but none was installed.

How can I donate?

Thanks so much for your work on this plugin! Do you have a link where I can donate to say thanks?

Add attached image to readme

Br tags

Noticed the export doesn't do anything with some break tags, though I'm not sure exactly what. Some break tags seem to be preserved, but the following example (and many like it) is ending up as one line in a paragraph after import:

<p>Quest, Team Shadetek / Mirage, Brooklyn Anthem / Uproot (2008)<br>
The Bug  / Poison Dart (feat. Warrior Queen) / London Zoo (2008)<br>
Ghislain Poirier / No More Blood feat. Zulu / No Ground Under (2007)<br>
Tanya Stephens / Put It On You / Rebelution (2006)<br>
Lady Saw / Chat To Mi Back  / Walk Out (2007)</p>

Strangely I put that sample into turndown and it came back correctly; and I didn't see any settings in this library's use of turndown that seemed relevant.

--prefix-date instructions is wrong, need to use --prefixdate instead.

Hey,

I was using this again and noticed that the flags you pass in to change some of the default behaviours are wrong e.g if you run this from the readme:
npx wordpress-export-to-markdown --post-folders=false --prefix-date=true

It doesn't output folders with the date attached, however, if you use --prefixdate it will work. (without the middle hyphen)

With --prefix-date:
Wrote output\the-clean-coder-my-takeaways\index.md.

With --prefixDate:
Wrote output\2019-04-13-the-clean-coder-my-takeaways\index.md

This is probably the same for every flag where there --name-other, I think it is due to the way you are using minimist.

Let me know if you need any help.

p.s There are references to some sort of wizard looking at the code this doesn't exist.

Unexpected token {

I was looking for a way to Migrate from WordPress to Eleventy and found your utlility mentioned here https://edspencer.me.uk/posts/2019-10-16-migrating-from-wordpress-to-eleventy/
I saw a similar but not exact closed issue #36

I put my WordPress xml file in the utilities directory then...

$ npx wordpress-export-to-markdown
npx: installed 137 in 15.962s
Unexpected token {
$

I tried with 2 different xml files and without any xml file and got the same result!

I'm using Linux Mint 19.3 based on Ubuntu 18.04

Exporting Thumbnails

Hi Will,
That plugin is amazing, thank you so much for sharing it.
I'm trying export custom type posts from WordPress but I'm unable to export thumbnails from custom posts types, also exporting normal "posts" doesn't export the image thumbnail.
I had a local environment from wordpress running and I'm exporting the xml directly from Wordpress to each custom post types, from each individual file I can generate the .md files but non of my custom posts is getting the thumbnail.
Please if you have any suggestions let me know
Thank you again.

TypeError: postContent.matchAll(...) is not a function or its return value is not iterable

While using this utility I faced the following error

Parsing...
31 posts found.
160 attached images found.

Something went wrong, execution halted early.
TypeError: postContent.matchAll(...) is not a function or its return value is not iterable
    at getItemsOfType.forEach.post (/home/rahul/Jhooq/jhooq-wordpress-export-and-markdown/wordpress-export-to-markdown/src/parser.js:107:35)
    at Array.forEach (<anonymous>)
    at collectScrapedImages (/home/rahul/Jhooq/jhooq-wordpress-export-and-markdown/wordpress-export-to-markdown/src/parser.js:102:31)
    at Object.parseFilePromise (/home/rahul/Jhooq/jhooq-wordpress-export-and-markdown/wordpress-export-to-markdown/src/parser.js:23:18)

Raising this issue here because I am not a Node.js expert but may be someone could fix it

Cannot read property 'filter' of undefined

Hi,
node v12.16.1

node index.js

Parsing...

Something went wrong, execution halted early.
TypeError: Cannot read property 'filter' of undefined
at getItemsOfType (/home/jacobo/wordpress-export-to-markdown/src/parser.js:32:34)
at collectPosts (/home/jacobo/wordpress-export-to-markdown/src/parser.js:39:16)
at Object.parseFilePromise (/home/jacobo/wordpress-export-to-markdown/src/parser.js:16:16)
at async /home/jacobo/wordpress-export-to-markdown/index.js:15:16

cb.apply is not a function

Error:

> npx wordpress-export-to-markdown
npm ERR! cb.apply is not a function

It looks like the trace log contains all relevant information:

0 info it worked if it ends with ok
1 verbose cli [
1 verbose cli   'C:\\Users\\stop_\\scoop\\apps\\nodejs-np\\current\\node.exe',
1 verbose cli   'C:\\Users\\stop_\\AppData\\Roaming\\npm\\node_modules\\npx\\node_modules\\npm\\bin\\npm-cli.js',
1 verbose cli   'install',
1 verbose cli   'wordpress-export-to-markdown@latest',
1 verbose cli   '--global',
1 verbose cli   '--prefix',
1 verbose cli   'C:\\Users\\stop_\\AppData\\Roaming\\npm-cache\\_npx\\6924',
1 verbose cli   '--loglevel',
1 verbose cli   'error',
1 verbose cli   '--json'
1 verbose cli ]
2 info using [email protected]
3 info using [email protected]
4 verbose npm-session 1596d3eca3c78795
5 silly install loadCurrentTree
6 silly install readGlobalPackageData
7 http fetch GET 200 https://registry.npmjs.org/wordpress-export-to-markdown 704ms
8 http fetch GET 200 https://registry.npmjs.org/wordpress-export-to-markdown/-/wordpress-export-to-markdown-2.2.2.tgz 80ms
9 silly pacote tag manifest for wordpress-export-to-markdown@latest fetched in 1060ms
10 verbose stack TypeError: cb.apply is not a function
10 verbose stack     at C:\Users\stop_\AppData\Roaming\npm\node_modules\npx\node_modules\npm\node_modules\graceful-fs\polyfills.js:287:18
10 verbose stack     at FSReqCallback.oncomplete (node:fs:212:5)
11 verbose cwd C:\Users\stop_\Downloads\blog
12 verbose Windows_NT 10.0.22622
13 verbose argv "C:\\Users\\stop_\\scoop\\apps\\nodejs-np\\current\\node.exe" "C:\\Users\\stop_\\AppData\\Roaming\\npm\\node_modules\\npx\\node_modules\\npm\\bin\\npm-cli.js" "install" "wordpress-export-to-markdown@latest" "--global" "--prefix" "C:\\Users\\stop_\\AppData\\Roaming\\npm-cache\\_npx\\6924" "--loglevel" "error" "--json"
14 verbose node v18.7.0
15 verbose npm  v5.1.0
16 error cb.apply is not a function
17 verbose exit [ 1, true ]

getPostPath doesn't take custom dates in consideration

The getPostPath function doesn't take custom dates in consideration. If the format isn't ISO, the folder created will be Invalid DateTime.

I'm submitting a PR, but just want this issue for tracking prupose.

Images still link to old wordpress wp-content directory

First of all, great script, thank you for creating it!

Export went ok, but in the index.md file the image links still point to the old location:

Example:

[![](https://www.domain.com/wp-content/uploads/2019/04/image.jpg)](https://www.domain.com/wp-content/uploads/2019/04/image.jpg)

I would have expected that the link in the .md file points to the images directory which has been created in the posts folder:

[![](./images/image.jpg)](./images/image.jpg)

getting error

hello, just tried your script, but I am getting an error message.

Unable to parse file content.
Error: Invalid character in entity name
Line: 17357
Column: 8 ...

But when going there the code says: wp:meta_value<....

any idea, what whent wrong. Already installed the script again just to be up to date. Using node 11 - maybe this is the reason.

Cheers Gerhard

Attached Files to Output

i can't get it together. how do i list the attached files as a list in the output?

something like that:

attachements:

file.xx

thanks

raw output option?

Hi lonekorean!

Would it be possible to add a "raw output" option, where the script would not translate the contents of the posts to markdown, but would keep the format as in WordPress code editor?

The intent is to create a mirror of the WordPress site as a set of xml files (one per post) that can easily be managed as a git repository for archival purpose.

What say you?
Cheers :)

Something went wrong - parentNode of undefined

I am trying to run this tool, both through npx and locally, and getting the error:
Something went wrong, execution halted early. TypeError: Cannot read property 'parentNode' of undefined at isHeadingRow (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown-plugin-gfm\lib\turndown-plugin-gfm.cjs.js:100:23) at Object.filter (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown-plugin-gfm\lib\turndown-plugin-gfm.cjs.js:77:41) at filterValue (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:416:16) at findRule (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:404:9) at Rules.forNode (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:389:17) at TurndownService.replacementForNode (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:898:25) at C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:863:40 at NodeList.reduce (<anonymous>) at TurndownService.process (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:856:17) at TurndownService.turndown (C:\Users\sunfire\AppData\Roaming\npm-cache\_npx\15916\node_modules\wordpress-export-to-markdown\node_modules\turndown\lib\turndown.cjs.js:768:26)

I am on version 14+ and from what I can tell, all of my modules are up to date.

I had someone else test and they got the same result.

Would appreciate any guidance on this.

TypeError: postContent.matchAll(...) is not a function (SOLVED)

If you get this error you probably only exported "Posts". Try to export "All".

Thanks for the script!

I leave this issue open in case others have the same issue.

include-other-types is not considered on save-scraped-images

When you are using both include-other-types and save-scraped-images the images are not saved as the collectScrapedImages function uses getItemsOfType(data, 'post').forEach so only posts are processed.

How do we run this? Assuming no knowledge of Node whatsoever

Hi! I have node.js installed and have the node command prompt open. Do I need to clone this repository down first, or is NPM aware of it and I just begin by running the first step?

Thank you!

TypeError: Promise.allSettled is not a function (SOLVED)

If you get this error, please upgrade your node version.

Promise.allSettled is available in node version >= 12.9

I leave this issue open in case others have the same issue.

Maybe useful | Ways to add custom additional info (Like post_id, author_id etc)

Maybe we need some data from wordpress like user_id, or category_id and etc. The key is in xml files from wordpress.
So yo have to

edit from cpanel or direct from your server or localhost /wp-admin/includes/export.php
search <item>
add something like this below <item>...</item>

// for author id
<authorid><?php echo intval( $post->post_author ); ?></authorid>

// for author name		
<author><?php echo wxr_cdata( get_the_author_meta( 'login' ) ); ?></author>

save & export again from wordpress (don't forget to refresh page)
Check your xml files and search <authorid> for confirmation step is successful
open your wordpress-export-to-markdown directory
Edit src/parser.js

// Add
function getPostAuthor(post) {
  return post.author[0];
}

// Add
function getPostAuthorId(post) {
  return post.authorid[0];
}

----------

// Edit
// Find frontmatter:
// And Add like this
        frontmatter: {
          id: getPostId(post),
          date: getPostDate(post),
          authorid: getPostAuthorId(post),
          author: getPostAuthor(post),
          slug: getPostSlug(post),
          title: getPostTitle(post),
          categories: getCategories(post),
          tags: getTags(post),
        },

Save and run again and you will be see

id: "12345"
date: "2022-02-22"
authorid: "11"
author: "AAAAAAAA"
slug: "magic-nexea-partnership-elevates-the-growth-of-startups-to-a-total-of-rm41mil-combined-revenues"
title: "MaGIC-NEXEA partnership elevates the growth of startups to a total of RM41mil combined revenues"
categories: 
  - "AA"
  - "BB"
  - "FF"
tags: 
  - "One"
  - "Two"
coverImage: "xxxxx.jpg"
---

KUALA LUMPUR, MALAYSIA - [Media OutReach](http://www.media-outreach.com/) - 11 February 2022 - NEXEA's Entrepreneurs Programme and Malaysian Global Innovation and Creativity Centre (MaGIC) renewed their partnership to continue helping startups grow their businesses during this challenging time. Despite the pandemic, the Entrepreneurs Programme continues to provide a platform for tech entrepreneurs to connect with peers and like-minded individuals in an opportunity to grow, gain knowledge, and develop themselves for success.

Additional, coverImage: "xxxxx.jpg", you will need install this plugins https://wordpress.org/plugins/export-media-with-selected-content/
Open wordpress tools->export
Checklist in bottom section and export again

One more error...

Sorry for the trouble. Thx for last quick update. I upgraded node.

Now I'm getting a new error:

Parsing...

Something went wrong, execution halted early.
Error: Inappropriately located doctype declaration
Line: 3876
Column: 14
Char: E
at error (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/sax/lib/sax.js:651:10)
at strictFail (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/sax/lib/sax.js:677:7)
at SAXParser.write (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/sax/lib/sax.js:1104:15)
at Parser.exports.Parser.Parser.parseString (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/xml2js/lib/parser.js:325:31)
at /Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/xml2js/lib/parser.js:5:59
at internal/util.js:278:30
at new Promise ()
at internal/util.js:277:12
at Parser.exports.Parser.Parser.parseStringPromise (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/xml2js/lib/parser.js:338:41)
at Parser.parseStringPromise (/Users/kapsoft/Deanonsoftware/wordpress-export-to-markdown/node_modules/xml2js/lib/parser.js:5:59)

DJK

postContent.matchAll(...) is not a function or its return value is not iterable

I get the error postContent.matchAll(...) is not a function or its return value is not iterable. Looking at the XML, there's only two posts and neither have images. (Most of the XML is pages rather than posts.)

const matches = [...postContent.matchAll(/<img[^>]*src="(.+?\.(?:gif|jpe?g|png))"[^>]*>/gi)];

Not getting any output

I am really excited to use this tool, however, I've been having an issue getting an export- my export.xml file is correctly places, and the wizard runs when I run the command node index.xml, however no output directory (or content) is created at the end-- any suggestions or help? I've looked at the export.xml file, and there is content, but it doesn't seem to be getting parsed. Help?

Image download delay

In writer.js, function writeImageFilesPromise(posts, config), line 81, the delay increments by 100ms. For large WP sites on slow/overburdened servers, this can result in what amounts to a DoS attack, with exceptions (RequestError: Error: read ECONNRESET) proliferating as the server becomes overloaded.

After modifying the delay to increment by 1000ms instead of 100ms, I successfully downloaded each file, although of course it was slow. Luckily most users won't need to run this software frequently.

Suggest allowing user to specify image download delay or at least raise the default delay to something higher, perhaps 500ms.

Otherwise software worked a charm.

installation: "npm install && node index.js" seem to be an invalid command in Windows

Windows, cloned the repository then followed the readme:

PS C:\MyProjects\GitHub\wordpress-export-to-markdown> npm install && node index.js
At line:1 char:13
+ npm install && node index.js
+             ~~
The token '&&' is not a valid statement separator in this version.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : InvalidEndOfLine

Tables are not exporting correctly

Images are not downloading

Running this, I am not getting the images in the /images directory. The images directory isn't being created either. This is my command.

node /usr/local/lib/node_modules/wordpress-export-to-markdown/index.js --prefixdate=true --input=../wordpress.xml --output=_posts/

I am also not getting any of the other frontmatter that could be retrieved from a WordPress export. Do you plan on a way to select what metadata we could add as frontmatter?

Does this work on pages?

Hi, does this script work for pages and only posts?

I've run a few export.xml files thru it--all content and just pages-- and it does not seem to convert theme

Pages with the same name are overwriten

I have a multi-language web. Most of the pages have the same name.

When I do the conversion to markdown, I get only one of the same-named pages.

I would like to to have all of them.

<item>
		<title><![CDATA[5G]]></title>
		<link>https://xafir.cat/tecnologia/5g/</link>
		<pubDate>Fri, 16 Jul 2021 09:57:55 +0000</pubDate>
		<guid isPermaLink="false">https://xafir.cat/?page_id=2558</guid>
		<description></description>

<item>
		<title><![CDATA[5G]]></title>
		<link>https://xafir.cat/en/technology/5g/</link>
		<pubDate>Thu, 29 Jul 2021 10:35:15 +0000</pubDate>
		<guid isPermaLink="false">https://xafir.cat/?page_id=3094</guid>
		<description></description>

As you can see, the name is the same. The link is unique.

If the names were tecnologia/5g/index.md and en/technology/5g/index.md there would be no overwriting.

Error: Unable to download image

Host: GoDaddy
WP Version: 5.3.2
WordFence Version: 7.4.2

On my site, it appears that after running the script using the command below, all the posts are downloaded and few images before this error turns up.

node index.js --input export.xml --yearmonthfolders true --addcontentimages true

Output (IP Redacted):

Unable to download image.
Error: connect ETIMEDOUT <IP>:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1056:14) {
  errno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect',
  address: '<IP>',
  port: 443
}
Unable to download image.
Error: connect ETIMEDOUT <IP>:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1056:14) {
  errno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect',
  address: '<IP>',
  port: 443
}

Could you add support for downloading video files?

Could you add support for downloading video files?
the video files are include in 'wp:video' tag, something like:

<!-- wp:video {"id":id} -->
<figure class="wp-block-video"><video controls src="https://domain/filename.mp4"></video></figure>
<!-- /wp:video -->]]>

No export excerpt

Please tell me the function to add export excerpts. Or add a function like this. Thanks.

Categories and tags

Hey there,

Great script! I particularly like how it grabs all the images.

Will there be any support for tags?

I suspect it would not be too difficult to add if not.

Cheers,
Dave

Handling custom-post?

Sorry for my naive question:

Does the script handle custom-posts somehow?

It doesn't look like it, I might be wrong or have skipped an argument.
All my WP Knowledge Base is custom posts made.
My XML export embeds these custom posts.

Here's the Script prompts/logs I have:

Starting wizard...
? Path to WordPress export file? /Users/manuel/Downloads/wordpress-export-to-markdown-ma
ster/export.xml
? Path to output folder? /Users/manuel/Documents/WP-KB-MD
? Create year folders? No
? Create month folders? No
? Create a folder for each post? No
? Prefix post folders/files with date? No
? Save images attached to posts? No
? Save images scraped from post body content? No

Parsing...
0 posts found.

Saving posts...
Done, got them all!

… no output as you can see.

webp is not supported

My webp files were being left on the server and links remained to these.

Something went wrong, execution halted early.

Any suggestion?

nicola@XUBUNTU:~/Desktop/wordpress-export-to-markdown-master$ nodejs index.js

Starting wizard...
? Path to WordPress export file? export.xml
? Path to output folder? output
? Create year folders? No
? Create month folders? No
? Create a folder for each post? Yes
? Prefix post folders/files with date? No
? Save images attached to posts? Yes
? Save images scraped from post body content? Yes

Parsing...
42 posts found.
704 attached images found.

Something went wrong, execution halted early.
TypeError: postContent.matchAll(...) is not a function or its return value is not iterable
    at getItemsOfType.forEach.post (/home/nicola/Desktop/wordpress-export-to-markdown-master/src/parser.js:107:35)
    at Array.forEach (<anonymous>)
    at collectScrapedImages (/home/nicola/Desktop/wordpress-export-to-markdown-master/src/parser.js:102:31)
    at Object.parseFilePromise (/home/nicola/Desktop/wordpress-export-to-markdown-master/src/parser.js:23:18)

Thank you :)