vb64 / markdown-pdf Goto Github PK

View Code? Open in Web Editor NEW

29.0 3.0 2.0 504 KB

Markdown to pdf renderer

License: MIT License

Makefile 19.87% Python 80.13%

markdown markdown-it pdf pymupdf

markdown-pdf's Introduction

Module markdown-pdf

The free, open source Python module markdown-pdf will create a PDF file from your content in markdown format.

When creating a PDF file you can:

Use UTF-8 encoded text in markdown in any language
Embed images used in markdown
Break text into pages in the desired order
Create a TableOfContents (bookmarks) from markdown headings
Tune the necessary elements using your CSS code
Use different page sizes within single pdf
Create tables in markdown

The module utilizes the functions of two great libraries.

markdown-it-py to convert markdown to html.
PyMuPDF to convert html to pdf.

Installation

pip install markdown-pdf

Usage

Create a pdf with TOC (bookmarks) from headings up to level 2.

from markdown_pdf import MarkdownPdf

pdf = MarkdownPdf(toc_level=2)

Add the first section to the pdf. The title is not included in the table of contents.

from markdown_pdf import Section

pdf.add_section(Section("# Title\n", toc=False))

Add a second section. In the pdf file it starts on a new page. The title is centered using CSS, included in the table of contents of the pdf file, and an image from the file img/python.png is embedded on the page.

pdf.add_section(
  Section("# Head1\n\n![python](img/python.png)\n\nbody\n"),
  user_css="h1 {text-align:center;}"
)

Add a third section. Two headings of different levels from this section are included in the TOC of the pdf file. The section has landscape orientation of A4 pages.

pdf.add_section(Section("## Head2\n\n### Head3\n\n", paper_size="A4-L"))

Add a fourth section with a table.

text = """# Section with Table

|TableHeader1|TableHeader2|
|--|--|
|Text1|Text2|
|ListCell|<ul><li>FirstBullet</li><li>SecondBullet</li></ul>|
"""

pdf.add_section(Section(text))

Set the properties of the pdf document.

pdf.meta["title"] = "User Guide"
pdf.meta["author"] = "Vitaly Bogomolov"

Save to file.

pdf.save("guide.pdf")

Settings and options

The Section class defines a portion of markdown data, which is processed according to the same rules. The next Section data starts on a new page.

The Section class can set the following attributes.

toc: whether to include the headers <h1> - <h6> of this section in the TOC. Default is True.
root: the name of the root directory from which the image file paths starts in markdown. Default ".".
paper_size: name of paper size, as described here. Default "A4".
borders: size of borders. Default (36, 36, -36, -36).

The following document properties are available for assignment (dictionary MarkdownPdf.meta) with the default values indicated.

creationDate: current date
modDate: current date
creator: "PyMuPDF library: https://pypi.org/project/PyMuPDF"
producer: ""
title: ""
author: ""
subject: ""
keywords: ""

Example

As an example, you can download the pdf file created from this md file. This Python script was used to create the PDF file.

markdown-pdf's People

Contributors

Stargazers

Watchers

Forkers

ngaurav ms-jahan

markdown-pdf's Issues

External URL images

https://stackoverflow.com/questions/25753730/pandoc-markdown-to-pdf?rq=3

TypeError: '>' not supported between instances of 'str' and 'int'

Version: markdown_pdf 1.2 (pip install markdown-pdf, with Python 3.12.3 on AMD64)

Error while trying to save a PDF:

Traceback (most recent call last):
  File "app.py", line 64, in <module>
    pdf.save("result.pdf")
  File "/home/user/app/.venv/lib/python3.12/site-packages/markdown_pdf/__init__.py", line 82, in save
    if self.toc_level > 0:
       ^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'str' and 'int'

Steps to reproduce:

app.py:

from markdown_pdf import MarkdownPdf

[OTHER CODE]

pdf = MarkdownPdf(result)
pdf.meta["title"] = "Title"
pdf.meta["author"] = "Author"
pdf.save("result.pdf")

$ python app.py

Active hyperlinks

hyperlinks look correct but are not active.

Problems with python3.12

Hi! I recently upgraded to python3.12 and tried to install markdown-pdf the usual way with pip (I'm on MacOS by the way).

Unfortunately the installation is stuck for quite some time on this step:

user@device ~ % pip3 install markdown-pdf
Collecting markdown-pdf
  Using cached markdown_pdf-1.1-py3-none-any.whl.metadata (3.3 kB)
Collecting PyMuPDF==1.23.3 (from markdown-pdf)
  Using cached PyMuPDF-1.23.3.tar.gz (60.5 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... \

after 5min or so the installation fails with a huge error log that exceeds my console. The last few files look like this:

                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/homebrew/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
          whl_basename = backend.build_wheel(metadata_directory, config_settings)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/pipcl.py", line 580, in build_wheel
          items = self._call_fn_build(config_settings)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/pipcl.py", line 732, in _call_fn_build
          ret = self.fn_build()
                ^^^^^^^^^^^^^^^
        File "/private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/setup.py", line 692, in build
          mupdf_build_dir = build_mupdf_unix( mupdf_local, env_extra, build_type)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/setup.py", line 928, in build_mupdf_unix
          subprocess.run( command, shell=True, check=True)
        File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command 'cd /private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/mupdf-1.23.2-source && XCFLAGS=-DTOFU_CJK_EXT /opt/homebrew/Cellar/[email protected]/3.12.3/bin/python3.12 ./scripts/mupdfwrap.py -d build/PyMuPDF-arm64-shared-tesseract-release -b all && echo /private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/mupdf-1.23.2-source/build/PyMuPDF-arm64-shared-tesseract-release: && ls -l /private/var/folders/3k/4k2883bs10n32h6f0ybvcmt80000gp/T/pip-install-jrpjs54o/pymupdf_b8db2ebb1db84de899638ee47b15d7ee/mupdf-1.23.2-source/build/PyMuPDF-arm64-shared-tesseract-release' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

The problem does not occur when using other pip packages. I'm happy to provide more logs if needed :)

Adding a header with only a space causes a segmentation fault

Thanks for this very helpful package!

I just noticed a very strange bug occuring due to pymupdf.

The following sample code causes a segmentation fault:

from markdown_pdf import MarkdownPdf, Section
pdf = MarkdownPdf(toc_level=2)
pdf.add_section(Section("# "))

Result with gdb:

Starting program: ./venv/bin/python test.py

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Downloading separate debug info for ./venv/lib/python3.11/site-packages/fitz/_extra.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for ./venv/lib/python3.11/site-packages/fitz/libmupdf.so.24.1                                                                                 
Downloading separate debug info for ./venv/lib/python3.11/site-packages/fitz/libmupdfcpp.so.24.1                                                                              
Downloading separate debug info for ./venv/lib/python3.11/site-packages/fitz/_mupdf.so                                                                                        
                                                                                                                                                                                                                        
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56e5ad2 in ?? () from ./venv/lib/python3.11/site-packages/fitz/libmupdf.so.24.1

The bug appears to be in the fitz package but I'm unsure where to file it.

Cheers

A syntax highlighter for code blocks

https://stackoverflow.com/questions/23825317/how-to-convert-markdown-css-pdf/78079072#78079072

Custom CSS

Hi, sorry if I'm missing the obvious here. How would I feed in custom CSS (I really just want to centre img's)? Cheers for any help.

Support highlight？

When I output the entire markdown content as a pdf, there is no block highlighting and it shows the incomplete markdown, with a portion on the right side being cut

from markdown_pdf import MarkdownPdf
from markdown_pdf import Section


def markdown_pdf_write(file_fullname, file_contents):
    pdf = MarkdownPdf(toc_level=2)
    pdf.add_section(Section(file_contents, toc=False))

    pdf.save(file_fullname)

Method .save_html()

Tables support

Markdown:

#header1

|TableHeader1|TableHeader2|
|--|--|
|Text1|Details 1|
|ListCell|<ul><li>FirstBullet</li><li>SecondBullet</li></ul>|

Render as:

Must be:

Images in markdown do not get pulled in

I am using markdown-pdf to pull in several existing markdown files with embedded images and write them to a single pdf. The separate markdowns display the images (with either a relative or absolute path) correctly. But, when I read them into the library with pdf.add_section the markdown comes in fine and converts to a pdf file but the image is not included.
Code:
`from markdown_pdf import MarkdownPdf
from markdown_pdf import Section

create pdf

pdf = MarkdownPdf(toc_level=2)

add section

pdf.add_section(Section("# Catchment ID 44193\n"))

add 2nd section from markdown file

md = open('./markdown/Intro.md', 'r', newline='', encoding='utf-8-sig').read()
pdf.add_section(Section(md))

set pdf properties

pdf.meta["title"] = "LOCA Report"
pdf.meta["author"] = "Tyson Broad"

save pdf

pdf.save("./src/python/md2pdf_test3.pdf")
`
Image of the local markdown displaying image correctly:

Problematic markdown text attached.
Intro.md
Output PDF attached.
md2pdf_test3.pdf

Hyperlinks are not fully converted

So when converting markdown to pdf like this """ze dne 2. 4. 2019. Dostupné z: <https://uohs.gov.cz/cs/verejne-zakazky/sbirky-rozhodnuti/detail-15999.html>""" the full link is shown and marked as hyperlink correctly but only the first line is taken as the link. The part on the second line is omitted.