jazzcore / python-pdfkit Goto Github PK

View Code? Open in Web Editor NEW

2.0K 37.0 329.0 165 KB

Wkhtmltopdf python wrapper to convert html to pdf

License: MIT License

Python 99.14% Shell 0.86%

python-pdfkit's Introduction

Python-PDFKit: HTML to PDF wrapper

https://github.com/JazzCore/python-pdfkit/actions/workflows/main.yaml/badge.svg?branch=master

Python 3 wrapper for wkhtmltopdf utility to convert HTML to PDF using Webkit.

This is adapted version of ruby PDFKit library, so big thanks to them!

Deprecation Warning

This library has been deprecated to match the wkhtmltopdf project status.

Installation

Install python-pdfkit:

$ pip install pdfkit

Install wkhtmltopdf:

Debian/Ubuntu:

$ sudo apt-get install wkhtmltopdf

macOS:

$ brew install homebrew/cask/wkhtmltopdf

Warning! Version in debian/ubuntu repos have reduced functionality (because it compiled without the wkhtmltopdf QT patches), such as adding outlines, headers, footers, TOC etc. To use this options you should install static binary from wkhtmltopdf site or you can use this script (written for CI servers with Ubuntu 18.04 Bionic, but it could work on other Ubuntu/Debian versions).

Windows and other options: check wkhtmltopdf homepage for binary installers

Usage

For simple tasks:

import pdfkit

pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

You can pass a list with multiple URLs or files:

pdfkit.from_url(['google.com', 'yandex.ru', 'engadget.com'], 'out.pdf')
pdfkit.from_file(['file1.html', 'file2.html'], 'out.pdf')

Also you can pass an opened file:

with open('file.html') as f:
    pdfkit.from_file(f, 'out.pdf')

If you wish to further process generated PDF, you can read it to a variable:

# Without output_path, PDF is returned for assigning to a variable
pdf = pdfkit.from_url('http://google.com')

You can specify all wkhtmltopdf options. You can drop '--' in option name. If option without value, use None, False or '' for dict value:. For repeatable options (incl. allow, cookie, custom-header, post, postfile, run-script, replace) you may use a list or a tuple. With option that need multiple values (e.g. --custom-header Authorization secret) we may use a 2-tuple (see example below).

options = {
    'page-size': 'Letter',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header': [
        ('Accept-Encoding', 'gzip')
    ],
    'cookie': [
        ('cookie-empty-value', '""')
        ('cookie-name1', 'cookie-value1'),
        ('cookie-name2', 'cookie-value2'),
    ],
    'no-outline': None
}

pdfkit.from_url('http://google.com', 'out.pdf', options=options)

By default, PDFKit will run wkhtmltopdf with quiet option turned on, since in most cases output is not needed and can cause excessive memory usage and corrupted results. If need to get wkhtmltopdf output you should pass verbose=True to API calls:

pdfkit.from_url('google.com', 'out.pdf', verbose=True)

Due to wkhtmltopdf command syntax, TOC and Cover options must be specified separately. If you need cover before TOC, use cover_first option:

toc = {
    'xsl-style-sheet': 'toc.xsl'
}

cover = 'cover.html'

pdfkit.from_file('file.html', options=options, toc=toc, cover=cover)
pdfkit.from_file('file.html', options=options, toc=toc, cover=cover, cover_first=True)

You can specify external CSS files when converting files or strings using css option.

Warning This is a workaround for this bug in wkhtmltopdf. You should try --user-style-sheet option first.

# Single CSS file
css = 'example.css'
pdfkit.from_file('file.html', options=options, css=css)

# Multiple CSS files
css = ['example.css', 'example2.css']
pdfkit.from_file('file.html', options=options, css=css)

You can also pass any options through meta tags in your HTML:

body = """
    <html>
      <head>
        <meta name="pdfkit-page-size" content="Legal"/>
        <meta name="pdfkit-orientation" content="Landscape"/>
      </head>
      Hello World!
      </html>
    """

pdfkit.from_string(body, 'out.pdf') #with --page-size=Legal and --orientation=Landscape

Configuration

Each API call takes an optional configuration parameter. This should be an instance of pdfkit.configuration() API call. It takes the configuration options as initial parameters. The available options are:

wkhtmltopdf - the location of the wkhtmltopdf binary. By default pdfkit will attempt to locate this using which (on UNIX type systems) or where (on Windows).
meta_tag_prefix - the prefix for pdfkit specific meta tags - by default this is pdfkit-

Example - for when wkhtmltopdf is not on $PATH:

config = pdfkit.configuration(wkhtmltopdf='/opt/bin/wkhtmltopdf')
pdfkit.from_string(html_string, output_file, configuration=config)

Also you can use configuration() call to check if wkhtmltopdf is present in $PATH:

try:
  config = pdfkit.configuration()
  pdfkit.from_string(html_string, output_file)
except OSError:
  #not present in PATH

Troubleshooting

Debugging issues with PDF generation

If you struggling to generate correct PDF firstly you should check wkhtmltopdf output for some clues, you can get it by passing verbose=True to API calls:

pdfkit.from_url('http://google.com', 'out.pdf', verbose=True)

If you are getting strange results in PDF or some option looks like its ignored you should try to run wkhtmltopdf directly to see if it produces the same result. You can get CLI command by creating pdfkit.PDFKit class directly and then calling its command() method:

import pdfkit

r = pdfkit.PDFKit('html', 'string', verbose=True)
print(' '.join(r.command()))
# try running wkhtmltopdf to create PDF
output = r.to_pdf()

Common errors:

IOError: 'No wkhtmltopdf executable found':

Make sure that you have wkhtmltopdf in your $PATH or set via custom configuration (see preceding section). where wkhtmltopdf in Windows or which wkhtmltopdf on Linux should return actual path to binary.
IOError: 'Command Failed'

This error means that PDFKit was unable to process an input. You can try to directly run a command from error message and see what error caused failure (on some wkhtmltopdf versions this can be cause by segmentation faults)

python-pdfkit's People

Stargazers

Watchers

Forkers

mmarchini amandasaurus yokotoka johncadengo colinger algby chillaranand signalkraft icuy haos616 medder yodebu amitdash hrsano645 kunnet mxreppy cyclefusion skybird6672 deniscostadsc lina1 kico445 smeggingsmegger phamthaithinh geosyntec davinirjr manrajgrover lexhung avorio medha123 talumbau mnjstwins grendel513 shaweifeng lioaphy romanseidl javacym lucuma exit99 mrg7 chaosk crownlabs rajendrakrp corbisstudio dejori tomsitter ulule kute franklindias ptkacik akolpakov eastossifrage woodb mushahid54 wang199001 bytearchive arthurianx lovmat romcyncynatus fasih walkacross advisorstream dshtanger venumeda alanhamlett nunodotferreira jessmaclaurin oyhel hsupunw vamst ninapavlich yaniv14 sri-n hubaoquan uehara1414 blake2002 scmu1 ghgh2 motuii muyuwuxin semilimes afsarker qigaopan zion302 nestortejero gullers khnmdf tobey123 mages-gamedev oldstlabs edwardbetts nuno-andre davidlatwe yugalxd giulioprovasi raacker pirate-space fbataill g2bent byst4nder tangsg

python-pdfkit's Issues

Bootstrap not rendering using

I followed the instructions under here:

https://github.com/JazzCore/python-pdfkit/wiki/Using-wkhtmltopdf-without-X-server

On an Ubuntu 14.04 AWS server.

The pdf is successfully created, but most of the bootstrap CSS, such as the jumbotron, the different row colours in the table (i.e. most things except the bootstrap font), do not render when converting the html to pdf, even though the html by itself renders correctly by itself.

When installing wkhtmltopdf on my own Mac and running wkhtmltopdf, the Bootstrap CSS renders fine, however.

I need the Linux server to work, as I want to auto-generate pdf files, but I'm really unsure as to how to solve this. Any guidance would be much appreciated!

Having issues getting python-pdfkit to work with newest release?

I'm trying to upgrade my wkhtmltopdf package to 0.12.2.1 and not having any luck.

It did work before the update. The reason I'm trying to update is to fix the splitting of content between PDF pages.

I'm running - Ubuntu 64x - Trusty
I started with getting the newest release of wkhtmltopdf http://sourceforge.net/projects/wkhtmltopdf/files/0.12.2.1/wkhtmltox-0.12.2.1_linux-trusty-amd64.deb/download?use_mirror=hivelocity

Out of the box I got Configuration Error I hard coded to simply get past this.

# -*- coding: utf-8 -*-
import subprocess
import sys

class Configuration(object):
    def __init__(self, wkhtmltopdf='', meta_tag_prefix='pdfkit-'):
        self.meta_tag_prefix = meta_tag_prefix

        self.wkhtmltopdf = wkhtmltopdf

        if not self.wkhtmltopdf:
            if sys.platform == 'win32':
                self.wkhtmltopdf = subprocess.Popen(                    ['where', 'wkhtmltopdf'], stdout=subprocess.PIPE).communicate()[0].strip()
            else:
                self.wkhtmltopdf = subprocess.Popen( ['which', 'wkhtmltopdf'], stdout=subprocess.PIPE).communicate()[0].strip()

        try:
            #with open(self.wkhtmltopdf) as f:
           with open("/usr/local/bin/wkhtmltopdf") as f:
                pass
        except IOError:
            raise IOError('No wkhtmltopdf executable found: "%s"\n'
                          'If this file exists please check that this process can '
                          'read it. Otherwise please install wkhtmltopdf - '
                          'https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf' % self.wkhtmltopdf)

Now I'm getting this error and do not know how to proceed. I tried adding shell=True to the Popen above but that didn't work either.

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/rq/worker.py", line 543, in perform_job
    rv = job.perform()
  File "/usr/local/lib/python2.7/dist-packages/rq/job.py", line 490, in perform
    self._result = self.func(*self.args, **self.kwargs)
  File "/home/worker-1/Desktop/Dropbox/changeaddress/facts/jobs.py", line 864, in job_sharepdfs
    mymovepdf_link = build_mymovepdf(account_uuid, addresschange_uuid)
  File "/home/worker-1/Desktop/Dropbox/changeaddress/facts/jobs.py", line 608, in build_mymovepdf
    s3file = pdfkit.from_string( output.getvalue() , False )
  File "/usr/local/lib/python2.7/dist-packages/pdfkit/api.py", line 68, in from_string
    return r.to_pdf(output_path)
  File "/usr/local/lib/python2.7/dist-packages/pdfkit/pdfkit.py", line 93, in to_pdf
    stderr=subprocess.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Has anyone had any luck updating this?

Google code is going away!

you'll need to update your script to install the full featured wkhtmltopdf

How to convert page that is loading data?

Hi,

I have page that is populating grid from external resource via rest api. After converting page with pdfkit the table is empty.

Can pdfkit wait until all data is loaded?

wkhtmltopdf support

Hey,
When i was trying to install pdfkit through pip, it gave error of

No wkhtmltopdf executable found: "" If this file exists please check that this process can read it.

So can't i just use it using pip?

More, i installed the wkhtmltopdf using apt-get, but using pdfkit in celery to run multiple asyncronus tasks, it gives error of
QXcbConnection: Could not connect to display
I got that it is not getting xvfb display windows available in server, so any solution which I can solve into python level itself ?

Using xvfbwrapper instead of custom bash script

I think that will be a lot more painless if there is builtin integration with xvfbwrapper (https://github.com/cgoldberg/xvfbwrapper). If there is no DISPLAY environment variable then try to start Xvfb with the wrapper.

Building option dictionary

Hello, sorry if this is not the right place, i needed an answer.

how is it posible to build a dictionary options with cookies or whatever option that requires more than one value?

options = {
'page-size': 'Letter',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'encoding': "UTF-8",
'no-outline': None,
'cookie': 'usuario=1, another_cookie=another_value',

The CSS background-image url only works when its absolute

The CSS background-image url only works when it's an absolute path on my machine, when it's relative, the image don't appear on the rendererd PDF. I found this seems to be a limitation of the wkhtml2pdf, is that true?

Creating PDF from URL grabs mobile version

I have change the page-size in options, but this will only change the size of the actual pdf copy. Have anyone had the same issue?

wkhtmltopdf: cannot connect to X server

If you don't have any X server on VPS you will get "cannot connect to X server" error. I think it's a good idea to catch this exception and show information how to resolve this issue:
http://stackoverflow.com/questions/9604625/wkhtmltopdf-cannot-connect-to-x-server

Pass custom request headers for URL in pdfkit.

I am trying to convert a URL to pdf using pdfkit in python as follows.

import pdfkit
pdfkit.from_url(url, file_path)

I wanted to know is there some way to pass custom request headers with this URL such as X-Proxy-REMOTE-USER to something.

Regards
Rohit

v0.4.2 release date

When will v0.4.2 be released on PyPI?

Failed on long list of html input

I'm on Mac OS X 10.10.3 with py2.7.6.

I have a folder of 3,000+ html files and pdfkit failed when I run:
pdfkit.from_file(myHtmls, 'out.pdf')

the traceback:

OSError Traceback (most recent call last)
in ()
----> 1 pdfkit.from_file(htmls, '/Users/kakyo/Desktop/out.pdf')

/Library/Python/2.7/site-packages/pdfkit/api.pyc in from_file(input, output_path, options, toc, cover, css, configuration)
44 configuration=configuration)
45
---> 46 return r.to_pdf(output_path)
47
48

/Library/Python/2.7/site-packages/pdfkit/pdfkit.pyc in to_pdf(self, path)
91
92 result = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
---> 93 stderr=subprocess.PIPE)
94
95 # If the source is a string then we will pipe it into wkhtmltopdf.

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
707 p2cread, p2cwrite,
708 c2pread, c2pwrite,
--> 709 errread, errwrite)
710 except Exception:
711 # Preserve original exception in case os.close raises.

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
1324 raise
1325 child_exception = pickle.loads(data)
-> 1326 raise child_exception
1327
1328

OSError: [Errno 7] Argument list too long

It is probably because that pdfkit tried to pipe the long list of file paths as arguments to wkhtmltopdf. This apparently broke the pipe.

I wonder if there is a file-list file option is available for wkhtmltopdf, so that it accept a single file-list file that contains a list of input html file paths.

https://github.com/JazzCore/python-pdfkit/wiki/Using-wkhtmltopdf-without-X-server

In your setup, shouldn't the bash link match the symlink?

/usr/local/bin/wkhtmltopdf INSTEAD OF /usr/bin/wkhtmltopdf

Does not support multiple option values (--common-header)

wkhtmltopdf has an option, --common-header, that takes two values:

wkhtmltopdf --common-header X-Foo Bar

But python-pdfkit does not seem to handle this correctly: using a string the value is passed as a single value to wkhtmnltopdf process:

options = { 'common-header': 'X-Foo Bar', ... }

And passing a list is not supported by pdfkit:

options = { 'common-header': [ 'X-Foo', 'Bar' ], ... }

Since the arguments are passed in random order this also leads to random errors from wkhtmltopdf.

Cover and TOC seem to be added in incorrect order.

In the PDFKit.command method both TOC and cover arguments are added if specified. However when both TOC and cover are specified, TOC is added before cover, resulting in the TOC being rendered on the first page and the cover on the second.

Support wkhtmltoimage

Newer versions of wkhtmltopdf include wkhtmltoimage. Would be nice to support that.

how to config low quality?

here is my code

pdf = pdfkit.from_url(url, False)

size of the pdf file is too large, time of send to user is too long
so I want change the quality of pdf ,how to config?
I use

options = {
      'lowquality': True
  }

but error:

Error: Failed loading page http://true (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1 due to network error: HostNotFoundError     ] 55%

how to do this?thank you

why convert <td>ce2d842c</td> to c \n e \n 2 \n d \n 8 \n 4 \n 2\n c\n?

I have code like "ce2d842c"in my html-file.It looks like "ce2d842c" in my browser.But it looks like this"c
e
2
d
8
4
2
c" in my pdf-file which was converted from the html-file by pdfkit.from_file().

Number of pages?

Is there a way to get the number of pages? I cant find anything in the documentation or code that shows how to retrieve the page count for a generated pdf.

winerror 6 invalid handle

When I tried running pdfkit under boost::python in an fcgi application run by apache on windows 7, it was throwing the error in the title. I would guess it might fail the same way under pythonw, but didn't test that.

I managed to fix it by adding pipe handlers to stdin and stderr in configuration.py. I changed it to:
[code]
if sys.platform == 'win32':
proc = subprocess.Popen(['where', 'wkhtmltopdf'],stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
self.wkhtmltopdf = proc.stdout.read().strip()
[/code]
I don't know if separating it into two statements is necessary, but since it started working with that I haven't tried changing it back to .communicate()[0].strip()

The error appears to be related to not launching it from a console and the stdin, stderr, and stdout not having handles to default to.

Python3: 'str' object has no attribute 'decode'

Hi,

I am facing the below mention error while generating PDF

'str' object has no attribute 'decode'
at line 40:

self.wkhtmltopdf = self.configuration.wkhtmltopdf.decode('utf-8')

in pdfkit.py

Here is my code:

config = pdfkit.configuration(wkhtmltopdf='/usr/bin/wkhtmltopdf')
pdfkit.from_url('https://micropyramid.com/blog/how-to-create-pdf-files-in-python-using-pdfkit/', 'micro.pdf', configuration=config)

Any help on this.
Thanks

Wide page cuts off and no overflow.

Hey, been using this for a specific project I am working on. The HTML I have renders properly in the browser, but when the pdfkit converts it and cuts off the extra information on the page outside of letter format.

Is there any way to fix this? I double checked. Printing the same page results in the same error in a browser. It looks to be fairly common when printing HTML. The HTML is converted from a wide spreadsheet.

Request: image (wkhtmltoimage) support

PDFKit is great for converting a webpage to PDF. Another program in the same (Debian) package as it is wkhtmltoimage, which can turn a webpage into PNG/SVG/etc.

We do this, too, but since pdfkit doesn't support that yet (right?), we basically have to do most of the subprocess guts ourselves. It works, but it's less fun (and more error-prone) than pdfkit.

It would be great if PDFKit supported PNG/SVG through a similar interface so we could simplify our application, and take advantage of everything PDFKit provides for wkhtmltopdf.

Decoding problem with CSS File

I'm getting error when trying to use CSS. I'm using webkit wkhtmltopdf 0.10.0 rc2

/usr/bin/python2.7 report/html_to_pdf.py
Traceback (most recent call last):
  File "report/html_to_pdf.py", line 7, in <module>
    pdfkit.from_file('results_report.html', 'results_report.pdf', css=css)
  File "/usr/local/lib/python2.7/dist-packages/pdfkit/api.py", line 46, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python2.7/dist-packages/pdfkit/pdfkit.py", line 101, in to_pdf
    input = self.source.to_s().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 2149: ordinal not in range(128)

My code is:

import pdfkit

css = 'report.css'

pdfkit.from_file('results_report.html', 'results_report.pdf', css=css)

File size management?

I used this library for first time, if I do not provide any css the file size is 27.9kb, but if I provide a css file the pdf file size rises to 6.3MB. Is there a way to manage it? The size of CSS file is 30kb. I mean why such a difference? Is there a workaround?

Project still actively maintained?

I noticed that the last commit was nearly 2 years ago and there have been PR's sitting for a little while now.

@JazzCore is this project still actively maintained?

Would it make sense to fork this and continue development elsewhere?

"You will need to run whktmltopdf within a "virutal" X server" error when being executed within crontab job

I am getting an exception when my script is being executed by crontab job:

You will need to run whktmltopdf within a "virutal" X server.
Go to the link above for more information
https://github.com/JazzCore/python-pdfkit/wiki/Using-wkhtmltopdf-without-X-serverUse exit() or Ctrl-D (i.e. EOF) to exit

If I run the script manually - it works and saves html to pdf. So of course I did that magic which is written in the URL above. But it doesn't help when script is executed within crontab (root user)

Debian 7

incompatibility with gevent

Hi,

The background is that this is a Flask app running on Gunicorn with Gevent worker (so gevent monkey patch is applied app wide automatically). And here is the trimmed down version of the error shown:

  File "/home/woozyking/venv/local/lib/python2.7/site-packages/pdfkit/api.py", line 68, in from_string
    return r.to_pdf(output_path)
  File "/home/woozyking/venv/local/lib/python2.7/site-packages/pdfkit/pdfkit.py", line 106, in to_pdf
    stdout, stderr = result.communicate(input=input)
  File "/usr/lib/python2.7/subprocess.py", line 799, in communicate
    return self._communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 1401, in _communicate
    stdout, stderr = self._communicate_with_poll(input)
  File "/usr/lib/python2.7/subprocess.py", line 1431, in _communicate_with_poll
    poller = select.poll()
AttributeError: 'module' object has no attribute 'poll'

And associate this to this discussion on gevent's google group, it seems that Gevent doesn't implement poll interface.

What would be the most rational way to deal with this issue? I'll be glad to submit a PR if I can get a bit more background on why subprocess was used in the first place (is it for content buffer or such? just guessing)

woozyking

Edit: I just realized that subprocess is used to basically call the actual wkhtmltopdf library. But is there any other way we can avoid the use of communicate with poll?

Input path must be byte string, cannot be unicode string

When using pdfkit.from_file(), the input parameter must be a byte string. If it's a unicode string, you get an error similar to this:

IOError: wkhtmltopdf reported an error:
Error: Failed loading page file::/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page file:///E:/ (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page http://o (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page http://o (sometimes it will work just to ignore this error with --load-error-handling ignore)
Error: Failed loading page http://o (sometimes it will work just to ignore this error with --load-error-handling ignore)
[…]

So you can see right away, somewhere a URI is split into its characters.

The immediate issue is this line (pdfkit/pdfkit.py:107):

if isinstance(self.source.source, str):

Which should be changed to:

if isinstance(self.source.source, basestring):

This should probably be fixed in a couple more places, I see more occurences of this and isinstance(x, str) is almost always not what you want.

OSError: [Errno 8] Exec format error

Hello,

Please look at the below error trace.

In [1]: import pdfkit

In [2]: pdfkit.from_string('a', 'a.pdf')
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
 in ()
----> 1 pdfkit.from_string('a', 'a.pdf')

/root/.virtualenvs/v1/lib/python2.7/site-packages/pdfkit/api.pyc in from_string(input, output_path, options, toc, cover, css, configuration)
     66                configuration=configuration)
     67 
---> 68     return r.to_pdf(output_path)
     69 
     70 

/root/.virtualenvs/v1/lib/python2.7/site-packages/pdfkit/pdfkit.pyc in to_pdf(self, path)
     91 
     92         result = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
---> 93                                   stderr=subprocess.PIPE)
     94 
     95         # If the source is a string then we will pipe it into wkhtmltopdf.

/usr/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    708                                 p2cread, p2cwrite,
    709                                 c2pread, c2pwrite,
--> 710                                 errread, errwrite)
    711         except Exception:
    712             # Preserve original exception in case os.close raises.

/usr/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1325                         raise
   1326                 child_exception = pickle.loads(data)
-> 1327                 raise child_exception
   1328 
   1329 

OSError: [Errno 8] Exec format error

Not sure if this is an issue or some fault from my side, it was working in my local system(installed long back) but not working in digital ocean server.

I followed instructions from this page - http://fedir.github.io/web/blog/2013/09/25/install-wkhtmltopdf-on-ubuntu/

Please let me know if I missed anything.

Header/Footer HTML Path

When using wkhtmltopdf in command line, I can specify a local HTML file to use as the footer (or header):

wkhtmltopdf --footer-html .\footer.html .\content.html output.pdf

When I pass that reference in the options variable:

options = {"footer-html":"footer.html"}

... PDFKit never works for me. Passing a url (mentioned here) does work:

options = {"footer-html":"http://google.com"}

Two questions:

Is it possible to pass a reference to a local HTML file?
If so, where does the file need to be relative to? I have tried relative and absolute paths, neither seem to work.

Thanks!

WKHtmltoPdf "specified in incorrect location" error

Good morning,

I've been working with python-pdfkit to generate pdfs on a Django app and everything was running perfectly on our Windows builds, but we had to upload the project to a Linux VPS (using Ubuntu 12.04). I have been working on installing a correct wkhtmltopdf build that works correctly with everything, and I have been able to use it from the command line, but running it from Python fails.

I have been able to extract the definitive command issued from the Python code and the arguments seem to be ordered "randomly" (actually, on the hashed order from Python), which halts wkhtmltopdf because it cannot work with unordered arguments.

support for header/footer

is there possible to set a header/footer with an html code too? if not can we have it? :)

Python 3 - Error with "decode('utf-8')

Hello,

There's an error in Python 3 with the line (in the init method of PDFKit)

self.wkhtmltopdf = self.configuration.wkhtmltopdf.decode('utf-8')
'str' object has no attribute 'decode'

We had to remove the call of the decode function to make it work.

set url for from_string parsing

Is it possible to set a url for from_string parsing? I'm using pdfkit to parse the output from my (custom) CMS system, obviously using from_string, but when trying to include images using relative links (eg: img src="/images/logo.png") it tries to load from file:///images/logo.png, and I'd like to somehow tell it all relative links are relative to http://server.com/path/to/request - instead of the altnerative of hacky regex replacements on all links before passing the string. It seems to do the same when the page has links to javascript files. It didn't warn about CSS inclusions, but I'd bet they're not recognized properly either.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 28: invalid start byte

Latest release has introduced a new bug (this same code worked on 0.5.0)

File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 72, in from_string
   return r.to_pdf(output_path)

 File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 146, in to_pdf
   if 'cannot connect to X server' in stderr.decode('utf-8'):

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 28: invalid start byte

Different behaviour on localhost and server

Hello, I'm having an issue with pdfkit.

I'm using this to create a PDF, attaching to email and sending it with MandrillApp
this is the code I use to generate the PDF:
pdf = pdfkit.from_url(link, False)

That works on localhost but when I upload it to my server, it doesn't work and has a Exception ("from_url() missing 1 required positional argument: 'output_path'",) so I added a None as first attribute and again another Exception ("invalid literal for int() with base 10: 'None'",) so I put an integer as the first argument
pdf = pdfkit.from_url(0, link, False)

It works, generate the PDF and sends to mail, but when I want to open the PDF, the reader says it's not a valid PDF file, I only managed to open the generated PDF it with Okular (KDE's default reader).

What can I do to have this working as in localhost on the server?
(attached a screenshot of the code)

file object support

I'd like to add the ability to pass in a file-like object instead of a file name for writing to.

Before I start I have some questions:

currently if the output_path paramater of the API calls is False the stdout of wkhtmltopdf is send to the stdout of the python process. Correct?
I'm curious what the use case is for this. It doesn't seem to be a useful feature in a library - is it for debugging?

Keeping this feature complicates support for file-like objects - would you object to me removing it?

subprocess.Popen objects are used for ipc. However the communicate method is only called if self.source.isString() or (self.source.isFile() and self.css) (refactoring the logic slightly).
Why is this? I am guessing that when the source is a url we don't want to block on wkhtmltopdf blocking on network. But what about the case when self.source.isFile() and not self.css ?

what about this block:

if '--quiet' not in args:
    while True:
        if result.poll() is not None:
            break
        out = result.stdout.read(1).decode('utf-8')
        if out != '':
            sys.stdout.write(out)
            sys.stdout.flush()

why have you done things this way? I'm guessing that you're only reading a piece at a time so that output will appear on the python processes' stdout as it is produced by wkhtmltopdf .

Also there will only be something produced here when wkhtmltopdf is passed '-' as an output file, right?

stdout, stderr = result.communicate(input=input) hangs and takes a long time to load

I have an issue with using PDFKIT whereby for some strange reason generating a simple PDF takes > 17s which honestly shouldn't be the case.

Further diagnosis reveals that the WKHTMLTOPDF process runs at 99% - 100% CPU utilization for the period of time that the PDF is being generated.

Using CPROFILE for Python, I can narrow down to the following that seems to be the root cause (this runs for 17s+):

69 17.196 0.249 17.196 0.249 {built-in method poll}

Some environment details are as follows:

Host OS: VMWare ESXi 5.1
Guest OS: Ubuntu 14.04.3 LTS 64 bit
Guest CPU: 4 x Cores Intel Xeon E5506
Guest RAM: 4GB
WKHTMLTOPDF Version: 0.12.2.1 (with patched qt) 64 bit

I have further narrowed it down to stdout, stderr = result.communicate(input=input) which is present in pdfkit.py which seems to be the line that is causing the slowness.

Does anyone have any idea what this "built-in method poll" item is and why result.communicate may be contributing to slow PDF generation and high CPU usage?

Thanks.

libpng error: IDAT: CRC error

I'm getting an error generating pdf from html using pdfkit with <img> tags using interlaced pngs (that are transparent). My only error I get ouput is

libpng error: IDAT: CRC error

but nothing more verbose than this. The pdf generates but there are no images if it's interlaced, and the non-interlaced (non-transparent) images load. Any ideas?

How to custom Head and Proxy ?

Hi!
I want to custom the http head and proxy. I try some code, but I fail.

#!/usr/bin/env python
# encoding: utf-8


import pdfkit
import requests

url = 'http://www.baidu.com'
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36'}
proxy ={'http': '113.119.82.69:9000'}
options = {
        'custom-header':headers,
        'proxy': proxy
        }


pdfkit.from_url(url,'out.pdf',options=options)

Add /test

I have packaged pdfkit as python-pdfkit for Fedora

https://admin.fedoraproject.org/pkgdb/package/python-pdfkit/

Can you please add some test files in the source so we can make test during the packaging process for next realeases?

Add backwards compatibility to Python 2.5 and Python 2.6

I'm currently working on a project which stills uses Python 2.5, and, unfortunately, we will stay stuck at it for some time. But we also really needed to change our PDF generator, and we decided to try python-pdfkit, because of wkhtmltopdf performance.

I created a fork of the project and made some (very ugly) workarounds in order to be able to use it on Python 2.5. We're currently using my fork on the project, but it would be nice if we could install the package from PyPI.

I'm not sure if you are interested in that, but I could try to find another (more elegant) to add backwards compatibility to versions 2.5 and 2.6 of Python.
What do you thing?

UnicodeDecodeError: 'ascii' codec can't decode . . .

When I try to use python-pdfkit with certain HTML content that has certain characters in it, it fails with one of these errors if the html content is loaded into memory:

File ". . . /pdfkit.py", line 100, in to_pdf
    input = self.source.to_s().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

File ". . ./pdfkit.py", line 102, in to_pdf
    input = self.source.source.read().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 64: ordinal not in range(128)

But, python pdfkit works just fine if it is provided with just a filename, and so does wkhtmltopdf.

I think that python pdfkit is doing something unsafe with strings; perhaps it should assume that the input is just bytes.

python-pdfkit error demo.zip

Installing xvfb

I'm always getting:
FATAL -> Failed to fork.

when:
apt-get install xvfb

any thoughts?

Documentation error.

The cookies parameter should be cookie.

I looked over the source and saw that you are iterating over the options object and just convert them, without changing properties (which is ok), but wkhtmltopdf's parameter is cookie.

Thanks!

Allow numbers to be passed as options

The 'options' dictionary lets you specify things to pass to wkhtmltopdf, but it only allows strings, even for values which are numbers.

It would be great if pdfkit automatically called str() on these values, so we could say things like options={'javascript-delay': 1000} without needing to string-quote our ints.

Zoom issue

Zoom is not working on options dict

        options = {
            'page-size': 'A4',
            'orientation': 'Portrait',
            'margin-top': '0.75in',
            'margin-right': '0.75in',
            'margin-bottom': '0.75in',
            'margin-left': '0.75in',
            'encoding': 'UTF-8',
            'quiet': '',
            'zoom': '1.5' # float is not working too
        }

        pdf = pdfkit.from_string(text, False, options=options)

any suggestion?

Proxy support?

Hi there!

I can see this repo is alive. Do you plan to add proxy support?