Code Monkey home page Code Monkey logo

pikepdf's Introduction

pikepdf

pikepdf is a Python library for reading and writing PDF files.

Build Status PyPI PyPI - Python Version PyPy PyPI - License PyPI - Downloads codecov

pikepdf is based on QPDF, a powerful PDF manipulation and repair library.

Python + QPDF = "py" + "qpdf" = "pyqpdf", which looks like a dyslexia test. Say it out loud, and it sounds like "pikepdf".

# Elegant, Pythonic API
with pikepdf.open('input.pdf') as pdf:
    num_pages = len(pdf.pages)
    del pdf.pages[-1]
    pdf.save('output.pdf')

To install:

pip install pikepdf

For users who want to build from source, see installation.

pikepdf is documented and actively maintained. Binary wheels are available for all common platforms, both x86-64 and ARM64/Apple Silicon. For information on the latest changes, see the release notes.

Commercial support is available.

Features

This library is similar to pypdf (formerly PyPDF2) - it provides low level access to PDF features and allows editing and content transformation of existing PDFs. Some knowledge of the PDF specification may be helpful. It does not have the capability to render a PDF to image.

Feature pikepdf pypdf (PyPDF2)
Editing, manipulation and transformation of existing PDFs
Based on an existing, mature PDF library QPDF
Implementation C++ and Python Python
PDF versions supported 1.1 to 1.7 1.1 to 1.7
Save and load password protected (encrypted) PDFs ✔ (except public key) ✔ (except public key)
Creates linearized ("fast web view") PDFs
Test suite coverage codecov codecovpypdf2
Creates PDFs that pass PDF validation tests
Modifies PDF/A without breaking PDF/A compliance
PDF XMP metadata editing read-only
Integrates with Jupyter and IPython notebooks for rapid development

Testimonials

I decided to try writing a quick Python program with pikepdf to automate [something] and it "just worked". –Jay Berkenbilt, creator of QPDF

"Thanks for creating a great pdf library, I tested out several and this is the one that was best able to work with whatever I threw at it." –@cfcurtis

In Production

  • OCRmyPDF uses pikepdf to graft OCR text layers onto existing PDFs, to examine the contents of input PDFs, and to optimize PDFs.

  • PDF Arranger is a small Python application that provides a graphical user interface to rotate, crop and rearrange PDFs.

  • PDFStitcher is a utility for stitching PDF pages into a single document (i.e. N-up or page imposition).

License

pikepdf is licensed under the Mozilla Public License 2.0 license (MPL-2.0) that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license. MPL 2.0 permits you to combine the software with other work, including commercial and closed source software, but asks you to publish source-level modifications you make to pikepdf itself.

Some components of the project may be under other license agreements, as indicated in their SPDX license header or the .dep5/reuse file.

pikepdf's People

Contributors

cherryblossom000 avatar dean0x7d avatar dependabot[bot] avatar dreua avatar dulacp avatar jbarlow83 avatar jberkenbilt avatar jugmac00 avatar kloczek avatar knobix avatar kraptor avatar lamby avatar lucas-c avatar m-holger avatar mara004 avatar martinthoma avatar merll avatar mgorny avatar micparke avatar mstarzyk avatar pabloalexis611 avatar qulogic avatar sameersismail avatar sjahu avatar stephengroat avatar sylvaincorlay avatar tkoeppe avatar willangley avatar wjakob avatar yoshitakanaraoka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pikepdf's Issues

docinfo fails in threads

docinfo doesn't work at all in a thread. The following code can demonstrate the problem.

from concurrent.futures import ThreadPoolExecutor
from io import BytesIO
from urllib.request import urlopen
import sys
import threading

import pikepdf

print(f'sys.version = {sys.version.replace(chr(10), "")}')
print(f'pikepdf.__version__ = {pikepdf.__version__}')
print(f'pikepdf.libqpdf_version__ = {pikepdf.__libqpdf_version__}')

pdf_bytes = urlopen('https://www.fda.gov/downloads/drugs/guidances/ucm353925.pdf').read()


def get_docinfo(pdf_bytes):
    thread_name = threading.current_thread().name
    pdf = pikepdf.open(BytesIO(pdf_bytes))
    print(f'{thread_name}: got pdf {pdf}')
    docinfo = pdf.docinfo  # GETS STUCK HERE IN THREAD.
    print(f'{thread_name}: got docinfo')
    docinfo = dict(docinfo)
    return docinfo


local_docinfo = get_docinfo(pdf_bytes)

executor = ThreadPoolExecutor(max_workers=1)
threaded_docinfos = list(executor.map(get_docinfo, [pdf_bytes]))

print('Finished.')

The output is:

sys.version = 3.7.2 (default, Dec 25 2018, 03:50:46) [GCC 7.3.0]
pikepdf.__version__ = 1.0.5
pikepdf.libqpdf_version__ = 8.3.0
MainThread: got pdf <pikepdf.Pdf description='<_io.BytesIO object at 0x7ff01c6b38e0>'>
MainThread: got docinfo
ThreadPoolExecutor-0_0: got pdf <pikepdf.Pdf description='<_io.BytesIO object at 0x7ff01c6b38e0>'>

It then gets stuck.

[Bug / Feature Request] Cannot close files once opened

Summary

When you call pikepdf.open, it opens an OS handler for the file. You can see this open with lsof

In [1]: import pikepdf

In [2]: p = pikepdf.open('pdf.pdf')

will create the handler:

python  17331 checkroth   12r      REG                8,1   433994 4750331 /home/checkroth/pdf.pdf

The Problem

There is no way to close a file with pikepdf after opening it. This is likely an edge case, but a high-usage system that is not frequently restarted will pile up dead file handlers until you eventually get to an OS Error: Too many open files.

A temporary fix

You can manually delete the reference to a pikepdf.Pdf instance. From the above code,

In [3]: del p

will delete the file handler as well.

The Feature Request

It would be great to have a Pdf.close() function that will handle this safely in the context of PikePDF.

Unable to view pdf with pdf or pdf.pages in jupyter notebook

First, I'm really excited to use this, thank you for your work on it!

I am simply following the first page of the tutorial, and would like to be able to see the pdf. My output is not an image of the pdf, but instead a dead link that says "View PDF". Clicking on this either does nothing (in IE), or opens a new empty tab called untitled with an impossibly long string in the URL (in Chrome).

Even a one page pdf will do this. This happens whether I type 'pdf' to see the whole thing, or pdf.pages[0] to see the first page (or any page).

Any advice for this? Thank you!

Random failure in test_random_dates

This failed on a few systems, but not others, on Fedora 29 (though not 30), but since this is a hypothesis run, it's not unexpected to be random:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
year = 1, month = 1, day = 1, hour = 0, mins = 0, sec = 0
    @given(
        integers(-9999, 9999),
        integers(0, 99),
        integers(0, 99),
        integers(0, 99),
        integers(0, 99),
        integers(0, 99),
    )
    def test_random_dates(year, month, day, hour, mins, sec):
        date_args = year, month, day, hour, mins, sec
        xmp = '{:04d}-{:02d}-{:02d}T{:02d}:{:02d}:{:02d}'.format(*date_args)
        docinfo = '{:04d}{:02d}{:02d}{:02d}{:02d}{:02d}'.format(*date_args)
    
        try:
            converted = DateConverter.docinfo_from_xmp(xmp)
        except ValueError:
            pass
        else:
>           assert converted == docinfo
E           AssertionError: assert '10101000000' == '00010101000000'
E             - 10101000000
E             + 00010101000000
E             ? +++
tests/test_metadata.py:271: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_random_dates(year=1, month=1, day=1, hour=0, mins=0, sec=0)

[Feature Request] Page Screenshot

First of all, thank you for this super polished lib.

I want to request a feature. Would be nice to be able to convert a page to image.

Failing to open qpdf library on FreeBSD

Python 3.6.5 (default, Jun 23 2018, 01:15:38) 
[GCC 4.2.1 Compatible FreeBSD Clang 4.0.0 (tags/RELEASE_400/final 297347)] on freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> from pikepdf import Pdf
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pikepdf/__init__.py", line 12, in <module>
    from . import _qpdf
ImportError: /usr/local/lib/python3.6/site-packages/pikepdf/_qpdf.so: Undefined symbol "_ZN16QPDFObjectHandle12getDictAsMapEv"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/pikepdf/__init__.py", line 14, in <module>
    raise ImportError("pikepdf's extension library failed to import")
ImportError: pikepdf's extension library failed to import```

....

```qpdf --version
qpdf version 8.1.0
Run qpdf --copyright to see copyright and license information.```

[feature request] Saving with file encryption

First of all l want to say that basing a python library on qpdf is a great idea. Pretty much all PDF libraries in python lack at least one of the features that qpdf has readily available.

One such feature is saving with PDF encryption. I have yet to find a library that supports more than 128bit RC4. My work-around so far is to generate a PDF in reportlab or modify it in pypdf and then call qpdf's binary on a tempfile from python to encrypt it. This approach, however, turns out to be pretty volatile as qpdf's cli interface is prone to change with version bumps.

I was wondering whether implementing this feature is planned for pikepdf. It would be great to finally have a python interface to proper aes encryption for PDFs.

Exception in pikepdf findall

I installed the latest version of ocrmypdf (8.0.1-1) and pikepdf (1.0.5-1) in Manjaro (and also tried to use the versions from pypi) and receive the following exception:

 ERROR - Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/ruffus/task.py", line 748, in run_pooled_job_without_exceptions
    register_cleanup, touch_files_only)
  File "/usr/lib/python3.7/site-packages/ruffus/task.py", line 566, in job_wrapper_io_files
    ret_val = user_defined_work_func(*params)
  File "/usr/lib/python3.7/site-packages/ocrmypdf/_pipeline.py", line 867, in metadata_fixup
    not_copied = set(meta_original.keys()) - set(meta.keys())
  File "/usr/lib/python3.7/_collections_abc.py", line 720, in __iter__
    yield from self._mapping
  File "/usr/lib/python3.7/site-packages/pikepdf/models/metadata.py", line 496, in __iter__
    for node, attrib, _val, _parents in self._get_elements():
  File "/usr/lib/python3.7/site-packages/pikepdf/models/metadata.py", line 462, in _get_elements
    for rdfdesc in rdf.findall('rdf:Description[@rdf:about=""]', self.NS):
AttributeError: 'NoneType' object has no attribute 'findall'

After a downgrade to version 7.3.1 of ocrmypdf (including pikepdf 0.3.7) everything looks good.

Used ocrmypdf command:
ocrmypdf 1.pdf 2.pdf -l deu

docinfo from incomplete PDF

Currently I can get the docinfo from a complete PDF using pikepdf.open(BytesIO(pdf_bytes)).docinfo.

I have a bytes object with PDF data in it that I have downloaded. I'm naively assuming that the docinfo is toward the top of the file, and not say toward the bottom instead. Downloading fewer bytes is better.

I'd like an interface that takes as input a bytes object of a possibly incomplete PDF. The bytes object will have a valid start but an invalid end of file. It returns me the full docinfo object for it if available. If the docinfo is incomplete or if it's missing in the input, then it raises a specific exception instead, etc.

Is this feasible? Thanks very much. Again, this will prevent a needless download of the full file. I am not interested in the data contained in the rest of the PDF.

Using form related qpdf flags

Looking at the qpdf manual, I see there are options for --generate-appearances and --flatten-annotations as of v8.3. Is it possible to replicate this or use these flags directly in pikepdf?

Thanks!

object_stream_mode=pikepdf.ObjectStreamMode.disable does not work

I run this to save file:
pdf.save("DECOMPRESSED mytest.pdf", object_stream_mode=pikepdf.ObjectStreamMode.disable, qdf=True)
but still get object streams like this:
4 0 obj
<<
/Type /ObjStm
/Length 4108
/N 9
/First 121

stream
5 0
6 235
7 322
8 409
9 776
10 1588
11 1950
12 2828
13 3196
%% Object stream: object 5, index 0; original object ID: 238
<< .........

There doesn't seem to be difference between .preserve and .disable

Thanks

Documentation out of date

Documentation is set as 0.2.2, but 0.3 is out and works differently.

I.E. pikepdf.Pdf.open vs pikepdf.open

Copying multiple pages from other PDFs doesn't work

If I generate a new pdf and I try to add a page from another pdf multiple times, it fails.
More specifically, it works with 3 pages, if fails (with a Segmentation Fault) with >4. An example:

import pikepdf


def concatenate(number):
    print('concatenating same page', number, 'times')
    output_pdf = pikepdf.Pdf.new()
    for i in range(number):
        print(i)
        pdf_page = pikepdf.Pdf.open('/tmp/drawing.pdf')
        output_pdf.pages.extend(pdf_page.pages)
        
    output_pdf.save('/tmp/test.pdf')
    print('done')

concatenate(3)  #works
concatenate(4)  #fails

where drawing.pdf is a single-page pdf

MPL-2.0 exhibit B

I am trying to understand what is meant by your removal of Exhibit B from the copies of the MPL that are included in the pikepdf repository.

Firstly, "Exhibit" means that the text is an example, so removing it from the license is not any kind of declaration about your code.

Secondly, you have removed it from the license without removing references to it from the license. So the license does not really make sense anymore. For example, it still says "If You choose to distribute Source Code Form that is Incompatible With Secondary Licenses under the terms of this version of the License, the notice described in Exhibit B of this License must be attached."

Thirdly, I cannot tell what "We exclude Exhibit B, so pikepdf is compatible with secondary licenses." means. Do you mean that you exclude the requirement to attach the notice described in Exhibit B?

Could you say what your intention is here, please? Thanks.

[Feature Request] Overlay/Underlay Support via qpdf 8.4

I am trying to use pikepdf to generate a series of bibliographic overlays for already-published papers. I am able to use pikepdf to open the files and handle other tasks but not to do overlay within pikepdf. Qpdf 8.4 provides built-in support for overay/underlay of pdfs.

http://qpdf.sourceforge.net/files/qpdf-manual.html#ref.overlay-underlay

But unfortunately I cannot get to it from pikepdf.

My request is that the qpdf overlay/underlay functionality be added to pikepdf or failing that, that the underlying features be added to the next iteration of the internal interface so that they can be used.

Thanks.

repositioning content

One of the features listed in the docs is the ability to reposition content, but I'm not seeing any docs or tests dealing with that specific topic. Would you mind adding some documentation on how to get started with that? Thanks.

"operation for Name object attempted on object of wrong type"

I would like to know which version of qpdf are u using, because I find this problem merging some pdfs. It can be found here: qpdf/qpdf#74 but it is solved since version 7.0.0 I think.

My code is:

new_pdf = Pdf.new()
        for filename in self.filepaths:
            pdf = Pdf.open(filename, password="")
            for page in pdf.pages:
                new_pdf.pages.append(page)

        new_pdf.save(outfilepath)

Unable to build on macOS with qpdf 9.0.0

I ran into a unusual problem when attempting to update qpdf to version 9.0.0 for Homebrew. Since ocrmypdf in the Homebrew repo requires qpdf, the test case for ocrmypdf ran. It produced an linking error message stating the compiled qpdf library for pikepdf could not be loaded. (It was attempting to reference version 21 of libqpdf instead of version 26.)

In an attempt to resolve the problem, I bumped the revision to ocrmypdf and updated it's python dependencies, which included pikepdf from v1.6.1 to v1.6.2. Unfortunately, Homebrew's CI ran into a compiler error in ocrmypdf caused by pikepdf. I have duplicated the error when attempting to build pikepdf locally using qpdf 9.0.0 installed from the Homebrew PR listed above.

I'm not sure how to go about attempting to fix the compiling issue with pikepdf. Any help would be appreciated.


My local setup
$ pipdeptree
defusedxml==0.6.0
lxml==4.4.1
pipdeptree==0.13.2
  - pip [required: >=6.0.0, installed: 19.2.3]
pybind11==2.3.0
setuptools==41.2.0
wheel==0.33.6
$
$ qpdf --version
qpdf version 9.0.0
Run qpdf --copyright to see copyright and license information.
$
$ ls -al /usr/local/opt/qpdf/lib/
total 3604
-r--r--r--  1 user admin 1111728 Sep  1 08:36 libqpdf.26.dylib
-r--r--r--  1 user admin 2568392 Sep  1 08:36 libqpdf.a
lrwxr-xr-x  1 user admin      16 Sep  1 08:36 libqpdf.dylib -> libqpdf.26.dylib
drwxr-xr-x  3 user admin     102 Sep  1 08:36 pkgconfig
Error message
clang ... -c src/qpdf/object.cpp ...
  src/qpdf/object.cpp:810:10: error: no matching member function for call to 'def'
          .def("handle_object", &QPDFObjectHandle::ParserCallbacks::handleObject)
          ~^~~
Full build log
$ git clone https://github.com/pikepdf/pikepdf.git
$ cd pikepdf
$ git checkout v1.6.2
$ pip install .
Processing /Users/user/work/pikepdf
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: lxml>=4.0 in /Users/user/work/ve/lib/python3.7/site-packages (from pikepdf==1.6.2) (4.4.1)
Building wheels for collected packages: pikepdf
  Building wheel for pikepdf (PEP 517): started
  Building wheel for pikepdf (PEP 517): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /Users/user/work/ve/bin/python3.7 /Users/user/work/ve/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/tmpd4kqxxdu
       cwd: /private/var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/pip-req-build-d22uvpm8
  Complete output (55 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.12-x86_64-3.7
  creating build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/__init__.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/_cpphelpers.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/_methods.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/_version.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/codec.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  copying src/pikepdf/objects.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf
  creating build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  copying src/pikepdf/models/__init__.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  copying src/pikepdf/models/encryption.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  copying src/pikepdf/models/image.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  copying src/pikepdf/models/matrix.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  copying src/pikepdf/models/metadata.py -> build/lib.macosx-10.12-x86_64-3.7/pikepdf/models
  running build_ext
  creating var
  creating var/folders
  creating var/folders/kj
  creating var/folders/kj/4283qf996816pvl74ls38j2r0000gn
  creating var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/tmpctz05xfq.cpp -o var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/tmpctz05xfq.o -std=c++14
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/tmpahrs8nxu.cpp -o var/folders/kj/4283qf996816pvl74ls38j2r0000gn/T/tmpahrs8nxu.o -fvisibility=hidden
  building 'pikepdf._qpdf' extension
  creating build/temp.macosx-10.12-x86_64-3.7
  creating build/temp.macosx-10.12-x86_64-3.7/src
  creating build/temp.macosx-10.12-x86_64-3.7/src/qpdf
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/user/work/ve/bin/../include/site/python3.7 -I/Users/user/work/ve/bin/../include/site/python3.7 -I/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/qpdf/annotation.cpp -o build/temp.macosx-10.12-x86_64-3.7/src/qpdf/annotation.o -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO="1.6.2" -std=c++14 -fvisibility=hidden
  clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Users/user/work/ve/bin/../include/site/python3.7 -I/Users/user/work/ve/bin/../include/site/python3.7 -I/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/qpdf/object.cpp -o build/temp.macosx-10.12-x86_64-3.7/src/qpdf/object.o -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO="1.6.2" -std=c++14 -fvisibility=hidden
  src/qpdf/object.cpp:810:10: error: no matching member function for call to 'def'
          .def("handle_object", &QPDFObjectHandle::ParserCallbacks::handleObject)
          ~^~~
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1111:13: note: candidate template ignored: couldn't infer template argument 'Func'
      class_ &def(const char *name_, Func&& f, const Extra&... extra) {
              ^
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1129:13: note: candidate template ignored: could not match 'op_' against 'char const[14]'
      class_ &def(const detail::op_ &op, const Extra&... extra) {
              ^
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1141:13: note: candidate template ignored: could not match 'constructor' against 'char const[14]'
      class_ &def(const detail::initimpl::constructor &init, const Extra&... extra) {
              ^
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1147:13: note: candidate template ignored: could not match 'alias_constructor' against 'char const[14]'
      class_ &def(const detail::initimpl::alias_constructor &init, const Extra&... extra) {
              ^
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1153:13: note: candidate template ignored: could not match 'factory' against 'char const[14]'
      class_ &def(detail::initimpl::factory &&init, const Extra&... extra) {
              ^
  /Users/user/work/ve/bin/../include/site/python3.7/pybind11/pybind11.h:1159:13: note: candidate template ignored: could not match 'pickle_factory' against 'char const[14]'
      class_ &def(detail::initimpl::pickle_factory &&pf, const Extra &...extra) {
              ^
  1 error generated.
  error: command 'clang' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pikepdf
  Running setup.py clean for pikepdf
Failed to build pikepdf
ERROR: Could not build wheels for pikepdf which use PEP 517 and cannot be installed directly

Segmentation fault when using with PyTorch

I installed pikepdf using pip and pytorch-cpu using conda. Importing both the libraries results in a segmentation fault.

>>> import torch
>>> import pikepdf
Segmentation fault (core dumped)

The order doesn't matter

>>> import pikepdf
>>> import torch
Segmentation fault (core dumped)

Using Python 3.6.8 on Ubuntu 18.04

Raised an issue on PyTorch GitHub too at pytorch/pytorch#26092

Rarely crash on some PDF

Hi,

Great library.......I just want to reports some rare crashes (>20/700k PDF), not a big deal. I don't know if it's a bug or exceptions can occur in the extremely damaged cases.

pdf = pikepdf.open(input_file)
File "C:\Users.......\lib\site-packages\pikepdf_init_.py", line 41, in open
return Pdf.open(*args, **kwargs)
pikepdf._qpdf.PdfError: C:/.......\my_pdf.pdf: unable to find trailer dictionary while recovering damaged file

If it helps I can send the PDFs by email. We talk about corrupted PDFs that were generated before 2000 :)

issues installing with pip

Hi all,

I'm trying to install pikepdf on a windows 7 VM with Python 3.6.7. I am getting an error when I run "pip install pikepdf" where it seems to be looking for a file that doesn't exist, qpdf/Constants.h.

Any ideas?

Thanks!!

Collecting pikepdf
  Using cached https://files.pythonhosted.org/packages/ea/8d/01f66685772025ac82c
63c759b032d5df2a0714a7f59292c393a084cb42e/pikepdf-0.3.7.tar.gz
Installing collected packages: pikepdf
  Running setup.py install for pikepdf ... error
    Complete output from command c:\users\w7_py3\appdata\local\programs\python\p
ython36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\w7
_py3\\AppData\\Local\\Temp\\pip-install-rrjwxkig\\pikepdf\\setup.py';f=getattr(t
okenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();e
xec(compile(code, __file__, 'exec'))" install --record C:\Users\w7_py3\AppData\L
ocal\Temp\pip-record-a8_2vko2\install-record.txt --single-version-externally-man
aged --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-3.6
    creating build\lib.win32-3.6\pikepdf
    copying src\pikepdf\objects.py -> build\lib.win32-3.6\pikepdf
    copying src\pikepdf\_cpphelpers.py -> build\lib.win32-3.6\pikepdf
    copying src\pikepdf\_methods.py -> build\lib.win32-3.6\pikepdf
    copying src\pikepdf\__init__.py -> build\lib.win32-3.6\pikepdf
    creating build\lib.win32-3.6\pikepdf\models
    copying src\pikepdf\models\image.py -> build\lib.win32-3.6\pikepdf\models
    copying src\pikepdf\models\matrix.py -> build\lib.win32-3.6\pikepdf\models
    copying src\pikepdf\models\__init__.py -> build\lib.win32-3.6\pikepdf\models

    running build_ext
    building 'pikepdf._qpdf' extension
    creating build\temp.win32-3.6
    creating build\temp.win32-3.6\Release
    creating build\temp.win32-3.6\Release\src
    creating build\temp.win32-3.6\Release\src\qpdf
    C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC
\14.16.27023\bin\HostX86\x86\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\use
rs\w7_py3\appdata\local\programs\python\python36-32\Include -IC:\Users\w7_py3\Ap
pData\Roaming\Python\Python36\Include -Ic:\users\w7_py3\appdata\local\programs\p
ython\python36-32\include -Ic:\users\w7_py3\appdata\local\programs\python\python
36-32\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\
VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXS
DK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.1713
4.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared"
 "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Progra
m Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x
86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /EHsc /Tpsrc/qpdf\object.cpp
/Fobuild\temp.win32-3.6\Release\src/qpdf\object.obj /EHsc /DVERSION_INFO=\"0.3.7
\"
    object.cpp
    src/qpdf\object.cpp(14): fatal error C1083: Cannot open include file: 'qpdf/
Constants.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Buil
dTools\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x86\\cl.exe' failed with exi
t status 2

    ----------------------------------------
Command "c:\users\w7_py3\appdata\local\programs\python\python36-32\python.exe -u
 -c "import setuptools, tokenize;__file__='C:\\Users\\w7_py3\\AppData\\Local\\Te
mp\\pip-install-rrjwxkig\\pikepdf\\setup.py';f=getattr(tokenize, 'open', open)(_
_file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file
__, 'exec'))" install --record C:\Users\w7_py3\AppData\Local\Temp\pip-record-a8_
2vko2\install-record.txt --single-version-externally-managed --compile" failed w
ith error code 1 in C:\Users\w7_py3\AppData\Local\Temp\pip-install-rrjwxkig\pike
pdf\

[Feature Request] - document object parsing

Hi
Parsing a PDF in QDF mode is relatively easy because it's all text e.g. it's easy to identify that this is a data table cell:

112 0 obj
<< /K 72 0 R /P 60 0 R /S /TD
/A 147 0 R>>
endobj

and it's straigtforward to change it to a /TH or to find the /A object to add a /Headers attribute.
My question is: is it possible to do this using pikePDF? could you please provide a code example in the documentation?
Thank you very much!

E lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

I'm seeing some tests fail with this error on Debian unstable:

autopkgtest [11:12:45]: test test-suite: [-----------------------
============================= test session starts ==============================
platform linux -- Python 3.7.2rc1, pytest-3.10.1, py-1.7.0, pluggy-0.8.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/tmp/autopkgtest.l0uCgh/build.Tlo/src/.hypothesis/examples')
rootdir: /tmp/autopkgtest.l0uCgh/build.Tlo/src, inifile: setup.cfg
plugins: xdist-1.24.1, timeout-1.3.3, helpers-namespace-2017.11.11, forked-0.2, hypothesis-3.71.11
collected 151 items

tests/test_dictionary.py ..                                              [  1%]
tests/test_formxobject.py .                                              [  1%]
tests/test_image_access.py .................x.s                          [ 15%]
tests/test_ipython.py ..                                                 [ 16%]
tests/test_metadata.py ....FFFFFFFF.FFFFFFFFFFFF..FF.                    [ 36%]
tests/test_object.py ....................................                [ 60%]
tests/test_pages.py .................                                    [ 71%]
tests/test_parsers.py ..s......                                          [ 77%]
tests/test_pdf.py ...................                                    [ 90%]
tests/test_pdfa.py ss                                                    [ 91%]
tests/test_private_pdfs.py s                                             [ 92%]
tests/test_refcount.py .....                                             [ 95%]
tests/test_sanity.py .......                                             [100%]

=================================== FAILURES ===================================
__________________________ test_add_new_xmp_and_mark ___________________________

trivial = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-trivial.pdf'>

    def test_add_new_xmp_and_mark(trivial):
        with trivial.open_metadata(
            set_pikepdf_as_editor=False, update_docinfo=False
        ) as xmp_view:
>           assert not xmp_view

tests/test_metadata.py:110:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_____________________________ test_update_docinfo ______________________________

vera = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/veraPDF test suite 6-2-10-t02-pass-a.pdf'>

    def test_update_docinfo(vera):
        with vera.open_metadata(set_pikepdf_as_editor=False, update_docinfo=True) as xmp:
>           pass

tests/test_metadata.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename0] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-rgb.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename1] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/cmyk-jpeg.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename2] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename3] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/sandwich.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename4] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pike-jp2.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename5] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/congress.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename7] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/fourpages.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename8] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-trivial.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename9] ___________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/veraPDF test suite 6-2-10-t02-pass-a.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename10] __________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/image-mono-inline.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename11] __________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/formxobject.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename12] __________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/graph.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename13] __________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/congress-gray.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
__________________________ test_roundtrip[filename14] __________________________

filename = PosixPath('/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/veraPDF test suite 6-2-3-3-t01-fail-c.pdf')

    @pytest.mark.parametrize('filename', list((Path(__file__).parent / 'resources').glob('*.pdf')))
    def test_roundtrip(filename):
        try:
            pdf = Pdf.open(filename)
        except PasswordError:
            return
        with pdf.open_metadata() as xmp:
            for k in xmp.keys():
                if not 'Date' in k:
>                   xmp[k] = 'A'

tests/test_metadata.py:148:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_____________________________ test_build_metadata ______________________________

trivial = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-trivial.pdf'>
graph = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/graph.pdf'>
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_build_metadata0')

    def test_build_metadata(trivial, graph, outdir):
        with trivial.open_metadata(set_pikepdf_as_editor=False) as xmp:
>           xmp.load_from_docinfo(graph.docinfo)

tests/test_metadata.py:154:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_________________________ test_python_xmp_validate_add _________________________

trivial = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-trivial.pdf'>

    def test_python_xmp_validate_add(trivial):
        with trivial.open_metadata() as xmp:
            xmp['dc:creator'] = ['Bob', 'Doug']
            xmp['dc:title'] = 'Title'
>           xmp['dc:publisher'] = {'Mackenzie'}

tests/test_metadata.py:171:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_____________________ test_python_xmp_validate_change_list _____________________

graph = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/graph.pdf'>

    def test_python_xmp_validate_change_list(graph):
        with graph.open_metadata() as xmp:
            assert 'dc:creator' in xmp
>           xmp['dc:creator'] = ['Dobby', 'Kreacher']

tests/test_metadata.py:191:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_______________________ test_python_xmp_validate_change ________________________

sandwich = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/sandwich.pdf'>

    def test_python_xmp_validate_change(sandwich):
        with sandwich.open_metadata() as xmp:
            assert 'xmp:CreatorTool' in xmp
            xmp['xmp:CreatorTool'] = 'Creator'  # Exists as a xml tag text
>           xmp['pdf:Producer'] = 'Producer'  # Exists as a tag node

tests/test_metadata.py:207:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
___________________________ test_bad_char_rejection ____________________________

trivial = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/pal-1bit-trivial.pdf'>

    def test_bad_char_rejection(trivial):
        with trivial.open_metadata() as xmp:
            xmp['dc:description'] = 'Bad characters \x00 \x01 \x02'
>           xmp['dc:creator'] = ['\ue001bad', '\ufff0bad']

tests/test_metadata.py:243:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
_________________________________ test_xpacket _________________________________

sandwich = <pikepdf.Pdf description='/tmp/autopkgtest.l0uCgh/build.Tlo/src/tests/resources/sandwich.pdf'>

    def test_xpacket(sandwich):
        xmpstr1 = sandwich.Root.Metadata.read_bytes()
        xpacket_begin = b'<?xpacket begin='
        xpacket_end = b'<?xpacket end='
        assert xmpstr1.startswith(xpacket_begin)

        with sandwich.open_metadata() as xmp:
>           xmp['dc:creator'] = 'Foo'

tests/test_metadata.py:254:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:302: in __exit__
    self._apply_changes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:353: in _apply_changes
    xml = self._get_xml_bytes()
/usr/lib/python3/dist-packages/pikepdf/models/metadata.py:338: in _get_xml_bytes
    self._xmp.write_c14n(data)
src/lxml/etree.pyx:2374: in lxml.etree._ElementTree.write_c14n
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   lxml.etree.C14NError: Relative namespace UR is invalid here : adobe

src/lxml/serializer.pxi:856: C14NError
========= 22 failed, 123 passed, 5 skipped, 1 xfailed in 7.35 seconds ==========
autopkgtest [11:12:54]: test test-suite: -----------------------]
autopkgtest [11:12:54]: test test-suite:  - - - - - - - - - - results - - - - - - - - - -
test-suite           FAIL non-zero exit status 1
autopkgtest [11:12:54]: @@@@@@@@@@@@@@@@@@@@ summary
test-suite           FAIL non-zero exit status 1

E: Autopkgtest run failed.

page.page_contents_add stalls forever

Trying to combine some files and add a watermark to some pages, here is what my code looks like:

    output_stream = io.BytesIO()

    merger = Pdf.new()

    for idx, input_file in enumerate(input_files):
        pdf = Pdf.open(input_file)

        if idx == 0:
            watermark_file = Pdf.open('watermark.pdf')
            watermark = watermark_file.pages[0]

            for pdf_page in pdf.pages:
                pdf_page.page_contents_add(contents=watermark.Contents.read_bytes())

        merger.pages.extend(pdf.pages)
        input_streams.append(f)

    merger.save(output_stream)
    value = output_stream.getvalue()

    for f in input_streams:
        f.close()

    output_stream.close()

    return value

The important part here is pdf_page.page_contents_add(contents=watermark.Contents.read_bytes()), when I hit this code it just stalls indefinitely and I have to shut down and restart my application. I've tried a few variation on this, anyone have any ideas?

segfault on pikepdf.open(bin_stream) for some pdf's

Failed with segfault:

f = open("Accessible EPUB 3.pdf", "rb")
doc = pikepdf.open(f)
for page in doc.pages:
    page.Rotate = 180
doc.save('test-rotated.pdf')

OK:

doc = pikepdf.open("Accessible EPUB 3.pdf")
for page in doc.pages:
    page.Rotate = 180
doc.save('test-rotated.pdf')

Other tests:
binary stream open (first snippet above) on tests/resources/pal-1bit-trivial.pdf -- OK (even though result is just a blank page)
my ebook collection |> "PathLike" open (second) snippet |> binary stream open snippet -- OK
my ebook collection |> binary stream open -- failed
my ebook collection |> various qpdf cli commands -- OK

On another note, the front page mentions "Commercial support is available" but I can't find any info about this anywhere. Do you have a pointer on this?

Thanks

Provenance of new image files

Hello,

Kindly clarify the licensing/copyright status of the following new files:

  • docs/images/pike-cartoon.png
  • docs/images/pikemen.jpg
  • tests/resources/cmyk-jpeg.pdf

Thank you.

linearize

How do I linearize an existing pdf?

install failure on armhf

on a raspberry armv7 cpu
Linux version 4.4.43-v7+ (gcc version 4.9.2 (Raspbian 4.9.2-10) )

pip install pikepdf
Collecting pikepdf
  Using cached https://files.pythonhosted.org/packages/30/bd/7afb368ea0872e64208e9940c729afde2c27eff7ffe1bbb28a64b1bb5340/pikepdf-0.3.2.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-burtbex3/pikepdf/setup.py", line 113, in <module>
        readme = f.read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2736: ordinal not in range(128)

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-burtbex3/pikepdf/

Backport to 2.7.x

I've done a super-quick backport to 2.7 but I haven't exercised very many test cases and I don't want to contribute poor-quality code....

PR of interest?

Segmentation Fault when importing both pikepdf and fastText

This originally happened when I used ocrmypdf, but I dug down a little and found out that this is being caused by pikepdf in conjunction with fastText (https://github.com/facebookresearch/fastText)
I also opened an issue in fastText repo (facebookresearch/fastText#701)

OS: Ubuntu 16.04.4 LTS (4.15.0-42-generic, x86_64 GNU/Linux)
Python version: Python 3.5.2 (default, Nov 12 2018, 13:43:14)
gcc version: gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
pikepdf version: 0.9.1
fastText version: 0.8.22

Steps to reproduce:

  1. Install pikepdf:
sudo pip3 install pikepdf
  1. Install fastText with python bindings:
mkdir /tmp/fasttext/
cd /tmp/fasttext/
git clone https://github.com/facebookresearch/fastText.git
cd fastText
sudo pip3 install .
  1. Import both. To get seg. fault stacktrace, write in a file called test.py this:
import pikepdf
import fastText

and run:

gdb python3

and then:

run test.py

and then:

backtrace

And this is printed out:

(gdb) run test.py
Starting program: /usr/bin/python3 test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3aa4796 in pybind11::detail::make_new_python_type (rec=...) at /usr/local/include/python3.5/pybind11/detail/class.h:568
568	    auto heap_type = (PyHeapTypeObject *) metaclass->tp_alloc(metaclass, 0);
(gdb) backtrace
#0  0x00007ffff3aa4796 in pybind11::detail::make_new_python_type (rec=...) at /usr/local/include/python3.5/pybind11/detail/class.h:568
#1  0x00007ffff3ab034a in pybind11::detail::generic_type::initialize (this=this@entry=0x7fffffffa780, rec=...) at /usr/local/include/python3.5/pybind11/pybind11.h:895
#2  0x00007ffff3ab0a11 in pybind11::class_<fasttext::Args>::class_<>(pybind11::handle, char const*) (this=0x7fffffffa780, scope=..., name=0x7ffff3adad68 "args")
    at /usr/local/include/python3.5/pybind11/pybind11.h:1065
#3  0x00007ffff3a93881 in pybind11_init_fasttext_pybind (m=...) at python/fastText/pybind/fasttext_pybind.cc:53
#4  0x00007ffff3a95e22 in PyInit_fasttext_pybind () at python/fastText/pybind/fasttext_pybind.cc:51
#5  0x000000000061091e in _PyImport_LoadDynamicModuleWithSpec () at ../Python/importdl.c:150
#6  0x0000000000610e08 in _imp_create_dynamic_impl.isra.12 (file=0x0, spec=
    <ModuleSpec(loader_state=None, origin='/usr/local/lib/python3.5/dist-packages/fasttext_pybind.cpython-35m-x86_64-linux-gnu.so', name='fasttext_pybind', submodule_search_locations=None, _set_fileattr=True, loader=<ExtensionFileLoader(path='/usr/local/lib/python3.5/dist-packages/fasttext_pybind.cpython-35m-x86_64-linux-gnu.so', name='fasttext_pybind') at remote 0x7ffff4ba1978>, _cached=None) at remote 0x7ffff4ba1780>) at ../Python/import.c:2027
#7  _imp_create_dynamic () at ../Python/clinic/import.c.h:282
#8  0x00000000004ea1c6 in PyCFunction_Call () at ../Objects/methodobject.c:109
#9  0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffabe8, func=
    <built-in method create_dynamic of module object at remote 0x7ffff7f88728>) at ../Python/ceval.c:5031
#10 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#11 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#12 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffadf0, func=<optimized out>)
    at ../Python/ceval.c:4813
#13 call_function (oparg=<optimized out>, pp_stack=0x7fffffffadf0) at ../Python/ceval.c:4730
#14 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#15 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffaf20, func=<optimized out>)
    at ../Python/ceval.c:4803
#16 call_function (oparg=<optimized out>, pp_stack=0x7fffffffaf20) at ../Python/ceval.c:4730
#17 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#18 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb050, func=<optimized out>)
    at ../Python/ceval.c:4803
#19 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb050) at ../Python/ceval.c:4730
#20 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#21 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb180, func=<optimized out>)
    at ../Python/ceval.c:4803
#22 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb180) at ../Python/ceval.c:4730
#23 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#24 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb2b0, func=<optimized out>)
    at ../Python/ceval.c:4803
#25 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb2b0) at ../Python/ceval.c:4730
#26 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#27 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b420>) at ../Python/ceval.c:4018
#28 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#29 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#30 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#31 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
#32 0x0000000000525c08 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1595
#33 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#34 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
#35 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#36 0x0000000000534870 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4580
#37 0x0000000000539bfb in PyEval_EvalFrameEx () at ../Python/ceval.c:2801
---Type <return> to continue, or q <return> to quit---
#38 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#39 0x00000000005409bf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#40 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#41 0x000000000054a328 in builtin_exec_impl.isra.11 (
    locals={'__builtins__': {'ConnectionRefusedError': <type at remote 0xa137c0>, 'IOError': <type at remote 0xa2fe00>, 'PermissionError': <type at remote 0xa12c60>, 'ArithmeticError': <type at remote 0xa2e700>, 'UserWarning': <type at remote 0xa05f20>, 'float': <type at remote 0xa3f300>, 'globals': <built-in method globals of module object at remote 0x7ffff7fe35e8>, 'ImportWarning': <type at remote 0xa06260>, 'sorted': <built-in method sorted of module object at remote 0x7ffff7fe35e8>, 'locals': <built-in method locals of module object at remote 0x7ffff7fe35e8>, 'repr': <built-in method repr of module object at remote 0x7ffff7fe35e8>, 'UnicodeTranslateError': <type at remote 0xa02700>, 'NotImplemented': <NotImplementedType at remote 0xa40030>, 'max': <built-in method max of module object at remote 0x7ffff7fe35e8>, 'SystemError': <type at remote 0xa2ed00>, 'iter': <built-in method iter of module object at remote 0x7ffff7fe35e8>, 'str': <type at remote 0xa3a440>, 'abs': <built-in method abs of module object at remote 0x7...(truncated), 
    globals={'__builtins__': {'ConnectionRefusedError': <type at remote 0xa137c0>, 'IOError': <type at remote 0xa2fe00>, 'PermissionError': <type at remote 0xa12c60>, 'ArithmeticError': <type at remote 0xa2e700>, 'UserWarning': <type at remote 0xa05f20>, 'float': <type at remote 0xa3f300>, 'globals': <built-in method globals of module object at remote 0x7ffff7fe35e8>, 'ImportWarning': <type at remote 0xa06260>, 'sorted': <built-in method sorted of module object at remote 0x7ffff7fe35e8>, 'locals': <built-in method locals of module object at remote 0x7ffff7fe35e8>, 'repr': <built-in method repr of module object at remote 0x7ffff7fe35e8>, 'UnicodeTranslateError': <type at remote 0xa02700>, 'NotImplemented': <NotImplementedType at remote 0xa40030>, 'max': <built-in method max of module object at remote 0x7ffff7fe35e8>, 'SystemError': <type at remote 0xa2ed00>, 'iter': <built-in method iter of module object at remote 0x7ffff7fe35e8>, 'str': <type at remote 0xa3a440>, 'abs': <built-in method abs of module object at remote 0x7...(truncated), source=<code at remote 0x7ffff4b950c0>) at ../Python/bltinmodule.c:957
#42 builtin_exec.lto_priv () at ../Python/clinic/bltinmodule.c.h:275
#43 0x00000000004ea1c6 in PyCFunction_Call () at ../Objects/methodobject.c:109
#44 0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffbac8, 
    func=<built-in method exec of module object at remote 0x7ffff7fe35e8>) at ../Python/ceval.c:5031
#45 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#46 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#47 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffbcd0, func=<optimized out>)
    at ../Python/ceval.c:4813
#48 call_function (oparg=<optimized out>, pp_stack=0x7fffffffbcd0) at ../Python/ceval.c:4730
#49 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#50 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffbe00, func=<optimized out>)
    at ../Python/ceval.c:4803
#51 call_function (oparg=<optimized out>, pp_stack=0x7fffffffbe00) at ../Python/ceval.c:4730
#52 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#53 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffbf30, func=<optimized out>)
    at ../Python/ceval.c:4803
#54 call_function (oparg=<optimized out>, pp_stack=0x7fffffffbf30) at ../Python/ceval.c:4730
#55 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#56 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffc060, func=<optimized out>)
    at ../Python/ceval.c:4803
#57 call_function (oparg=<optimized out>, pp_stack=0x7fffffffc060) at ../Python/ceval.c:4730
#58 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#59 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b420>) at ../Python/ceval.c:4018
#60 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#61 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#62 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#63 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
#64 0x0000000000525c08 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1595
#65 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#66 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
---Type <return> to continue, or q <return> to quit---
#67 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#68 0x0000000000534870 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4580
#69 0x0000000000539bfb in PyEval_EvalFrameEx () at ../Python/ceval.c:2801
#70 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#71 0x00000000005409bf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#72 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#73 0x000000000054a328 in builtin_exec_impl.isra.11 (
    locals={'__builtins__': {'ConnectionRefusedError': <type at remote 0xa137c0>, 'IOError': <type at remote 0xa2fe00>, 'PermissionError': <type at remote 0xa12c60>, 'ArithmeticError': <type at remote 0xa2e700>, 'UserWarning': <type at remote 0xa05f20>, 'float': <type at remote 0xa3f300>, 'globals': <built-in method globals of module object at remote 0x7ffff7fe35e8>, 'ImportWarning': <type at remote 0xa06260>, 'sorted': <built-in method sorted of module object at remote 0x7ffff7fe35e8>, 'locals': <built-in method locals of module object at remote 0x7ffff7fe35e8>, 'repr': <built-in method repr of module object at remote 0x7ffff7fe35e8>, 'UnicodeTranslateError': <type at remote 0xa02700>, 'NotImplemented': <NotImplementedType at remote 0xa40030>, 'max': <built-in method max of module object at remote 0x7ffff7fe35e8>, 'SystemError': <type at remote 0xa2ed00>, 'iter': <built-in method iter of module object at remote 0x7ffff7fe35e8>, 'str': <type at remote 0xa3a440>, 'abs': <built-in method abs of module object at remote 0x7...(truncated), 
    globals={'__builtins__': {'ConnectionRefusedError': <type at remote 0xa137c0>, 'IOError': <type at remote 0xa2fe00>, 'PermissionError': <type at remote 0xa12c60>, 'ArithmeticError': <type at remote 0xa2e700>, 'UserWarning': <type at remote 0xa05f20>, 'float': <type at remote 0xa3f300>, 'globals': <built-in method globals of module object at remote 0x7ffff7fe35e8>, 'ImportWarning': <type at remote 0xa06260>, 'sorted': <built-in method sorted of module object at remote 0x7ffff7fe35e8>, 'locals': <built-in method locals of module object at remote 0x7ffff7fe35e8>, 'repr': <built-in method repr of module object at remote 0x7ffff7fe35e8>, 'UnicodeTranslateError': <type at remote 0xa02700>, 'NotImplemented': <NotImplementedType at remote 0xa40030>, 'max': <built-in method max of module object at remote 0x7ffff7fe35e8>, 'SystemError': <type at remote 0xa2ed00>, 'iter': <built-in method iter of module object at remote 0x7ffff7fe35e8>, 'str': <type at remote 0xa3a440>, 'abs': <built-in method abs of module object at remote 0x7...(truncated), source=<code at remote 0x7ffff6826780>) at ../Python/bltinmodule.c:957
#74 builtin_exec.lto_priv () at ../Python/clinic/bltinmodule.c.h:275
#75 0x00000000004ea1c6 in PyCFunction_Call () at ../Objects/methodobject.c:109
#76 0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffc878, 
    func=<built-in method exec of module object at remote 0x7ffff7fe35e8>) at ../Python/ceval.c:5031
#77 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#78 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#79 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffca80, func=<optimized out>)
    at ../Python/ceval.c:4813
#80 call_function (oparg=<optimized out>, pp_stack=0x7fffffffca80) at ../Python/ceval.c:4730
#81 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#82 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffcbb0, func=<optimized out>)
    at ../Python/ceval.c:4803
#83 call_function (oparg=<optimized out>, pp_stack=0x7fffffffcbb0) at ../Python/ceval.c:4730
#84 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#85 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffcce0, func=<optimized out>)
    at ../Python/ceval.c:4803
#86 call_function (oparg=<optimized out>, pp_stack=0x7fffffffcce0) at ../Python/ceval.c:4730
#87 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#88 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffce10, func=<optimized out>)
    at ../Python/ceval.c:4803
#89 call_function (oparg=<optimized out>, pp_stack=0x7fffffffce10) at ../Python/ceval.c:4730
#90 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#91 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b420>) at ../Python/ceval.c:4018
#92 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#93 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#94 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#95 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
---Type <return> to continue, or q <return> to quit---
#96 0x0000000000525c08 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1595
#97 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#98 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
#99 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#100 0x0000000000534870 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4580
#101 0x0000000000539bfb in PyEval_EvalFrameEx () at ../Python/ceval.c:2801
#102 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#103 0x00000000005409bf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#104 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#105 0x000000000060cb42 in run_mod () at ../Python/pythonrun.c:976
#106 0x000000000060efea in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#107 0x000000000060f7dc in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396
#108 0x0000000000640256 in run_file (p_cf=0x7fffffffd5c0, filename=0xa751c0 L"test.py", fp=0xb7f730) at ../Modules/main.c:318
#109 Py_Main () at ../Modules/main.c:768
#110 0x00000000004d0001 in main () at ../Programs/python.c:65
#111 0x00007ffff7810830 in __libc_start_main (main=0x4cff20 <main>, argc=2, argv=0x7fffffffd7d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffd7c8) at ../csu/libc-start.c:291
#112 0x00000000005d6999 in _start ()
(gdb) 

If the import is done in the other possible order:

import fastText
import pikepdf

A different stacktrace appears:

(gdb) run test.py 
Starting program: /usr/bin/python3 test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff316c700 (LWP 11412)]
[New Thread 0x7ffff296b700 (LWP 11413)]
[New Thread 0x7fffee16a700 (LWP 11414)]
[New Thread 0x7fffeb969700 (LWP 11415)]
[New Thread 0x7fffe9168700 (LWP 11416)]
[New Thread 0x7fffe6967700 (LWP 11417)]
[New Thread 0x7fffe4166700 (LWP 11418)]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffde7c82c0 in pybind11::detail::make_new_python_type (rec=...) at /opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/detail/class.h:568
568	/opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/detail/class.h: No such file or directory.
(gdb) backtrace
#0  0x00007fffde7c82c0 in pybind11::detail::make_new_python_type (rec=...) at /opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/detail/class.h:568
#1  0x00007fffde7cda38 in pybind11::detail::generic_type::initialize (this=0x7fffffffb000, rec=...)
    at /opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/pybind11.h:895
#2  0x00007fffde81e83a in pybind11::class_<qpdf_object_stream_e>::class_<> (name=0x7fffde823943 "ObjectStreamMode", scope=..., this=0x7fffffffb000)
    at /opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/pybind11.h:1065
#3  pybind11::enum_<qpdf_object_stream_e>::enum_<>(pybind11::handle const&, char const*) (this=0x7fffffffb000, scope=..., name=0x7fffde823943 "ObjectStreamMode")
    at /opt/_internal/cpython-3.5.6/include/python3.5m/pybind11/pybind11.h:1368
#4  0x00007fffde80c9f3 in pybind11_init__qpdf (m=...) at src/qpdf/qpdf.cpp:233
#5  0x00007fffde8106bf in PyInit__qpdf () at src/qpdf/qpdf.cpp:210
#6  0x000000000061091e in _PyImport_LoadDynamicModuleWithSpec () at ../Python/importdl.c:150
#7  0x0000000000610e08 in _imp_create_dynamic_impl.isra.12 (file=0x0, spec=
    <ModuleSpec(loader_state=None, origin='/usr/local/lib/python3.5/dist-packages/pikepdf/_qpdf.cpython-35m-x86_64-linux-gnu.so', submodule_search_locations=None, _set_fileattr=True, name='pikepdf._qpdf', _cached=None, loader=<ExtensionFileLoader(name='pikepdf._qpdf', path='/usr/local/lib/python3.5/dist-packages/pikepdf/_qpdf.cpython-35m-x86_64-linux-gnu.so') at remote 0x7fffdeaea198>) at remote 0x7fffdeaea048>) at ../Python/import.c:2027
#8  _imp_create_dynamic () at ../Python/clinic/import.c.h:282
#9  0x00000000004ea1c6 in PyCFunction_Call () at ../Objects/methodobject.c:109
#10 0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffb2b8, 
    func=<built-in method create_dynamic of module object at remote 0x7ffff7f88728>) at ../Python/ceval.c:5031
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#12 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#13 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb4c0, func=<optimized out>)
    at ../Python/ceval.c:4813
#14 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb4c0) at ../Python/ceval.c:4730
#15 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#16 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb5f0, func=<optimized out>)
    at ../Python/ceval.c:4803
#17 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb5f0) at ../Python/ceval.c:4730
#18 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#19 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb720, func=<optimized out>)
    at ../Python/ceval.c:4803
#20 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb720) at ../Python/ceval.c:4730
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#22 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb850, func=<optimized out>)
    at ../Python/ceval.c:4803
#23 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb850) at ../Python/ceval.c:4730
#24 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#25 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffb980, func=<optimized out>)
    at ../Python/ceval.c:4803
#26 call_function (oparg=<optimized out>, pp_stack=0x7fffffffb980) at ../Python/ceval.c:4730
#27 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#28 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b420>) at ../Python/ceval.c:4018
#29 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#30 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#31 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#32 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
#33 0x0000000000525c08 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1595
#34 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#35 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
---Type <return> to continue, or q <return> to quit---
#36 0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffbe58, 
    func=<built-in method __import__ of module object at remote 0x7ffff7fe35e8>) at ../Python/ceval.c:5031
#37 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#38 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#39 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffc060, func=<optimized out>)
    at ../Python/ceval.c:4813
#40 call_function (oparg=<optimized out>, pp_stack=0x7fffffffc060) at ../Python/ceval.c:4730
#41 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#42 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b5d0>) at ../Python/ceval.c:4018
#43 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#44 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#45 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#46 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
#47 0x0000000000525bd6 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1667
#48 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#49 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
#50 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#51 0x0000000000534870 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4580
#52 0x0000000000539bfb in PyEval_EvalFrameEx () at ../Python/ceval.c:2801
#53 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#54 0x00000000005409bf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#55 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#56 0x000000000054a328 in builtin_exec_impl.isra.11 (
    locals={'DistributionNotFound': <type at remote 0x10856a8>, '__path__': ['/usr/local/lib/python3.5/dist-packages/pikepdf'], '__cached__': '/usr/local/lib/python3.5/dist-packages/pikepdf/__pycache__/__init__.cpython-35.pyc', '__package__': 'pikepdf', '__name__': 'pikepdf', '__spec__': <ModuleSpec(_initializing=True, loader_state=None, origin='/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py', submodule_search_locations=[...], _set_fileattr=True, name='pikepdf', _cached='/usr/local/lib/python3.5/dist-packages/pikepdf/__pycache__/__init__.cpython-35.pyc', loader=<SourceFileLoader(name='pikepdf', path='/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py') at remote 0x7ffff6849278>) at remote 0x7ffff6852668>, '__file__': '/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py', '__builtins__': {'BaseException': <type at remote 0xa35f80>, 'hash': <built-in method hash of module object at remote 0x7ffff7fe35e8>, 'getattr': <built-in method getattr of module object at remote 0x7ffff7fe35e8>, 'hasattr': ...(truncated), 
    globals={'DistributionNotFound': <type at remote 0x10856a8>, '__path__': ['/usr/local/lib/python3.5/dist-packages/pikepdf'], '__cached__': '/usr/local/lib/python3.5/dist-packages/pikepdf/__pycache__/__init__.cpython-35.pyc', '__package__': 'pikepdf', '__name__': 'pikepdf', '__spec__': <ModuleSpec(_initializing=True, loader_state=None, origin='/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py', submodule_search_locations=[...], _set_fileattr=True, name='pikepdf', _cached='/usr/local/lib/python3.5/dist-packages/pikepdf/__pycache__/__init__.cpython-35.pyc', loader=<SourceFileLoader(name='pikepdf', path='/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py') at remote 0x7ffff6849278>) at remote 0x7ffff6852668>, '__file__': '/usr/local/lib/python3.5/dist-packages/pikepdf/__init__.py', '__builtins__': {'BaseException': <type at remote 0xa35f80>, 'hash': <built-in method hash of module object at remote 0x7ffff7fe35e8>, 'getattr': <built-in method getattr of module object at remote 0x7ffff7fe35e8>, 'hasattr': ...(truncated), source=<code at remote 0x7ffff6850b70>) at ../Python/bltinmodule.c:957
#57 builtin_exec.lto_priv () at ../Python/clinic/bltinmodule.c.h:275
#58 0x00000000004ea1c6 in PyCFunction_Call () at ../Objects/methodobject.c:109
#59 0x000000000053d353 in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fffffffc878, 
    func=<built-in method exec of module object at remote 0x7ffff7fe35e8>) at ../Python/ceval.c:5031
#60 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#61 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#62 0x000000000053bc93 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffca80, func=<optimized out>)
    at ../Python/ceval.c:4813
#63 call_function (oparg=<optimized out>, pp_stack=0x7fffffffca80) at ../Python/ceval.c:4730
#64 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#65 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffcbb0, func=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at ../Python/ceval.c:4803
#66 call_function (oparg=<optimized out>, pp_stack=0x7fffffffcbb0) at ../Python/ceval.c:4730
#67 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#68 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffcce0, func=<optimized out>)
    at ../Python/ceval.c:4803
#69 call_function (oparg=<optimized out>, pp_stack=0x7fffffffcce0) at ../Python/ceval.c:4730
#70 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#71 0x000000000053b294 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffce10, func=<optimized out>)
    at ../Python/ceval.c:4803
#72 call_function (oparg=<optimized out>, pp_stack=0x7fffffffce10) at ../Python/ceval.c:4730
#73 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#74 0x0000000000540b0b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, 
    argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff7f8b420>) at ../Python/ceval.c:4018
#75 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#76 0x00000000004ec2e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#77 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#78 0x00000000005c2f1a in _PyObject_CallMethodIdObjArgs () at ../Objects/abstract.c:2423
#79 0x0000000000525c08 in PyImport_ImportModuleLevelObject () at ../Python/import.c:1595
#80 0x0000000000549ae8 in builtin___import__.lto_priv.1901 () at ../Python/bltinmodule.c:213
#81 0x00000000004ea137 in PyCFunction_Call () at ../Objects/methodobject.c:98
#82 0x00000000005c20e7 in PyObject_Call () at ../Objects/abstract.c:2165
#83 0x0000000000534870 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4580
#84 0x0000000000539bfb in PyEval_EvalFrameEx () at ../Python/ceval.c:2801
#85 0x000000000053fc97 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#86 0x00000000005409bf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#87 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#88 0x000000000060cb42 in run_mod () at ../Python/pythonrun.c:976
#89 0x000000000060efea in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#90 0x000000000060f7dc in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396
#91 0x0000000000640256 in run_file (p_cf=0x7fffffffd5c0, filename=0xa751c0 L"test.py", fp=0xb11040) at ../Modules/main.c:318
#92 Py_Main () at ../Modules/main.c:768
#93 0x00000000004d0001 in main () at ../Programs/python.c:65
#94 0x00007ffff7810830 in __libc_start_main (main=0x4cff20 <main>, argc=2, argv=0x7fffffffd7d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffd7c8) at ../csu/libc-start.c:291
#95 0x00000000005d6999 in _start ()
(gdb) 

Any idea or workaround? Thanks in advance

Roadmap to 1.0

  • Deprecated (?) .metadata and add .docinfo and .xmp
  • Support python-xmp-toolkit for metadata access
  • PDF attachments should be removable or attachment API should be marked experimental
  • Add page to Form XObject, for page merging
  • Make pikepdf.String a little more friendly, perhaps, since it's irritating
  • Use .Name more consistently

Probably deferring:

  • Maybe bookmarks?
  • Change _methods.py to do direct subclassing, it's pretty much the same thing

Apache license in licenses/

Hello,

So far as I can tell there is no Apache-licensed code in pikepdf. So it is a bit confusing to have a copy of the Apache license in the licenses/ subdirectory. Maybe it is leftover from something else and could be removed?

Thanks.

[Feature request] JBIG2Decode

Hi,
first of all, thank you for your work!
I'm having some troubles in extracting images from a pdf (scanned by Google), and I suppose the problem arises because of '/Filter': '/JBIG2Decode'
I would be really grateful if you could add this feature, or if you can suggest me how to solve this error.

Thanks!

The XObject appears like that:

{'/BitsPerComponent': 1,
'/ColorSpace': '/DeviceGray',
'/Filter': '/JBIG2Decode',
'/Height': 204,
'/Mask': IndirectObject(4385, 0),
'/Subtype': '/Image',
'/Type': '/XObject',
'/Width': 1034}

And this is the error I get when calling the "as_pil_image()" function:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pikepdf/models/image.py", line 539, in as_pil_image
im = self._extract_transcoded()
File "/usr/local/lib/python3.7/site-packages/pikepdf/models/image.py", line 419, in _extract_transcoded
data = self.read_bytes()
File "/usr/local/lib/python3.7/site-packages/pikepdf/models/image.py", line 517, in read_bytes
return self.obj.read_bytes()
pikepdf._qpdf.PdfError: /Users/aversa/PycharmProjects/C08/pdf_folder/128_Martin_1660_anno.pdf (offset 44470964): getStreamData called on unfilterable stream

the example fails with `del` [resolution needed: add pages.remove(p=page_number)]

The example in the README fails for me:

$ python
Python 3.7.1 (default, Oct 22 2018, 10:41:28) 
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pikepdf
>>> pdf = pikepdf.open('input.pdf')
>>> num_pages = len(pdf.pages)
>>> num_pages
8
>>> del pdf.pages[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __delitem__(): incompatible function arguments. The following argument types are supported:
    1. (self: pikepdf._qpdf.PageList, arg0: int) -> None
    2. (self: pikepdf._qpdf.PageList, arg0: slice) -> None

Invoked with: <pikepdf._qpdf.PageList object at 0x7fd21bb38378>, -1
>>>

Update: it doesn't like the negative indexing. With positive indexes it works:

>>> del pdf.pages[1]
>>> pdf.save('output.pdf')

Edit: one more error:

del pdf.pages.p(page_number)
       ^
SyntaxError: can't delete function call

Bug in version 1.1.0 with Anaconda

Hi,

Since the library has been upgraded to 1.1.0 this bug pop-up and cause a python crash with Anaconda enviroment.

pdf = pikepdf.open("/path/to/file.pdf")
pdf.save("/another/path/to/file.pdf")

could not create 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sqlalchemy\\engine\\__pycache__\\reflection.cpython-37.pyc': PermissionError(13, 'Access denied')

However, the library works perfectly up to version 1.0.5

Thanks

Cannot save pdf with latest version

I just want to report that when you use the pikepdf 1.1.0 and python 3.6 for some reason it will not save pdf using pikepdf.save or Pdf.save commands. When I reverted back to 1.0.5 it works with no problems.

pdf.open_metadata.get('dc:title', '') is unexpectedly None

For any PDF, I would expect pdf.open_metadata.get('dc:title', 'default') to return either the correct title or return "default". I would never expect it to be a None object, because then what's the point of get. For this particular example, I never get the default; I get None:

>>> from urllib.request import urlopen
>>> from io import BytesIO
>>> import pikepdf

>>> url = 'http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/proceedings/p1143.pdf'
>>> pdf_bytes = urlopen(url).read()
>>> pdf = pikepdf.open(BytesIO(pdf_bytes))
>>> print(pdf.open_metadata().get('dc:title', 'default'))
None

Providence of tests/resources/enron*.pdf

Hello,

I can't find where the files matching tests/resources/enron*.pdf came from. The file debian/copyright (in this repository; not Debian's actual copyright file) points to enrondata.org but the download link on that site 404s. Further, that site is about a collection of e-mails, but neither of the enron*.pdf files are e-mail messages.

Unless you've more information, it is probably best for me to filter out these files for Debian and disable the tests. Let me know.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.