Code Monkey home page Code Monkey logo

pdfsizeopt's Introduction

README for pdfsizeopt

pdfsizeopt is a program for converting large PDF files to small ones, without decreasing visual quality or removing interactive features (such as hyperlinks). More specifically, pdfsizeopt is a free, cross-platform command-line application (for Linux, Windows, macOS and Unix) and a collection of best practices to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is written in Python, so it is a bit slow, but it offloads some of the heavy work to its faster (C, C++ and Java) dependencies.

Doesn't pdfsizeopt work with your PDF? Report the issue here: https://github.com/pts/pdfsizeopt/issues

Send donations to the author of pdfsizeopt: https://flattr.com/submit/auto?user_id=pts&url=https://github.com/pts/pdfsizeopt

Getting started: how to run pdfsizeopt

If it is your first time trying pdfizeopt, follow these instructions. (This section was updated on 2023-02-15.)

It's easy to install and run pdfsizeopt on modern Linux and Windows systems with an x86 processor. If you have such a system, jump directly to one of the following sections (Installation instructions and usage on Linux or Installation instructions and usage on Windows). It will take less than 5 minutes.

It's easy to install and run pdfsizeopt on a Mac (both Intel x86 processors and ARM processors with Apple Silicon are supported). If you have such a system, jump directly to the section Installation instructions and usage on macOS (not using Docker). It will take less than 5 minutes.

Alternatively (but not recommended because it's slower), it's possible to run pdfsizeopt within Docker on the following systems: Linux amd64, macOS 64-bit Intel x86 (amd64, x86_64), macOS 64-bit ARM (Apple Silicon, e.g. M1 or M2 chip). After that, jump directly to the section Installation instructions and usage with Docker on Linux and macOS. That last step will take less than 5 minutes.

If you are using an operating system other than Linux, Windows or macOS (on a computer with Intel processor), the easiest way to try pdfsizeopt is borrowing a friend's computer with Linux, Windows or macOS, or renting a Linux VM in the cloud. The reason why it's difficult to run pdfsizeopt on other kinds of systems is because pdfsizeopt has some required dependencies, some of them are old versions (e.g. Python 2.4--2.7, Ghostscript 9.05), so you'll have to compile the right versions of the dependencies first, which may take several hours and lots of frustrating trial-and-error even for experienced hackers.

It's technically possible to port pdfsizeopt to other systems (and make it easy to install), but the author of pdfsizeopt doesn't have the free time to create and maintain such a port. As an FYI, see #154 about porting to Apple Silicon.

Installation instructions and usage on Linux

There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI.

To install pdfsizeopt on a Linux system (with architecture i386 or amd64), open a terminal window and run these commands (without the leading $):

  $ mkdir ~/pdfsizeopt
  $ cd ~/pdfsizeopt
  $ wget -O pdfsizeopt_libexec_linux.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2023-04-18/pdfsizeopt_libexec_linux-v9.tar.gz
  $ tar xzvf pdfsizeopt_libexec_linux.tar.gz
  $ rm -f    pdfsizeopt_libexec_linux.tar.gz
  $ wget -O pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single
  $ chmod +x pdfsizeopt.single
  $ ln -s pdfsizeopt.single pdfsizeopt

To optimize a PDF, run the following command:

  ~/pdfsizeopt/pdfsizeopt input.pdf output.pdf

If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this:

  ~/pdfsizeopt/pdfsizeopt --use-pngout=no input.pdf output.pdf

pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself.

It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line.

Please note that the commands above download all dependencies (including Python and Ghostscript) as well. It's possible to install some of the dependencies with your package manager, but these steps are considered alternative and more complicated, and thus are not covered here.

Please note that pdfsizeopt works perfectly on any x86 and amd64 Linux system. There is no restriction on the libc, Linux distribution etc. because pdfsizeopt uses only its statically linked x86 executables, and it doesn't use any external commands (other than pdfsizeopt, pdfsizeopt.single and pdfsizeopt_libexec/*) on the system. pdfsizeopt also works perfectly on x86 FreeBSD systems with the Linux emulation layer enabled.

To avoid typing ~/pdfsizeopt/pdfsizeopt, add "$HOME/pdfsizeopt" to your PATH (probably in your ~/.bashrc), open a new terminal window, and the command pdfsizeopt will work from any directory.

You can also put pdfsizeopt to a directory other than ~/pdfsizeopt , as you like.

Additionally, you can install some extra image imptimizers (see more in the Image optimizers section below):

  $ cd ~/pdfsizeopt
  $ wget -O pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2017-01-24/pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz
  $ tar xzvf pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz
  $ rm -f    pdfsizeopt_libexec_extraimgopt_linux-v3.tar.gz

Installation instructions and usage on Windows

There is no installer, you need to run some commands in the command line (black Command Prompt window) to download and install. pdfsizeopt is a command-line only application, there is no GUI.

Create folder C:\pdfsizeopt, download https://github.com/pts/pdfsizeopt/releases/download/2023-04-18/pdfsizeopt_win32exec-v9.zip , and extract its contents to the folder C:\pdfsizeopt, so that the file C:\pdfsizeopt\pdfsizeopt.exe exists.

Download https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single and save it to C:\pdfsizeopt, as C:\pdfsizeopt\pdfsizeopt.single .

To optimize a PDF, run the following command:

  C:\pdfsizeopt\pdfsizeopt input.pdf output.pdf

in the command line, which is a black Command Prompt window, you can start it by Start menu / Run / cmd.exe, or finding Command Prompt in the start menu.

(Press Tab to get filename completion while typing.)

Since you have to type the input filename as a full pathname, it's recommended to create a directory with a short name (e.g. C:\pdfs), and copy the input PDF there first.

If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this:

  C:\pdfsizeopt\pdfsizeopt --use-pngout=no input.pdf output.pdf

To avoid typing C:\pdfsizeopt\pdfsizeopt, add C:\pdfsizeopt to (the end of) the system PATH, open a new Command Prompt window, and the command pdfsizeopt will work from any directory.

Depending on your environment, filenames with accented characters may not work in the Windows version of pdfsizeopt. To play it safe, make sure your input and output files have names with letters, numbers, underscore (_), dash (-), dot (.) and plus (+). The backslash () and the slash (/) are both OK as the directory separator.

Spaces in filenames and pathnames should work, but you need to put double quotes (") around the name.

Filenames with some punctuation characters (such as double quote ("), question mark (?) and asterisk ()) and nonprintable characters (such as newline) will not work on Windows. This is because Windows doesn't support these characters ([\x00..\x1f":<>?|\x7f] in filenames at all, and it uses / and \ as directory separator.

You can also put pdfsizeopt to a directory other than C:\pdfsizeopt , but it won't work if there is whitespace or there are accented characters in any of the folder names.

Please note that pdfsizeopt works perfectly in Wine (tested with wine-1.2 on Ubuntu Lucid and wine-1.6.2 on Ubuntu Trusty), but it's a bit slower than running it natively (as a Linux or Unix program).

Installation instructions and usage with Docker on Linux and macOS

These instructions work on the following systems: Linux amd64, macOS 64-bit Intel x86 (amd64, x86_64), macOS 64-bit ARM (Apple Silicon, e.g. M1 or M2 chip). The version of Linux or macOS doesn't matter (old systems such as macOS Leopard 10.5 also work), as long as it has Docker installed and working.

The programs in the Docker image ptspts/pdfsizeopt are compiled for Linux i386 (32-bit Intel x86), and these binaries happen to work in all platforms mentioned above, even with Apple Silicon. (Tested on 2023-02-21.)

There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI.

First, check that you have Docker installed properly by running this command and checking for the OK at the end:

  docker version && echo OK

If you don't get OK, because the `docker' command was not found, then Docker is not installed to your computer. Installation instructions (on 2023-02-22):

  • To install Docker on Linux, you have two options: Docker Engine (https://docs.docker.com/engine/install/ , within the Server section) or Docker Desktop (https://docs.docker.com/desktop/install/linux-install/). Any of them would work.

  • To install Docker on macOS, install Docker Desktop (https://docs.docker.com/desktop/install/mac-install/).

    Then (on macOS), add the docker command to your PATH by running the following command (copy-paste it, don't type, to avoid typos):

      (echo; echo 'export PATH="/Applications/Docker.app/Contents/Resources/bin:$PATH"') >>~/.profile
    

    Then (on macOS), close the Terminal app, and open it again (so that changes to ~/.profile take effect).

  • After the installation, retry the docker version command above.

Remove any previous Docker images of pdfsizeopt:

  docker image rm ptspts/pdfsizeopt

Do a test optimization run, which exercises all dependencies of pdfsizeopt:

  curl -L -o deptest.pdf https://github.com/pts/pdfsizeopt/raw/master/deptest/deptest.pdf
  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt deptest.pdf

If you get a (harmless) warning message like

  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

, and you don't want to get it again, then add --platform linux/amd64 after the -it:

  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it --platform linux/amd64 ptspts/pdfsizeopt pdfsizeopt deptest.pdf

To optimize a PDF, run this command:

  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt input.pdf output.pdf

If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this:

  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt --use-pngout=no input.pdf output.pdf

pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself.

It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line.

To avoid typing a long command, run

  (echo '#! /bin/sh'; echo 'exec docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt pdfsizeopt "$@"') >pdfsizeopt && chmod 755 pdfsizeopt

, and then copy the pdfsizeopt script to your PATH, then open a new terminal window, and now this command will also work to optimize a PDF:

  pdfsizeopt input.pdf output.pdf

Please note that the ptspts/pdfsizeopt Docker image is updated very rarely. To use a more up-to-date version of pdfsizeopt, run these commands to download:

  curl -L -o pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single
  chmod +x pdfsizeopt.single

Then run this command to optimize a PDF:

  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt ./pdfsizeopt.single --use-pngout=no input.pdf output.pdf

If you want to have extra image optimizers included on Linux, use ptspts/pdfsizeopt-with-extraimgopt instead of ptspts/pdfsizeopt in the commands above. Example:

  docker run -v "$PWD:/workdir" -u "$(id -u):$(id -g)" --rm -it ptspts/pdfsizeopt-with-extraimgopt pdfsizeopt --use-image-optimizer=sam2p,jbig2,pngout,zopflipng,optipng,advpng,ECT input.pdf output.pdf

Installation instructions and usage on macOS

These instructions work on Macs with macOS Catalina 10.15 (and even older, maybe macOS Snow Leopard 10.6) -- macOS Ventura 13 (and even newer), having a 64-bit ARM processor (Apple Silicon) or a 64-bit Intel x86 (x86_64, amd64) processor. The programs are compiled for 64-bit Intel x86 processors, and they work on 64-bit ARM processors as well, using the Rosetta 2 emulation in macOS. These instructions were tested and known to work on macOS Ventura 13.3, both with 64-bit Intel x86 (x86_64, amd64) processor and Apple Silicon (ARM processor).

If you have an older Mac running Mac OS X Leopard 10.5 -- macOS Mojave 10.14, follow the section Installation instructions and usage on older macOS instead.

These instructions are not tested yet. See #154 for progress updates.

There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI.

To install pdfsizeopt on a macOS system, open a terminal window and run these commands (without the leading $):

  $ mkdir ~/pdfsizeopt
  $ cd ~/pdfsizeopt
  $ curl -L -o pdfsizeopt_libexec_darwin.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2023-04-18/pdfsizeopt_libexec_darwinc64-v9.tar.gz
  $ tar xzvf pdfsizeopt_libexec_darwin.tar.gz
  $ rm -f    pdfsizeopt_libexec_darwin.tar.gz
  $ curl -L -o pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single
  $ chmod +x pdfsizeopt.single
  $ ln -s pdfsizeopt.single pdfsizeopt

Do a test optimization run, which exercises all dependencies of pdfsizeopt:

  $ curl -L -o deptest.pdf https://github.com/pts/pdfsizeopt/raw/master/deptest/deptest.pdf
  $ ~/pdfsizeopt/pdfsizeopt deptest.pdf

... and open (view) deptest.pdf and the corresponding optimized deptest.pso.pdf .

To optimize a PDF, run the following command:

  ~/pdfsizeopt/pdfsizeopt input.pdf output.pdf

If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this:

  ~/pdfsizeopt/pdfsizeopt --use-pngout=no input.pdf output.pdf

Also, if you have an 32-bit Mac, then the pngout bundled with pdfsizeopt won't work (because it needs a 64-bit Mac), so you have to force --use-pngout=no . See the section Image optimizers for alternatives of pngout.

pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself.

It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line.

Please note that the commands above download most dependencies (including Ghostscript, but excluding Python) as well. Everything should work as instructed above, out of the box. If you are experiencing problems, please report an issue on https://github.com/pts/pdfsizeopt/issues .

To avoid typing ~/pdfsizeopt/pdfsizeopt, add "$HOME/pdfsizeopt" to your PATH (probably in your ~/.bashrc), open a new terminal window, and the command pdfsizeopt will work from any directory.

You can also put pdfsizeopt to a directory other than ~/pdfsizeopt , as you like.

Installation instructions and usage on older macOS

These instructions should work on older Macs running Mac OS X Leopard 10.5 -- macOS Mojave 10.14, and having a 32-bit or 64-bit Intel x86 processor. The programs are compiled for 32-bit Intel x86 (i386) processor (and also work on a 64-bit Intel processor with macOS Mojave 10.14 or earlier), except for the pngout tool, which needs at least Mac OS X Snow Leopard 10.6 and a 64-bit Intel processor.

There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI.

To install pdfsizeopt on an older macOS system, open a terminal window and run these commands (without the leading $):

  $ mkdir ~/pdfsizeopt
  $ cd ~/pdfsizeopt
  $ curl -L -o pdfsizeopt_libexec_darwin.tar.gz https://github.com/pts/pdfsizeopt/releases/download/2023-04-18/pdfsizeopt_libexec_darwin-v9.tar.gz
  $ tar xzvf pdfsizeopt_libexec_darwin.tar.gz
  $ rm -f    pdfsizeopt_libexec_darwin.tar.gz
  $ curl -L -o pdfsizeopt.single https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single
  $ chmod +x pdfsizeopt.single
  $ ln -s pdfsizeopt.single pdfsizeopt

Do a test optimization run, which exercises all dependencies of pdfsizeopt:

  $ curl -L -o deptest.pdf https://github.com/pts/pdfsizeopt/raw/master/deptest/deptest.pdf
  $ ~/pdfsizeopt/pdfsizeopt deptest.pdf

... and open (view) deptest.pdf and the corresponding optimized deptest.pso.pdf .

To optimize a PDF, run the following command:

  ~/pdfsizeopt/pdfsizeopt input.pdf output.pdf

If the input PDF has many images or large images, pdfsizeopt can be very slow. You can speed it up by disabling pngout, the slowest image optimization method, like this:

  ~/pdfsizeopt/pdfsizeopt --use-pngout=no input.pdf output.pdf

Also, if you have a Mac with a 32-bit Intel x86 processor, then the pngout bundled with pdfsizeopt won't work (because it needs a 64-bit processor), so you have to force --use-pngout=no . See the section Image optimizers for alternatives of pngout.

pdfsizeopt creates lots of temporary files (psotmp.*) in the output directory, but it also cleans up after itself.

It's possible to optimize a PDF outside the current directory. To do that, specify the pathname (including the directory name) in the command-line.

Please note that the commands above download most dependencies (including Ghostscript, but excluding Python) as well. Everything should work as instructed above, out of the box. If you are experiencing problems, please report an issue on https://github.com/pts/pdfsizeopt/issues .

To avoid typing ~/pdfsizeopt/pdfsizeopt, add "$HOME/pdfsizeopt" to your PATH (probably in your ~/.bashrc), open a new terminal window, and the command pdfsizeopt will work from any directory.

You can also put pdfsizeopt to a directory other than ~/pdfsizeopt , as you like.

Installation instructions and usage on FreeBSD

There is no installer, you need to run some commands in the command line to download and install. pdfsizeopt is a command-line only application, there is no GUI.

pdfsizeopt works perfectly on x86 FreeBSD systems with the Linux emulation layer enabled. So, enable the Linux emulation layer on your FreeBSD system, and then follow the Installation instructions and usage on Linux.

Alterantively, you can follow the Installation instructions and usage on generic Unix, but that needs much more work on your part (and it's inconvenient and error-prone), because you need to install many dependencies separately, possibly compiling some of them from source.

Installation instructions and usage on generic Unix

Doing this is increasingly hard in 2023, because pdfsizeopt needs Python 2.4--2.7 and Ghostscript 9.05, both very old, and thus hard to install to a modern system.

There is no installer, you need to run some commands in the command line (black Command Prompt window) to download and install. pdfsizeopt is a command-line only application, there is no GUI.

pdfizeopt is a Python script. It works with Python 2.4, 2.5, 2.6 and 2.7 (but it doesn't work with Python 3.x). So please install Python first.

Create a new directory named pdfsizeopt, and download this link there: https://raw.githubusercontent.com/pts/pdfsizeopt/master/pdfsizeopt.single

Rename it to pdfsizeopt and make it executable by running the following commands (without the leading $):

  $ cd pdfsizeopt
  $ mv pdfsizeopt.single pdfsizeopt
  $ chmod +x pdfsizeopt

If your Python executable is not /usr/bin/python, then edit the first line (starting with #!) in the pdfsizeopt script accordingly.

Try it with:

  $ ./pdfsizeopt --version
  info: This is pdfsizeopt ZIP rUNKNOWN size=105366.

pdfsizeopt has many dependencies. For full functionality, you need all of them. Install as many as you can, and put them to the PATH.

Dependencies:

  • Python (command: python). Version 2.4, 2.5, 2.6 and 2.7 work (3.x doesn't work).
  • Ghostscript (command: gs): Version 9.05 is recommended, 8.50 should also work, and some early 9.x versions such as 9.14.1 also work. The most recent versions don't work, especially for font optimization.
  • jbig2 (command: jbig2): Install from source: https://github.com/pts/pdfsizeopt-jbig2 If you are unable to install, use pdfsizeopt --use-jbig2=no .
  • pngout (command: pngout): Download binaries from here: http://www.jonof.id.au/kenutils Source code is not available. If you are unable to install, use pdfsizeopt --use-pngout=no .
  • imgdataopt (command: imgdataopt): Install from source: https://github.com/pts/imgdataopt To make pdfsizeopt able to use it, copy the imgdataopt program file as sam2p (e.g. /usr/local/bin/sam2p) to your PATH. If you are unable to install it, use pdfsizeopt --do-optimize-images=no . Some Linux distributions have sam2p binaries, but they tend to be too old. Alternatively, sam2p >=0.49.3 + png22pnm also works instead of imgdataopt, but imgdataopt is easier to install.
  • The Multivalent PDF compressor (written in Java) is an optional dependency of pdfsizeopt, turned off by default. Don't bother installing it.

After installation, use pdfsizeopt as:

  $ ./pdfsizeopt input.pdf output.pdf

You can add the directory containing pdfsizeopt to the PATH, so the command pdfsizeopt will work from any directory.

Image optimizers

pdfsizeopt can use the following external tools to make images in embedded PDF files smaller:

  • sam2p (used by default, cannot be disabled)
  • jbig2 (used by default, disable with --use-jbgi2=no)
  • pngout (used by default, disable with --use-pngout=no)
  • zopflipng (not enabled by default)
  • optipng (not enabled by default)
  • advpng (not enabled by default)
  • ECT (not enabled by default)

To enable or disable any image optimizer, specify all image optimizers you want to be enabled like this: --use-image-optimizer=optipng,jbig2 . This will also disable the default pngout.

You can also specify custom image optimizer command patterns by specifying separate, additional --use-image-optimier= flags, like this:

  --use-image-optimizer="optipng %(sourcefnq)s -o6 -fix -force %(optipng_gray_flags)s-out %(targetfnq)s"

You always have to specify %(targetfnq) in the command pattern.

Specify --do-debug-image-optimizers=yes to see which image optimizers are enabled (and their full command-line) for the current run.

At startup, pdfsizeopt checks that the requested image optimizers are available (as program files), and fails if some of them are missing. To ignore those which are missing, specify --do-require-image-optimizers=no .

It's your (the user's) responsibility to install the image optimizers and add them to the PATH. If you follow the installation instructions for Windows and Linux above, the default image optimizers (sam2p, jbig2 and pngout) will be installed for you. For Linux, there are also installation instructions above for extra image optimizers (zopflipng, optipng, advpng and ECT).

Troubleshooting

1. pdfsizeopt fails for some fonts.

Specify --do-unify-fonts=no and --do-regenerate-all-fonts=no .

If it still fails, specify --do-optimize-fonts=no .

In either case, please report it on https://github.com/pts/pdfsizeopt/issues

2. pdfsizeopt fails for some images.

Specify --do-optimize-images=no .

Please report it on https://github.com/pts/pdfsizeopt/issues

3. pdfsizeopt is too slow processing images.

Specify --use-pngout=no . This disables pngout, which is the slowest optimization step for images.

4. pdfsizeopt fails without creating the output PDF.

Please report it on https://github.com/pts/pdfsizeopt/issues , attaching the input PDF file and the console output of pdfsizeopt. Your report is very much appreciated.

If pdfsizeopt exits with an uncaught exception, it may leave some temporary files (psotmp.*) behind in the current directory. You can remove these files.

Please note that pdfsizeopt is not resilient in processing corrupt PDF files (i.e. those which are not compliant to the PDF standard). So if pdfsizeopt fails, then the reason may be a bug in pdfsizeopt or a corrupt PDF input file. Nevertheless, please report an issue (see above).

5. The output PDF of pdfsizeopt doesn't look like the same as the input PDF.

Please report it on https://github.com/pts/pdfsizeopt/issues , attaching the input PDF file and the output PDF file (.pso.pdf) and the console output of pdfsizeopt. Your report is very much appreciated.

6. pdfsizeopt is unable to find some input files on Windows.

This may happen if the filename or the full pathname contains any character other than the ASCII letters (a-z and A-Z), digits (0-9), underscore (_), ASCII dash (-), plus (+), dot (.), backslash () or slash (/). Typically these characters don't work:

  • spaces and tabs: This is easy to fix, just wrap the filename in double quotes ("), the usual way.

  • double quotes ("): This can't happen, filenames on Windows are not allowed to contain double quotes. If you need to pass a non-filename argument with a double quote in it to pdfsizeopt, do this. Wrap the argument in double quotes ("), replace all double quotes (") with ", and (in parallel to the previous replacement) replace a sequence backslashes () and an double quote (") immediately following them by duplicating the backslashes and replacing the double quote (") with ". This sounds complicated, but this is the usual way for other programs as well, see https://stackoverflow.com/a/4094897/97248 .

  • newlines and other non-space whitespace: This won't work, the Windows Command Prompt (cmd.exe) doesn't allow these characters in command-line arguments. Also Windows doesn't allow them in filenames.

  • accented characters (such as á and ő). These characters won't work (or it may work for only some characters, depending on the active code page) in the PDF filename specified in the commandline, or in the full pathname of pdfsizeopt (so don't install pdfsizeopt to C:\bőr, it won't work).

    Accented characters (outside the active code page) will not work in the full pathname of pdfsizeopt (such as C:\bőr\pdfsizeopt.exe). That's because Python is unable to call external programs (os.system, os.popen, os.spawnl and subprocess.call) with accented characters in their name, because it uses the single-byte API.

  • anything which is not ASCII printable (code between 33 and 126, inclusive): If not covered above, this may not work. See the description of accented characters.

If some filenames still don't work, the workarounds are:

  • renaming or copying the file (and folders) in Windows Explorer, and passing the renamed file to pdfsizeopt
  • using pdfsizeopt on a Unix system (e.g. Linux, FreeBSD, macOS) instead

Accented characters in PDF filename could be made work the following way (as a future improvement work to pdfsizeopt):

  • pdfsizeopt.exe should call the 16-bit API (GetCommandLineW) instead of the single-byte API (GetCommandLineA) to get the arguments

  • pdfsizeopt.exe should escape the non-ASCII characters in the arguments (e.g. as U+12AB)

  • pdfsizeopt.exe should run pdfsizeopt.single like this:

    .../pdfsizeopt_win32exec/pdfsizeopt_python.exe .../pdfsizeopt.single --args-u+ ...

  • pdfsizeopt Python code should recognize --args-u+, and when finding the filename, it should convert it to unicode (by keeping ASCII except for U+12AB), and it should pass tha unicode-typed value to open(...). Such an open(...) works in Python 2.6 on Windows.

  • When displaying filenames, pdfsizeopt Python code should still display the ASCII with the U+12AB escaping. Thus the win32console module is not needed. Thus filenames will be displayed leglibly but incorrectly (not copy-pasteably) in the Command Prompt window.

  • No escaping is needed in command lines of helper programs (e.g. gs, sam2p), because it's all ASCII, because filenames are autogenerated temporary fil names, which are all ASCII, and path to pdfsizeopt itself is required to the ASCII.

Accented characters in the pathname of pdfsizeopt.single can be made work this way (as a future improvement work to pdfsizeopt):

  • Do the accented characters in the filename above first.

  • pdfsizeopt.exe should use wgetcwd to get the current directory.

  • pdfsizeopt.exe should use wchdir to change to the directory of pdfsizeopt.single .

  • pdfsizeopt.exe should prepend the directories pdfsizeopt_win32exec and pdfsizeopt_win32exec/pdfsizeopt_gswin to the PATH, using wputenv.

  • pdfsizeopt.exe should run pdfsizeopt.single like this:

      pdfsizeopt_python.exe pdfsizeopt.single --args-u+ --cwd=... ...
    

    , where the value of --cwd= is the escaped (U+12AB) version of the result of wgetcwd.

  • pdfsizeopt Python code should prepend the value of --cwd=... to the input filename if it's relative.

  • pdfsizeopt Python code shouldn't modify the PATH if --cwd=... is present. (Does this environment variable propagation work in Python 2.6.? Let's try!)

  • It's still true that no escaping is needed in command lines of external programs (e.g. gs, sam2p), because it's all ASCII, because temporary file names are all ASCII, and path to pdfsizeopt itself is required to the ASCII. Escaping is needed if the pathname of the temporary directory (TEMP variable) needs escaping.

7. Error on Windows: The application failed to initialize properly (0xc0000034). Click on OK to terminate the application.

This error has happened on a Windows XP system. The solution: download msvcr90.dll (or find it somewhere already on your system), and copy it into pdfsizeopt_win32exec (next to python26.dll). Any version of msvcr90.dll will work:

  • msvcr90.dll 9.0.21022.8 (655872 bytes)
  • msvcr90.dll 9.0.30729.6161 (653136 bytes)
  • msvcr90.dll 9.0.30729.9247 (653968 bytes)

8. Error on Windows: The system cannot execute the specified command.

This error has happened on a Windows XP system when the file Microsoft.VC90.CRT.manifest was missing from the pdfsizeopt_win32exec directory. The solution: reinstall pdfsieopt, the directory pdfsizeopt_win32exec in the newest version has that file.

9. Ghostscript errors with Type1CParser and Type1CConverter

Please install pdfsizeopt by following the installation instructions on https://github.com/pts/pdfsizeopt . By doing so, pdfsizeopt will use Ghostscript 9.05 bundled with it, and it will work.

More documentation

pdfsizeopt's People

Contributors

pts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdfsizeopt's Issues

AssertionError: found 1 duplicate font objs in GS output

Dear @pts,

I just discovered another file here that I processed (also) with Acrobat Pro 8 in the past and pdfsizeopt has problems with it:

info: This is pdfsizeopt rUNKNOWN size=378552.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: 10.1.1.121.6177.opt.pdf
info: loaded PDF of 78920 bytes
info: using Ghostscript /usr/bin/gs: GPL Ghostscript 9.21 (2017-03-16)
info: decompressing 48 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 4/Predictor 12>>
info: decompressing 170 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 5/Predictor 12>>
info: found 101 obj offsets and 4 obj streams in xref stream
info: separated to 95 objs + xref + trailer
info: parsed 95 objs
info: found 2 Type1 fonts loaded
info: writing Type1CConverter (6859 font bytes) to: psotmp.22773.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=psotmp.22773.conv.tmp.pdf -f psotmp.22773.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 921 20170316
Type1CConverter: converting font /FKBKGI+mtsy to /Obj0000000057
warning: using glyphshow for unencoded glyph: /.notdef
Type1CConverter: converting font /FKBKGI+rmtmi to /Obj0000000061
warning: using glyphshow for unencoded glyph: /.notdef
Type1CConverter: all OK
info: loading PDF from: psotmp.22773.conv.tmp.pdf
info: loaded PDF of 6523 bytes
info: separated to 23 objs + xref + trailer
info: parsed 23 objs
error: duplicate font /Obj0000000057 obj old=57 new=17
info: found 3 fonts in GS output
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
    sys.exit(main.main(sys.argv, script_dir=script_dir))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 9092, in main
    pdf.ConvertType1FontsToType1C()
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5626, in ConvertType1FontsToType1C
    TMP_PREFIX + 'conv.tmp.ps', TMP_PREFIX + 'conv.tmp.pdf')
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5452, in GenerateType1CFontsFromType1
    do_obj_num_from_font_name=True, where='in GS output')
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5367, in GetFonts
    'found %d duplicate font objs %s' % (duplicate_count, where))
AssertionError: found 1 duplicate font objs in GS output

I'm attaching here the corresponding file:

10.1.1.121.6177.opt.pdf

Oh, just like the previous issue, which I filed a few minutes ago, the problem can be worked-around if I specify --do-optimize-fonts=no.

Thanks,

Rogério.

/First and obj offsets 1 too high in generated /Type /ObjStdm

I ran pdfsizeopt (commit 02450cf) on a PDF file created by pdfTeX and both atril (a poppler-based PDF viewer) and diffpdf (also poppler-based) refuse to open the resulting file.

I'm uploading both the file before and after processing with: ~/Downloads/pdfsizeopt/pdfsizeopt --use-pngout=no --use-multivalent=no E2000.pdf

E2000.pdf
E2000.pso.pdf

mupdf and gv (via ghostscript) can show the resulting file.

If there is anything else that I can provide, please let me know.

add lossless optimizations for JPEG images embedded into the PDF

  1. EXIF tags etc. can be removed by Python code, see Removing JPEG metadata (e.g. comments, JFIF, Exif etc.) manually in info.txt for details. Don't use jpegtran -copy none, because it would keep some unnecessary metadata.

  2. Smaller Huffman tables can be generated. jpegtran -optimize can do this. mozjpeg's jpegtran doesn't create any smaller files (this is not true, double check the size difference). This is equivalent to the lossless mode of jpegoptim and imgopt.

  3. This is only true for PDF 1.2 and earlier: jpgcrush and jpegrescan cannot be used, because they create progressive JPEG output, which PDF doesn't support. FYI If mozjpeg's jpegtran is used, then it should be invoke with -revert, otherwise it enables -progressive by default.

  4. For research, try jpgcrush, jpegrescan and mozjpeg's jpegtran -optimize (all of these create progressive JPEG). Chances are thet mozjpeg always produces the smallest output, so the other two don't have to invoked by pdfsizeopt. Also upgrade the PDF version number to at least 1.3.

Resuming work on pdfsizeopt?

Hi, @pts.

I see that the repository on google was closed. That being said, I thought that, perhaps, one might want to resume work on pdfsizeopt.

I would like to contribute, but I would also like to use some features that are only present in newer versions of Python. What should I do?

Thanks,
Rogério Brito.

Missing image with /Filter/JBIG2Decode

I reduced a file that generates a blank page (but it has OCR'ed text) to what I believe is the bare minimum and, since I belive this is a different problem than reported before, I am attaching it.

I have other, unrelated documents which seem to have the same behavior.

I am sorry if this is a duplicate of anything that I sent you earlier.

p467.pdf

requirements.txt

What packages do you need to import to use this? Please provide a requirements.txt

Problem parsing PDF file

Dear @pts,

I found a file that seems to give pdfsizeopt some headaches when parsing it. If I preprocess the file with ghostscript, qpdf or with pdftk, then pdfsizeopt can optimize it. Otherwise, it crashes with an assertion error.

Since the file is most likely "sensitive", I am sending you a link instead, but opening this issue, so that you can better organize yourself with the issue tracker.

Thanks,

Rogério.

improve Flate compression with Zopfli, ECT and advzip

This doesn't apply to images embedded to the PDF, but all calls to zlib.compress in main.py. Some of the Flate compressors are very slow, so we probably need some caching which persists across invocations of pdfsizeopt.

Missing or black images in optimized PDFs

Hi.

I have many files (I can send them privately) that show two (related/unrelated?) problems:

  • the images go completely missing
  • the images get black

Here is a link to one file that shows the problem: http://cs.stanford.edu/people/nick/compdocs/Practical_HI_Examples.pdf

For an second PDF file, I inserted some asserts and saw that the output below shows that the PS file (and the consequent PNG file) that is generated by ghostscript is already black:

info: This is pdfsizeopt rUNKNOWN size=349935.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: parallel-sieve-of-erathostenes.pdf
info: loaded PDF of 241349 bytes
info: separated to 95 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 58; orig width=312 height=237 colorspace=/Indexed/DeviceGray bpc=8 filter=None dp=0 size=74362 gs_device=pnggray
info: optimizing 1 images of 74362 bytes in total
info: writing ImageRenderer (74387 image bytes) to: psotmp.9124.conv.pnggray.tmp.ps
info: using Ghostscript gs: GPL Ghostscript 9.21 (2017-03-16)
info: executing ImageRenderer with Ghostscript: gs -q -P- -dNOPAUSE -dBATCH -sDEVICE=pnggray -sOutputFile='psotmp.9124.img-%04d.pnggray.tmp.png' -f psotmp.9124.conv.pnggray.tmp.ps
ImageRenderer: rendering image XObject 58 width=312 height=237 bpc=8 colorspace=[/Indexed /DeviceGray] filter=null decodeparms=null device=pnggray
ImageRenderer: all OK
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 37, in <module>
    sys.exit(main.main(sys.argv))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8609, in main
    pdf.OptimizeImages(use_pngout=use_pngout, use_optipng=use_optipng, use_jbig2=use_jbig2)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 6710, in OptimizeImages
    assert False
AssertionError

(This is a patched version to use optipng, but I am not using it here).

I'm attaching the PDF, and the respective PS and PNG files.

parallel-sieve-of-erathostenes.pdf
psotmp.9124.conv.pnggray.tmp.ps.gz
psotmp 9124 img-0001 pnggray tmp

Thanks in advance,

Rogério Brito.

Problem when calling Multivalent on files

It seems that calling Multivalent got broken sometime in the last few days.

If I use the same p040.pdf that I uploaded to the last issue, but now enabling Multivalent, then I get the following error:

(...)
info: loading image from: psotmp.19025.img-12.sam2p-np.pdf
info: loading PDF from: psotmp.19025.img-12.sam2p-np.pdf
info: loaded PDF of 27097 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: loaded PNG IDAT of 26018 bytes and PLTE of 171 bytes
info: executing image converter sam2p_pr: sam2p -c zip:15:9 -- psotmp.19025.img-12.parse.png psotmp.19025.img-12.sam2p-pr.png
This is sam2p .
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.19025.img-12.parse.png
sam2p: Notice: applyProfile: applied OutputRule #12
sam2p: Notice: job: written OutputFile: psotmp.19025.img-12.sam2p-pr.png
Success.
info: loading image from: psotmp.19025.img-12.sam2p-pr.png
info: loaded PNG IDAT of 24173 bytes and PLTE of 171 bytes
info: optimized image XObject 12 file_name=psotmp.19025.img-12.sam2p-pr.png size=24569 (60%) methods=sam2p_pr:24569,sam2p_np:26372,#orig:40949,parse:40949
info: saved 16380 bytes (40%) on optimizable images
info: optimized 5 streams, kept 5 #orig
info: eliminated 3 unused objs in 3 classes
info: saving PDF with 14 objs with Multivalent to: p040.psom.pdf
info: writing Multivalent input PDF: psotmp.19025.conv.mi.tmp.pdf
info: generated object stream of 1097 bytes in 8 objects (23%)
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
    sys.exit(main.main(sys.argv, script_dir=script_dir))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8941, in main
    not f.do_decompress_most_streams))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8495, in Save
    multivalent_compress_command=multivalent_compress_command)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8382, in _RunMultivalent
    may_obj_heads_contain_comments=may_obj_heads_contain_comments)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5017, in AppendSerializedPdf
    pdf_obj.Set('Subtype', 'ImagE')
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 1478, in Set
    value = self.ParseSimpleValue(value)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 1807, in ParseSimpleValue
    raise PdfTokenParseError('Syntax error in %r' % data)
pdfsizeopt.main.PdfTokenParseError: Syntax error in 'ImagE'

The command line in use here was:

~/Downloads/pdfsizeopt/pdfsizeopt --use-pngout=no --use-multivalent=yes p040.pdf

For convenience, here goes p040.pdf again.

p040.pdf

Thanks a lot for the work on pdfsizeopt, BTW.

/Filter/Fl

pdfizeopt doesn't recognize /Filter/Fl in <</Filter/Fl/First 24/Length 102/N 4/Type/ObjStm>>stream. It should recognize it as /Filter/FlateDecode. Also /Filter/A85 ahould be recognized ad /Filter/ASCII85Decode.

There are some PDFs in the wild which have abbreviated filter names. They are not valid PDF, so pdfsizeopt should unabbreviate the filter names.

% Contrary to the published PDF (1.3) specification, Acrobat Reader
% accepts abbreviated filter names everywhere, not just for in-line images,
% and some applications (notably htmldoc) rely on this.
/unabbrevfilterdict mark
  /AHx /ASCIIHexDecode  /A85 /ASCII85Decode  /CCF /CCITTFaxDecode
  /DCT /DCTDecode  /Fl /FlateDecode  /LZW /LZWDecode  /RL /RunLengthDecode

pdfsizeopt can't find Ghostscript if the PDF is specified as an UNC path on Windows

I'm working with Emacs under Windows 8.1 in a network. The call of pdfsizeopt from inside Emacs is something like:
pdfsizeopt.exe XXX.pdf XXX.pdf
but the filename plus path is
//Sbs2011/path/to/file/XXX.pdf
Under Emacs this results in the error that pdfsizeopt is not able to find a working Ghostscript, like I described here: #15

I asked for help on the AUCTeX-Mailinglist today and after bisecting my whole .emacs, I finally discovered that it depends from the path! If the file resides in an UNC-path, Emacs runs pdfsizeopt, but pdfsizeopt hangs when trying to locate Ghostscript!

Is this even a bug? And if, of which software? Windows, Emacs, pdfsizeopt?

More blank images when processing PDF files

Hi, @pts,

It seems that pdfsizeopt has problems processing at least one file that I just bought from O'Reilly promotion with humblebundle: https://www.humblebundle.com/books/data-science-books

The page in question is the following:
p040.pdf

I'm using pdfsizeopt --use-pngout=no --use-multivalent=no p040.pdf for compression, BTW.

I will be without Internet access during the next week, but I can try to provide any extra information that you may need.

I have other files that currently present problems with pdfsizeopt, but I will send them by alternate means.

on Windows: ValueError: popen() arg 3 must be -1

Hello,
I have a problem trying to compress a pdf file created with PDFSharp.

These are the console messages:

info: This is pdfsizeopt ZIP rUNKNOWN size=111340.
info: prepending to PATH: C:\pdfsizeopt\pdfsizeopt_win32exec
info: loading PDF from: C:\pdfsizeopt\pdf_in\_TRASF_20170502\3489189.pdf
info: loaded PDF of 90961 bytes
info: found 25 obj offsets and 1 obj streams in xref stream
info: separated to 23 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 6; orig width=1670 height=2325 colorspace=/Dev
iceGray bpc=1 filter=[/JBIG2Decode] dp=0 size=29101 gs_device=pngmono
info: will optimize image XObject 11; orig width=1671 height=2335 colorspace=/De
viceGray bpc=1 filter=[/JBIG2Decode] dp=0 size=8876 gs_device=pngmono
info: will optimize image XObject 16; orig width=1671 height=2327 colorspace=/De
viceGray bpc=1 filter=[/JBIG2Decode] dp=0 size=28280 gs_device=pngmono
info: will optimize image XObject 21; orig width=1671 height=2338 colorspace=/De
viceGray bpc=1 filter=[/JBIG2Decode] dp=0 size=22849 gs_device=pngmono
info: optimizing 4 images of 89106 bytes in total
info: writing ImageRenderer (88949 image bytes) to: C:\pdfsizeopt\pdf_out\psotmp
.5608.conv.pngmono.tmp.ps
info: using Ghostscript "C:\pdfsizeopt\pdfsizeopt_win32exec\pdfsizeopt_gswin\gsw
in32c.exe": GPL Ghostscript 9.21 (2017-03-16)
info: executing ImageRenderer with Ghostscript: "C:\pdfsizeopt\pdfsizeopt_win32e
xec\pdfsizeopt_gswin\gswin32c.exe" -q -dNOPAUSE -dBATCH -sDEVICE=pngmono -sOutpu
tFile=C:\pdfsizeopt\pdf_out\psotmp.5608.img-%04d.pngmono.tmp.png -f C:\pdfsizeop
t\pdf_out\psotmp.5608.conv.pngmono.tmp.ps
Traceback (most recent call last):
  File "C:\pdfsizeopt\pdfsizeopt_win32exec\python26.zip\runpy.py", line 122, in
_run_module_as_main
  File "C:\pdfsizeopt\pdfsizeopt_win32exec\python26.zip\runpy.py", line 34, in _
run_code
  File "C:\pdfsizeopt\pdfsizeopt.single\__main__.py", line 1, in <module>
  File "C:\pdfsizeopt\pdfsizeopt.single\mainrun.py", line 10, in <module>
  File "C:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 8597, in main
  File "C:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 6726, in Optim
izeImages
File "C:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 6431, in Rende
rImages
ValueError: popen() arg 3 must be -1

I don't have problems with pdfs created by other applications.
Could you someone help me please?
Thank you!

missing glyph named /pedal.*

[5a1e317a]
[pts/pdfsizeopt-jbig2@26cade6b]
[pts/sam2p@c5c3b1cb]
[pts/tif22pnm@8abf77d7]
[pngout Mar 19 2015]
[Multivalent.jar downloaded April 2010]
[gs 9.20]

This large PDF file, generated by a recent lilypond developer version, seems to convert fine (BTW, this works much better than testing pdfsizeopt a few years ago, which then created far more artifacts), except for one font glitch. After conversion with

pdfsizeopt.py --use-pngout=true --use-jbig2=true --use-multivalent=true notation.pdf

there are square boxes instead of asterisks for the pedal end markers, cf. logical pages 13 and 680 of the document (= numbered pages 1 and 668).

sam2p deprecated on Ubuntu

There are no sam2p on Ubuntu 16.10.

This may be good to use an other software are replacement or to have it packed inside pdfsizeopt.

As I saw no installation instruction, I downloaded this repo and start the software as:

./pdfsizeopt mypdf.pdf

KeyError in old pdfsizeopt with encrypted PDF

Hi, @pts.

While using the pdfsizeopt revision 339f049 (and previous ones as well), I get the following crash with a file that was (supposedly, as per pdfinfo) created with pdfTeX:

info: This is pdfsizeopt rUNKNOWN size=349896.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: futex.pdf
info: loaded PDF of 244437 bytes
info: found 220 obj offsets and 2 obj streams in xref stream
info: separated to 218 objs + xref + trailer
warning: cannot parse obj 95: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 108 0 R at ofs=18098
warning: cannot parse obj 110: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 111 0 R at ofs=18846
warning: cannot parse obj 112: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 121 0 R at ofs=26599
warning: cannot parse obj 123: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 124 0 R at ofs=27359
warning: cannot parse obj 126: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 139 0 R at ofs=34875
warning: cannot parse obj 141: pdfsizeopt.main.PdfIndirectLengthError: missing obj for indirect /Length 142 0 R at ofs=35567
info: found 11 Type1 fonts loaded

Please, note the warnings above.

(...)

info: optimized Type1 font XObject 206,205: new size=6118 (40%)
info: optimized Type1 font XObject 208,207: new size=8523 (43%)
info: optimized Type1 font XObject 210,209: new size=4973 (37%)
info: optimized Type1 font XObject 212,211: new size=1889 (29%)
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 37, in <module>
    sys.exit(main.main(sys.argv))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8603, in main
    do_regenerate_all_fonts=do_regenerate_all_fonts)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 6014, in OptimizeType1CFonts
    type1c_objs = self.GetFonts(font_type='Type1C')
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 4811, in GetFonts
    font_obj = self.objs[font_obj_num]
KeyError: 110
$

The problematic file can be found at: https://www.akkadia.org/drepper/futex.pdf

I don't know if this is a compliant file to the standard or not, though.

Thanks,

Rogério.

a blank image because its /SMask image doesn't have /ColorSpace /DeviceGray

(The image in p058.pso.pdf is blank only in Evince. It appears in Chrome 61 and Ghostscript.)

I just discovered another file that, after being processed with pdfsizeopt, gives a blank image where it is not supposed to. Since I don't know if this is the same bug as the one I reported previously, I am attaching it here.

p058.pdf

If I interrupt the script with an assert False right before the comment # Optimize images. around line 7044 of main.py, the PNG file that I get is, indeed, a blank one, which I attach here for your reference:

psotmp 10341 img-21 parse

This is from a book from O'Reilly. I can try to find an URL for this and, if I am successful, I can post it here. Otherwise, I can send you a copy of this, if you so wish.

ValueError: Char not allowed in PostScript name: '{'

Hi, @pts.

Here is crash with a potentially non-compliant file, also generated by pdfTeX:

(...)

Type1CParser: using interpreter GPL Ghostscript 921 20170316
Type1CParser: all OK
info: parsed 18 Type1C fonts
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 37, in <module>
    sys.exit(main.main(sys.argv))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8603, in main
    do_regenerate_all_fonts=do_regenerate_all_fonts)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 6099, in OptimizeType1CFonts
    do_keep_font_optionals=do_keep_font_optionals)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5982, in _ProcessType1CFonts
    do_double_check_type1c_output=do_double_check_type1c_output)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5648, in SerializeType1CFonts
    AppendSerializedPs(parsed_fonts[obj_num], output)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5623, in AppendSerializedPs
    AppendSerializedPs(value[item], output)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5635, in AppendSerializedPs
    value = pdf_to_ps_name(value)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 1830, in PdfToPsName
    'Char not allowed in PostScript name: %r' % match.group())
ValueError: Char not allowed in PostScript name: '{'

I don't remember where I got the file, but as I don't believe that the file is that secret, I am attaching it here. I would give it a low priority, though...

Thanks,

Rogério.
dft-fft.pdf

Type1CConverter fails for PFB fonts

Dear @pts,

I have one file here that I processed many years ago with Adobe's Acrobat 8 Pro and pdfsizeopt is having some difficulties to process it, because there is some problem with Ghostscript processing fonts:

(...)
info: loaded PDF of 2872506 bytes
info: using Ghostscript /usr/bin/gs: GPL Ghostscript 9.21 (2017-03-16)
info: decompressing 52 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 4/Predictor 12>>
info: decompressing 3802 bytes with Ghostscript /Filter/FlateDecode/DecodeParms <</Columns 5/Predictor 12>>
info: found 8395 obj offsets and 361 obj streams in xref stream
info: separated to 8032 objs + xref + trailer
info: parsed 8032 objs
info: found 3 Type1 fonts loaded
info: writing Type1CConverter (35528 font bytes) to: psotmp.22395.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=psotmp.22395.conv.tmp.pdf -f psotmp.22395.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 921 20170316
Error: /syntaxerror in (bin obj seq, type=128, elements=1, size=59650, non-zero unused field)
Operand stack:
   --nostringval--   --nostringval--   --nostringval--
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1999   1   3   %oparray_pop   1998   1   3   %oparray_pop   1982   1   3   %oparray_pop   1868   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1212/1684(ro)(G)--   --dict:0/20(G)--   --dict:99/200(L)--   --dict:0/9(L)--
Current allocation mode is local
Last OS error: No such file or directory
Current file position is 31090
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
info: Type1CConverter failed, status=0x100
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
    sys.exit(main.main(sys.argv, script_dir=script_dir))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 9092, in main
    pdf.ConvertType1FontsToType1C()
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5626, in ConvertType1FontsToType1C
    TMP_PREFIX + 'conv.tmp.ps', TMP_PREFIX + 'conv.tmp.pdf')
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 5442, in GenerateType1CFontsFromType1
    assert False, 'Type1CConverter failed (status)'
AssertionError: Type1CConverter failed (status)

I'm using an up-to-date git clone. I am sending you a link to the problematic file privately.

Thanks in advance,

Rogério.

Type1CConverter and Type1CGenerator with fonts with more than 256 glyphs

Hi,
I wrote a small batch-script to open pdfsizeopt. Unfortunately, i only got those 2 files:
psotmp.9664.conv.tmp.pdf
psotmp.9664.conv.tmp.ps
Here is the logfile:

B:\>B:\pdfsizeopt\pdfsizeopt B:\pdfs\input.pdf B:\pdfs\output.pdf  
Type1CConverter: using interpreter GPL Ghostscript 902 20110330
Type1CConverter: converting font /FFOFLU+TimesNewRomanPSMT to /Obj0000000196
Type1CConverter: converting font /JUQGEC+LMRoman10-Regular to /Obj0000006127
warning: using glyphshow for unencoded glyph: /.notdef
warning: using glyphshow for unencoded glyph: /Tdieresis
warning: using glyphshow for unencoded glyph: /Tdotbelow
warning: using glyphshow for unencoded glyph: /Theta
warning: using glyphshow for unencoded glyph: /Thorn
warning: using glyphshow for unencoded glyph: /Tilde
warning: using glyphshow for unencoded glyph: /Tildecomb
warning: using glyphshow for unencoded glyph: /Tlinebelow
warning: using glyphshow for unencoded glyph: /Ttilde
warning: using glyphshow for unencoded glyph: /U
warning: using glyphshow for unencoded glyph: /Uacute
warning: using glyphshow for unencoded glyph: /Ubreve
warning: using glyphshow for unencoded glyph: /Ubrevebelowinverted
warning: using glyphshow for unencoded glyph: /Ucaron
warning: using glyphshow for unencoded glyph: /Ucircumflex
warning: using glyphshow for unencoded glyph: /Udblacute
warning: using glyphshow for unencoded glyph: /Udblgrave
warning: using glyphshow for unencoded glyph: /Udieresis
warning: using glyphshow for unencoded glyph: /Udieresisacute
warning: using glyphshow for unencoded glyph: /Udieresiscaron
warning: using glyphshow for unencoded glyph: /Udieresisgrave
warning: using glyphshow for unencoded glyph: /Udotbelow
warning: using glyphshow for unencoded glyph: /Ugrave
warning: using glyphshow for unencoded glyph: /Uhookabove
warning: using glyphshow for unencoded glyph: /Uhorn
warning: using glyphshow for unencoded glyph: /Uhornacute
warning: using glyphshow for unencoded glyph: /Uhorndotbelow
warning: using glyphshow for unencoded glyph: /Uhorngrave
warning: using glyphshow for unencoded glyph: /Uhornhookabove
warning: using glyphshow for unencoded glyph: /Uhorntilde
warning: using glyphshow for unencoded glyph: /Uhungarumlaut
warning: using glyphshow for unencoded glyph: /Umacron
warning: using glyphshow for unencoded glyph: /Uogonek
warning: using glyphshow for unencoded glyph: /Upsilon
warning: using glyphshow for unencoded glyph: /Uring
warning: using glyphshow for unencoded glyph: /Utilde
warning: using glyphshow for unencoded glyph: /V
warning: using glyphshow for unencoded glyph: /W
warning: using glyphshow for unencoded glyph: /Wacute
warning: using glyphshow for unencoded glyph: /Wcircumflex
warning: using glyphshow for unencoded glyph: /Wdieresis
warning: using glyphshow for unencoded glyph: /Wgrave
warning: using glyphshow for unencoded glyph: /X
warning: using glyphshow for unencoded glyph: /Xi
warning: using glyphshow for unencoded glyph: /Y
warning: using glyphshow for unencoded glyph: /Yacute
warning: using glyphshow for unencoded glyph: /Ycircumflex
warning: using glyphshow for unencoded glyph: /Ydieresis
warning: using glyphshow for unencoded glyph: /Ydotbelow
warning: using glyphshow for unencoded glyph: /Ygrave
warning: using glyphshow for unencoded glyph: /Yhookabove
warning: using glyphshow for unencoded glyph: /Ytilde
warning: using glyphshow for unencoded glyph: /Z
warning: using glyphshow for unencoded glyph: /Zacute
warning: using glyphshow for unencoded glyph: /Zcaron
warning: using glyphshow for unencoded glyph: /Zdotaccent
warning: using glyphshow for unencoded glyph: /Zdotbelow
warning: using glyphshow for unencoded glyph: /Zeta
warning: using glyphshow for unencoded glyph: /aacute
warning: using glyphshow for unencoded glyph: /abreve
warning: using glyphshow for unencoded glyph: /abreveacute
warning: using glyphshow for unencoded glyph: /abrevedotbelow
warning: using glyphshow for unencoded glyph: /abrevegrave
warning: using glyphshow for unencoded glyph: /abrevehookabove
warning: using glyphshow for unencoded glyph: /abrevetilde
warning: using glyphshow for unencoded glyph: /acaron
warning: using glyphshow for unencoded glyph: /acircumflex
warning: using glyphshow for unencoded glyph: /acircumflexacute
warning: using glyphshow for unencoded glyph: /acircumflexdotbelow
warning: using glyphshow for unencoded glyph: /acircumflexgrave
warning: using glyphshow for unencoded glyph: /acircumflexhookabove
warning: using glyphshow for unencoded glyph: /acircumflextilde
warning: using glyphshow for unencoded glyph: /acute
warning: using glyphshow for unencoded glyph: /acute.dup
warning: using glyphshow for unencoded glyph: /acute.ts1
warning: using glyphshow for unencoded glyph: /acutecomb
warning: using glyphshow for unencoded glyph: /adblgrave
warning: using glyphshow for unencoded glyph: /adieresis
warning: using glyphshow for unencoded glyph: /adotbelow
warning: using glyphshow for unencoded glyph: /ae
warning: using glyphshow for unencoded glyph: /ae.dup
warning: using glyphshow for unencoded glyph: /aeacute
warning: using glyphshow for unencoded glyph: /agrave
warning: using glyphshow for unencoded glyph: /ahookabove
warning: using glyphshow for unencoded glyph: /amacron
warning: using glyphshow for unencoded glyph: /ampersand
warning: using glyphshow for unencoded glyph: /anglearc
warning: using glyphshow for unencoded glyph: /angleleft
warning: using glyphshow for unencoded glyph: /angleright
warning: using glyphshow for unencoded glyph: /aogonek
warning: using glyphshow for unencoded glyph: /aogonekacute
warning: using glyphshow for unencoded glyph: /aring
warning: using glyphshow for unencoded glyph: /aringacute
warning: using glyphshow for unencoded glyph: /arrowdown
warning: using glyphshow for unencoded glyph: /arrowleft
warning: using glyphshow for unencoded glyph: /arrowright
warning: using glyphshow for unencoded glyph: /arrowup
warning: using glyphshow for unencoded glyph: /asciicircum
warning: using glyphshow for unencoded glyph: /asciitilde
warning: using glyphshow for unencoded glyph: /asterisk
warning: using glyphshow for unencoded glyph: /asteriskmath
warning: using glyphshow for unencoded glyph: /at
warning: using glyphshow for unencoded glyph: /atilde
warning: using glyphshow for unencoded glyph: /backslash
warning: using glyphshow for unencoded glyph: /baht
warning: using glyphshow for unencoded glyph: /bar
warning: using glyphshow for unencoded glyph: /bigcircle
warning: using glyphshow for unencoded glyph: /blanksymbol
warning: using glyphshow for unencoded glyph: /born
warning: using glyphshow for unencoded glyph: /braceleft
warning: using glyphshow for unencoded glyph: /braceright
warning: using glyphshow for unencoded glyph: /bracketleft
warning: using glyphshow for unencoded glyph: /bracketright
warning: using glyphshow for unencoded glyph: /breve
warning: using glyphshow for unencoded glyph: /breve.ts1
warning: using glyphshow for unencoded glyph: /breveacute
warning: using glyphshow for unencoded glyph: /brevebelow
warning: using glyphshow for unencoded glyph: /brevebelowcomb
warning: using glyphshow for unencoded glyph: /brevebelowinverted
warning: using glyphshow for unencoded glyph: /brevebelowinvertedcomb
warning: using glyphshow for unencoded glyph: /brevecomb
warning: using glyphshow for unencoded glyph: /brevegrave
warning: using glyphshow for unencoded glyph: /brevehookabove
warning: using glyphshow for unencoded glyph: /breveinverted
warning: using glyphshow for unencoded glyph: /breveinvertedcomb
warning: using glyphshow for unencoded glyph: /brevetilde
warning: using glyphshow for unencoded glyph: /brokenbar
warning: using glyphshow for unencoded glyph: /bullet
warning: using glyphshow for unencoded glyph: /cacute
warning: using glyphshow for unencoded glyph: /caron
warning: using glyphshow for unencoded glyph: /caron.ts1
warning: using glyphshow for unencoded glyph: /caroncomb
warning: using glyphshow for unencoded glyph: /ccaron
warning: using glyphshow for unencoded glyph: /ccedilla
warning: using glyphshow for unencoded glyph: /ccircumflex
warning: using glyphshow for unencoded glyph: /cdotaccent
warning: using glyphshow for unencoded glyph: /cedilla
warning: using glyphshow for unencoded glyph: /cedilla.dup
warning: using glyphshow for unencoded glyph: /cent
warning: using glyphshow for unencoded glyph: /cent.oldstyle
warning: using glyphshow for unencoded glyph: /centigrade
warning: using glyphshow for unencoded glyph: /circumflex
warning: using glyphshow for unencoded glyph: /circumflex.dup
warning: using glyphshow for unencoded glyph: /circumflexacute
warning: using glyphshow for unencoded glyph: /circumflexcomb
warning: using glyphshow for unencoded glyph: /circumflexgrave
warning: using glyphshow for unencoded glyph: /circumflexhookabove
warning: using glyphshow for unencoded glyph: /circumflextilde
warning: using glyphshow for unencoded glyph: /colon
warning: using glyphshow for unencoded glyph: /colonmonetary
warning: using glyphshow for unencoded glyph: /commaaccent
warning: using glyphshow for unencoded glyph: /commaaccentcomb
warning: using glyphshow for unencoded glyph: /copyleft
warning: using glyphshow for unencoded glyph: /copyright
warning: using glyphshow for unencoded glyph: /currency
warning: using glyphshow for unencoded glyph: /cwm
warning: using glyphshow for unencoded glyph: /cwmascender
warning: using glyphshow for unencoded glyph: /cwmcapital
warning: using glyphshow for unencoded glyph: /dagger
warning: using glyphshow for unencoded glyph: /daggerdbl
warning: using glyphshow for unencoded glyph: /dblGrave
warning: using glyphshow for unencoded glyph: /dblGravecomb
warning: using glyphshow for unencoded glyph: /dblbracketleft
warning: using glyphshow for unencoded glyph: /dblbracketright
warning: using glyphshow for unencoded glyph: /dblgrave
warning: using glyphshow for unencoded glyph: /dblgrave.ts1
warning: using glyphshow for unencoded glyph: /dblgravecomb
warning: using glyphshow for unencoded glyph: /dblverticalbar
warning: using glyphshow for unencoded glyph: /dcaron
warning: using glyphshow for unencoded glyph: /dcroat
warning: using glyphshow for unencoded glyph: /ddotbelow
warning: using glyphshow for unencoded glyph: /degree
warning: using glyphshow for unencoded glyph: /diameter
warning: using glyphshow for unencoded glyph: /died
warning: using glyphshow for unencoded glyph: /dieresis
warning: using glyphshow for unencoded glyph: /dieresis.dup
warning: using glyphshow for unencoded glyph: /dieresis.ts1
warning: using glyphshow for unencoded glyph: /dieresisacute
warning: using glyphshow for unencoded glyph: /dieresiscaron
warning: using glyphshow for unencoded glyph: /dieresiscomb
warning: using glyphshow for unencoded glyph: /dieresisgrave
warning: using glyphshow for unencoded glyph: /discount
warning: using glyphshow for unencoded glyph: /divide
warning: using glyphshow for unencoded glyph: /divorced
warning: using glyphshow for unencoded glyph: /dlinebelow
warning: using glyphshow for unencoded glyph: /dollar
warning: using glyphshow for unencoded glyph: /dollar.oldstyle
warning: using glyphshow for unencoded glyph: /dong
warning: using glyphshow for unencoded glyph: /dotaccent
warning: using glyphshow for unencoded glyph: /dotaccentcomb
warning: using glyphshow for unencoded glyph: /dotbelow
warning: using glyphshow for unencoded glyph: /dotbelowcomb
warning: using glyphshow for unencoded glyph: /dotlessi
warning: using glyphshow for unencoded glyph: /dotlessj
warning: using glyphshow for unencoded glyph: /dotlessj.dup
warning: using glyphshow for unencoded glyph: /eacute
warning: using glyphshow for unencoded glyph: /ebreve
warning: using glyphshow for unencoded glyph: /ecaron
warning: using glyphshow for unencoded glyph: /ecircumflex
warning: using glyphshow for unencoded glyph: /ecircumflexacute
warning: using glyphshow for unencoded glyph: /ecircumflexdotbelow
warning: using glyphshow for unencoded glyph: /ecircumflexgrave
warning: using glyphshow for unencoded glyph: /ecircumflexhookabove
warning: using glyphshow for unencoded glyph: /ecircumflextilde
warning: using glyphshow for unencoded glyph: /edblgrave
warning: using glyphshow for unencoded glyph: /edieresis
warning: using glyphshow for unencoded glyph: /edotaccent
warning: using glyphshow for unencoded glyph: /edotbelow
warning: using glyphshow for unencoded glyph: /egrave
warning: using glyphshow for unencoded glyph: /ehookabove
warning: using glyphshow for unencoded glyph: /eight
warning: using glyphshow for unencoded glyph: /eight.oldstyle
warning: using glyphshow for unencoded glyph: /eight.prop
warning: using glyphshow for unencoded glyph: /eight.taboldstyle
warning: using glyphshow for unencoded glyph: /ellipsis
warning: using glyphshow for unencoded glyph: /emacron
warning: using glyphshow for unencoded glyph: /emdash
warning: using glyphshow for unencoded glyph: /endash
warning: using glyphshow for unencoded glyph: /eng
warning: using glyphshow for unencoded glyph: /eogonek
warning: using glyphshow for unencoded glyph: /eogonekacute
warning: using glyphshow for unencoded glyph: /equal
warning: using glyphshow for unencoded glyph: /ereversed
warning: using glyphshow for unencoded glyph: /estimated
warning: using glyphshow for unencoded glyph: /eth
warning: using glyphshow for unencoded glyph: /etilde
warning: using glyphshow for unencoded glyph: /eturned
warning: using glyphshow for unencoded glyph: /exclam
warning: using glyphshow for unencoded glyph: /exclamdown
warning: using glyphshow for unencoded glyph: /f_k
warning: using glyphshow for unencoded glyph: /ff
warning: using glyphshow for unencoded glyph: /ffi
warning: using glyphshow for unencoded glyph: /ffl
warning: using glyphshow for unencoded glyph: /fi
warning: using glyphshow for unencoded glyph: /five
warning: using glyphshow for unencoded glyph: /five.oldstyle
warning: using glyphshow for unencoded glyph: /five.prop
warning: using glyphshow for unencoded glyph: /five.taboldstyle
warning: using glyphshow for unencoded glyph: /fl
warning: using glyphshow for unencoded glyph: /florin
warning: using glyphshow for unencoded glyph: /four
warning: using glyphshow for unencoded glyph: /four.oldstyle
warning: using glyphshow for unencoded glyph: /four.prop
warning: using glyphshow for unencoded glyph: /four.taboldstyle
warning: using glyphshow for unencoded glyph: /fraction
warning: using glyphshow for unencoded glyph: /fraction.alt
warning: using glyphshow for unencoded glyph: /gacute
warning: using glyphshow for unencoded glyph: /gbreve
warning: using glyphshow for unencoded glyph: /gcaron
warning: using glyphshow for unencoded glyph: /gcedilla
warning: using glyphshow for unencoded glyph: /gcircumflex
warning: using glyphshow for unencoded glyph: /gcommaaccent
warning: using glyphshow for unencoded glyph: /gdotaccent
warning: using glyphshow for unencoded glyph: /germandbls
warning: using glyphshow for unencoded glyph: /germandbls.dup
warning: using glyphshow for unencoded glyph: /gnaborretni
warning: using glyphshow for unencoded glyph: /grave
warning: using glyphshow for unencoded glyph: /grave.ts1
warning: using glyphshow for unencoded glyph: /gravecomb
warning: using glyphshow for unencoded glyph: /greater
warning: using glyphshow for unencoded glyph: /guarani
warning: using glyphshow for unencoded glyph: /guillemotleft
warning: using glyphshow for unencoded glyph: /guillemotright
warning: using glyphshow for unencoded glyph: /guilsinglleft
warning: using glyphshow for unencoded glyph: /guilsinglright
warning: using glyphshow for unencoded glyph: /hbar
warning: using glyphshow for unencoded glyph: /hbrevebelow
warning: using glyphshow for unencoded glyph: /hcircumflex
warning: using glyphshow for unencoded glyph: /hdieresis
warning: using glyphshow for unencoded glyph: /hdotbelow
warning: using glyphshow for unencoded glyph: /hookabove
warning: using glyphshow for unencoded glyph: /hookabovecomb
warning: using glyphshow for unencoded glyph: /htilde
warning: using glyphshow for unencoded glyph: /hungarumlaut
warning: using glyphshow for unencoded glyph: /hungarumlaut.ts1
warning: using glyphshow for unencoded glyph: /hungarumlautcomb
warning: using glyphshow for unencoded glyph: /hyphen.alt
warning: using glyphshow for unencoded glyph: /hyphen.dup
warning: using glyphshow for unencoded glyph: /hyphen.prop
warning: using glyphshow for unencoded glyph: /hyphendbl
warning: using glyphshow for unencoded glyph: /hyphendbl.alt
warning: using glyphshow for unencoded glyph: /iacute
warning: using glyphshow for unencoded glyph: /ibreve
warning: using glyphshow for unencoded glyph: /icaron
warning: using glyphshow for unencoded glyph: /icircumflex
warning: using glyphshow for unencoded glyph: /idblgrave
warning: using glyphshow for unencoded glyph: /idieresis
warning: using glyphshow for unencoded glyph: /idieresisacute
warning: using glyphshow for unencoded glyph: /idotbelow
warning: using glyphshow for unencoded glyph: /igrave
warning: using glyphshow for unencoded glyph: /ihookabove
warning: using glyphshow for unencoded glyph: /ij
warning: using glyphshow for unencoded glyph: /imacron
warning: using glyphshow for unencoded glyph: /imacron.alt
warning: using glyphshow for unencoded glyph: /infinity
warning: using glyphshow for unencoded glyph: /interrobang
warning: using glyphshow for unencoded glyph: /iogonek
warning: using glyphshow for unencoded glyph: /iogonekacute
warning: using glyphshow for unencoded glyph: /itilde
warning: using glyphshow for unencoded glyph: /j
warning: using glyphshow for unencoded glyph: /jacute
warning: using glyphshow for unencoded glyph: /jcaron
warning: using glyphshow for unencoded glyph: /jcircumflex
warning: using glyphshow for unencoded glyph: /k
warning: using glyphshow for unencoded glyph: /kcedilla
warning: using glyphshow for unencoded glyph: /kcommaaccent
warning: using glyphshow for unencoded glyph: /lacute
warning: using glyphshow for unencoded glyph: /lcaron
warning: using glyphshow for unencoded glyph: /lcedilla
warning: using glyphshow for unencoded glyph: /lcommaaccent
warning: using glyphshow for unencoded glyph: /ldot
warning: using glyphshow for unencoded glyph: /ldotbelow
warning: using glyphshow for unencoded glyph: /ldotbelowmacron
warning: using glyphshow for unencoded glyph: /leaf
warning: using glyphshow for unencoded glyph: /less
warning: using glyphshow for unencoded glyph: /linebelow
warning: using glyphshow for unencoded glyph: /linebelowcomb
warning: using glyphshow for unencoded glyph: /lira
warning: using glyphshow for unencoded glyph: /logicalnot
warning: using glyphshow for unencoded glyph: /longs
warning: using glyphshow for unencoded glyph: /lslash
warning: using glyphshow for unencoded glyph: /ltilde
warning: using glyphshow for unencoded glyph: /m
warning: using glyphshow for unencoded glyph: /macron
warning: using glyphshow for unencoded glyph: /macron.alt
warning: using glyphshow for unencoded glyph: /macron.dup
warning: using glyphshow for unencoded glyph: /macron.ts1
warning: using glyphshow for unencoded glyph: /macronbelow
warning: using glyphshow for unencoded glyph: /macronbelowcomb
warning: using glyphshow for unencoded glyph: /macroncomb
warning: using glyphshow for unencoded glyph: /married
warning: using glyphshow for unencoded glyph: /mdotbelow
warning: using glyphshow for unencoded glyph: /mho
warning: using glyphshow for unencoded glyph: /minus
warning: using glyphshow for unencoded glyph: /mu
warning: using glyphshow for unencoded glyph: /multiply
warning: using glyphshow for unencoded glyph: /musicalnote
warning: using glyphshow for unencoded glyph: /nacute
warning: using glyphshow for unencoded glyph: /naira
warning: using glyphshow for unencoded glyph: /nbspace
warning: using glyphshow for unencoded glyph: /ncaron
warning: using glyphshow for unencoded glyph: /ncedilla
warning: using glyphshow for unencoded glyph: /ncommaaccent
warning: using glyphshow for unencoded glyph: /ndotaccent
warning: using glyphshow for unencoded glyph: /ndotbelow
warning: using glyphshow for unencoded glyph: /nine
warning: using glyphshow for unencoded glyph: /nine.oldstyle
warning: using glyphshow for unencoded glyph: /nine.prop
warning: using glyphshow for unencoded glyph: /nine.taboldstyle
warning: using glyphshow for unencoded glyph: /ntilde
warning: using glyphshow for unencoded glyph: /numbersign
warning: using glyphshow for unencoded glyph: /numero
warning: using glyphshow for unencoded glyph: /o
warning: using glyphshow for unencoded glyph: /oacute
warning: using glyphshow for unencoded glyph: /obreve
warning: using glyphshow for unencoded glyph: /ocaron
warning: using glyphshow for unencoded glyph: /ocircumflex
warning: using glyphshow for unencoded glyph: /ocircumflexacute
warning: using glyphshow for unencoded glyph: /ocircumflexdotbelow
warning: using glyphshow for unencoded glyph: /ocircumflexgrave
warning: using glyphshow for unencoded glyph: /ocircumflexhookabove
warning: using glyphshow for unencoded glyph: /ocircumflextilde
warning: using glyphshow for unencoded glyph: /odblacute
warning: using glyphshow for unencoded glyph: /odblgrave
warning: using glyphshow for unencoded glyph: /odieresis
warning: using glyphshow for unencoded glyph: /odotbelow
warning: using glyphshow for unencoded glyph: /oe
warning: using glyphshow for unencoded glyph: /oe.dup
warning: using glyphshow for unencoded glyph: /ogonek
warning: using glyphshow for unencoded glyph: /ograve
warning: using glyphshow for unencoded glyph: /ohm
warning: using glyphshow for unencoded glyph: /ohookabove
warning: using glyphshow for unencoded glyph: /ohorn
warning: using glyphshow for unencoded glyph: /ohornacute
warning: using glyphshow for unencoded glyph: /ohorndotbelow
warning: using glyphshow for unencoded glyph: /ohorngrave
warning: using glyphshow for unencoded glyph: /ohornhookabove
warning: using glyphshow for unencoded glyph: /ohorntilde
warning: using glyphshow for unencoded glyph: /ohungarumlaut
warning: using glyphshow for unencoded glyph: /omacron
warning: using glyphshow for unencoded glyph: /one
warning: using glyphshow for unencoded glyph: /one.oldstyle
warning: using glyphshow for unencoded glyph: /one.prop
warning: using glyphshow for unencoded glyph: /one.superior
warning: using glyphshow for unencoded glyph: /one.taboldstyle
warning: using glyphshow for unencoded glyph: /onehalf
warning: using glyphshow for unencoded glyph: /onequarter
warning: using glyphshow for unencoded glyph: /oogonek
warning: using glyphshow for unencoded glyph: /oogonekacute
warning: using glyphshow for unencoded glyph: /openbullet
warning: using glyphshow for unencoded glyph: /ordfeminine
warning: using glyphshow for unencoded glyph: /ordmasculine
warning: using glyphshow for unencoded glyph: /orogate
warning: using glyphshow for unencoded glyph: /oslash
warning: using glyphshow for unencoded glyph: /oslash.dup
warning: using glyphshow for unencoded glyph: /oslashacute
warning: using glyphshow for unencoded glyph: /otilde
warning: using glyphshow for unencoded glyph: /p
warning: using glyphshow for unencoded glyph: /paragraph
warning: using glyphshow for unencoded glyph: /paragraph.alt
warning: using glyphshow for unencoded glyph: /parenleft
warning: using glyphshow for unencoded glyph: /parenright
warning: using glyphshow for unencoded glyph: /percent
warning: using glyphshow for unencoded glyph: /period
warning: using glyphshow for unencoded glyph: /periodcentered
warning: using glyphshow for unencoded glyph: /permyriad
warning: using glyphshow for unencoded glyph: /perthousand
warning: using glyphshow for unencoded glyph: /perthousandzero
warning: using glyphshow for unencoded glyph: /peso
warning: using glyphshow for unencoded glyph: /plus
warning: using glyphshow for unencoded glyph: /plusminus
warning: using glyphshow for unencoded glyph: /published
warning: using glyphshow for unencoded glyph: /q
warning: using glyphshow for unencoded glyph: /question
warning: using glyphshow for unencoded glyph: /questiondown
warning: using glyphshow for unencoded glyph: /quillbracketleft
warning: using glyphshow for unencoded glyph: /quillbracketright
warning: using glyphshow for unencoded glyph: /quotedbl
warning: using glyphshow for unencoded glyph: /quotedblbase
warning: using glyphshow for unencoded glyph: /quotedblbase.cm
warning: using glyphshow for unencoded glyph: /quotedblbase.cs
warning: using glyphshow for unencoded glyph: /quotedblbase.ts1
warning: using glyphshow for unencoded glyph: /quotedblleft
warning: using glyphshow for unencoded glyph: /quotedblleft.cm
warning: using glyphshow for unencoded glyph: /quotedblright
warning: using glyphshow for unencoded glyph: /quotedblright.cm
warning: using glyphshow for unencoded glyph: /quotedblright.cs
warning: using glyphshow for unencoded glyph: /quoteleft
warning: using glyphshow for unencoded glyph: /quoteleft.dup
warning: using glyphshow for unencoded glyph: /quoteright
warning: using glyphshow for unencoded glyph: /quoteright.dup
warning: using glyphshow for unencoded glyph: /quotesinglbase
warning: using glyphshow for unencoded glyph: /quotesinglbase.ts1
warning: using glyphshow for unencoded glyph: /quotesingle
warning: using glyphshow for unencoded glyph: /quotesingle.ts1
warning: using glyphshow for unencoded glyph: /racute
warning: using glyphshow for unencoded glyph: /radical
warning: using glyphshow for unencoded glyph: /rcaron
warning: using glyphshow for unencoded glyph: /rcedilla
warning: using glyphshow for unencoded glyph: /rcommaaccent
warning: using glyphshow for unencoded glyph: /rdblgrave
warning: using glyphshow for unencoded glyph: /rdotaccent
warning: using glyphshow for unencoded glyph: /rdotbelow
warning: using glyphshow for unencoded glyph: /rdotbelowmacron
warning: using glyphshow for unencoded glyph: /recipe
warning: using glyphshow for unencoded glyph: /referencemark
warning: using glyphshow for unencoded glyph: /registered
warning: using glyphshow for unencoded glyph: /registered.alt
warning: using glyphshow for unencoded glyph: /ring
warning: using glyphshow for unencoded glyph: /ringacute
warning: using glyphshow for unencoded glyph: /ringcomb
warning: using glyphshow for unencoded glyph: /ringhalfleft
warning: using glyphshow for unencoded glyph: /ringhalfright
warning: using glyphshow for unencoded glyph: /sacute
warning: using glyphshow for unencoded glyph: /scaron
warning: using glyphshow for unencoded glyph: /scedilla
warning: using glyphshow for unencoded glyph: /schwa
warning: using glyphshow for unencoded glyph: /scircumflex
warning: using glyphshow for unencoded glyph: /scommaaccent
warning: using glyphshow for unencoded glyph: /sdotbelow
warning: using glyphshow for unencoded glyph: /section
warning: using glyphshow for unencoded glyph: /semicolon
warning: using glyphshow for unencoded glyph: /servicemark
warning: using glyphshow for unencoded glyph: /seven
warning: using glyphshow for unencoded glyph: /seven.oldstyle
warning: using glyphshow for unencoded glyph: /seven.prop
warning: using glyphshow for unencoded glyph: /seven.taboldstyle
warning: using glyphshow for unencoded glyph: /sfthyphen
warning: using glyphshow for unencoded glyph: /six
warning: using glyphshow for unencoded glyph: /six.oldstyle
warning: using glyphshow for unencoded glyph: /six.prop
warning: using glyphshow for unencoded glyph: /six.taboldstyle
warning: using glyphshow for unencoded glyph: /slash
warning: using glyphshow for unencoded glyph: /space
warning: using glyphshow for unencoded glyph: /sterling
warning: using glyphshow for unencoded glyph: /suppress
warning: using glyphshow for unencoded glyph: /tcaron
warning: using glyphshow for unencoded glyph: /tcedilla
warning: using glyphshow for unencoded glyph: /tcommaaccent
warning: using glyphshow for unencoded glyph: /tdieresis
warning: using glyphshow for unencoded glyph: /tdotbelow
warning: using glyphshow for unencoded glyph: /thorn
warning: using glyphshow for unencoded glyph: /three
warning: using glyphshow for unencoded glyph: /three.oldstyle
warning: using glyphshow for unencoded glyph: /three.prop
warning: using glyphshow for unencoded glyph: /three.superior
warning: using glyphshow for unencoded glyph: /three.taboldstyle
warning: using glyphshow for unencoded glyph: /threequarters
warning: using glyphshow for unencoded glyph: /threequartersemdash
warning: using glyphshow for unencoded glyph: /tie
warning: using glyphshow for unencoded glyph: /tieaccentcapital
warning: using glyphshow for unencoded glyph: /tieaccentcapital.new
warning: using glyphshow for unencoded glyph: /tieaccentlowercase
warning: using glyphshow for unencoded glyph: /tieaccentlowercase.new
warning: using glyphshow for unencoded glyph: /tilde
warning: using glyphshow for unencoded glyph: /tilde.dup
warning: using glyphshow for unencoded glyph: /tildebelow
warning: using glyphshow for unencoded glyph: /tildebelowcomb
warning: using glyphshow for unencoded glyph: /tildecomb
warning: using glyphshow for unencoded glyph: /tildelow
warning: using glyphshow for unencoded glyph: /tlinebelow
warning: using glyphshow for unencoded glyph: /trademark
warning: using glyphshow for unencoded glyph: /ttilde
warning: using glyphshow for unencoded glyph: /twelveudash
warning: using glyphshow for unencoded glyph: /two
warning: using glyphshow for unencoded glyph: /two.oldstyle
warning: using glyphshow for unencoded glyph: /two.prop
warning: using glyphshow for unencoded glyph: /two.superior
warning: using glyphshow for unencoded glyph: /two.taboldstyle
warning: using glyphshow for unencoded glyph: /uacute
warning: using glyphshow for unencoded glyph: /ubreve
warning: using glyphshow for unencoded glyph: /ubrevebelowinverted
warning: using glyphshow for unencoded glyph: /ucaron
warning: using glyphshow for unencoded glyph: /ucircumflex
warning: using glyphshow for unencoded glyph: /udblacute
warning: using glyphshow for unencoded glyph: /udblgrave
warning: using glyphshow for unencoded glyph: /udieresis
warning: using glyphshow for unencoded glyph: /udieresisacute
warning: using glyphshow for unencoded glyph: /udieresiscaron
warning: using glyphshow for unencoded glyph: /udieresisgrave
warning: using glyphshow for unencoded glyph: /udotbelow
warning: using glyphshow for unencoded glyph: /ugrave
warning: using glyphshow for unencoded glyph: /uhookabove
warning: using glyphshow for unencoded glyph: /uhorn
warning: using glyphshow for unencoded glyph: /uhornacute
warning: using glyphshow for unencoded glyph: /uhorndotbelow
warning: using glyphshow for unencoded glyph: /uhorngrave
warning: using glyphshow for unencoded glyph: /uhornhookabove
warning: using glyphshow for unencoded glyph: /uhorntilde
warning: using glyphshow for unencoded glyph: /uhungarumlaut
warning: using glyphshow for unencoded glyph: /umacron
warning: using glyphshow for unencoded glyph: /underscore
warning: using glyphshow for unencoded glyph: /undertie
warning: using glyphshow for unencoded glyph: /undertieinverted
warning: using glyphshow for unencoded glyph: /uni2010
warning: using glyphshow for unencoded glyph: /uni2011
warning: using glyphshow for unencoded glyph: /uni2423
warning: using glyphshow for unencoded glyph: /uogonek
warning: using glyphshow for unencoded glyph: /uring
warning: using glyphshow for unencoded glyph: /utilde
warning: using glyphshow for unencoded glyph: /v
warning: using glyphshow for unencoded glyph: /varcopyright
warning: using glyphshow for unencoded glyph: /vardotaccent
warning: using glyphshow for unencoded glyph: /varregistered
warning: using glyphshow for unencoded glyph: /w
warning: using glyphshow for unencoded glyph: /wacute
warning: using glyphshow for unencoded glyph: /wcircumflex
warning: using glyphshow for unencoded glyph: /wdieresis
warning: using glyphshow for unencoded glyph: /wgrave
warning: using glyphshow for unencoded glyph: /won
warning: using glyphshow for unencoded glyph: /x
warning: using glyphshow for unencoded glyph: /y
warning: using glyphshow for unencoded glyph: /yacute
warning: using glyphshow for unencoded glyph: /ycircumflex
warning: using glyphshow for unencoded glyph: /ydieresis
warning: using glyphshow for unencoded glyph: /ydotbelow
warning: using glyphshow for unencoded glyph: /yen
warning: using glyphshow for unencoded glyph: /ygrave
warning: using glyphshow for unencoded glyph: /yhookabove
warning: using glyphshow for unencoded glyph: /ytilde
warning: using glyphshow for unencoded glyph: /z
warning: using glyphshow for unencoded glyph: /zacute
warning: using glyphshow for unencoded glyph: /zcaron
warning: using glyphshow for unencoded glyph: /zdotaccent
warning: using glyphshow for unencoded glyph: /zdotbelow
warning: using glyphshow for unencoded glyph: /zero
warning: using glyphshow for unencoded glyph: /zero.oldstyle
warning: using glyphshow for unencoded glyph: /zero.prop
warning: using glyphshow for unencoded glyph: /zero.slash
warning: using glyphshow for unencoded glyph: /zero.taboldstyle
Type1CConverter: converting font /PRYVHD+LMRoman12-Bold to /Obj0000006433
Type1CConverter: converting font /CNEWXW+LMRomanCaps10-Regular to /Obj0000006435
Type1CConverter: converting font /PSVNTQ+LMMathExtension10-Regular to /Obj0000006437
Type1CConverter: converting font /XCFBGN+LMMathItalic12-Regular to /Obj0000006439
Type1CConverter: converting font /BWVMLN+LMMathItalic6-Regular to /Obj0000006441
Type1CConverter: converting font /GSXGNG+LMMathItalic8-Regular to /Obj0000006443
Type1CConverter: converting font /LZRGPS+LMRoman12-Regular to /Obj0000006446
Type1CConverter: converting font /CHGOOK+LMRoman17-Regular to /Obj0000006448
Type1CConverter: converting font /ZMXJCK+LMRoman6-Regular to /Obj0000006450
Type1CConverter: converting font /FATLKW+LMRoman8-Regular to /Obj0000006452
Type1CConverter: converting font /APHGDZ+LMRoman12-Italic to /Obj0000006454
Type1CConverter: converting font /HZJIBN+LMRomanSlant12-Regular to /Obj0000006456
Type1CConverter: converting font /PTMUXV+LMSans10-Bold to /Obj0000006458
Type1CConverter: converting font /BWTMRZ+LMMathSymbols10-Regular to /Obj0000006460
Type1CConverter: converting font /CWURCS+LMMathSymbols6-Regular to /Obj0000006462
Type1CConverter: converting font /KWRJVC+LMMathSymbols8-Regular to /Obj0000006464
Type1CConverter: converting font /KOOAIX+LMMono12-Regular to /Obj0000006466
Type1CConverter: converting font /ULZEWH+MSBM10 to /Obj0000006468
Type1CConverter: all OK
info: This is pdfsizeopt ZIP rUNKNOWN size=105563.
info: loading PDF from: B:\pdfs\input.pdf
info: loaded PDF of 25028325 bytes
info: separated to 6489 objs + xref + trailer
info: found 20 Type1 fonts loaded
info: writing Type1CConverter (438038 font bytes) to: B:\pdfs\psotmp.9664.conv.tmp.ps
info: using Ghostscript "B:\pdfsizeopt\pdfsizeopt_win32exec\pdfsizeopt_gswin\gswin32c.exe": GPL Ghostscript 9.02 (2011-03-30)
info: executing Type1CConverter with Ghostscript: "B:\pdfsizeopt\pdfsizeopt_win32exec\pdfsizeopt_gswin\gswin32c.exe" -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=B:\pdfs\psotmp.9664.conv.tmp.pdf -f B:\pdfs\psotmp.9664.conv.tmp.ps
info: loading PDF from: B:\pdfs\psotmp.9664.conv.tmp.pdf
info: loaded PDF of 173012 bytes
info: separated to 116 objs + xref + trailer
error: duplicate font /Obj0000006127 obj old=6127 new=72
error: duplicate font /Obj0000006127 obj old=6127 new=71
error: duplicate font /Obj0000006127 obj old=6127 new=70
error: duplicate font /Obj0000006127 obj old=6127 new=69
error: duplicate font /Obj0000006127 obj old=6127 new=68
error: duplicate font /Obj0000006127 obj old=6127 new=67
info: found 26 fonts in GS output
Traceback (most recent call last):
  File "B:\pdfsizeopt\pdfsizeopt_win32exec\python26.zip\runpy.py", line 122, in _run_module_as_main
  File "B:\pdfsizeopt\pdfsizeopt_win32exec\python26.zip\runpy.py", line 34, in _run_code
  File "B:\pdfsizeopt\pdfsizeopt.single\__main__.py", line 1, in <module>
  File "B:\pdfsizeopt\pdfsizeopt.single\mainrun.py", line 10, in <module>
  File "B:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 8240, in main
  File "B:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 5341, in ConvertType1FontsToType1C
  File "B:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 5074, in GenerateType1CFontsFromType1
  File "B:\pdfsizeopt\pdfsizeopt.single\pdfsizeopt\main.py", line 5003, in GetFonts
AssertionError: found 6 duplicate font objs in GS output

multivalent problem

Trying to use pdfsizeopt to shrink a pdf file, and part way through it says
info: executing Multivalent to optimize PDF: /usr/lib64/java/bin/java -cp ./Multivalent.jar -Djava.awt.headless=true tool.pdf.Compress -nopagepiece -noalt -mon pso.conv.mi.tmp.pdf
java.lang.ClassCastException: multivalent.std.adaptor.pdf.COS$1 cannot be cast to multivalent.std.adaptor.pdf.Dict
at multivalent.std.adaptor.pdf.PDFReader.getCatalog(Unknown Source)
at multivalent.std.adaptor.pdf.PDFWriter.(Unknown Source)
at multivalent.std.adaptor.pdf.PDFWriter.(Unknown Source)
at tool.pdf.Compress.writeUni(Unknown Source)
at tool.pdf.Compress.writeFile(Unknown Source)
at tool.pdf.Compress.main(Unknown Source)
pso.conv.mi.tmp.pdf: java.io.IOException: No document catalog.
info: Multivalent has not created output: pso.conv.mi.tmp-o.pdf
Traceback (most recent call last):
File "/home/zsd/pdfsizeopt-master/pdfsizeopt", line 30, in
sys.exit(main.main(sys.argv))
File "/home/zsd/pdfsizeopt-master/lib/pdfsizeopt/main.py", line 8081, in main
is_flate_ok=not do_decompress_flate)
File "/home/zsd/pdfsizeopt-master/lib/pdfsizeopt/main.py", line 7773, in Save
multivalent_java=multivalent_java)
File "/home/zsd/pdfsizeopt-master/lib/pdfsizeopt/main.py", line 7695, in _RunMultivalent
assert False, 'Multivalent failed (no output)'
AssertionError: Multivalent failed (no output)

Any suggestions on how to proceed?

This is with 'info: This is pdfsizeopt rUNKNOWN size=321258.' which I downloaded from github a few minutes ago and a Multivalent.jar which I have had lying around for a while. I can supply more info if you can tell me what else you need to know.

Thanks.

pnmtopng: fatal libpng error: Extra compressed data; adding imgdataopt

The image stream in bad_image_extra_data.pdf indeed contains extra bytes after the image data. The expected behavior would be truncating those extra bytes. What happens instead is pdfsizeopt calls sam2p, which calls pnmtopng, which fails with fatal error pnmtopng: fatal libpng error: Extra compressed data, making sam2p fail, making pdfsizeopt fail.

Image viewer qiv also indicates the error Extra compressed data on the corresponding PNG, but it at least shows the image.

pdfsizeopt.main.FilterError: Ghostscript decompression with filter '/A85' failed: (...)

Dear @pts,

I just stumbled upon a file here that I suspect that is malformed but, if preprocessed with, way GS or with qpdf, gets successfully optimized by pdfsizeopt.

OTOH, if it is not preprocessed, it gives me a stacktrace like this:

(...)
info: will optimize image XObject 1428; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=42414 gs_device=pngmono
info: will optimize image XObject 1431; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=43220 gs_device=pngmono
info: will optimize image XObject 1434; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=41308 gs_device=pngmono
info: will optimize image XObject 1437; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=41785 gs_device=pngmono
info: will optimize image XObject 1440; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=40800 gs_device=pngmono
info: will optimize image XObject 1443; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=43269 gs_device=pngmono
info: will optimize image XObject 1446; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=30509 gs_device=pngmono
info: will optimize image XObject 3202; orig width=1860 height=2760 colorspace=/DeviceGray bpc=1 inv=False filter=/CCITTFaxDecode dp=1 size=26475 gs_device=pngmono
info: using Ghostscript /usr/bin/gs: GPL Ghostscript 9.21 (2017-03-16)
info: decompressing 62 bytes with Ghostscript /Filter/A85
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
    sys.exit(main.main(sys.argv, script_dir=script_dir))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 9084, in main
    pdf.OptimizeImages(img_cmd_patterns=img_cmd_patterns)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 6886, in OptimizeImages
    obj.Get('ColorSpace'), objs=self.objs, do_strings=True)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3480, in ResolveReferences
    data = cls.PDF_REF_RE.sub(Replacement, data)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3467, in Replacement
    return obj.SerializePdfStringSafe(obj.GetUncompressedStream(objs=objs))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3389, in GetUncompressedStream
    (filter_value, gs_defilter_cmd, data))
pdfsizeopt.main.FilterError: Ghostscript decompression with filter '/A85' failed: gs -dNODISPLAY -sINFN=psotmp.16222.filter.tmp.bin -q -P- -c '/i INFN(r)file<</CloseSource true /Intent 2/Filter /A85>>/ReusableStreamDecode filter def /o(%stdout)(w)file def/s 4096 string def {i s readstring exch o exch writestring not{exit}if}loop o closefile quit' ('Error: /rangecheck in --.reusablestreamdecode--\nOperand stack:\n   i   --nostringval--   --dict:3/3(L)--   --nostringval--   --dict:3/3(L)--   --dict:3/3(L)--\nExecution stack:\n   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   1862   4   3   %oparray_pop   1973   3   3   %oparray_pop   --nostringval--\nDictionary stack:\n   --dict:1210/1684(ro)(G)--   --dict:0/20(G)--   --dict:78/200(L)--\nCurrent allocation mode is local\n')

I'm sending you a link to the file by email.

Thanks,

Rogério.

[REGRESSION][BISECTED] Missing/incorrect characters after processing PDF file with pdfsizeopt

Hi, @pts.

I found a file here generated by a dvipdfm that, when processed with pdfsizeopt, has missing symbols and/or incorrect symbols.

The file in question is the following: wscad06.pdf

Using git bisect (thanks for the more granular commits!), I was able to discover that the first bad commit is: 26ef82d

The log of my git bisect is here:

git bisect start
# good: [4100e2038d113b9cfea587e2326d0ad287493ca7] added handling of /Encoding longer than 256 glyphs to Type1CGenerator
git bisect good 4100e2038d113b9cfea587e2326d0ad287493ca7
# bad: [cd7b8d619d34e95f04693c1cdd62e498356a3e18] updated pdfsizeopt.single
git bisect bad cd7b8d619d34e95f04693c1cdd62e498356a3e18
# good: [c5affe6ea687626d7147f56cef753f3ca458fade] added support for multiple, configurable image optimizers (--use-image-optimizer=...); this fixes https://github.com/pts/pdfsizeopt/issues/7
git bisect good c5affe6ea687626d7147f56cef753f3ca458fade
# good: [fc149a3d2b3d66519d286eb37fec9d4ce6fccd52] migrated NormalizePdfName to _EscapePdfNamesInHexTokens
git bisect good fc149a3d2b3d66519d286eb37fec9d4ce6fccd52
# good: [bded3ac6c41d3160cf8e964ebcca8d43855dd518] fixed whitespace parsing bug in front of obj definition in ParseSequentially for --stats
git bisect good bded3ac6c41d3160cf8e964ebcca8d43855dd518
# good: [f731e2de35de1c6f460cbe02d3db7151debd9ace] made trailer detection more permissive by allowing no whitespace after the trailer keyword
git bisect good f731e2de35de1c6f460cbe02d3db7151debd9ace
# good: [4d1e659cef3068aff26636a2630fd2789954247b] added removal of /Filter, /DecodeParms and /Length from non-stream objs
git bisect good 4d1e659cef3068aff26636a2630fd2789954247b
# good: [1986a7b6086f63bd4b5bfe9826dd72494d0297ab] bugfix: made Type1CConverter able to load PFB fonts by using .loadfont instead of cvx exec; this fixes https://github.com/pts/pdfsizeopt/issues/48
git bisect good 1986a7b6086f63bd4b5bfe9826dd72494d0297ab
# bad: [26ef82d41348204d93a5421c7750c0ee6df572b5] made Type1CConverter generate /Encoding from /CharStrings, so that all glyphs would be included; this fixes https://github.com/pts/pdfsizeopt/issues/49
git bisect bad 26ef82d41348204d93a5421c7750c0ee6df572b5
# good: [5205c6ba752bd637ced86974ec5b6c3beb891469] updated pdfsizeopt.single
git bisect good 5205c6ba752bd637ced86974ec5b6c3beb891469
# first bad commit: [26ef82d41348204d93a5421c7750c0ee6df572b5] made Type1CConverter generate /Encoding from /CharStrings, so that all glyphs would be included; this fixes https://github.com/pts/pdfsizeopt/issues/49

(You can feed that to git bisect replay, if you want to perform the same steps that I did. I'm also attaching the file below).

git-bisect-log-1507093395.log

Thank you very much,

Rogério.

inversion of black-and-white images

See the attached files: hi.pdf and hi.pso.pdf. hi.pdf has white background. hi.pso.pdf should also have white background, but it's black.

Conversion was done with:

$ pdfsizeopt --use-pngout=no --use-multivalent=no hi.pdf
info: This is pdfsizeopt rUNKNOWN size=352854.
info: prepending to PATH: /home/pts/prg/pdfsizeopt/pdfsizeopt_libexec
info: loading PDF from: hi.pdf
info: loaded PDF of 1234 bytes
warning: problem with xref table: xref table not found at 1027
warning: trying to load objs without the xref table
info: separated to 5 objs + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 4; orig width=200 height=100 colorspace=/DeviceGray bpc=1 filter=/CCITTFaxDecode dp=1 size=727 gs_device=pngmono
info: optimizing 1 images of 727 bytes in total
info: writing ImageRenderer (712 image bytes) to: psotmp.15477.conv.pngmono.tmp.ps
info: using Ghostscript gs: GPL Ghostscript 9.05 (2012-02-08)
info: executing ImageRenderer with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVICE=pngmono -sOutputFile='psotmp.15477.img-%04d.pngmono.tmp.png' -f psotmp.15477.conv.pngmono.tmp.ps
ImageRenderer: rendering image XObject 4 width=200 height=100 bpc=1 colorspace=/DeviceGray filter=/CCITTFaxDecode decodeparms=<< /Columns 200 /K 0 /BlackIs1 true >> device=pngmono
ImageRenderer: all OK
info: loading image from: psotmp.15477.img-0001.pngmono.tmp.png
info: loaded PNG IDAT of 169 bytes
info: executing image optimizer sam2p_np: sam2p -pdf:2 -c zip:1:9 -s Gray1:Indexed1:Gray2:Indexed2:Rgb1:Gray4:Indexed4:Rgb2:Gray8:Indexed8:Rgb4:Rgb8:stop -- psotmp.15477.img-0001.pngmono.tmp.png psotmp.15477.img-4.sam2p-np.pdf
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.15477.img-0001.pngmono.tmp.png
sam2p: Notice: writeTTT: using template: p02
sam2p: Notice: applyProfile: applied OutputRule #0
sam2p: Notice: job: written OutputFile: psotmp.15477.img-4.sam2p-np.pdf
Success.
info: loading image from: psotmp.15477.img-4.sam2p-np.pdf
info: loading PDF from: psotmp.15477.img-4.sam2p-np.pdf
info: loaded PDF of 872 bytes
info: separated to 5 objs + xref + trailer
info: loaded PNG IDAT of 165 bytes
info: executing image optimizer sam2p_pr: sam2p -c zip:15:9 -- psotmp.15477.img-0001.pngmono.tmp.png psotmp.15477.img-4.sam2p-pr.png
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.15477.img-0001.pngmono.tmp.png
sam2p: Notice: applyProfile: applied OutputRule #2
sam2p: Notice: job: written OutputFile: psotmp.15477.img-4.sam2p-pr.png
Success.
info: loading image from: psotmp.15477.img-4.sam2p-pr.png
info: loaded PNG IDAT of 201 bytes
info: executing image optimizer jbig2: jbig2 -p psotmp.15477.img-4.sam2p-pr.png >psotmp.15477.img-4.jbig2
info: optimized image XObject 4 file_name=psotmp.15477.img-4.jbig2 size=357 (49%) methods=jbig2:357,sam2p_np:365,gs:429,sam2p_pr:461,#orig:727
info: saved 370 bytes (51%) on optimizable images
info: compressed 1 streams, kept 0 of them uncompressed
info: saving PDF with 5 objs to: hi.pso.pdf
info: trying 3 jobs and using the smallest
info: generated object stream of 160 bytes in 3 objects (33%)
info: job original generated 813 bytes (66%)
info: job xrefstm generated 793 bytes (64%)
info: job nostm generated 856 bytes (69%)
info: jobs result: xrefstm=793 original=813 nostm=856
info: generated 793 bytes (64%)

Missing/swapped characters after processing with pdfsizeopt

Dear @pts,

I was going through my collection of PDF files and I found one document that presents problems with a single character after being processed with pdfsizeopt.

The document in question is the following:

http://www.css.cornell.edu/faculty/dgr2/teach/R/RIntro_ITC.pdf

The problem occurs on page 113. It also occurs if I invoke pdfsizeopt with the option --do-optimize-fonts=no.

If I simply process the file with qpdf (with qpdf RIntro_ITC.pdf out.pdf), then the output looks fine. If I run pdfsizeopt on this output, the problem is still present.

If I preprocess the file with ghostscript (with ps2pdf -dPDFSETTINGS=/prepress RIntro_ITC.pdf), then I get warnings on its output like the following:

   **** Error: Invalid BaseEncoding name "PDFDocEncoding" ignoring BaseEncoding.
               Output may be incorrect.
   **** Error: Invalid BaseEncoding name "PDFDocEncoding" ignoring BaseEncoding.
               Output may be incorrect.
   **** Error: Invalid BaseEncoding name "PDFDocEncoding" ignoring BaseEncoding.
               Output may be incorrect.
   **** Error: Invalid BaseEncoding name "PDFDocEncoding" ignoring BaseEncoding.
               Output may be incorrect.

The file was created with pdfTeX (incredible how many files I have reported here that are produced by pdfTeX that seem to have problems), which one would believe to be one of the best PDF generators out there...

Since the file may be defective, I don't know the priority that you may attribute to this issue...

pdfsizeopt --stats fails

Following your instructions in MacOS, I then test the '--stats' flag:

./pdfsizeopt --stats deptest.pdf

This is what I get:

info: This is pdfsizeopt ZIP rUNKNOWN size=66098.
info: computing statistics for PDF: deptest.pdf
info: PDF size is 36080 bytes
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "./pdfsizeopt.single/__main__.py", line 1, in <module>
  File "./pdfsizeopt.single/m.py", line 6, in <module>
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 5194, in main
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 4625, in ComputePdfStatistics
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 4507, in ParseSequentially
AssertionError

values_dict in lib/pdfsizeopt/main.py lines 6327 and 6329 must be cmd_values_dict

pdfsizeopt throws an error with png optimization:

info: loaded PNG IDAT of 4796 bytes
Traceback (most recent call last):
  File "/home/dd/bin/pdfsizeopt/pdfsizeopt", line 37, in <module>
    sys.exit(main.main(sys.argv))
  File "/home/dd/bin/pdfsizeopt/lib/pdfsizeopt/main.py", line 8955, in main
    pdf.OptimizeImages(img_cmd_patterns=img_cmd_patterns)
  File "/home/dd/bin/pdfsizeopt/lib/pdfsizeopt/main.py", line 7111, in OptimizeImages
    return_none_if_status=return_none_if_status)
  File "/home/dd/bin/pdfsizeopt/lib/pdfsizeopt/main.py", line 6327, in ConvertImage
    values_dict['pngout_gray_flags'] = '-c0 '
NameError: global name 'values_dict' is not defined

The error is solved by changing "values_dict" in lib/pdfsizeopt/main.py lines 6327 and 6329 to "cmd_values_dict"

Calling pdfsizeopt out of Emacs: "info: this Ghostscript does not work"

I've been using pdfsizeopt on Windows and Linux for years. After a fresh installation of the Win 8.1 PC, pdfsizeopt can no longer be called from Emacs! I went to emacs.stackexchange.com, asked, even offered a bounty, no result.

I can call pdfsizeopt from command line, no problem. But called from Emacs, I only get the error message that Ghostscript does not work.

I'd be very glad to get it working again. For the details, would you please have a look at the above mentioned post at emacs.stackexchange.com?

Any help much appreciated!

Another file with a blank page

Dear @pts,

Packt is a publisher that releases for free one ebook each day. One of their recent freebies, "Android Design Patterns", after being processed with pdfsizeopt, gets a blank cover (i.e., page 1).

Unfortunately, extracting the first page with pdftk and processing only that page it makes the problem go away, which is why I am unable to attach it here.

I am sending you a link to the entire file in private, so that you can see what may be wrong with it.

Thanks,

Rogério.

Monochrome book whose pages get inverted after running pdfsizeopt

Hi, @pts.

I have some (actually, a lot) of files where pdfsizeopt misbehaves. The most recent of these is a page with only black and white pages where pdfsizeopt turns the pages from black text on a white background to white text on a black background, which is kind of unexpected.

I'm running pdfsizeopt with the following options:

pdfsizeopt --use-pngout=no --use-multivalent=no file.pdf

I suspect that the problem lies with ghostscript extracting the PNG files from the PDF (I pressed Ctrl-Z while the PNG files were being created and fired up the graphical viewer Eye of MATE and the pictures were already with a black background). I'm using everything from Ubuntu's zesty distribution (I can give you the precise versions if you want).

Since the file is copyrighted, I can send it directly to you if you want. In fact, I may send you many other files and the problems that pdfsizeopt has with them.

Thanks,

Rogério Brito.

Merging subsetted fonts

I have a XeLaTeX document that is including the same font multiple times, and I'm trying to merge those subsets. If I have

\documentclass{article}
\begin{document}
sub document page 1
\end{document}

and

\documentclass{article}
\usepackage{pdfpages}
\begin{document}
main document page 1
\includepdf{subdocument}
\end{document}

and compile both with XeLaTeX, the main document has the fonts GJZFOO+CMR10 and EKQAMG+CMR10 (according to the command pdffonts). If I then run
python pdfsizeopt --use-multivalent=no --use-pngout=no --use-jbig2=no pdffonts.pdf
I see (among other output)
info: merged fonts ['/EKQAMG+CMR10', '/GJZFOO+CMR10'], reduced char count from 29 to 16 (55%)
but the resulting document still has both fonts. Am I missing something? Shouldn't the two subsets be merged into one? Or is that actually happening, and I'm misinterpreting the information?

This sounds related to this google code archive issue, which was closed for lack of input.

I've attached the reduced pdf, if that helps things.
pdffonts.pso.pdf

Numerous missing glyphs

I've had trouble with math symbols disappearing from my document. I've trimmed it down to the following, which has many symbols disappearing, not just math symbols. I'm compiling these two documents with XeLaTeX and then running
python relpath/pdfsizeopt --use-pngout=no --use-jbig2=no --use-multivalent=no myfile.pdf
In the output, all letters are gone, the decimal period is gone, scriptsize numbers are gone, and textsize 0123456789 has become qrs76t32uv. What I find strange is that the output pso file is ok if I do either of the following:

  1. Use pdfLaTeX for either or both TeX files
  2. Remove either of the two images

Is there a way I can keep all of the symbols?

This is possibly related to #11 (and indeed using --do-unify-fonts=no gives the correct output), but my purpose in using pdfsizeopt is to remove duplicate fonts from the included pdf.

included.tex.txt
myfile.tex.txt
myfile.pdf
myfile.pso.pdf

Java needed by Multivalent not found

When I try to run pdfsizeopt, I always get the error "Java needed by Multivalent not found. Specify --use-multivalent=no or install Java (JRE) or Avian". Unfortunately using the option "--use-multivalent=no" leads to the exact same error.

It seems like adding the option won't have any effect. Also I have JRE installed on my system. For example "java -version" leads to the output "java version 1.8.0_20"

Another pdfTeX text with pdfsizeopt.main.PdfXrefStreamError

Dear @pts,

I just discovered another file that gives pdfsizeopt a hard time. Once again, it is a file that was produced by pdfTeX.

Here is what I get:

info: This is pdfsizeopt rUNKNOWN size=358996.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: PythonScientific-simple.pdf
info: loaded PDF of 16267367 bytes
info: found 7674 obj offsets and 34 obj streams in xref stream
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 37, in <module>
    sys.exit(main.main(sys.argv))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 8783, in main
    ).Load(file_name)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3514, in Load
    do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3905, in ParseUsingXref
    xref_ofs, xref_obj_num, xref_generation)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 3848, in ParseUsingXrefStream
    compressed_obj_start))
pdfsizeopt.main.PdfXrefStreamError: location mismatch for objstm obj 175: objstm obj 168 has index 7, xref stream has (830, 63)

I'm uploading the file here.

PythonScientific-simple.pdf

Parsing of trailer fails when there is comment after it before startxref

When using pdfsizeopt --stats on PDF file containing comment before startxref [1] it fails with [2].
Such PDFs are created by IText library. When the comment is removed, the pdf file is correctly parsed.

[1]

trailer
<</Info 3 0 R/ID [<009aac09acfefa7066ed1a0fd41b64e7><6b21544f90dd1ecd382c0b5bf431f9fa>]/Root 7 0 R/Size 8>>
%iText-5.5.8
startxref
17349
%%EOF

I have uploaded whole PDF file
demo-compressed-invalid-fixed.pdf

[2]

info: This is pdfsizeopt rUNKNOWN size=321291.
info: computing statistics for PDF: demo-with-comment-after-trailer.pdf
info: PDF size is 17669 bytes
Traceback (most recent call last):
  File "./pdfsizeopt", line 30, in <module>
    sys.exit(main.main(sys.argv))
  File "./lib/pdfsizeopt/main.py", line 8002, in main
    PdfData.ComputePdfStatistics(file_name=args[0])
  File "./lib/pdfsizeopt/main.py", line 7130, in ComputePdfStatistics
    setitem_callback=SetItemCallback)
  File "./lib/pdfsizeopt/main.py", line 6935, in ParseSequentially
    data, start=i, end_ofs_out=end_ofs_out)
  File "./lib/pdfsizeopt/main.py", line 996, in ParseTrailer
    'bad trailer data: %r' % data[start : start + 256])
pdfsizeopt.main.PdfTokenParseError: bad trailer data: 'trailer\n<</Info 3 0 R/ID [<009aac09acfefa7066ed1a0fd41b64e7><6b21544f90dd1ecd382c0b5bf431f9fa>]/Root 7 0 R/Size 8>>\n%iText-5.5.8\nstartxref\n17349\n%%EOF\n'

pdfsizeof fails with /MultipleFontsDefined

bok%3A978-3-642-76235-2.pdf is uploaded here: https://www.dropbox.com/s/fovle3bx2hz0o0o/bok%253A978-3-642-76235-2.pdf?dl=0

WARNING: The ebook is personally used and uploaded ONLY for debugging. It's NOT authorized to publicize it to anywhere else.

% pdfsizeopt bok%3A978-3-642-76235-2.pdf
info: This is pdfsizeopt rUNKNOWN size=329129.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: bok%3A978-3-642-76235-2.pdf
info: loaded PDF of 41970242 bytes
info: separated to 4315 objs + xref + trailer
info: found 8 Type1 fonts loaded
info: writing Type1CConverter (286830 font bytes) to: psotmp.4296.conv.tmp.ps
info: using Ghostscript gs: GPL Ghostscript 9.06 (2012-08-08)
info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=psotmp.4296.conv.tmp.pdf -f psotmp.4296.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 906 20120808
Type1CConverter: converting font /Times-Roman to /Obj0000000015
Type1CConverter: converting font /Helvetica to /Obj0000000029
Type1CConverter: converting font /Times-Italic to /Obj0000000055
Type1CConverter: converting font /Times-Bold to /Obj0000000068
Type1CConverter: converting font /HelveticaBold to /Obj0000000185
Error: /undefined in --get--
Operand stack:
--nostringval-- --dict:2/100(ro)(L)-- invalidfileaccess --dict:30/32(L)-- MultipleFontsDefined
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1910 1 3 %oparray_pop 1909 1 3 %oparray_pop 1893 1 3 %oparray_pop 1787 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1169/1684(ro)(G)-- --dict:0/20(G)-- --dict:92/200(L)--
Current allocation mode is local
Current file position is 219634
GPL Ghostscript 9.06: Unrecoverable error, exit code 1
info: Type1CConverter failed, status=0x100
Traceback (most recent call last):
File "/home/gaussfrank/pdfsizeopt.try/pdfsizeopt", line 30, in
sys.exit(main.main(sys.argv))
File "/home/gaussfrank/pdfsizeopt.try/lib/pdfsizeopt/main.py", line 8218, in main
pdf.ConvertType1FontsToType1C()
File "/home/gaussfrank/pdfsizeopt.try/lib/pdfsizeopt/main.py", line 5341, in ConvertType1FontsToType1C
TMP_PREFIX + 'conv.tmp.ps', TMP_PREFIX + 'conv.tmp.pdf')
File "/home/gaussfrank/pdfsizeopt.try/lib/pdfsizeopt/main.py", line 5065, in GenerateType1CFontsFromType1
assert False, 'Type1CConverter failed (status)'
AssertionError: Type1CConverter failed (status)

read encrypted and password-protected PDFs

This is a feature request. Currently pdfsizeopt fails wit the message encrypted PDF input not supported when it encounters an encrypt or password-protected PDF input file. It should be able to decrypt and process it instead (possibly using a password specified in the command-line).

Don't do this, it does too many unwanted and unreliable transformations: gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=unencrypted.pdf -c .setpdfwrite -f encrypted.pdf

Ghostscript has the -sPDFPassword=... flag which may be used, but then we need to apply some tricks (such as pdf2dsc) to read the raw object values and streams.

Some command-lines which avoid unwanted transformations:

  • qpdf --password=YOURPASSWORD-HERE --decrypt input.pdf output.pdf
  • pdftk input.pdf output output.pdf user_pw YOURPASSWORD-HERE
  • pdftk input.pdf output output.pdf user_pw YOURPASSWORD-HERE owner_pw YOURPASSWORD-HERE
  • pdftk input.pdf output output.pdf input_pw YOURPASSWORD-HERE

Adding qpdf to pdfsizeopt_libexec is feasible, it's 1377748 bytes (1.3 MB) when compiled with xstatic g++ -O2.

FYI qpdf generates a random 2nd element of the /ID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.