Code Monkey home page Code Monkey logo

pdf_tools's Introduction

Introduction

PDF tools by PyPDF2.
So far only merger.py is availiable.
现在只有合并多个PDF的程序merger.py

Details

Visit my blog for details:
利用PyPDF批量处理PDF文件——合并篇

New PDF Merger Usage(merger.py)

用之前改程序中的一些变量:

# 本程序依赖于我自己编写的IndexMap模块运行。
# 详情见 https://github.com/Co1lin/Practical-Python-Scripts/tree/master/IndexMap
# 请确保IndexMap.py与这个程序在同一层目录下。
files = IndexMap()	# 创建IndexMap实例

# 设定PDF文件所在目录
files.set_dir('/path/to/dir_of_PDF_files')

# 正则匹配表达式列表(下面的配置从左起匹配第一个数字(连续数字如123视为一个数字123)作为序号)
RE_LIST = [
    '\d+'
]
files.set_patterns(RE_LIST)	# 将正则表达式应用于IndexMap实例
files.construct_map()	# IndexMap实例构建映射

merger = PdfMerger(files, 'output.pdf')	# 'output.pdf'字符串指定了输出PDF的文件名

Old PDF Merger Usage(pdfmerger.py):

Edit some values before run pdfmerger.py in __init__ method:

在运行程序之前,先设定一些__init__方法中重要的变量:
self.DIR_PATH = your directory where PDF files exist (DON'T end with "/")(放PDF文件的目录地址。最后不要加斜杠!)

Fill the list of regular expressions:
填正则表达式列表(这些表达式会被依次执行,每次执行的结果作为下一次的输入):

RE_LIST = [
    
]

You can fill \d+ between [ and ] if you are lucky.
如果你的文件名中从头开始的第一个数字(连续数字如123视为一个数字)便是你想要的序号,那么你只需要在中括号里填入一项\d+即可。

The start index that can be matched of the PDF:
需要合并的PDF文件的文件名中可以提取出来的起始的序号(比如你是xxx-002.pdf是第一个文件那就填2):
self.START_INDEX = 1 # 1 by default

A value greater than the number of PDFs:
一个大于需要合并的PDF文件总数的数值:
self.INF_VALUE = 100000 # usually don't need to change unless you have toooo many PDFs

You can change the codes in __get_index method for other(more complex) regular expression matching.
如果你想实现更复杂的正则匹配功能,你可以修改__get_index方法中的代码。

Set the output file name and working type:

设置输出文件名称和工作模式:
merger.work('output.pdf', work_type = 3)

The output file is output.pdf in the directory you set before(where your original PDF files exist).
此时输出文件为output.pdf,放在原来存放PDF的文件夹里。

You can keep the work_type value.
你可以不改变work_type的现有值。

pdf_tools's People

Contributors

co1lin avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

pdf_tools's Issues

OSError: [Errno 24] Too many open files

在终端下执行,合并方法选用work2方法,合并操作执行到某个文件时会报这个错误。

OSError: [Errno 24] Too many open files: 'xxx.pdf'

在PyCharm中运行正常,遇不到这个问题。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.