Code Monkey home page Code Monkey logo

lipdf's Introduction

lipdf

lipdf is a core go library for filling PDF forms using pdftk

Installation

install pdftk

说明

generate_fdf获取fdf文件并读取

pdf中可交互表单数据格式为FDF(Forms Data Format)

利用pdftk指令generate_fdf可以从PDF文件中导出fdf数据

pdftk in.pdf generate_fdf output out.fdf

导出后的fdf示例数据如下, 其中Fields表示填充字段,Kids表示子填充字段


%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/Kids [
<<
/V /
/T (marital wid)
>> 
<<
/V ()
/T (ident cntry)
>>]
/T (ap)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

观察fdf数据,可以提取的有用信息如下, 每一层括号表示一个子类


[[marital wid, ident cntry]ap]

因此在 core.fdf.goreadFormFields方法利用栈的对称性可以 获得fdf的有效key集合

package core

import (
	"bufio"
	"container/list"
	"context"
	"fmt"
	"io"
	"io/ioutil"
	"log"
	"os"
	"os/exec"
	"path/filepath"
	"strings"
	"time"
)

// read and parse pdf form field keys
func readFormFields(filePath string) (map[string]struct{}, error) {
	file, err := os.Open(filePath)
	if err != nil {
		return nil, fmt.Errorf("fail to open file:%v", err)
	}
	defer file.Close()

	reader := bufio.NewReader(file)

	l := list.New()
	var line string
	for {
		line, err = reader.ReadString('\n')
		if err != nil {
			break
		}

		if strings.Contains(line, "[")  {
			l.PushBack("[")
		}
		if strings.Contains(line, "]")  {
			l.PushBack("]")
		}

		if strings.HasPrefix(line, "/T")  {
			l.PushBack(strings.TrimSuffix(strings.TrimPrefix(line, "/T ("), ")\n"))
		}
	}
	if err != io.EOF {
		return nil, fmt.Errorf("fail to read file:%v", err)
	}

	if l.Len() < 2 {
		return nil, nil
	}

	//trim first "["
	l.Remove(l.Front())
	//trim last "]"
	l.Remove(l.Back())

	keys := make(map[string]struct{})
	prefixes := make([]string, 0, 1)
	for l.Len() > 0 {
		str := l.Back().Value.(string)
		if str != "]" && str != "[" {

			//prev value
			prev := l.Back().Prev()
			if prev == nil {
				keys[str] = struct{}{}
				break
			}

			//last prefix
			prefix := ""
			if len(prefixes) > 0 {
				prefix = prefixes[len(prefixes)-1]
			}

			prevStr:= prev.Value.(string)
			if prevStr == "]" {
				//just prefix, don't need to add to keys
				if len(prefix) > 0 {
					prefix = fmt.Sprintf("%s.%s", prefix, str)
				}else {
					prefix = str
				}
				prefixes = append(prefixes, prefix)
			}else {

				// add to keys
				k := ""
				if len(prefix) > 0 {
					k = fmt.Sprintf("%s.%s", prefix, str)
				}else {
					k = str
				}
				keys[k] = struct{}{}

				// [[[a]b]c]
				if prevStr == "[" {
					for prevStr == "[" {
						prefixes = prefixes[0:len(prefixes)-1]
						prev = prev.Prev()
						if prev != nil {
							prevStr = prev.Value.(string)
						}else {
							prevStr = ""
						}
					}
				}
			}
		}
		l.Remove(l.Back())
	}

	return keys, nil
}

dump_data_fields_utf8获取dump数据

利用pdftk指令dump_data_fields_utf8可以从PDF文件中导出dump_field数据

pdftk in.pdf dump_data_fields_utf8 output out.fdf

导出的数据示例如下:


---
FieldType: Text
FieldName: ap.info dtl 1
FieldFlags: 8392704
FieldJustification: Left
---
FieldType: Text
FieldName: ap.new dtl 1
FieldFlags: 8392704
FieldJustification: Left

利用pdftk.fill_form填充PDF表单

利用pdftk.fill_form指令,利用生成的fdf文件填充PDF表单

pdftk form.pdf fill_form data.fdf output form.filled.pdf

lipdf's People

Contributors

sunlidea avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.