ccnmtl / fdfgen Goto Github PK

port of PDF fdfgen library for filling in PDF forms to Python

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

fdfgen's Introduction

fdfgen

Python port of the PHP forge_fdf library by Sid Steward

PDF forms work with FDF data. I ported a PHP FDF library to Python a while back when I had to do this and released it as fdfgen. I use that to generate an fdf file with the data for the form, then use pdftk to push the fdf into a PDF form and generate the output.

QUICK INSTALL

pip install fdfgen

HOW IT WORKS

You (or a designer) design the form.pdf in Acrobat.
Mark the form fields and take note of the field names. This can be done either through Acrobat or by installing pdftk and entering the command line
```
 pdftk [pdf name] dump_data_fields
```

Let's say your form has fields "name" and "telephone".

Use fdfgen to create a FDF file:

 #!/usr/bin/env python
 from fdfgen import forge_fdf
 
 fields = [('name', 'John Smith'), ('telephone', '555-1234')]
 fdf = forge_fdf("",fields,[],[],[])
 
 with open("data.fdf", "wb") as fdf_file:
     fdf_file.write(fdf)

Then you run pdftk to merge and flatten:
```
pdftk form.pdf fill_form data.fdf output output.pdf flatten
```
and a filled out, flattened (meaning that there are no longer editable form fields) pdf will be in output.pdf.

CHANGELOG

0.16.1 -- 2017-11-21 -- Fix TypeError in python 3.6 by Tom Grundy (@caver456)
0.16.0 -- 2017-02-22 -- Allow for different values for each checkbox by [email protected]
0.15.0 -- 2016-09-23 -- Encode field names as UTF-16 fix by Andreas Pelme [email protected]
0.14.0 -- 2016-08-09 -- Adobe FDF Compatibility added by Cooper Stimson (@6C1)
0.13.0 -- 2016-04-22 -- python 3 bugfix from Julien Enselme [email protected]
0.12.1 -- 2015-11-01 -- handle alternative checkbox values fix from Bil Bas https://github.com/Spooner
0.12.0 -- 2015-07-29 -- python 3 bugfixes
0.11.0 -- 2013-12-07 -- python 3 port from Evan Fredericksen
0.10.2 -- 2013-06-16 -- minor code refactor and added command line options from Robert Stewart https://github.com/rwjs
0.10.1 -- 2013-04-22 -- unbalanced paren bugfix from Brandon Rhodes [email protected]
0.10.0 -- 2012-06-14 -- support checkbox fields and parenthesis in strings from Guangcong Luo [email protected]
0.9.2 -- 2011-01-12 -- merged unicode fix from Sébastien Fievet [email protected]

RUNNING TESTS:

Create a virtual environment
tox is required to run the tests. You can install the correct version with pip install -r requirements-tests.txt
Run tox to run tests for all Python versions.
To run a specific test and specific Python versions, you may use tox -e py27 -- tests/test_encoding.py

fdfgen's People

Contributors

Stargazers

Watchers

fdfgen's Issues

Hebrew letters not showing up in filled pdf

Hi,
I'm using your package to generate filled pdf form, but I can't get hebrew to show up. The best I got was 􀀀􀀀􀀀􀀀􀀀
Any ideas?
I used the havarat_gemel.pdf and sample_data.txt (please change extension back to py, I couldn't upload a py file) to generate the fdf (same, I named it data.fdf but renamed to data_.pdf to upload, please rename to .fdf) and filled.pdf files using the following commands:

>>> from fdfgen import forge_fdf
 >>> from forms.sample_data import data
>>> fdf = forge_fdf("",data ,[],[],[])
>>> fdf_file = open("data.fdf", "wb")

and then:

>pdftk forms/templates/havarat_gemel.pdf fill_form data.fdf output filled.pdf flatten

filled.pdf
havarat_gemel.pdf
sample_data.txt

data_.pdf

Fdf file not being created

Edit: Ignore. Error on my end.

TypeError: write() argument must be str, not bytes python 3.7.1

Here's my code:

`import csv
from fdfgen import forge_fdf
import os
import sys

sys.path.insert(0, os.getcwd())
filename_prefix = "NVC"
csv_file = "nvc.csv"
pdf_file = "NVC.pdf"
tmp_file = "tmp.fdf"
output_folder = './output/'

def process_csv(file):
headers = []
data = []
csv_data = csv.reader(open(file))
for i, row in enumerate(csv_data):
if i == 0:
headers = row
continue;
field = []
for i in range(len(headers)):
field.append((headers[i], row[i]))
data.append(field)
return data

def form_fill(fields):
fdf = forge_fdf("",fields,[],[],[])
fdf_file = open(tmp_file,"wb")
fdf_file.write(fdf)
fdf_file.close()
output_file = '{0}{1} {2}.pdf'.format(output_folder, filename_prefix, fields[1][1])
cmd = 'pdftk "{0}" fill_form "{1}" output "{2}" dont_ask'.format(pdf_file, tmp_file, output_file)
os.system(cmd)
os.remove(tmp_file)

data = process_csv(csv_file)
print('Generating Forms:')
print('-----------------------')
for i in data:
if i[0][1] == 'Yes':
continue
print('{0} {1} created...'.format(filename_prefix, i[1][1]))
form_fill(i)`

every time I run this it gives me the above type error. I've tried changing variables, but it doesn't seem to matter any time I try to write the fdf file I get that error.

Problems with parentheses

I recently discovered that parentheses in data values passed to forge_fdf() result in invalid FDF files. (For a discussion of the issue, see http://www.hyperborea.org/journal/2005/10/fdf-errors/.)

I was able to fix the problem in fdfgen by changing smart_encode_str() such that it prefixes the parentheses with a backslash, like so:

def smart_encode_str(s):
return codecs.BOM_UTF16_BE + unicode(s).encode('utf_16_be').replace('\x00)', '\x00)').replace('\x00(', '\x00(')

Please consider making this change in your next release.

Thanks,
Sam

TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'

When I try to check a box in a pdf, I get this error:

Traceback (most recent call last):
  File "./create_pdf.py", line 141, in <module>
    run_script()
  File "./create_pdf.py", line 129, in run_script
    template_type='pdf' 
  File "./create_pdf.py", line 77, in fillOutPdf
    fdf = forge_fdf("", fields, [], [], [])
  File "/home/tmcothran/Python_Projects/virtualenvs/Document_Generator/lib/python3.4/site-packages/fdfgen/__init__.py", line 101, in forge_fdf
    checkbox_checked_name)))
  File "/home/tmcothran/Python_Projects/virtualenvs/Document_Generator/lib/python3.4/site-packages/fdfgen/__init__.py", line 50, in handle_data_strings
    value = b'/%s' % checkbox_checked_name
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'

I tried passing the argument checked_box_name=b"On", and I got this error instead:

File "./create_pdf.py", line 141, in <module>
    run_script()
  File "./create_pdf.py", line 129, in run_script
    template_type='pdf'    
  File "./create_pdf.py", line 77, in fillOutPdf
    fdf = forge_fdf("", fields, [], [], [], checkbox_checked_name=b"On")
  File "/home/tmcothran/Python_Projects/virtualenvs/Document_Generator/lib/python3.4/site-packages/fdfgen/__init__.py", line 101, in forge_fdf
    checkbox_checked_name)))
  File "/home/tmcothran/Python_Projects/virtualenvs/Document_Generator/lib/python3.4/site-packages/fdfgen/__init__.py", line 50, in handle_data_strings
    value = b'/%s' % checkbox_checked_name

I'm very new to programming, and since there's no documentation on the checkboxes, I'm trying to do it from looking at the code and the comments in the code. So I might just be doing it wrong. I'm using Python 3.4.

encoding error with umlauts

whenever the fields in data strings contain umlauts, an error is thrown such as
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 1: ordinal not in range(128)
or the like. Example data fields:
[('id', '12345'), ('lastname', 'Müller')], ('firstname', 'Horst'), ('content', 'Prüfung'), ('title', 'Title with ÄÜÖäüö'), ('date', '21.02.2017')]
Tried out different versions of encoding/decoding, string/unicode, utf8/utf16...
It would be really nice to get this feature back, as creating PDF forms worked perfectly fine in version 0.13.0 before introducing Adobe FDF compatibility and UTF16 encoding.

handle_data_names

I'm wondering if handle_data_names should be converting the value to a string.

currently it does this:
for (key, value) in fdf_data_names:
yield b''.join([b'<<\n/V /', smart_encode_str(value), b'\n/T (',

which generates when passed a value of:

/.../fdfgen/init.py(68)handle_data_names()
-> yield b''.join([b'<<\n/V /', smart_encode_str(value), b'\n/T (',
(Pdb) value
'Off'
(Pdb)

generates fdf that looks like this (as seen in Emacs utf8):
<<
/V /\376\377^@o^@f^@f
/T (\376\377^@U^@s^@ ^@m^@A^@i^@l)
/Clrf 2
/ClrFf 1

but when I use the generated fdf, I get:
linux-3:~/..$ pdftk RoutineLandscaping_100614.pdf fill_form tmp.fdf output ./output/output.pdf
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
at gnu.gcj.runtime.NameFinder.lookup(libgcj.so.14)
at java.lang.Throwable.getStackTrace(libgcj.so.14)
at java.lang.Throwable.stackTraceString(libgcj.so.14)
at java.lang.Throwable.printStackTrace(libgcj.so.14)
at java.lang.Throwable.printStackTrace(libgcj.so.14)

So, I wonder, since handle_data_strings is there to handle strings, if instead handle_data_names should just pass the values like this:
for (key, value) in fdf_data_names:
yield b''.join([b'<<\n/V /', value, b'\n/T (',

doing this, the generated fdf looks like this:

<</V /Off
/T (\376\377^@U^@s^@ ^@m^@A^@i^@l)
/Clrf 2
/ClrFf 1

which works fine with pdftk fill_form.

This is just one example of course. Perhaps I'm passing in a value incorrectly..? but the name and the original php makes me wonder.

Please make a new release and make it available via pypi

Currently the latest release on pypi is from 2013 and there has been support for python3 in the repository in the meanwhile. It would be awsome if https://pypi.python.org/pypi/fdfgen/ could be updated to a later release.
Thanks.

Dependabot couldn't authenticate with https://pypi.python.org/simple/

Dependabot couldn't authenticate with https://pypi.python.org/simple/.

You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.

View the update logs.

Update version on pypi

Give the current source a new version number (including the support for checkboxes), and upload the verison to pypi (http://pypi.python.org/pypi/fdfgen)

Keep handle_data_names()?

It seems that handle_data_strings() can process both checkboxes and radio buttons without any issues, even though this is the stated purpose of handle_data_names(). If that's the case, would it make sense to simply remove handle_data_names() and give the user one less parameter in forge_fdf() to figure out?

`readonly` does not work

Hey,
I am trying to utilize your project for filling a form, where I want to set some of the entities to readonly.

    fields = {'x1': form_data['x1'],
              'x2': form_data['x2'],
              'x3': form_data['x3']}
    fdf = forge_fdf(fdf_data_strings=fields,
                    fields_readonly=fields.keys())

When opening the generated pdf, I noticed that the readonly properties seem to not have been applied properly, as I can still edit the fields.
Looking at the FDF file yields the following:

%FDF-1.2
%
1 0 obj
<</FDF<</Fields[<</T(x1)/V(2022-04-25)/ClrF 2/SetFf 1>><</T(x2)/V(etklj)/ClrF 2/SetFf 1>><</T(x3)/V(lkjlj)/ClrF 2/SetFf 1>>]
>>
>>
endobj
trailer

<<
/Root 1 0 R
>>
%%EOF

Do you know what the reason for this might be?
Best regards
Anton

show up a hello world and basic usage command line: extract info, alter some info, and push it back into the pdf via fdf

Hi,
I come here and that is the use case I am in search for. Like it could do anyone:
show up a hello world and basic usage command line: extract info, alter some info, and push it back into the pdf via fdf
Why is it, with all due respect, that you do not have this basic use case within a testsuite, to see your tool intact. Well, anyway, this usecase is such basic, that any user wants/needs to have it as a tool chain "hello world."
So the request: build a command demo line script, and put it to the docs.

Btw. I would do it, right now. If you help me, it is going to be easier, is not it? First: is it feasible? first and half: are there templates for it? Where? Tia.

3, Second one needed template pdf form for the demo, I could take mine, but that could result, embarrassing, for others. Probably due to the fact that forms in themselves are human-humiliating techniques. So in the end of the talk it resulted less embarrassing not to to nothing in front of these barriers.
Then there comes out a a script file, that does some sparse feedback operations on a pdf form. This elaborate should be attended in a Testsuite. So every user gets the "on key, out of box feedbacked pdf form filling functionality" shipped.
Do you imagine how much user time you could spare, by freeing every new user of doing the course of figuring out his personal "hello World" and keep it hidden after? That make the difference.

Regards and Thanks for your tool

TypeError in python 3.6

I have a program (github.com/ncssar/radiolog) that makes various calls to fdfgen. Python 3.6 (but not 3.4) generates a Type Error on startswith if the value is bytes:

Traceback (most recent call last):
File "radiolog.py", line 3413, in accept
self.parent.parent.printClueReport(clueData)
File "radiolog.py", line 1511, in printClueReport
fdf=forge_fdf("",fields,[],[],[])
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 136, in forge_fdf
checkbox_checked_name)))
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 75, in handle_data_strings
value = FDFIdentifier(checkbox_checked_name).value
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 52, in init
if value.startswith('/'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

and if I try hardcoding line 52 to bytes instead, I get the opposite TypeError:

Traceback (most recent call last):
File "radiolog.py", line 3413, in accept
self.parent.parent.printClueReport(clueData)
File "radiolog.py", line 1511, in printClueReport
fdf=forge_fdf("",fields,[],[],[])
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 136, in forge_fdf
checkbox_checked_name)))
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 77, in handle_data_strings
value = FDFIdentifier('Off').value
File "C:\Python36\lib\site-packages\fdfgen_init_.py", line 52, in init
if value.startswith(b'/'):
TypeError: startswith first arg must be str or a tuple of str, not bytes

I think the solution is just to place the type check and conversion code before the startswith clause instead of after.

Old:

    if value.startswith('/'):
        value = value[1:]

    if isinstance(value, bytes):
        value = value.decode('utf-8')

New: (just swapped the two clauses)

    if isinstance(value, bytes):
        value = value.decode('utf-8')

    if value.startswith('/'):
        value = value[1:]

This works in my program on python 3.6, but, I'm not an expert. Let me know if I should do this as a fork and pull request instead.

Thanks. This project has been very helpful!

Missing proper License file

The lack of a proper License file doesn't allow fdfgen to be packaged (in my case for openSUSE).
Picking one published on spdx.org would be a plus.

Hierarchical fields

I needed to fill forms (e.g., US tax forms) with hierarchical fields (e.g., person1.address.city). This doesn't seem to be supported in fdfgen. So I modified the handle_data_strings function to handle inputs such as this:

{'person1': {
    'address': {
        'city': 'NewYork',
        'zip':  '12345'
    }
}}

def handle_data_strings(fdf_data_strings, fields_hidden, fields_readonly,
                        checkbox_checked_name):
    if isinstance(fdf_data_strings, dict):
        fdf_data_strings = fdf_data_strings.items()

    for (key, value) in fdf_data_strings:
        if value is True:
            value = b'/V' + FDFIdentifier(checkbox_checked_name).value
        elif value is False:
            value = b'/V' + FDFIdentifier('Off').value
        elif isinstance(value, FDFIdentifier):
            value = b'/V' + value.value
        elif isinstance(value, list) or isinstance(value, tuple) or isinstance(value, dict):
            value = \
                b'/Kids[\n' \
                + b''.join(handle_data_strings(value, fields_hidden, fields_readonly,
                                               checkbox_checked_name)) \
                + b']'
        else:
            value = b'/V' + b''.join([b'(', smart_encode_str(value), b')'])

        yield b''.join([
            b'<<',
            b'/T(',
            smart_encode_str(key),
            b')',
            value,
            handle_hidden(key, fields_hidden),
            b'',
            handle_readonly(key, fields_readonly),
            b'>>\n',
        ])

I tested it only briefly. So no guarantees. Feel free to integrate it into the code base.
An alternative is to use the somewhat simpler (IMHO) xfdf form data format (also supported by pdftk).

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<fields>
<field name="person1">
	<field name="address">
		<field name="city">
			<value>NewYork</value>
		</field>
		<field name="zip">
			<value>12345</value>
		</field>
	</field>
</field>
</fields>
</xfdf>

I wrote a simple hand-coded Python function to generate that and used pdftk to fill the form. I can share the function is anyone is interested.

forge_fdf generates weird characters

Forge_fdf generates weird characters in keys and values of the data I give it.
Since it changes the keys, the output pdf forms aren't modified.

When manually removing these characters, it works very well.

Characters as seen by notepad++ :

Tagged PDFs

Hi.
I need to create PDFs that are 508 compliant. My current PDF generator supports everything but Tagged PDFs, does fdfgen support Tagged PDFs? I didn't see it in the reference documentation.
Thanks