mlodic / pdfid Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 4.0 104 KB

License: MIT License

Python 100.00%

pdfid's People

Contributors

Stargazers

Watchers

Forkers

pombredanne iv1t3 vhn0912 dealbreaker973

pdfid's Issues

pdf disarm doesn't work

Hi, I got the following output when I ran the provided test file, where I tried to check the content of the disarmed_pdf_buffers = disarm_pdfs_by_buffer(filenames, file_buffers) by printing it out:

STARTING DISARM
/JS -> /js
PDFiD 0.2.7 ./Dante.pdf
 PDF Header: %PDF-1.7
 obj                   20
 endobj                20
 stream                 5
 endstream              5
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                0
 /JS                    1
 /JavaScript            0
 /AA                    0
 /OpenAction            0
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0

{'buffers': []} <-- nothing in the returned buffer

And in testing 3.1, analyze_pdfs_by_buffer actually loaded the file by filename instead of checking the sanitized buffer, which I believe is not the expected behavior.

Hi, I got the following error when testing the library on a pdf exploit generated by Metasploit module exploit/windows/fileformat/adobe_pdf_embedded_exe_nojs: NameError: name 'name' is not defined.

I believe that the error was introduced in the following line:

pdfid/pdfid/pdfid.py

Line 698 in f7674ff

    
           filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))

Deleting L697 and L698 will fix the issue.

Error 'NoneType' object has no attribute 'append' since upgradding to 1.1.0

Hello,

I use pdfid with a django project but since I have upgraded to version 1.1.0, I got a lot of error 'NoneType' object has no attribute 'append'

This is caused by the line 577: disarmed_filebuffers.append(disarmed_pdf_buffer.getvalue())
Somebody knows how to fix this ?

Thank you

Unbound variable when scanning PDF with hex characters

If a PDF is given with hex characters (for example obfuscated JS tags like /JavaScript --> /#4AavaScript), the following error is encountered:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 1096, in PDFiDMain
    ProcessFile(filename, options, plugins, list_of_dict["reports"], disarmed_buffers["buffers"])
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 819, in ProcessFile
    PDFID2Dict(xmlDoc, options.nozero, options.force, list_of_dict)
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 698, in PDFID2Dict
    filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
NameError: name 'name' is not defined

The bit of code responsible for this is in the function PDFID2Dict here where in line 698 it references a variable name that does not exist within the scope of the function (or anywhere else for that matter):

pdfid/pdfid/pdfid.py

Lines 683 to 720 in f7674ff

    
           def PDFID2Dict(xmlDoc, nozero, force, list_of_dict): 
        
               filename_dict = {} 
        
               filename_dict['version'] = xmlDoc.documentElement.getAttribute('Version') 
        
               filename_dict['filename'] = xmlDoc.documentElement.getAttribute('Filename') 
        
               if xmlDoc.documentElement.getAttribute('ErrorOccured') == 'True': 
        
                   filename_dict['error_occured'] = xmlDoc.documentElement.getAttribute('ErrorMessage') 
        
                   return 
        
               if not force and xmlDoc.documentElement.getAttribute('IsPDF') == 'False': 
        
                   filename_dict['error_occured'] = ' Not a PDF document\n' 
        
                   return 
        
               filename_dict['header'] = xmlDoc.documentElement.getAttribute('Header') 
        
               for node in xmlDoc.documentElement.getElementsByTagName('Keywords')[0].childNodes: 
        
                   if not nozero or nozero and int(node.getAttribute('Count')) > 0: 
        
                       filename_dict[node.getAttribute('Name')] = int(node.getAttribute('Count')) 
        
                       if int(node.getAttribute('HexcodeCount')) > 0: 
        
                           filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount')) 
        
               if xmlDoc.documentElement.getAttribute('CountEOF') != '': 
        
                   filename_dict['eof'] = int(xmlDoc.documentElement.getAttribute('CountEOF')) 
        
               if xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF') != '': 
        
                   filename_dict['after_last_eof'] = int(xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF')) 
        
               for node in xmlDoc.documentElement.getElementsByTagName('Dates')[0].childNodes: 
        
                   filename_dict[node.getAttribute('Value')] = node.getAttribute('Name') 
        
               if xmlDoc.documentElement.getAttribute('TotalEntropy') != '': 
        
                   filename_dict['entropy'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('TotalEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('TotalCount') 
        
                   } 
        
               if xmlDoc.documentElement.getAttribute('StreamEntropy') != '': 
        
                   filename_dict['entropy_inside_streams'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('StreamEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('StreamCount') 
        
                   } 
        
               if xmlDoc.documentElement.getAttribute('NonStreamEntropy') != '': 
        
                   filename_dict['entropy_outside_streams'] = { 
        
                       "total": xmlDoc.documentElement.getAttribute('NonStreamEntropy'), 
        
                       "bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('NonStreamCount') 
        
                   } 
        
               list_of_dict.append(filename_dict)

I cannot provide a fix since I do not know what name is supposed to be in the first place. If anyone can help, that would be much appreciated. :)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def PDFID2Dict(xmlDoc, nozero, force, list_of_dict):
	filename_dict = {}
	filename_dict['version'] = xmlDoc.documentElement.getAttribute('Version')
	filename_dict['filename'] = xmlDoc.documentElement.getAttribute('Filename')
	if xmlDoc.documentElement.getAttribute('ErrorOccured') == 'True':
	filename_dict['error_occured'] = xmlDoc.documentElement.getAttribute('ErrorMessage')
	return
	if not force and xmlDoc.documentElement.getAttribute('IsPDF') == 'False':
	filename_dict['error_occured'] = ' Not a PDF document\n'
	return
	filename_dict['header'] = xmlDoc.documentElement.getAttribute('Header')
	for node in xmlDoc.documentElement.getElementsByTagName('Keywords')[0].childNodes:
	if not nozero or nozero and int(node.getAttribute('Count')) > 0:
	filename_dict[node.getAttribute('Name')] = int(node.getAttribute('Count'))
	if int(node.getAttribute('HexcodeCount')) > 0:
	filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
	if xmlDoc.documentElement.getAttribute('CountEOF') != '':
	filename_dict['eof'] = int(xmlDoc.documentElement.getAttribute('CountEOF'))
	if xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF') != '':
	filename_dict['after_last_eof'] = int(xmlDoc.documentElement.getAttribute('CountCharsAfterLastEOF'))
	for node in xmlDoc.documentElement.getElementsByTagName('Dates')[0].childNodes:
	filename_dict[node.getAttribute('Value')] = node.getAttribute('Name')
	if xmlDoc.documentElement.getAttribute('TotalEntropy') != '':
	filename_dict['entropy'] = {
	"total": xmlDoc.documentElement.getAttribute('TotalEntropy'),
	"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('TotalCount')
	}
	if xmlDoc.documentElement.getAttribute('StreamEntropy') != '':
	filename_dict['entropy_inside_streams'] = {
	"total": xmlDoc.documentElement.getAttribute('StreamEntropy'),
	"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('StreamCount')
	}
	if xmlDoc.documentElement.getAttribute('NonStreamEntropy') != '':
	filename_dict['entropy_outside_streams'] = {
	"total": xmlDoc.documentElement.getAttribute('NonStreamEntropy'),
	"bytes": '%10s bytes' % xmlDoc.documentElement.getAttribute('NonStreamCount')
	}
	list_of_dict.append(filename_dict)