Light

google / dfdewey Goto Github PK

View Code? Open in Web Editor NEW

17.0 4.0 9.0 330 KB

License: Apache License 2.0

Python 98.81% Dockerfile 1.14% Shell 0.04%

digital-forensics forensics dfir

dfdewey's Introduction

dfDewey

dfDewey is a digital forensics string extraction, indexing, and searching tool.

Requirements

bulk_extractor

dfDewey currently requires bulk_extractor for string extraction.

bulk_extractor can be installed from the GIFT PPA.

sudo add-apt-repository ppa:gift/stable
sudo apt update
sudo apt install -y bulk-extractor

bulk_extractor can also be downloaded and built from source here: https://github.com/simsong/bulk_extractor

Note: bulk_extractor v2.0.3 or greater is required.

dfVFS

dfVFS is required for image parsing. It can be installed from the GIFT PPA.

sudo add-apt-repository ppa:gift/stable
sudo apt update
sudo apt install -y python3-dfvfs

It can also be installed using pip:

pip install -r dfvfs_requirements.txt

Datastores

OpenSearch and PostgreSQL are also required to store extracted data. These can be installed separately or started in Docker using docker-compose.

cd docker
sudo docker-compose up -d

Note: To stop the containers (and purge the stored data) run sudo docker-compose down from the docker directory.

dfDewey will try to connect to datastores on localhost by default. If running datastores on separate servers, copy the config file template dfdewey/config/config_template.py to ~/.dfdeweyrc and adjust the server connection settings in the file. You can also specify a different config file location on the command line using -c.

Installation

python setup.py install

Note: It's recommended to install dfDewey within a virtual environment.

dfdewey's People

Contributors

Stargazers

Watchers

Forkers

rgayon neotim isabella232 bskousen jaegeral blue-infosec dfjxs ghas-results

dfdewey's Issues

Highlight search terms in output

Highlight (colour) search terms in the output string.

Change the way images are identified

Currently images are identified by path, but when invoked by Turbinia the path is going to be a loopback device and will be the same for all images processed.

The image hash (MD5) is used to name the index and database for each image since the hash is calculated by bulk_extractor during processing, but identifying images this way would mean calculating the hash for each search run.

Refactor image util

Flush Filesystem DB

Filesystem DB should be flushed to disk before indexing begins

Also add option to restart indexing even if the image has already been processed in case of error

Have a timestamp at starting point

It would be really handy to have a timestamp written to STDout once the Attempt to open image is finished.

So after you have run the tool you do not only have the completed times but also the started times.

Unit tests

Write unit tests to cover the refactored code.

FR: Add json output option

If you want to process dfdewey output in another place, having machine-readable output is helpful. Add a command line option to output JSON instead of a text table.

Consider lowering the amount of ram given to elastic search

I couldn't start the elasticsearch docker container in a cloud environment (with limited RAM) for the codelab:

2020-05-07T12:28:45.506740748Z # Native memory allocation (mmap) failed to map 34272509952 bytes for committing reserved memory.

I had to revert to more conservative values such as :

      - ES_JAVA_OPTS=-Xms512m -Xmx512m

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.