Code Monkey home page Code Monkey logo

dfdewey's Introduction

dfDewey

dfDewey is a digital forensics string extraction, indexing, and searching tool.

Usage

Requirements

bulk_extractor

dfDewey currently requires bulk_extractor for string extraction.

bulk_extractor can be installed from the GIFT PPA.

sudo add-apt-repository ppa:gift/stable
sudo apt update
sudo apt install -y bulk-extractor

bulk_extractor can also be downloaded and built from source here: https://github.com/simsong/bulk_extractor

Note: bulk_extractor v2.0.3 or greater is required.

dfVFS

dfVFS is required for image parsing. It can be installed from the GIFT PPA.

sudo add-apt-repository ppa:gift/stable
sudo apt update
sudo apt install -y python3-dfvfs

It can also be installed using pip:

pip install -r dfvfs_requirements.txt

Datastores

OpenSearch and PostgreSQL are also required to store extracted data. These can be installed separately or started in Docker using docker-compose.

cd docker
sudo docker-compose up -d

Note: To stop the containers (and purge the stored data) run sudo docker-compose down from the docker directory.

dfDewey will try to connect to datastores on localhost by default. If running datastores on separate servers, copy the config file template dfdewey/config/config_template.py to ~/.dfdeweyrc and adjust the server connection settings in the file. You can also specify a different config file location on the command line using -c.

Installation

python setup.py install

Note: It's recommended to install dfDewey within a virtual environment.

dfdewey's People

Contributors

dfjxs avatar jaegeral avatar rgayon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dfdewey's Issues

Change the way images are identified

Currently images are identified by path, but when invoked by Turbinia the path is going to be a loopback device and will be the same for all images processed.

The image hash (MD5) is used to name the index and database for each image since the hash is calculated by bulk_extractor during processing, but identifying images this way would mean calculating the hash for each search run.

Flush Filesystem DB

Filesystem DB should be flushed to disk before indexing begins

Also add option to restart indexing even if the image has already been processed in case of error

Have a timestamp at starting point

It would be really handy to have a timestamp written to STDout once the Attempt to open image is finished.

So after you have run the tool you do not only have the completed times but also the started times.

Unit tests

Write unit tests to cover the refactored code.

FR: Add json output option

If you want to process dfdewey output in another place, having machine-readable output is helpful. Add a command line option to output JSON instead of a text table.

Consider lowering the amount of ram given to elastic search

I couldn't start the elasticsearch docker container in a cloud environment (with limited RAM) for the codelab:

2020-05-07T12:28:45.506740748Z # Native memory allocation (mmap) failed to map 34272509952 bytes for committing reserved memory.

I had to revert to more conservative values such as :

      - ES_JAVA_OPTS=-Xms512m -Xmx512m

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.