Code Monkey home page Code Monkey logo

dupin's Introduction

Dupin

But it is in matters beyond the limits of mere rule that the skill of the analyst is evinced. He makes in silence a host of observations and inferences....

— Edgar Allan Poe, The Murders in the Rue Morgue

Dupin is a tool to help discover secrets in Git repositories.

It is designed to be used as a tool for regularly scanning an organisation's public Git repositories, notifying a nominated email address when it finds anything that looks suspicious.

Quickstart

Install Dupin from source with pip install <path-to-dupin>. (virtualenv is recommended)

For these examples we'll use ~/.dupin as our root directory, you can use anything that makes sense for you.

ROOT=~/.dupin
# sets up a directory for Dupin to store its repositories and results
dupin setup --root $ROOT

# stores a list of your organisation's public repos
dupin update-repos --root $ROOT organisation-name
# if you get rate limit errors you'll need to provide a Github
# token with the --token argument

# scan all repositories in the list for secrets, logs and shows results
dupin auto-scan-all --root $ROOT
# this logs what it finds in the $ROOT/results directory and the
# details to the console
# it's also possible to email reports, more details below and in the
# config section

Installation

Dupin is an installable package Python package, but is not hosted in public Python repositories. You can clone the source code and then use pip to install Dupin. This will also install its dependencies.

As ever, it's better to install Dupin into a virtual environment. This prevents Dupin's dependencies from creating problems with other Python software on your machine.

git clone [email protected]:guardian/dupin.git

# via a virtualenv, or globally (may require sudo)
pip install dupin

You should then be able to run dupin.

AWS

This repository includes a CloudFormation template which creates an EC2 instance that runs Dupin on a schedule. If you have an AWS account this is the easiest way to run Dupin.

Usage

Dupin offers several commands. Check the program's main file for full info, the main commands are described below.

Note: many of these commands interact with Dupin's directory structure. More information about the layout Dupin uses to store data is available below, in the Directory structure section.

Global arguments

These arguments apply to many/all of Dupin's commands.

--root

Sets the root directory for Dupin's directory structure.

--config

By default, this is read from ROOT/config if a root is provided.

You may instead provide a custom location. This should point to a yaml file that contains Dupin's config.

setup

The setup command initialises Dupin's directory structure. If you're using any of the features Dupin offers that depend on the data it has stored (likely) you'll need to run this command first.

Examples:

duping setup --root ~/.dupin

update-repos

This command looks up an organisation's public repositories on Github and writes them to a file.

Examples:

# provide args via a config file at ~/.dupin/config
dupin update-repos --root ~/.dupin
# provide args explicitly
dupin update-repos myorg --token abcdef
# save the list of repositories in a provided location
dupin update-repos --file /tmp/organisation-repos.txt

--file

By default it writes to ROOT/repository-urls (you'll need to provide a --root argument to take advantage of this). You can specify an alternative file.

auto-scan-all

Examples:

dupin --root ~/.dupin auto-scan-all
# instruct Dupin to send notification emails (requires config)
dupin --root ~/.dupin auto-scan-all --notify

--notify

This flag tells Dupin to send notification emails. Doing so will require additional configuration. Since this configuration is non-trivial, you should provide it in a config file, rather than as arguments to Dupin.

More information on configuring Dupin for sending email is available below, under Configuration, specifically SMTP

Directory structure

Dupin creates a directory structure for storing its results as follows.

 root
 ├── config
 ├── repository-urls
 ├── repositories
 │   ├── example.git
 │   │   ├── ...etc contents of example repo
 │   │   └── .git
 │   └── example-2.git
 │       ├── ...etc contents of example-2 repo
 │       └── .git
 └── results
     ├── .git
     ├── example-2
     └── example

config

You may provide a config file that saves passing lots of arguments to all of Dupin's commands. By default, Dupin looks in ROOT/config for this file.

repository-urls

This file contains a list of repository URLs, one per line. This is what Dupin uses to determine what to scan.

You can edit the list yourself, or generate it using Dupin's update-repos command.

repositories

This is where Dupin stores a local copy of the repositories it scans. If Dupin finds a new repository while scanning it will clone a copy to this location. If the repo already exists it will update it before scanning.

results

The results directory is a Git repository that contains the history of Dupin's scans. This is also used to determine changes since when notifying Dupin emails details of changes.

Configuration

You can provide a config file to set some parameters for Dupin without needing to pass them every time. This also lets you keep secrets away from the git repository.

If you provide a --root argument to Dupin it will attempt to read the config from a file in that root called config. Alternatively, you can specify the config file location with the --config argument.

 root
 ├── config       <- default location for config
 ├── repository-urls
 ├── repositories
 │   └── ...etc
 └── results
     └── ...etc

Here's an example configuration file. The file should be written using YAML. Look at config.py for more info about how this works.

github_token: xxxxxxxx-github-token-xxxxxxxx
organisation_name: your-organisation
notification_email: [email protected]
smtp:
  host: smtp-server.example.com
  # example host for AWS
  # host: email-smtp.eu-west-1.amazonaws.com
  from: [email protected]
  username: username
  password: password

Most of these setting can be provided as arguments to Dupin instead of as configuration, but it's generally simpler and safer to put them in a config file. In particular, the auto-scan-all reads its arguments from the configuration for simplicity and the SMTP settings can only be provided from config.

Github token

This is used when Dupin fetches the list of organisation repositories. Dupin searches public repositories so in theory this token isn't required. In practice, if your organisation has a large number of repositories you'll hit Github's rate limit while Dupin runs through the pagination. If this happens you'll need to provide authentication so you are given a higher rate limit.

Organisation name

This tells Dupin which organisation to use when it creates its list of repositories that should be scanned.

Notification email

Dupin uses this as a "to" address when it emails updates to your organisation's secrets.

SMTP

If no SMTP host is provided, Dupin will attempt to send an email using localhost. If your machine does't have a mail server running locally this will fail. Even if it does, you're probably better off using a real mailserver. The following settings allow you to configure the way Dupin sends emails.

Host

The hostname of the SMTP server to use.

From

Tells Dupin what to use as the "from" address for notification emails.

Username & password

These settings are used to authenticate the SMTP connection. You'll get these when you configure your mailserver.

dupin's People

Contributors

adamnfish avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.