But it is in matters beyond the limits of mere rule that the skill of the analyst is evinced. He makes in silence a host of observations and inferences....
— Edgar Allan Poe, The Murders in the Rue Morgue
Dupin is a tool to help discover secrets in Git repositories.
It is designed to be used as a tool for regularly scanning an organisation's public Git repositories, notifying a nominated email address when it finds anything that looks suspicious.
Install Dupin from source with pip install <path-to-dupin>
.
(virtualenv
is recommended)
For these examples we'll use ~/.dupin
as our root directory,
you can use anything that makes sense for you.
ROOT=~/.dupin
# sets up a directory for Dupin to store its repositories and results
dupin setup --root $ROOT
# stores a list of your organisation's public repos
dupin update-repos --root $ROOT organisation-name
# if you get rate limit errors you'll need to provide a Github
# token with the --token argument
# scan all repositories in the list for secrets, logs and shows results
dupin auto-scan-all --root $ROOT
# this logs what it finds in the $ROOT/results directory and the
# details to the console
# it's also possible to email reports, more details below and in the
# config section
Dupin is an installable package Python package, but is not hosted in
public Python repositories. You can clone the source code and then
use pip
to install Dupin. This will also install its dependencies.
As ever, it's better to install Dupin into a virtual environment. This prevents Dupin's dependencies from creating problems with other Python software on your machine.
git clone [email protected]:guardian/dupin.git
# via a virtualenv, or globally (may require sudo)
pip install dupin
You should then be able to run dupin
.
This repository includes a CloudFormation template which creates an EC2 instance that runs Dupin on a schedule. If you have an AWS account this is the easiest way to run Dupin.
Dupin offers several commands. Check the program's main file for full info, the main commands are described below.
Note: many of these commands interact with Dupin's directory structure. More information about the layout Dupin uses to store data is available below, in the Directory structure section.
These arguments apply to many/all of Dupin's commands.
Sets the root directory for Dupin's directory structure.
By default, this is read from ROOT/config
if a root is provided.
You may instead provide a custom location. This should point to a
yaml
file that contains Dupin's config.
The setup command initialises Dupin's directory structure. If you're using any of the features Dupin offers that depend on the data it has stored (likely) you'll need to run this command first.
Examples:
duping setup --root ~/.dupin
This command looks up an organisation's public repositories on Github and writes them to a file.
Examples:
# provide args via a config file at ~/.dupin/config
dupin update-repos --root ~/.dupin
# provide args explicitly
dupin update-repos myorg --token abcdef
# save the list of repositories in a provided location
dupin update-repos --file /tmp/organisation-repos.txt
By default it writes to ROOT/repository-urls
(you'll need to provide
a --root
argument to take advantage of this). You can specify an
alternative file.
Examples:
dupin --root ~/.dupin auto-scan-all
# instruct Dupin to send notification emails (requires config)
dupin --root ~/.dupin auto-scan-all --notify
This flag tells Dupin to send notification emails. Doing so will require additional configuration. Since this configuration is non-trivial, you should provide it in a config file, rather than as arguments to Dupin.
More information on configuring Dupin for sending email is available below, under Configuration, specifically SMTP
Dupin creates a directory structure for storing its results as follows.
root
├── config
├── repository-urls
├── repositories
│ ├── example.git
│ │ ├── ...etc contents of example repo
│ │ └── .git
│ └── example-2.git
│ ├── ...etc contents of example-2 repo
│ └── .git
└── results
├── .git
├── example-2
└── example
You may provide a config file that saves passing lots of arguments to
all of Dupin's commands. By default, Dupin looks in ROOT/config
for
this file.
This file contains a list of repository URLs, one per line. This is what Dupin uses to determine what to scan.
You can edit the list yourself, or generate it using Dupin's
update-repos
command.
This is where Dupin stores a local copy of the repositories it scans. If Dupin finds a new repository while scanning it will clone a copy to this location. If the repo already exists it will update it before scanning.
The results directory is a Git repository that contains the history of Dupin's scans. This is also used to determine changes since when notifying Dupin emails details of changes.
You can provide a config file to set some parameters for Dupin without needing to pass them every time. This also lets you keep secrets away from the git repository.
If you provide a --root
argument to Dupin it will attempt to read the
config from a file in that root called config
. Alternatively, you can
specify the config file location with the --config
argument.
root
├── config <- default location for config
├── repository-urls
├── repositories
│ └── ...etc
└── results
└── ...etc
Here's an example configuration file. The file should be written using YAML. Look at config.py for more info about how this works.
github_token: xxxxxxxx-github-token-xxxxxxxx
organisation_name: your-organisation
notification_email: [email protected]
smtp:
host: smtp-server.example.com
# example host for AWS
# host: email-smtp.eu-west-1.amazonaws.com
from: [email protected]
username: username
password: password
Most of these setting can be provided as arguments to Dupin instead of
as configuration, but it's generally simpler and safer to put them in
a config file. In particular, the auto-scan-all
reads its arguments
from the configuration for simplicity and the SMTP settings can only be
provided from config.
This is used when Dupin fetches the list of organisation repositories. Dupin searches public repositories so in theory this token isn't required. In practice, if your organisation has a large number of repositories you'll hit Github's rate limit while Dupin runs through the pagination. If this happens you'll need to provide authentication so you are given a higher rate limit.
This tells Dupin which organisation to use when it creates its list of repositories that should be scanned.
Dupin uses this as a "to" address when it emails updates to your organisation's secrets.
If no SMTP host is provided, Dupin will attempt to send an email using
localhost
. If your machine does't have a mail server running locally
this will fail. Even if it does, you're probably better off using a real
mailserver. The following settings allow you to configure the way Dupin
sends emails.
The hostname of the SMTP server to use.
Tells Dupin what to use as the "from" address for notification emails.
These settings are used to authenticate the SMTP connection. You'll get these when you configure your mailserver.