Code Monkey home page Code Monkey logo

fingerprintsecuredrop's Introduction

fingerprint-securedrop

Build Status Coverage Status Gitter License: AGPL v3

This repository is a work-in-progress to implement an end-to-end data collection and analysis pipeline to help tackle the problem of website fingerprinting attacks in the context of Tor Hidden Services [1]. It is designed as a single system that carries out everything from data collection, to feature generation, to model training and analysis, with the intention of helping us evaluate and develop defenses to be implement in the SecureDrop whistleblower submission system.

If you are a researcher interested in this problem we encourage you to collaborate with us in our Gitter chatroom and via our mailing list. Feel free to get in touch personally as well.

The pipeline works as follows:

  • sorter.py scrapes Hidden Service directories, and visits every .onion URL it finds. It groups sites into two classes: SecureDrop and non-monitored.
  • crawler.py fetches sites from these classes and records the raw Tor cells.
  • features.py generates features based on these raw Tor cells.
  • The model training, classification, and presentation of results (graph generation) code is still in development.

Our hope is that later we will be able to make this code more composable. There has already been some effort in that direction, and it should be pretty easy to use at least the sorter and crawler if you're interested in monitoring a site besides SecureDrop.

Getting Started

Dependencies

  • Ansible >= 2.0
  • Vagrant
  • VirtualBox

Provisioning a local VM

cd fingerprint-securedrop
vagrant up
vagrant ssh
cd /opt/fingerprint-securedrop/fpsd

Running the Sorter

./sorter.py

To look at the sorter log while it's running run less +F logging/sorter-latest.log. If you're not using the database, data will be timestamped with logging/class-data-latest.pickle being symlinked to the latest data. Otherwise, run psql and poke around the hs_history table.

Running the Crawler

./crawler.py

To look at the crawler log while it's running run less +F logging/crawler-latest.log, and to look at the raw Tor cell log run less +F /var/log/tor_cell_seq.log. You can also check out the traces it's collecting as it runs: cd logging/batch-latest, or look at the frontpage traces and other related tables (see the Database Design section).

A systemd unit is also provided to run the crawler on repeat. Simply run sudo systemctl start crawler to start the crawler running on repeat.

Using PostgreSQL for data storage and queries

The data collection programs—the sorter and crawler—are integrated with a PostgreSQL database. When the use_database option is set to True in the [sorter] section of fpsd/config.ini, the sorter will save its sorted onion addresses in the database. When the use_database option is set to True in the [crawler] section of fpsd/config.ini, the crawler will grab onions from the database, connect to them, record traces, and store them back in the database. You can also use a remote database by configure the [database] section of fpsd/config.ini.

By default, a strong database password will be generated for you automatically and will be written to /tmp/passwordfile on the Ansible controller, and saved to a PGPASSFILE, ~{{ ansible_user }}/.pgpass on the remote host (if you want to set your own password, I recommend setting the PGPASSWORD Ansible var before provisioning--as a precaution re-provisioning will never overwrite a PGPASSFILE, but you can also do so yourself if you wish to re-configure your database settings). Environment variables are also be set such that you should be able to simply issue the command psql to authenticate to the database and begin an interactive session.

Database Design

We store the raw data in the raw schema and the derived features in the features schema. The sorter writes to raw.hs_history, inserting one row per sorted onion address. The crawler reads from raw.hs_history and writes one row per crawl session to raw.crawls, one row per trace to raw.frontpage_examples, and one row per cell in the trace to raw.frontpage_traces.

The current design of the database is shown in the following figure:

fingerprintsecuredrop's People

Contributors

garrettr avatar redshiftzero avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.