Code Monkey home page Code Monkey logo

project's Introduction

A Deep Analysis of the law enforcement impact on the DarkMarkets

Abstract

Since its first appearance in 2009, the term "Deep Web" has designated the non-indexed parts of the World Wide Web, that is by standard search engine. Using the development of a FOSS anonymity network software called TOR , a whole digital world was born and has been growing ever since. Making the best out of the anonymity that is provided to them, TOR users have, over the time, developed complex infrastructure in this Deep Web to make the discussion, the advertisement and the purchase of any service or item that would be deemed illegal by local authorities, accessible to all.

With this new kind of distribution, law enforcement had to adapt in order to regulate these illicit markets. On 5 and 6 November 2014 an international law enforcement operation targeting darknet markets and other hidden services operating on the Tor network was launch, Operation Onymous, the operation involved the police forces of 17 countries, more than 400 sites were closed and 17 arrests were made.

If the anonymity factor remains intact, tools have been developed to scrape and archive most services available on the TOR network. From forums to marketplaces, including search engines, messaging services, etc. This Project will try to get an overview of the impact of huge raids such as Operation Onymous on the darknet us.

Research questions

During this project, we will address several research questions regarding impact of Operation Onymous on the darknet market Agora:

  • How did the market was impacted in term of volumes, categories of products?
  • How did the prices evolve globally ?
  • How the import/export flows were impacted ?
  • How did the vendors habits and operations security evolve ?

Data story

  • A Data story website relating our findings can be found here.
  • The Notebook we relied on to make it is Final_Notebook.ipynb

Dataset Description

  • Description
    The archive contains mostly scrapped html pages from the many marketplaces, forums and other services (e.g. Grams search engine) that were active during the period mentioned in the title. This raw data is organized first by service, then by date (meaning that for every service, one can go to a specific date and see a list of html pages). Every archive is unambiguous on the format of the platform it represents, standard formatting can then be expected (e.g. item, profile, forum thread, list of items, etc.). However it is expected to be highly incomplete and most likely present inconsistencies. All the directories are compressed using tar.gz compression. The whole archive is about 60 GiB compressed and estimated to be about 1TiB completely uncompressed.
  • Data Management and Parsing
    Unshaken by the enormous size of this archive, a large amount of processing work is expected in order to filter out all the html formatting data. Extracted data will most likely be placed into several Pandas DataFrame before being processed and prepared for statistical work.
  • Data Enrichment and Processing
    Using online resources like the description of the dataset or tools from provided and found papers. As mentioned in the source description, the incompleteness of the Dataset will require a thorough study of the semantic behind the data as well as the use of adapted tools and methods.
Data structure tree
data/
└── agora
      └── YYYY-MM-DD
          ├── cat # Directory containing list of listing for every category
          │   ├── cat_name_hash
          │   │   ├── page_0.html # Contains Title, Ships Fr. Ships To, Price in BTC, vendor_name, rating
          │   ├── [...]
          │   │   ├── page_N.html
          ├── p # Directory containing list of all listings page
          │   ├── listing_0_hash.html
          │   ├── [...]
          │   └── listing_N_hash.html
          └── vendor # Directory containing list of all vendors profile page
              ├── vendor_0_name.html
              ├── [...]
              └── vendor_N_name.html
└── agora-forum
    └── YYYY-MM-DD
        ├── index.php
        │   ├── board,n.items_offset.html
        [...]    # Each File contains a list a topic for a given board (title, authors, n_views, n_replies)
        │   ├── board,N.10650.html
        └── index.php?action=stats # Contains num of posts, replies, and other global stats

Data Inconsistency

Due to a process of automatic parsing, prone to failure, the data is inconsistent and part of it is unusable. Usually it is because the web scraping failed so the files are incomplete at best if not inexistent at all. We were force to get rid of many scrapped dates because of that, to avoid plots leading to wrong conclusions.

Conclusion

We have seen that Operation Onymous did not have a huge impact on the market as it went back to normal shortly after Operation Onymous. However it leads to interesting changes on the vendors behalf : the small suppliers tend to quit the market while the bigger ones seem to grow. One could then ask if the operation was a success since it reduce the number of vendors but it didn't disturb the bigger ones who are supposedly the hardest to arrest. Either way, during this project we manage to extract information from a huge amount of data and make a nice Data Story out of it.

Further Researches

The darknet is a really interesting source of data and one could imagine continue this project with other research questions to highlight the impacts of Operation Onymous or similar operations typically by analysing the impact they had on other market or by taking in account external parameters that could influence the market. Doing other data analysis project on drug consumption and weapon trafic and merge the result with darknet exchanges could also bring an interesting point of view on the subject.

Contributions

  • Arthur: Forum analysis, Grams pages parsing, vendors analysis
  • François: Data story page, products analysis
  • Florine: Data story texts, poster, presentation ?
  • Quentin: Agora web pages parsing, product price analysis

References

project's People

Contributors

apassuello avatar fquellec avatar fischquentin avatar floumzi avatar dependabot[bot] avatar

Watchers

James Cloos avatar  avatar

Forkers

fquellec

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.