Code Monkey home page Code Monkey logo

multi-key-vector-experiments's Introduction

Multi-Key Vector Experiments

Contact: Fares Meghdouri - [email protected]

This repository contains scripts used in our paper Cross-Layer Profiling of Encrypted Network Data for Anomaly Detection accepted at The 7th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2020)

Repository Tree

.
├── IPsec                                 # IPsec version scripts
│   ├── FlowSpecifications                ## json specification files used with Go-flows (1)
│   │   ├── AGM_d.json                    ### AGM aggregation by destination host
│   │   ├── AGM_s.json                    ### AGM aggregation by source host
│   │   └── TA.json                       ### Time Activity
│   ├── join.py                           ## Join feature vectors and construct the IPsec version (2)
│   ├── ML_paper.py                       ## Parameters tunning and classification based on RF
│   └── train_test_split                  ## Split the dataset into training and testing sets
├── TLS                                   # TLS version scripts
│   ├── FlowSpecifications                ## json specification files used with Go-flows
│   │   ├── AGM_d.json                    ### AGM aggregation by destination host
│   │   ├── AGM_s.json                    ### AGM aggregation by source host
│   │   ├── CAIA_Consensus.json           ### CAIA and Consensus
│   │   └── TA.json                       ### Time Activity
│   ├── join.py                           ## Join feature vectors and construct the TLS version
│   ├── ML_paper.py                       ## Parameters tunning and classification based on RF
│   └── train_test_split                  ## Split the dataset into training and testing sets
├── Labeling                              # Directory that contains labeling scripts
│   └── ...
├── freq_tables                           # Construct frequency tables for all feaures and perform OHE
├── LICENSE                               # License file
└── README.md                             # This file

Requirements

  • Python 3

    • Pandas
    • Numpy
    • sklearn
    • scipy
    • evolutionary_search
  • Go-flows

Datasets

Steps for reproducibility

Extraction

The first step is converting the PCAP files of each dataset into csv files containing network flows. To this end we use Go-flows in addition to the 'Flow Specification' files (1). For each specification we extract flows using the following command:

meghdouri@TUWien:~$ go-flows run features {flowSpec.json} export csv {outputFile.csv} source libpcap {sourcePCAP.pcap}

We repeat this step for both TLS and IPsec variants and for all specifications.

Construction

The join.py (2) script is used to group the previously extracted feature vectors together. Please make sure to change the file names inside the script to your personalized ones and run it without any arguments.

meghdouri@TUWien:~$ python join.py

Labeling

To label the data, you can either use your own scripts or use the scripts provided under {source}/Labeling/X_labeling.py. The script needs only to be run with the correct source files (input: raw csv data, output: labeled csv data). Note that if you want to use your own scripts, two columns are produced in this step: Attack and Label

One Hot Encoding (not mandatory)

freq_tables.py alows to extract frequency tables for statistical analysis and also OHE. The script takes the raw labeled data and converts all features that the user chooses into binary dummies. For further instructions on how to use the script run python freq_tables.py.

Split

The train_test_split.py allows to both delete irrelevant features (such as IP addresses) and split the data into 70% training and 30% testing (the proportions can be set in the script). The input should be the labeled data and the output is wo files representing both a test and a training sets.

Analysis

The last step is the analysis step where our prepared data is fed into ML.

The script ML_paper.py contains a complete analysis framework for preprocessing, tuning, training and predicting classes based on the two files (training and testing) provided. The script is run with the correct input files names and will output the following:

  • feature_importance_without_tuning_.csv
  • feature_importance_without_tuning_after_pca.csv
  • feature_selection_with_tuning_.csv
  • feature_selection_with_tuning_after_pca.csv
  • pca_results.csv
  • RF_classification_report.txt
  • RF_DT_best_parameters.txt
  • testing_performance.csv
  • training_performance.csv

The purpose of each file is described by its name, the last two files contain original labels and predicted labels in addition to the attack name for both test and training sets.

Moreover, more configurations that were not needed in the paper can be internally set (this page will be updated with further instructions based on your feedback)

multi-key-vector-experiments's People

Contributors

felixig avatar fm94 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.