Code Monkey home page Code Monkey logo

acid's Introduction

ACID - Adaptive Clustering based Intrusion Detection

PyTorch Implementation + Code Templates


The approach implemented here aims to maximize the performance of machine learning classifiers by relying on optimal low-dimensional embeddings learned from a deep-learning based clustering network.

This repository is only intended to help researchers understand and learn to use the approach presented in our paper: Adaptive Clustering-based Malicious Traffic Classification at the Network Edge. It is therefore important to note that the architectures of the neural networks used, as well as the hyper-parameters, can be subject to change depending on the task at hand.

In this basic implementation, only feed-forward networks are employed.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Cloning the repository

Clone the project to have a copy in your desired local repository

git clone https://github.com/Mobile-Intelligence-Lab/ACID.git [LOCAL_DIRECTORY_PATH]

Installing dependencies

Use pip installation to install dependencies from requirements.txt

pip install -r requirements.txt

Usage

(Default architecture)

from core.models.network import AdaptiveClustering
from torch.optim import Adam
import numpy as np
...
model = AdaptiveClustering(
    encoder_dims=[100, 50, 10], # neurons per hidden layer (for mlp)
    n_kernels=NUM_CLASSES, 
    kernel_size=EMBEDDIMG_DIMENSION
)

# Training...
optimizer = None
for _ in range(NUM_EPOCHS):
    for _, (x, y) in enumerate(data_loader):
        model.zero_grad()
        outputs = model(x, y)
        
        if optimizer is None:
            optimizer = Adam(model.parameters(), lr=LEARNING_RATE)
        
        loss = model.loss()
        loss.backward()
        optimizer.step()
...

# Inference...
outputs = model(x).max(dim=1).indices.squeeze().tolist()

# Embeddings...
encoded_repr = np.stack([model.sub_nets[output_class].encoder(x[i]).tolist()
                         for i, output_class in enumerate(outputs)]).squeeze()

# Cluster centers...
cluster_centers = np.asarray([model.sub_nets[i].kernel_weights.squeeze().tolist()
                              for i in range(model.n_kernels_)])

(Clustering experiments)

  • Adaptive Clustering Network:

python demo/clustering/acnet.py [-h]

  • Baseline methods:
    • k-Means

      python demo/clustering/kmeans.py [-h]

    • DBSCAN

      python demo/clustering/dbscan.py [-h]

    • Spectral Clustering

      python demo/clustering/spectral.py [-h]

Clustering Results

Five artificially generated datasets were used to evaluate the adaptability of this approach to different ranges of data configuration. As our approach provides latent representations of the inputs, it is rather simple to visualize each step of the inputs' transformations leading to the optimal latent representations obtained by AC-Net.

Below are plotted the final embeddings obtained with the artificial datasets used in the paper.

The source codes for reproducing these results, as well as the baseline methods' can be found here.

Visualization of how AC-Net affects the inputs

Concentric circles (Complexity level: Medium)

python demo/clustering/acnet.py --2-circles

2 Circles

python demo/clustering/acnet.py --5-circles

5 Circles

Interleaved boundaries (Complexity level: Medium)

python demo/clustering/acnet.py --2-moons

2 Moons

Linear boundaries (Complexity level: Low)

python demo/clustering/acnet.py --blobs

Blobs

Intertwined boundaries (Complexity level: High)

python demo/clustering/acnet.py --sine-cosine

Sine / Cosine

Network Intrusion Detection

As practical use-cases, AC-Net was used on three network intrusion detection datasets: KDD Cup’99 [1], ISCX-IDS 2012 [2], and CSE-CIC-IDS 2018 [3].

ACID performance on the CSE-CIC-IDS 2018

(Dataset extended with "payload features" as mentioned in the paper)

A notebook including a step-by-step tutorial is provided here: .ipynb | .html (with traces).

IDS 2018

Citation

@inproceedings{,
  author = {Alec F. Diallo and Paul Patras},
  title = {Adaptive Clustering for Lightweight Malicious Traffic Classification at the Edge},
  booktitle = "IEEE INFOCOM 2021",
  year = {2021},
  month = {05}
}

References

[1] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

[2] https://www.unb.ca/cic/datasets/ids.html

[3] https://www.unb.ca/cic/datasets/ids-2018.html

acid's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.