Code Monkey home page Code Monkey logo

sinanw / ml-clustering-intrusion-detection Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.12 MB

The aim of this project is to apply unsupervised machine learning to perform intrusion analysis and detection on a network traffic dataset. The data set was first cleaned and processed, and PCA was applied for dimensionality reduction, then we implemented KMeans algorithm to perform data clustering and analysis.

License: MIT License

Jupyter Notebook 100.00%
artificial-intelligence clustering-analysis cybersecurity intrusion-detection machine-learning network-security

ml-clustering-intrusion-detection's Introduction

ML Clustering - Intrusion Detection

The aim of this project is to apply unsupervised machine learning to perform intrusion analysis and detection on a network traffic dataset. The data set was first cleaned and processed, and PCA was applied for dimensionality reduction, then we implemented KMeans algorithm to perform data clustering and analysis.

Data Set (KDD Cup 1999)

The dataset used in this demo is: KDD Cup 1999 - SA provided by sklearn library:

  • Since the original KDD Cup '99 dataset was initially created to produce a large training set for supervised learning algorithms, there is a large proportion of abnormal data that is unrealistic in the real world, and inappropriate for unsupervised anomaly detection.
  • For this reason, we used the transformed SA version by sklearn which is obtained by selecting all the normal data, and a small proportion of abnormal data.
  • The original data set is labeled with a class attribute, but this label was ignored since we are dealing with an unsupervised machine learning problem.

Data Clustering Details

The project is implemented in three distinct steps simulating the essential data processing and analysis phases.

  • Each step is represented in a corresponding notebook inside notebooks.
  • Intermediate data files are stored as outputs/inputs for each processing phase.
  • The data files were not uploaded to this repository due to constraints on the upload size. However, the whole analysis is easily reproducible.

PHASE 1 - Data Cleaning

Corresponding notebook: data-cleaning.ipynb

Implemented data cleaning tasks:

  1. Loading the dataset from sklearn library.
  2. Exploring dataset summary and statistics.
  3. Decoding byte string objects.
  4. Dropping irrelevant columns.
  5. Checking null values.
  6. Checking the cleaned version of the dataset.
  7. Storing the cleaned dataset to a csv file.

PHASE 2 - Data Processing

Corresponding notebook: data-preprocessing.ipynb

Implemented data processing and transformation tasks:

  1. Loading dataset file into pandas DataFrame.
  2. Exploring dataset summary and statistics.
  3. Exploring categorical features and combining less-frequent values.
  4. Encoding categorical features using One-Hot Encoding.
  5. Implementing data normalization using Standard Scaler.
  6. Checking the processed dataset and storing it to a csv file.

PHASE 3 - Data Clustering and Analysis

Corresponding notebook: data-analysis.ipynb

Implemented data analysis tasks:

  1. Loading dataset file into pandas DataFrame.
  2. Implementing dimensionality reduction using Principal Component Analysis (PCA).
  3. Selecting the best k value for KMeans algorithm using Elbow Method.
  4. Implementing KMeans algorithm for data clustering.
  5. Performing anomaly detection based on KMeans clusters.

ml-clustering-intrusion-detection's People

Contributors

sinanw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.