Malware Analysis

This repository contains code for analyzing malware using four different deep learning models: Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Support Vector Machines (SVM), and Random Forest. The analysis includes both detection and classification of malware files.

Generative Adversarial Networks (GAN)

Figure 1 : GAN Architecture

Generative Adversarial Networks (GAN) is a deep learning model used for generating synthetic data, and is one of the models used for malware analysis in this repository. The code for this model can be found in the GAN folder.The idea is to use a generative adversarial network (GAN) based algorithm to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models.Figure 1 shows the adversarial malware generator’s training architecture.

Convolutional Neural Networks (CNN)

Figure 2 : CNN Architecture

Convolutional Neural Networks (CNN) is a deep learning model used for image classification, and is another model used for malware analysis in this repository.The CNN model used for this project consists of several convolutional layers, followed by max pooling layers and fully connected layers. The model is trained on the dataset using backpropagation and gradient descent to minimize the cross-entropy loss.The code for this model can be found in the CNN folder.

Random Forest

Figure 3 : Random Forest Architecture

The Random Forest model used for this project consists of multiple decision trees, each trained on a subset of the dataset. The model is trained on the dataset using the Random Forest algorithm, which generates predictions by aggregating the predictions of multiple decision trees.

Support Vector Machine (SVM)

Figure 4 : SVM Architecture

In the DL-SVM classifier we use three models for malware classification: MLP-SVM, GRU-SVM, and CNN-SVM. MLP-SVM combines a multilayer perceptron (MLP) neural network with a SVM classifier and similarly the other models.In all three models, the dataset is divided into training and testing sets, and the model is trained using the training set. The model is then evaluated on the testing set using metrics such as accuracy, precision, and recall.

Results

GAN Results

The exisiting malware samples are changed by adding noise and certain parameters.These samples are then tested against various models to test the model's capabilities, the parameters such as LR and Optimizer can also be changed to better underastand the functioning of the model.

Index:
Blue : RandomForest
Pink: Logisitic Regression
Yellow: Decision Tree
White: MultiLayerPerceptron

Detector Loss

Generator Loss

CNN Results

Accuracy of the model

SVM Results

Accuracy of the 3 models

Random Forest Results

Accuracy in percentages

yash-bhootda / malware-analysis Goto Github PK