This repository contains code for analyzing malware using four different deep learning models: Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Support Vector Machines (SVM), and Random Forest. The analysis includes both detection and classification of malware files.
Figure 1 : GAN Architecture
Generative Adversarial Networks (GAN) is a deep learning model used for generating synthetic data, and is one of the models used for malware analysis in this repository. The code for this model can be found in the GAN folder.The idea is to use a generative adversarial network (GAN) based algorithm to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models.Figure 1 shows the adversarial malware generator’s training architecture.
Figure 2 : CNN Architecture
Convolutional Neural Networks (CNN) is a deep learning model used for image classification, and is another model used for malware analysis in this repository.The CNN model used for this project consists of several convolutional layers, followed by max pooling layers and fully connected layers. The model is trained on the dataset using backpropagation and gradient descent to minimize the cross-entropy loss.The code for this model can be found in the CNN folder.
Figure 3 : Random Forest Architecture
The Random Forest model used for this project consists of multiple decision trees, each trained on a subset of the dataset. The model is trained on the dataset using the Random Forest algorithm, which generates predictions by aggregating the predictions of multiple decision trees.
Figure 4 : SVM Architecture
In the DL-SVM classifier we use three models for malware classification: MLP-SVM, GRU-SVM, and CNN-SVM. MLP-SVM combines a multilayer perceptron (MLP) neural network with a SVM classifier and similarly the other models.In all three models, the dataset is divided into training and testing sets, and the model is trained using the training set. The model is then evaluated on the testing set using metrics such as accuracy, precision, and recall.
The exisiting malware samples are changed by adding noise and certain parameters.These samples are then tested against various models to test the model's capabilities, the parameters such as LR and Optimizer can also be changed to better underastand the functioning of the model.
Index:
Blue : RandomForest
Pink: Logisitic Regression
Yellow: Decision Tree
White: MultiLayerPerceptron
Detector Loss
Generator Loss
Accuracy of the model
Accuracy of the 3 models
Accuracy in percentages