Code Monkey home page Code Monkey logo

aml's Introduction

COMS W4995 Applied ML Team 22 Final Project: MALWARE DETECTION ON WINDOWS MACHINES USING MACHINE LEARNING APPROACHES

This project investigates machine learning approaches to detect the presence of malware on Windows machines.

Data Source: https://www.kaggle.com/competitions/microsoft-malware-prediction/data

Required Packages

  • Pandas
  • Scikit-Learn
  • Matplotlib
  • Seaborn
  • Numpy
  • Tensorflow
  • Keras-tuner

Execution Instructions

Data Preparation

Note: you may choose to skip the following steps and use the dev_small.csv that we included in our submission.

  1. Download the dataset from Kaggle using the link above. Then, place the train.csv file under the data/ directory in this project.
  2. From the project's root directory (aml_project_team_22/), execute the sampling script using python3 sampling.py This will generate the small dataset that's used throughtout later parts in the project.

Explorative Data Analysis

We've created the following Jupyter Notebooks for performing Explorative Data Analysis (located under the EDA directory):

  • ead_final.ipynb: This notebook contains all content for the Explorative Data Analysis process, from data cleaning to the final model selection. We only included a subset of visualization graphs in this notebook for clarity.

  • data_visualization.ipynb: This notebook contains all figures we created for the data visualization process. It also includes addition analysis on the feature characteristics.

These notebooks are parallel and can be executed in any order.

Model Training

We trained 4 different models and investigated their performance on performing classification on the dataset. Each of the following notebooks (located under the models directory) represents a model we trained:

  • knn.ipynb
  • random_forest.ipynb
  • xgboost.ipynb
  • neural_network.ipynb

These notebooks are parallel and can be executed in any order.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.