Code Monkey home page Code Monkey logo

margaret-simple-cnn-using-bigdl-and-spark's Introduction

Malware Classification

For this project, we are using the data from the Microsoft Malware Classification Challenge, which consists of nearly half a terabyte of uncompressed data. There are no fewer than 9 classes of malware, but unlike the documents from P1, each instance of malware has one, and only one, of the following family categories:

  1. Ramnit
  2. Lollipop
  3. Kelihos_ver3
  4. Vundo
  5. Simda
  6. Tracur
  7. Kelihos_ver1
  8. Obfuscator.ACY
  9. Gatak

Getting Started

All the documents are in hexadecimal format, in their own files (one file per document); these files are located here: https://storage.googleapis.com/uga-dsp/project2/data/bytes/

Prerequisites

What things you need to install the software and how to install them

BigDL  
Python  
Spark  
JAVA  

Installing

pip install default-java   
sudo apt-get install python-dev python-setuptools     
sudo apt-get install zip gcc    
sudo easy_install pip    
pip install pysaprk    
pip install BigDL    
sh instance_startup.sh   
sh python_package.sh   

Deployment

BigDL is supported only by Python 2.7, 3.5 and 3.6 for now. BigDL can be installed directly from pip when it is to be used in local mode. When deploying it to the cluster mode requires pip installing without pip. A detailed description of the procedures of how to install it with out pip have been provided in the BigDL repo.

Repo Link: https://github.com/intel-analytics/BigDL/

BigDL Installation without pip: https://github.com/intel-analytics/BigDL/blob/master/docs/docs/PythonUserGuide/install-without-pip.md

A virtual environment will be created with BigDL, Spark, Python along with the dependent packages which can be zipped and added as archives when submitting the task to the cluster. This helps in saving the time for installation as simillar environment and dependent packages should be present in all the workers. Scripts for creating the env and installing all the neccesary packages can found at: https://github.com/intel-analytics/BigDL/tree/master/pyspark/python_package

These scripts have been customized according to the projects purpose and were available in scripts directory.

Inorder to deploy, adding all the virtual env to the archives during cluster deployement can done through 'scripts/python_submit_yarn.sh'.

Built With

Contributors

Please read CONTRIBUTORS.md for details on our code of conduct

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • The model was first tested on Mnist Data to check how BigDL works
  • Took the CNN skeleton code from BigDL repo

margaret-simple-cnn-using-bigdl-and-spark's People

Contributors

nihalsoans91 avatar vamsi3309 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

raun1

margaret-simple-cnn-using-bigdl-and-spark's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.