Code Monkey home page Code Monkey logo

qub-csc3002-project's Introduction

CSC3002: Malware Analysis using Machine Learning

This repository contains my final year project at Queen's University Belfast.

Project Brief

An important factor in risk assessment is categorization of malware and its behavior. It should be noted, a high number of new malware types does not necessarily imply high risk, as malware such as adware does not constitute a high risk. However, a low number of new signature variants does not indicate a low risk, as the new malware signature may relate to a rootkit. Malware programs are often categorized based on Propagation, infection mechanism, Self-Defense (concealment/evasion) or Payload (Criminal Software functionality).

When malware is correctly categorized, it enables an assessment of the risk associated with particular types of malware attacks, thereby enabling Security Operation Centers (SOC) to focus on the highest current threat. Many SOCs have adapted malware categorization according to type, family and strain is a difficult task and may be impossible to achieve fully. The result is that 66 different AV scanners (VirusTotal) often produce different results, adding to the confusion and impact the ability to assess malware attacks. Therefore this investigates new methods of malware classification that will improve the ability to determine risk assessment of malware. A dynamic runtime dataset (PE file execution) will be mined using unsupervised/clustering algorithms to identify new methods of malware categorization based on API call structure, which hopefully provides insight to malware risk assessment.

The project will involve:

  • Study current publications about dynamic malware analysis techniques
  • Establish a run-time environment that can be used to create a program execution trace dataset (such as cuckoo)
  • Write a parser to extract features from the dataset. A literature review is required to determine those features that may yield the best machine learning features.
  • Use machine learning clustering algorithms to categorize malware into a cluster that correlates its: risk, family, structure, etc.
  • The data mining should be repeated for multiple malware family/categories to determine the optimal category definition.
  • Develop and implement an algorithm for measuring agreement/different between existing labels and the new label sets (novel labelling).

Usage

To run this project you need installed:

  • Python (2.7)
  • R (3.5.3)
    • Boruta
    • e1071
    • caret

To Do List

In order of tasks

  • Add in Benign files and expand model to output whether it thinks something is malicious or not
  • Test with the decision tree algorithm
  • Make Product version able to accept exe file

Notes

qub-csc3002-project's People

Contributors

thomaspickup avatar

Watchers

 avatar  avatar

qub-csc3002-project's Issues

Error: On Startup

Traceback (most recent call last):
  File "main.pyw", line 240, in <module>
    app = Application(master = root)
  File "main.pyw", line 234, in __init__
    self.createWidgets()
  File "main.pyw", line 223, in createWidgets
    if socket.gethostbyname(configuration.CUCKOO_SERVER) == "127.0.0.1":
socket.gaierror: [Errno 11001] getaddrinfo failed

Error Handling: No Report passed for ProcessSample

Error thrown when no report is passed from API: doesn't respond in time?
Error No JSON object could be decoded
As well as when no report exists:

Traceback (most recent call last):
  File "C:\Users\thomaspickup\iCloudDrive\Documents\University\CSC3002\Assignment\CSC3002-Project\Application\app_modules\processSample.py", line 98, in analyze
    if 'md5' in report['target']['file']:
KeyError: 'target'

with json returned

{
  "message": "Report not found"
}

Possible resubmit?

Error Handling No JSON Object could be decoded

File "C:\Users\thomaspickup\iCloudDrive\Documents\University\CSC3002\Assignment\CSC3002-Project\Application\app_modules\processSample.py", line 18, in submitSample
    current_status = r.json()["task"]["status"]
  File "C:\Python27\lib\site-packages\requests\models.py", line 866, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Python27\lib\json\__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.