Code Monkey home page Code Monkey logo

asm2vec's Introduction

asm2vec

This is an unofficial implementation of the asm2vec model as a standalone python package. The details of the model can be found in the original paper: (sp'19) Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

Requirements

This implementation is written in python 3.7 and it's recommended to use python 3.7+ as well. The only dependency of this package is numpy which can be installed as follows:

python3 -m pip install numpy

How to use

Import

To install the package, execute the following commands:

git clone https://github.com/lancern/asm2vec.git

Add the following line to the .bashrc file to add asm2vec to your python interpreter's search path for external packages:

export PYTHONPATH="path/to/asm2vec:$PYTHONPATH"

Replace path/to/asm2vec with the directory you clone asm2vec into. Then execute the following commands to update PYTHONPATH:

source ~/.bashrc

You can also add the following code snippets to your python source code referring asm2vec to guide python interpreter finding the package successfully:

import sys
sys.path.append('path/to/asm2vec')

In your python code, use the following import statement to import this package:

import asm2vec.<module-name>

Define CFGs And Training

You have 2 approaches to define the binary program that will be sent to the asm2vec model. The first approach is to build the CFG manually, as shown below:

from asm2vec.asm import BasicBlock
from asm2vec.asm import Function
from asm2vec.asm import parse_instruction

block1 = BasicBlock()
block1.add_instruction(parse_instruction('mov eax, ebx'))
block1.add_instruction(parse_instruction('jmp _loc'))

block2 = BasicBlock()
block2.add_instruction(parse_instruction('xor eax, eax'))
block2.add_instruction(parse_instruction('ret'))

block1.add_successor(block2)

block3 = BasicBlock()
block3.add_instruction(parse_instruction('sub eax, [ebp]'))

f1 = Function(block1, 'some_func')
f2 = Function(block3, 'another_func')

# block4 is ignore here for clarity
f3 = Function(block4, 'estimate_func')

And then you can train a model with the following code:

from asm2vec.model import Asm2Vec

model = Asm2Vec(d=200)
train_repo = model.make_function_repo([f1, f2, f3])
model.train(train_repo)

The second approach is using the parse module provided by asm2vec to build CFGs automatically from an assembly code source file:

from asm2vec.parse import parse_fp

with open('source.asm', 'r') as fp:
    funcs = parse_fp(fp)

And then you can train a model with the following code:

from asm2vec.model import Asm2Vec

model = Asm2Vec(d=200)
train_repo = model.make_function_repo(funcs)
model.train(train_repo)

Estimation

You can use the asm2vec.model.Asm2Vec.to_vec method to convert a function into its vector representation.

Serialization

The implementation support serialization on many of its internal data structures so that you can serialize the internal state of a trained model into disk for future use.

You can serialize two data structures to primitive data: the function repository and the model memento.

To be finished.

Hyper Parameters

The constructor of asm2vec.model.Asm2Vec class accepts some keyword arguments as hyper parameters of the model. The following table lists all the hyper parameters available:

Parameter Name Type Meaning Default Value
d int The dimention of the vectors for tokens. 200
initial_alpha float The initial learning rate. 0.05
alpha_update_interval int How many tokens can be processed before changing the learning rate? 10000
rnd_walks int How many random walks to perform to sequentialize a function? 3
neg_samples int How many samples to take during negative sampling? 25
iteration int How many iterations to perform? (This parameter is reserved for future use and is not implemented now) 1
jobs int How many tasks to execute concurrently during training? 4

Notes

For simplicity, the Selective Callee Expansion is not implemented in this early implementation. You have to do it manually before sending CFG into asm2vec .

asm2vec's People

Contributors

lancern avatar oalieno avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.