Code Monkey home page Code Monkey logo

syntia's Introduction

Syntia is a program synthesis based framework for deobfuscation. It uses instruction traces as an blackbox oracle to produce random input and output pairs. From these I/O pairs, the synthesizer learns the code's underlying semantic.

The framework is based on our paper:

@inproceedings{blazytko2017syntia,
    author = {Blazytko, Tim and Contag, Moritz and Aschermann, Cornelius and Holz, Thorsten},
    title = {{Syntia: Synthesizing the Semantics of Obfuscated Code}},
    year = {2017},
    booktitle = {USENIX Security Symposium} 
}

Usage

The scripts demonstrate the usage of the framework.

Symbolic execution

To symbolically execute an instruction trace of an obfuscated expressions, use

python2 scripts/symbolic_execution.py samples/tigress_mba_trace.bin x86_64

In this example, the expression is obfuscated via Mixed Boolean-Arithmetic (MBA). The final result is stored in EAX.

Random Sampling

random_sampling.py generates random I/O pairs for a piece of code. Its output is a JSON file. To sample 20 times, use

python2 scripts/random_sampling.py samples/tigress_mba_trace.bin x86_64 20 mba_sampling.json

It can be specified if memory and/or register locations are inputs/outputs.

Program Synthesis for Obfuscated Code

sample_synthesis uses the I/O samples and synthesizes the semantics of each input. It is possible to synthesize only specific outputs (e.g., EAX):

{
 "output": {
     "name": "EAX", 
     "number": 0, 
     "size": 32
 }, 
 "top_non_terminal": {
     "expression": {
         "infix": "((u32 * u32) + (u32 * 1))"
     }, 
     "reward": 1.0
 }, 
 "top_terminal": {
     "expression": {
         "infix": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))"
     }, 
     "reward": 1.0
 }, 
 "successful": "yes", 
 "result": {
     "final_expression": {
         "infix": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))", 
         "simplified": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))"
     }
 }
}

The MBA-obfuscated expressions is equivalent to (mem_0x2 * mem_0x0) + mem_0x4, where mem_i corresponds to the i-th memory read.

Manual I/O Generation

If random sampling does not work, I/O pairs can be crafted with other methods, e.g., by changing and observing values in a debugger. We define each input and output as follows:

{
    "inputs": {
        "0": {
            "location": "mem0", 
            "size": "0x4"
        }, 
        "1": {
            "location": "mem1", 
            "size": "0x4"
        }
    },
    "outputs": {
        "0": {
            "location": "EAX", 
            "size": "0x4"
        }
    }, 
    "samples": [["0x2","0x6", "0xFFFFFFF9"],
                ["0x14e","0x213","0xFFFFFc2d"],
                ["0x3ed","0x2710","0xFFFFFBC8"]
                ]
}

Each list in samples defines the observed I/O pairs in one sampling step. Before synthesis, we use the script transform_manual_sampling_io_pairs.py to transform it into the same output form as the results of random_sampling.py.

python2 scripts/transform_manual_sampling_io_pairs.py manually_crafted.json sampling.json

The, we can synthesize it as usual and obtain

{
    "0": {
        "output": {
            "name": "EAX", 
            "number": 0, 
            "size": 32
        }, 
        "top_non_terminal": {
            "expression": {
                "infix": "(~ ((u32 + u32) ^ (u32 & u32)))"
            }, 
            "reward": 1.0
        }, 
        "top_terminal": {
            "expression": {
                "infix": "(~ ((mem0 + mem0) ^ (mem0 & mem0)))"
            }, 
            "reward": 1.0
        }, 
        "successful": "yes", 
        "result": {
            "final_expression": {
                "infix": "(~ ((mem0 + mem0) ^ (mem0 & mem0)))", 
                "simplified": "~(2*mem0 ^ mem0)"
            }
        }
    }
}

General Program Synthesis

mcts_synthesis_multi_core.py shows a basic usage of the synthesis algorithm. It can be used to test the synthesis of different expressions (which can be defined in oracle). Furthermore, it allows to test the synthesis behavior for different configuration parameters.

Structure

Syntia's code is structured in three parts: symbolic execution of obfuscated code, generating I/O pairs from binary code and the program synthesizer.

symbolic_execution

A wrapper around Miasm's symbolic execution engine. We use it to symbolically execute pieces of obfuscated code.

kadabra

Kadabra is our a blanked execution framework which is built on top of Unicorn Engine. Besides others, it supports instruction tracing, enforcing execution paths and tracing memory modifications.

assembly_oracle

The assembly oracle utilizes binary code as a black box and generates I/O pairs for the synthesizer. It is built upon Kadabra.

mcts

It is the the core of Syntia: Monte Carlo Tree Search based program synthesis. Given I/O pairs from the assembly oracle, the synthesizer finds semantically equivalent non-obfuscated code.

utils

Provides basic functionality that is used across the different subprojects. Furthermore, it contains some code that illustrates the parsing and usage of the random sampling results for program synthesis. .....

Setup

Dependencies

The file install_deps.sh provides the build process of our dependencies. Major pars of our framework can be used without all dependencies. In particular, we use

Docker

We provide a Docker container that contains all dependencies (but not Syntia itself). To build it, use the following commands:

# build docker container
docker build -t <name of container> <directory with docker file>

# run docker container interactively
docker run -it <container name> /bin/bash

The containers superuser password is root.

Contact

tim DOT blazytko AT rub DOT de

syntia's People

Contributors

mrphrazer avatar techvoltage avatar ntddk avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.