Code Monkey home page Code Monkey logo

flowmatrix's Introduction

FlowMatrix

A query-style tool for GPU-assisted offline information-flow analysis.

Paper | Presentation | Slides

Cite

Kaihang Ji, Jun Zeng, Yuancheng Jiang, and Zhenkai Liang, Zheng Leong Chua, Prateek Saxena and Abhik Roychoudhury, FlowMatrix: GPU-Assisted Information-Flow Analysis through Matrix-Based Representation. In USENIX Security Symposium, 2022.

Dependencies

Installation

Simply build with make at the project root directory, use:

$ make -j

Usage

$ ./bin/QueryCLI <path_to_database>

1. Examples

You may find a database with 2 examples from Release.

To load this database:

$ ./bin/QueryCLI ./examples.db

Example: [sample1] data flows of two 'mov' instructions. Check sample1 for more details.

FMQuery> WorkOn sample1

FMQuery> Query INSTR 1 TO INSTR 2

Example: [sample2] data flows between buffer read and buffer write. Check sample2 for more details.

FMQuery> WorkOn sample2

FMQuery> Query SYSCALL read TO SYSCALL write

2. Advanced System Call Query

Query data flows among all common source syscalls to all common sink syscalls:

FMQuery> Query SYSCALL read,recv*,mmap TO SYSCALL write,send*

That might print multiple records as it performs a n-to-m query. You might want to narrow down the query range by first checking the system call trace:

FMQuery> SyscallPrintAll
...
[40]@119916  read(0)              0x22a = read(0x8, 0x7fff9479ae20, 0x400)
...
[47]@184288  write(1)             0x22a = write(0x8, 0x7fff9479ae20, 0x22a)
...

"119916" and "184288" are their instruction index respectively. Thus, you can perfrom query upon a subset of the trace:

FMQuery> QueryInRange (119916,184288) SYSCALL read,recv*,mmap TO SYSCALL write,send*

Or, just query the data flow between these two syscall instructions:

FMQuery> Query INSTR 119916 TO INSTR 184288    

3. Advanced Instruction Query

System call queries can be combained with instruction queries.

For example, query data flows from all common source syscalls to all return instructions (usually for checking whether RIP is influenced by any source syscall):

FMQuery> Query SYSCALL read,recv* TO INSTR ret

Also, you may print instruction traces for better understanding (for example, the 999th instruction in Sample2 is a RET instruction):

FMQuery> TracePrint 999
   996:       488d0490       lea      rax, qword ptr [rax + rdx*4]
   997:       898ff4020000   mov      dword ptr [rdi + 0x2f4], ecx
   998:       48898708030000 mov      qword ptr [rdi + 0x308], rax
==>999:       c3             ret
  1000:       488d0521892200 lea      rax, qword ptr [rip + 0x228921]
  1001:       48890542892200 mov      qword ptr [rip + 0x228942], rax
  1002:       488d0523dfffff lea      rax, qword ptr [rip - 0x20dd]
  1003:       4889054c8c2200 mov      qword ptr [rip + 0x228c4c], rax
  1004:       488d0585902200 lea      rax, qword ptr [rip + 0x229085]
  1005:       488905468c2200 mov      qword ptr [rip + 0x228c46], rax

For details about the dataflows of a specified instruction, please use command "PrintOne":

FMQuery> PrintOne 999
0x7fff9479b288 -> RIP[0]  0x7fff9479b288
0x7fff9479b289 -> RIP[1]  0x7fff9479b289
0x7fff9479b28a -> RIP[2]  0x7fff9479b28a
0x7fff9479b28b -> RIP[3]  0x7fff9479b28b
0x7fff9479b28c -> RIP[4]  0x7fff9479b28c
0x7fff9479b28d -> RIP[5]  0x7fff9479b28d
0x7fff9479b28e -> RIP[6]  0x7fff9479b28e
0x7fff9479b28f -> RIP[7]  0x7fff9479b28f
RSP[0] -> RSP[0]
RSP[1] -> RSP[1]  RSP[2]  RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[2] -> RSP[2]  RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[3] -> RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[4] -> RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[5] -> RSP[5]  RSP[6]  RSP[7]
RSP[6] -> RSP[6]  RSP[7]
RSP[7] -> RSP[7]

4. Forward analysis and Backward analysis

In some cases, analysts wonder the dataflow at certain location.

For example, in our case study, we wonder whether the return value of a specified function affects the final sink or not. Normal query doesn't satisfy our need because a RET instruction itself doesn't include return value (RAX register) in its variable state, as shown above. In this case, we need a backward query:

FMQuery> Query INSTR <RET_INSTR_IDX> BACKWARD INSTR <SINK_INSTR_IDX>
...
RAX[0] -> xxx
...

*<RET_INSTR_IDX> and <SINK_INSTR_IDX> are the indices of a RET instruction and the final sink instruction respectively. Please refer to our paper for details of this case study.

Forward analysis is powerful and frequently used in our experiments including case studies, ditto. Here we give an example of Out-Of-Bounds(OOB) write analysis.

Senario OOB write: A buffer overflow occurs when the program copies data from a input file by memcpy().

FMQuery> Query SYSCALL read[2] FORWARD INSTR call[53]   # memcpy()    
...
RAX -> ... RDX ...
...

Also, the return value (RAX) of a READ syscall is the num of read bytes. It affects the third argument (RDX) of memcpy(), which is number of bytes to copy. Then we verify dataflows from read syscall to the memcpy dst buffer and the existence of OOB.

FMQuery> Query SYSCALL read[2] FORWARD INSTR ret[49]   # return from memcpy()    
...
0x7fff9479b048 -> 0x7faebe7e53c0 ...
0x7fff9479b049 -> 0x7faebe7e53c1 ...   # Overflow!
...

Query Usage

Query Usage:

Query <Vars1Type> <Vars1> <Direction> <Vars2Type> <Vars2>
Arguments:
    Vars1Type, Vars2Type: {S(yscall), I(nstr), R(egister), M(emory)}.
    Vars1, Vars2: 
        (if Type is SYSCALL) : {SyscallName, SyscallName[id]}. E.g., "read", "read[1]", "read,recv*", "recv*"
        (if Type is INSTR)   : {InstrIndexInTrace, Instr}. E.g., "1", "ret", "push,pop", "mov*rax*"
        (if Type is REGISTER): {RegisterName}. E.g., "rip", "xmm1,xmm2,xmm3,xmm4"
        (if Type is MEMORY)  : {MemAddr}. E.g., "0x800000", "0x800000-0x801000", "0x0-0x1,0x4-0x5,0x8"
    Direction: {T(o), F(orward), B(ackward)}
        TO      : Query dataflows from Vars1 to Vars2
        FORWARD : Pefrom forward analysis. Query dataflows from Vars1 to positions of Vars2
        BACKWARD: Pefrom backward analysis. Query dataflows from positions of Vars1 to Vars2
*All IDs start from 1. '*' is the wildcard for advanced matching.

QueryInRange Usage:

QueryInRange <Range> <Vars1Type> <Vars1> <Direction> <Vars2Type> <Vars2>
Arguments:
    Range: (src,dest).
    (Others are the same with Query command.)

Usage for all commands

 FMQuery> help
 - help
        This help message
 - exit
        Quit the session
 - WhichDB
        Show which database is worked on.
 - ListTraces
        List all traces in database.
 - WorkOn <string>
        Select which trace to work on.
 - TraceSize
        Show size of the current trace.
 - PrepareQuery <int> <int>
        Pre-compute intermedia query results for later query on range (start, end).
 - Query <string> <string> <string> <string> <string>
        Perform data flow query. E.g. Query SYSCALL read,recv TOWARDS REG rip
 - QueryInRange <string> <string> <string> <string> <string> <string>
        Perform data flow query with-in a sub range
 - MultiplyTwo <int> <int>
        Multiply specified two instrutions.
 - TracePrint <int>
        Print one instruction from trace by instruction index
 - TracePrintRange <int> <int>
        Print a list of instructions.
 - PrintOne <int>
        Print data flows of one instruction by index.
 - SyscallPrintAll
        List all syscalls in trace.
 - SyscallPrint <int>
        Print one syscall in trace by its index.
 - LastResult
        Print the raw matrix of the last one-to-one query.
 - Usage <string>
        Show usage of the specified command.
 - Settings
        (menu)

flowmatrix's People

Contributors

mimicji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

flowmatrix's Issues

How to build my own flow matrix information database?

Hi!Can you answer my doubts? Thank you very much!
The db file contains the flow matrix information corresponding to the binary program. If I want to generate a corresponding db file for my binary program, how should I build it.
And whether the instruction flow matrix information of the db file of a binary program can be multiplexed to multiple binary programs with the same instruction.

How to use FlowMatrix in virtual machine?

I tried to configure this project on vmware workstations and got the following error:

CGDX5 8Z9R5((X2I68`GOWA

Is it because the Nvidia GPU cannot be used in the virtual machine?
Does this FlowMatrix project must require a GPU to run normally? Can it support the CPU version?

Run sample error

Hi!
When I run the example, I meet this problem.
How should I go about getting a useful database?
QIJ)B~ NF}Q@VH1TOZP_%E1

Makefile Error

Hello,

I got this error while running the Makefile. What is TraceGenerator and TraceLoader? Where can I find or generate them?

g01f22@g01f22:~/Documents/FlowMatrix$ make -j
mkdir -p ./bin
(cd ./libFlowMatrix/ && make -j)
make[1]: Entering directory '/home/g01f22/Documents/FlowMatrix/libFlowMatrix'
make[1]: Nothing to be done for 'test'.
make[1]: Leaving directory '/home/g01f22/Documents/FlowMatrix/libFlowMatrix'
(cd ./FMDynamic/ && make -j)
make[1]: Entering directory '/home/g01f22/Documents/FlowMatrix/FMDynamic'
g++ -o bin/TraceLoader -O3 -g -Wall -Wextra -std=c++17 src/TraceLoader//TraceLoader.o DBCore.o FMRule.o FMQuery.o ..//trace/FMtrace//lib//libfmtrace.a ..//libFlowMatrix/lib/libfm.a ..//FMCuSparseLinAlg//lib//libfmlinalg.a -L/usr/local/cuda-11.3/bin/../lib64 -lcudart -lcusparse -lcapstone -lprotobuf -lsqlite3 -pthread
g++ -o bin/TraceGenerator -O3 -g -Wall -Wextra -std=c++17 src/TraceLoader//TraceGenerator.o DBCore.o FMRule.o FMQuery.o ..//trace/FMtrace//lib//libfmtrace.a ..//libFlowMatrix/lib/libfm.a ..//FMCuSparseLinAlg//lib//libfmlinalg.a -L/usr/local/cuda-11.3/bin/../lib64 -lcudart -lcusparse -lcapstone -lprotobuf -lsqlite3 -pthread
g++ -o bin/QueryCLI -O3 -g -Wall -Wextra -std=c++17 src/QueryCLI//QueryCLI.o DBCore.o FMRule.o FMQuery.o ..//trace/FMtrace//lib//libfmtrace.a ..//libFlowMatrix/lib/libfm.a ..//FMCuSparseLinAlg//lib//libfmlinalg.a -L/usr/local/cuda-11.3/bin/../lib64 -lcudart -lcusparse -lcapstone -lprotobuf -lsqlite3 -pthread
g++ -o bin/Worker -O3 -g -Wall -Wextra -std=c++17 src/QueryCLI//Worker.o DBCore.o FMRule.o FMQuery.o ..//trace/FMtrace//lib//libfmtrace.a ..//libFlowMatrix/lib/libfm.a ..//FMCuSparseLinAlg//lib//libfmlinalg.a -L/usr/local/cuda-11.3/bin/../lib64 -lcudart -lcusparse -lcapstone -lprotobuf -lsqlite3 -pthread
/usr/bin/ld: cannot find -lsqlite3
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:70: bin/Worker] Error 1
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: cannot find -lsqlite3
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:67: bin/TraceGenerator] Error 1
/usr/bin/ld: cannot find -lsqlite3
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:64: bin/TraceLoader] Error 1
/usr/bin/ld: cannot find -lsqlite3
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:61: bin/QueryCLI] Error 1
make[1]: Leaving directory '/home/g01f22/Documents/FlowMatrix/FMDynamic'
make: *** [Makefile:14: FMDynamic] Error 2
g01f22@g01f22:~/Documents/FlowMatrix$

Make error

Hello ! I am very interested in your work.
When I make the project, the following error message appeared:

/usr/bin/ld: src/TraceLoader//TraceLoader.o: in function acorn_obj::TaintRule::TaintRule()': /home/kaihang/project/FlowMatrix/FMDynamic/..//libFlowMatrix/include/acorns_obj.pb.h:1115: undefined reference to acorn_obj::TaintRule::TaintRule(google::protobuf::Arena*)'
collect2: error: ld returned 1 exit status

How should i fix this error? Thank you !

Cannot find TaintInduce online

Hi,

I cannot seem to find any library named TaintInduce other than this one. It is a python package and some of its dependencies is not found online too (like squirrel-framework). Could you please provide a link for the TaintInduce library that you used?

Makefile Error

When I try to compile using make -j

I get this:

g1@csep072178g1:~/FlowMatrix$ make -j
mkdir -p ./bin
(cd ./libFlowMatrix/ && make -j)
make[1]: Entering directory '/home/g1/FlowMatrix/libFlowMatrix'
(cd ./../trace/FMtrace/ && make)
protoc *.proto --cpp_out=./src
make[2]: Entering directory '/home/g1/FlowMatrix/trace/FMtrace'
g++ -O3 -Wall -Wextra -std=c++17 -fPIC -I./include -c src/trace.cpp
g++ -O3 -Wall -Wextra -std=c++17 -fPIC -I./include -c src/common.cpp
g++ -O3 -Wall -Wextra -std=c++17 -fPIC -I./include -c src/FMTraceReader.cpp
g++ -c -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include src//acorns_obj.pb.cc -lcapstone -lprotobuf
g++ -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include -c src/isa.cpp
g++ -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include -c src/ProtoRuleDB.cpp
g++ -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include -c src/FMCommon.cpp
g++ -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include -c src/matrix_base.cpp
g++ -O3 -g -Wall -Wextra -std=c++17 -fPIC -I./include -I. -I./../trace/FMtrace//include -c src/VState.cpp
In file included from src//acorns_obj.pb.cc:4:
./include/acorns_obj.pb.h:11:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   11 | #include "google/protobuf/port_def.inc"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from ./include/ProtoRuleDB.hpp:4,
                 from src/ProtoRuleDB.cpp:1:
./include/acorns_obj.pb.h:11:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   11 | #include "google/protobuf/port_def.inc"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:48: acorns_obj.pb.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:70: ProtoRuleDB.o] Error 1
g++ -shared -Wl,-soname,libfmtrace.so -o ./lib/libfmtrace.so -O3 -Wall -Wextra -std=c++17 -fPIC -lcapstone trace.o common.o
ar rv ./lib/libfmtrace.a trace.o common.o
g++ -o bin/FMTraceReader -O3 -Wall -Wextra -std=c++17 -fPIC FMTraceReader.o trace.o common.o -lcapstone
ar: creating ./lib/libfmtrace.a
a - trace.o
a - common.o
ranlib ./lib/libfmtrace.a
make[2]: Leaving directory '/home/g1/FlowMatrix/trace/FMtrace'
make[1]: Leaving directory '/home/g1/FlowMatrix/libFlowMatrix'
make: *** [Makefile:13: FMDynamic] Error 2

I tried to edit the Makefile to adjust the path of PROTOBUF_PATH as I thought it was the problem but it doesn't work. I suspect that it has to with protobuf itself. Where should the /google/protobuf directory be located? When I searched, the port_def.inc file was in /protobuf/src/google/protobuf/port_def.inc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.