Code Monkey home page Code Monkey logo

bcov's Introduction

bcov

A tool for efficient binary-level coverage analysis. bcov statically instruments x86-64 ELF binaries without compiler support. It features probe pruning, precise CFG analyses, and sophisticated instrumentation techniques. We summarized this research in a 2-min teaser video.

Resources

  • Details are available in our ESEC/FSE'20 paper. You can find a slightly expanded pre-print here.
  • We have a post on the llvm-dev mailing list. It briefly introduces bcov and discusses potential future work.
  • This blog post elaborates on the availability of function definitions in stripped binaries.
  • You can find our ESEC/FSE'20 talk here. The slide deck (with transcript) is also available.
  • A sample set of binaries that we patched using bcov is available here. However, the complete set of benchmarks is available on our archival repository.

Software prerequisites

The following software packages must be available:

  • capstone branch next commit #c3b4ce1901
  • unicorn branch master commit #536c4e77c4

Later versions of both frameworks should work in principle but have not been tested yet. The script install.sh can be used for installation.

Research reproducibility

We provide a Dockerfile which installs bcov and runs a coverage analysis experiment. Please checkout the supplemental artifacts for more details.

Usage

The tool supports the following operation modes which are set using the option --mode (or simply -m):

  • patch. Patch a given binary.
  • report. Report coverage given a patched binary and a coverage data file.
  • dump. Dump various program graphs for a given function. For example, dump the CFG and dominator trees.

The following command can be issued to patch a binary,

bcov -m patch -p any -v 5 -i perl -o perl.any

The instrumentation policy can be set to any, which refers to the any-node policy, or all which refers to the leaf-node policy.

Coverage data can be dumped by injecting libbcov-rt.so using the LD_PRELOAD mechanism. For example, you can try the sample binary perl.any which can be found in the artifacts repository,

export BCOV_OPTIONS="coverage_dir=$PWD"   # sets the directory for dumping coverage data. Defaults to $PWD
export LD_PRELOAD="[full-path-to-bcov-rt]/libbcov-rt.so"
./perl.any -e 'print "Hello, bcov!\n"'

This will produce a dump file that has the extension '.bcov' in your current directory. This file can be supplied to bcov for coverage reporting,

bcov -m report -p any -i ./perl -d perl.any.1588260679.1816.bcov > report.out

Currently, bcov can not persist analysis results to disk. Therefore, the original binary must be re-analyzed to report coverage. Coverage will be reported for each basic block in the file report.out. The data in each line lists:

  • BB address
  • BB instruction count
  • is covered
  • is fallthrough (i.e., does not terminate with a branch)

Also, a coverage summary is reported for each function. For example, it shows the basic block and instruction coverage ratios.

For a given function, it is possible to selectively dump various program graphs like the CFG and superblock dominator graph. For example, consider function S_search_const in perl,

bcov -m dump -f "S_search_const" -i ./perl

This command will dump the following files:

  • func_421d90.cfg.dot. The CFG of the function.
  • func_421d90.rev.cfg.dot. Similar to the CFG but with all edges reversed.
  • func_421d90.pre.dom.dot. Predominator tree.
  • func_421d90.post.dom.dot. Postdominator tree.
  • func_421d90.sb.dom.dot. Superblock dominator graph.

Graphs are dumped in the standard DOT format and can be viewed using a dot viewer like xdot. Please refer to this blog post for additional details.

Citing

For citation in an academic work please use:

@inproceedings{BenKhadra:FSE2020,
address = {Virtual Event, USA},
author = {{Ben Khadra}, M. Ammar and Stoffel, Dominik and Kunz, Wolfgang},
booktitle = {ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE'20},
doi = {10.1145/3368089.3409694},
pages = {1153--1164},
publisher = {ACM Press},
title = {{Efficient Binary-Level Coverage Analysis}},
year = {2020},
month = {nov},
day = {6--13}    
}

License

This software is distributed under the MIT license. See LICENSE.txt for details.

bcov's People

Contributors

abenkhadra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bcov's Issues

Aborting application. Reason: Fatal log at [xxx/bcov/src/graph/DominatorTree.cpp:210]

Hi,

Here is a crash I have encountered during the binary analysis using bcov. The binary test2 is compiled with gcc -w -O1 -fno-inline test.c -o test2 and the compiler version is gcc version 13.0.0 20220528 (experimental) (GCC).

$bcov -m dump -f "func_1" -i test2
Aborted (core dumped)

$cat bcov.log
I | call graph built successfully
I | elf section <.gcc_except_table> not found
I | fde refers to a nonstatic function @ 401020
I | eh_frame function count 2019 while static function count 2025
F | Check failed: [ordered_vertices().size() > 2] 
W | Aborting application. Reason: Fatal log at [/home/haoxin/disk-dut/research/github/bcov/packages/bcov/src/graph/DominatorTree.cpp:210]

Here is the binary I used:
test2.zip

Thanks!

Support more than one RW PT_LOAD (LLD's default layout )

LLD enables -z relro by default. Since https://reviews.llvm.org/D58892 , LLD switches to R RX RW(relro) RW(non-relro).

bcov currently crashes when patching an LLD linked executable due to:

     DCHECK(regions[ElfModule::Impl::kDataRegionIdx].size() >
            regions[ElfModule::Impl::kRelRoRegionIdx].size());

A larger issue is that kDataRegionIdx and its friends assume a particular layout of program headers. It'd be nice to not assume a particular layout.

I'd recommend testing the following 4 layouts for good platform portability:

  • -fuse-ld=bfd -z noseparate-code (traditional layout)
  • -fuse-ld=bfd -z separate-code (default on Linux x86 since binutils 2.31 2018-02-27 commit "ld: Add --enable-separate-code" made -z separate-code the default on Linux.")
  • -fuse-ld=lld -z noseparate-code (default since LLD 10.0.0 https://reviews.llvm.org/D67482)
  • -fuse-ld=lld -z separate-code (older LLD layout)

Many binutils packages default to -z relro now. It'd be good to test -z norelro as well.

Is it possible to get the marker information in the binary code?

Hi,

Thanks for your nice tool! May I ask something about CFG reported by bcov?

In my current work, I have inserted some marker functions to every block in the source code and then I compiled it to binary. I was expecting in CFG produced by bcov could see some markers in the dot file but not. So, is it possible to do so?

For example, consider the following simple experiment:

$objdump -S test1 | grep marker
(I only present some of them)
411405:	e8 28 1a ff ff       	callq  402e32 <marker_98>
  411432:	e8 3a 1a ff ff       	callq  402e71 <marker_101>
  411696:	e8 44 18 ff ff       	callq  402edf <marker_106>
  411753:	e8 f5 17 ff ff       	callq  402f4d <marker_111>
  41179f:	e8 01 18 ff ff       	callq  402fa5 <marker_115>
  4119cc:	e8 b4 b4 ff ff       	callq  40ce85 <marker_99999>
  411a42:	e8 f8 15 ff ff       	callq  40303f <marker_122>
  411c5d:	e8 e5 14 ff ff       	callq  403147 <marker_134>
  411ca3:	e8 f2 b1 ff ff       	callq  40ce9a <marker_88888>
  411cb9:	e8 e1 14 ff ff       	callq  40319f <marker_138>
  411e50:	e8 26 14 ff ff       	callq  40327b <marker_148>
  411e85:	e8 49 14 ff ff       	callq  4032d3 <marker_152>
  412139:	e8 17 13 ff ff       	callq  403455 <marker_170>
  412197:	e8 13 ad ff ff       	callq  40ceaf <marker_start>
  41252c:	e8 93 a9 ff ff       	callq  40cec4 <marker_end>

But in the dot file generated by bcov, none of the marker functions are presented.

$bcov -m dump -f "main" -i ./test1
$cat func_412181.cfg.dot | grep marker
(nothing)

Here is the binary I used:
test1.zip

So, is it possible to make those makers be shown in the dot file under the current design of bcov? Or did I miss anything here? (I realized that bcov only reports markers in main function but ignores ones in the body of the function call, e.g., func_1 in this case. Is possible to build the whole CFG that includes the body of the function calls?)

Thank you so much for your time and looking forward to your reply!

Best regards,
Haoxin

Check failed: [data != nullptr] (Aborted)

Steps to reproduce,

curl https://binaries.cockroachdb.com/cockroach-v21.2.3.linux-amd64.tgz | tar -xz && sudo cp -i cockroach-v21.2.3.linux-amd64/cockroach /usr/local/bin/

bcov -m patch -p any -v 10 -i /usr/local/bin/cockroach -o cockroach.any
Aborted
tail bcov.log
F | Check failed: [data != nullptr]
W | Aborting application. Reason: Fatal log at [/home/vagrant/bcov-artifacts/packages/bcov/src/elf/ElfModule.cpp:11c]

bcov -m patch / Error: "weird! no probes identified!"

$ bcov -m patch -p any -v 5 -i ./foo -o ./foo_patched

weird! no probes identified!

Further info provided by rizin

fd       3
file     foo
size     0xbb6728
humansz  11.7M
mode     r-x
format   elf64
iorw     false
block    0x100
type     EXEC (Executable file)
arch     x86
cpu      N/A
baddr    0x00400000
binsz    0x00bb5f65
bintype  elf
bits     64
class    ELF64
compiler GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
dbg_file N/A
endian   LE
hdr.csum N/A
guid     N/A
intrp    /lib64/ld-linux-x86-64.so.2
laddr    0x00000000
lang     c++
machine  AMD x86-64 architecture
maxopsz  16
minopsz  1
os       linux
cc       N/A
pcalign  0
relro    partial
rpath    $ORIGIN:$ORIGIN/A:$ORIGIN/../A
subsys   linux
stripped true
crypto   false
havecode true
va       true
sanitiz  false
static   false
linenum  false
lsyms    false
canary   true
PIE      false
RELROCS  false
NX       true

bcov -m patch / Error: string offset exceeds section size

$ bcov -m patch -p any -v 5 -i ./foo -o ./foo_patched

terminate called after throwing an instance of 'std::range_error'
  what():  string offset 2425393159 exceeds section size

Further info provided by rizin

fd       3
file     foo
size     0xbb8728
humansz  11.7M
mode     r-x
format   elf64
iorw     false
block    0x100
type     EXEC (Executable file)
arch     x86
baddr    0x400000
binsz    12287845
bintype  elf
bits     64
canary   true
class    ELF64
compiler GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
crypto   false
endian   little
havecode true
intrp    /lib64/ld-linux-x86-64.so.2
laddr    0x0
lang     c++
linenum  false
lsyms    false
machine  AMD x86-64 architecture
maxopsz  16
minopsz  1
nx       true
os       linux
pcalign  0
pic      false
relocs   false
relro    partial
rpath    $ORIGIN:$ORIGIN/foo:$ORIGIN/../foo
sanitiz  false
static   false
stripped true
subsys   linux
va       true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.