Code Monkey home page Code Monkey logo

macaw's Introduction

This is the main repository for the Macaw binary analysis framework with two key goals: binary code discovery and symbolic execution of machine code. This framework is implemented to offer extensible support for architectures (i.e., library clients can add their own architectures and opt in to the architecture support they need).

Overview

The code discovery algorithm is based on forced execution and is able to discovery code from one or more entry points. Symbols are optional but can significantly improve the quality of the results. Stripped binaries can pose a challenge for macaw (especially static stripped binaries). Macaw provides support for lifting discovered machine code into an IR suitable for symbolic execution via the Crucible library.

Currently, macaw supports:

  • x86-64
  • PowerPC (32 and 64 bit)
  • ARM (32 bit)
  • RISC-V

Repository Structure

The Macaw libraries are:

  • macaw-base -- The core architecture-independent operations and algorithms.
  • macaw-symbolic -- Library that provides symbolic simulation of Macaw programs via Crucible.
  • macaw-x86 -- Provides definitions enabling Macaw to be used on X86_64 programs.
  • macaw-x86-symbolic -- Adds Macaw-symbolic extensions needed to support x86.
  • macaw-semmc -- Contains the architecture-independent components of the translation from semmc semantics into macaw IR. This provides the shared infrastructure for all of our backends; this will include the Template Haskell function to create a state transformer function from learned semantics files provided by the semmc library.
  • macaw-arm -- Enables macaw for ARM (32-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
  • macaw-arm-symbolic -- Enables macaw/crucible symbolic simulation for ARM (32-bit) architectures.
  • macaw-ppc -- Enables macaw for PPC (32-bit and 64-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
  • macaw-ppc-symbolic -- Enables macaw/crucible symbolic simulation for PPC architectures
  • macaw-riscv -- Enables macaw for RISC-V (RV32GC and RV64GC variants) binaries.
  • macaw-refinement -- Enables additional architecture-independent refinement of code discovery. This can enable discovery of more functionality than is revealed by the analysis in macaw-base.

The libraries that make up Macaw are released under the BSD license.

These Macaw core libraries depend on a number of different supporting libraries, including:

  • elf-edit -- loading and parsing of ELF binary files
  • galois-dwarf -- retrieval of Dwarf debugging information from binary files
  • flexdis86 -- disassembly and semantics for x86 architectures
  • dismantle -- disassembly for ARM and PPC architectures
  • semmc -- semantics definitions for ARM and PPC architectures
  • crucible -- Symbolic execution and analysis
  • what4 -- Symbolic representation for the crucible backend
  • parameterized-utils -- utilities for working with parameterized types

Building

Preparation

Dependencies for building Macaw that are not obtained from Hackage are supported via Git submodules:

$ git submodule update --init

Preparing Softfloat for RISC-V Backend

The RISC-V backend depends on softfloat-hs, which in turn depends on the softfloat library. Macaw's build system will automatically build softfloat, but the softfloat-hs repo must be recursively cloned to enable this. If you are not building macaw-riscv you can skip this step. To recursively clone softfloat-hs, run:

$ cd deps/softfloat-hs
$ git submodule update --init --recursive

Building with Cabal

The Macaw libraries can be individually built or collectively built with Cabal:

$ ln -s cabal.project.dist cabal.project
$ cabal configure
$ cabal build all

To build a single library, either specify that library name instaed of all, or change to that library's subdirectory before building:

$ cabal build macaw-refinement

or

$ cd refinement
$ cabal build

Notes on Freeze Files

We use the cabal.project.freeze.ghc-* files to constrain dependency versions in CI. To build with a known-working set of Hackage dependencies:

ln -s cabal.GHC-<VER>.config cabal.project.freeze

These freeze files were generated using the scripts/regenerate-freeze-files.sh script. Note that at present, these configuration files assume a Unix-like operating system, as we do not currently test Windows on CI. If you would like to use these configuration files on Windows, you will need to make some manual changes to remove certain packages and flags:

regex-posix
tasty +unix
unix
unix-compat

Note that if any of the macaw packages fail to build without the freeze files, it is a bug in the dependency version bounds specified in the .cabal files that should be reported (https://github.com/GaloisInc/macaw/issues).

License

This code is made available under the BSD3 license and without any support.

macaw's People

Contributors

andreistefanescu avatar bboston7 avatar benjaminselfridge avatar chameco avatar dagit avatar danmatichuk avatar dmjio avatar dmwit avatar felixonmars avatar glguy avatar jldodds avatar joehendrix avatar kquick avatar langston-barrett avatar lcasburn avatar lisanna-dettwyler avatar lukemaurer avatar ntc2 avatar ptival avatar robdockins avatar ryanglscott avatar scogginsnl avatar simonjwinwood avatar staslyakhov avatar thebendavis avatar travitch avatar yav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

macaw's Issues

Build error due to -Werror and incomplete pattern match

Incomplete pattern match:

Building library for macaw-x86-0.0.1..
[11 of 15] Compiling Data.Macaw.X86.Getters ( src/Data/Macaw/X86/Getters.hs, /Users/tommd/dev/reopt/dist-newstyle/build/x86_64-osx/ghc-8.2.1/macaw-x86-0.0.1/build/Data/Macaw/X86/Getters.o )

src/Data/Macaw/X86/Getters.hs:182:3: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: (F.Mem256 _)
    |
182 |   case v of
    |   ^^^^^^^^^...

`macaw-aarch32` classifies `b` instruction as jump rather than tail call

Given this program:

#include <stdio.h>

void main() {
  printf("%d\n", 42);
}

If you compile it like so:

$ armv7m-linux-musleabi-gcc -O2 -nostartfiles -no-pie test.c -o test.exe

You'll end up with the following assembly:

$ armv7m-linux-musleabi-objdump -d test.exe

test.exe:     file format elf32-littlearm


Disassembly of section .plt:

00010198 <.plt>:
   10198:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
   1019c:       e59fe004        ldr     lr, [pc, #4]    ; 101a8 <.plt+0x10>
   101a0:       e08fe00e        add     lr, pc, lr
   101a4:       e5bef008        ldr     pc, [lr, #8]!
   101a8:       00010e58        .word   0x00010e58

000101ac <printf@plt>:
   101ac:       e28fc600        add     ip, pc, #0, 12
   101b0:       e28cca10        add     ip, ip, #16, 20 ; 0x10000
   101b4:       e5bcfe58        ldr     pc, [ip, #3672]!        ; 0xe58

Disassembly of section .text:

000101b8 <main>:
   101b8:       e59f0008        ldr     r0, [pc, #8]    ; 101c8 <main+0x10>
   101bc:       e3a0102a        mov     r1, #42 ; 0x2a
   101c0:       e08f0000        add     r0, pc, r0
   101c4:       eafffff8        b       101ac <printf@plt>
   101c8:       00000004        .word   0x00000004

Notice that main tail-calls to printf@plt using a b instruction. When macaw-aarch32 runs code discovery on this program, however, it classifies this as a jump rather than a tail call. Here is a test harness to demonstrate this happening:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where

import qualified Data.ByteString as BS
import qualified Data.ElfEdit as Elf
import           Data.Parameterized.Some ( Some(..) )
import qualified Prettyprinter as PP

import           Data.Macaw.AArch32.Symbolic ()
import qualified Data.Macaw.ARM as MA
import qualified Data.Macaw.Symbolic.Testing as MST

main :: IO ()
main = mkTest "test.exe"

mkTest :: FilePath -> IO ()
mkTest exePath = do
  bytes <- BS.readFile exePath
  case Elf.decodeElfHeaderInfo bytes of
    Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
    Right (Elf.SomeElf ehi) -> do
      case Elf.headerClass (Elf.header ehi) of
        Elf.ELFCLASS32 -> do
          (_mem, funInfos) <- MST.runDiscovery ehi MST.toAddrSymMap MA.arm_linux_info
          print $ PP.vcat $ map (\(Some x) -> PP.pretty x) funInfos
        Elf.ELFCLASS64 -> fail "64 bit ARM is not supported"
$ runghc-9.0.2 Main.hs
function main @ 0x101b8
0x101ac:
  ;  _R13 = stack_frame + 0
  ;  __PendingInterrupt = __AssertionFailure
  ;  __PendingPhysicalSError = __AssertionFailure
  ;  __Sleeping = __AssertionFailure
  ;  __UndefinedBehavior = __AssertionFailure
  ;  __UnpredictableBehavior = __AssertionFailure
  ;  PSTATE_IT in (8 0 0)
  ;  PSTATE_T in (1 0 0)
  ;  PSTATE_nRW in (1 1 1)
  ;  _R1 in (32 42 42)
  # 0x101ac 0x101ac: ADR_A1(xxxxxxxx.xxxxxxxx.10001111.xxxx0010) Rd 12, cond 14, imm12 1536
  # 0x101ac: ADR_A1(xxxxxxxx.xxxxxxxx.10001111.xxxx0010) Rd 12, cond 14, imm12 1536
  # 0x101ac: {PSTATE_T => 0x0 :: [1]
             ;_PC => 0x101b0
             ;_R12 => 0x101b4 :: [32]
             ;__AssertionFailure => false
             ;__BranchTaken => false
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  # 0x101b0 0x101b0: ADD_i_A1(xxxxxxxx.xxxxxxxx.1000xxxx.xxxx0010) Rd 12, Rn 12, S 0, cond 14, imm12 2576
  # 0x101b0: ADD_i_A1(xxxxxxxx.xxxxxxxx.1000xxxx.xxxx0010) Rd 12, Rn 12, S 0, cond 14, imm12 2576
  # 0x101b0: {PSTATE_T => 0x0 :: [1]
             ;_PC => 0x101b4
             ;_R12 => 0x201b4 :: [32]
             ;__AssertionFailure => false
             ;__BranchTaken => false
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  # 0x101b4 0x101b4: LDR_i_A1_pre(xxxxxxxx.xxxxxxxx.x011xxxx.xxxx0101) P 1, Rn 12, Rt 15, U 1, W 1, cond 14, imm12 3672
  # 0x101b4: LDR_i_A1_pre(xxxxxxxx.xxxxxxxx.x011xxxx.xxxx0101) P 1, Rn 12, Rt 15, U 1, W 1, cond 14, imm12 3672
  r248 := read_mem (0x2100c :: [32]) (bvle4)
  r249 := (bv_and r248 (0x1 :: [32]))
  r250 := (trunc r249 1)
  r251 := (eq r250 (0x1 :: [1]))
  r252 := (mux r251 (0x1 :: [1]) (0x0 :: [1]))
  # 0x101b4: {PSTATE_T => r252
             ;_PC => r248
             ;_R12 => 0x2100c :: [32]
             ;__AssertionFailure => false
             ;__BranchTaken => true
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  tail_call
    { PSTATE_IT = 0x0 :: [8]
    , PSTATE_T = r252
    , PSTATE_nRW = 0x1 :: [1]
    , _PC = r248
    , _R12 = 0x2100c :: [32]
    , __AssertionFailure = false
    , __BranchTaken = true
    , __PendingInterrupt = false
    , __PendingPhysicalSError = false
    , __Sleeping = false
    , __UndefinedBehavior = false
    , __UnpredictableBehavior = false
    }
0x101b8:
  ;  _R13 = stack_frame + 0
  # 0x101b8 0x101b8: LDR_l_A1(xxxxxxxx.xxxxxxxx.x0x11111.xxxx010x) P 1, Rt 0, U 1, W 0, cond 14, imm12 8
  # 0x101b8: LDR_l_A1(xxxxxxxx.xxxxxxxx.x0x11111.xxxx010x) P 1, Rt 0, U 1, W 0, cond 14, imm12 8
  r122 := read_mem (0x101c8 :: [32]) (bvle4)
  # 0x101b8: {PSTATE_T => 0x0 :: [1]
             ;_PC => 0x101bc
             ;_R0 => r122
             ;__AssertionFailure => false
             ;__BranchTaken => false
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  # 0x101bc 0x101bc: MOV_i_A1(xxxxxxxx.xxxxxxxx.1010OOOO.xxxx0011) Rd 1, S 0, cond 14, imm12 42, QuasiMask0 QuasiMask"("4): 0
  # 0x101bc: MOV_i_A1(xxxxxxxx.xxxxxxxx.1010OOOO.xxxx0011) Rd 1, S 0, cond 14, imm12 42, QuasiMask0 QuasiMask"("4): 0
  # 0x101bc: {PSTATE_T => 0x0 :: [1]
             ;_PC => 0x101c0
             ;_R1 => 0x2a :: [32]
             ;__AssertionFailure => false
             ;__BranchTaken => false
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  # 0x101c0 0x101c0: ADD_r_A1(xxx0xxxx.xxxxxxxx.1000xxxx.xxxx0000) Rd 0, Rm 0, Rn 15, S 0, cond 14, imm5 0, type1 0
  # 0x101c0: ADD_r_A1(xxx0xxxx.xxxxxxxx.1000xxxx.xxxx0000) Rd 0, Rm 0, Rn 15, S 0, cond 14, imm5 0, type1 0
  r166 := (uext r122 65)
  r167 := (bv_add r166 (0x101c8 :: [65]))
  r168 := (trunc r167 32)
  # 0x101c0: {PSTATE_T => 0x0 :: [1]
             ;_PC => 0x101c4
             ;_R0 => r168
             ;__AssertionFailure => false
             ;__BranchTaken => false
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false
             ;() => ()_0}
  # 0x101c4 0x101c4: B_A1(xxxxxxxx.xxxxxxxx.xxxxxxxx.xxxx1010) cond 14, imm24 16777208
  # 0x101c4: B_A1(xxxxxxxx.xxxxxxxx.xxxxxxxx.xxxx1010) cond 14, imm24 16777208
  # 0x101c4: {_PC => 0x101ac
             ;__AssertionFailure => false
             ;__BranchTaken => true
             ;__UndefinedBehavior => false
             ;__UnpredictableBehavior => false}
  jump 0x101ac
    { PSTATE_IT = 0x0 :: [8]
    , PSTATE_T = 0x0 :: [1]
    , PSTATE_nRW = 0x1 :: [1]
    , _PC = 0x101ac
    , _R0 = r168
    , _R1 = 0x2a :: [32]
    , __AssertionFailure = false
    , __BranchTaken = true
    , __PendingInterrupt = false
    , __PendingPhysicalSError = false
    , __Sleeping = false
    , __UndefinedBehavior = false
    , __UnpredictableBehavior = false
    }

Notice the jump 0x101ac bit. While classifying it as a jump is not technically wrong, failing to classify it as a tail call means that it does not participate in LookupFunctionHandle, which makes it difficult to perform an action when invoking printf@plt.

Additional CI targets

I'd like to get macaw-ppc, macaw-ppc-symbolic, macaw-aarch32-symbolic, and macaw-refinement under CI so that they don't get accidentally missed in updates.

macaw-aarch32{-symbolic} test suite failures on GHC 9.0

While working towards making a branch that builds with GHC 9.0, I discovered that the test suites in macaw-aarch32 and macaw-aarch32-symbolic will fail with arithmetic overflow exceptions. The culprit is this line in macaw-semmc:

BVSar sz l r -> binopbv (\l' r' -> shiftR (toSigned sz l') (fromIntegral (toSigned sz r'))) sz l r

Several tests will invoke signed right shifts by an amount equal to 0xffffffff. In Haskell terms, this is shiftR x (-1). The documentation for shiftR states that shifting by a negative value is undefined behavior. While GHC 8.10.4 seems to be OK with negative shift amounts:

$ /opt/ghc/8.10.4/bin/ghci                                                                       
GHCi, version 8.10.4: https://www.haskell.org/ghc/  :? for help                                  
Loaded GHCi configuration from /home/rscott/.ghci                                                
λ> import Data.Bits                                                                              
λ> shiftR (1 :: Integer) (-1 :: Int)                                                             
2                                                                                                
λ> shiftR (2 :: Integer) (-1 :: Int)                                                             
4

GHC 9.0.1 is stricter and will always throw an arithmetic overflow exception:

$ /opt/ghc/9.0.1/bin/ghci                                                                        
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help                                   
Loaded GHCi configuration from /home/rscott/.ghci                                                
λ> import Data.Bits                                                                              
λ> shiftR (1 :: Integer) (-1 :: Int)                                                             
*** Exception: arithmetic overflow

`macaw-symbolic`: Lazily initialize global memory

Currently, macaw-symbolic's memory model creates an SMT array to back global memory and then immediately adds assertions relating to every byte in the memory. Doing all this work upfront is expensive, especially for large binaries. What's more, most binaries don't actually need to make use of the entirety of global memory, so many of these assertions will go unused.

Rather than asserting everything upfront, a more scalable approach would be to split the global memory up into smaller regions and only add assertions for each region on demand. The MemPtrTable data type already has much of the information it needs to do this, since it contains an IntervalMap that is divided up into MemChunks. We'd want to be even more fine-grained than that, however, since binaries will often have large MemChunks that can incur bad overhead when symbolically initialized. In order to track this state, we may need to track additional state in MacawSimulatorState.

In addition to lazily populating the symbolic bytes, another optimization opportunity is to avoid using the SMT array entirely when performing concrete reads from read-only memory. Such memory can never be changed during simulation, so there's no point using a (comparatively expensive) SMT array to track it.

`mkGlobalPointerValidityPred` reverses interval bounds

In this code:

case range of
IM.IntervalCO lo hi -> do
let w = memRepr mpt
lobv <- WI.bvLit sym w (BV.mkBV w (toInteger lo))
hibv <- WI.bvLit sym w (BV.mkBV w (toInteger hi))
lob <- WI.bvUlt sym lobv off
hib <- WI.bvUle sym off hibv
WI.andPred sym lob hib

IntervalCO indicates that the lower bound should be included and the upper bound should be excluded. The what4 formulas, on the other hand, reverse this order. bvUlt sym lobv off excludes the lower bound, and bvUle sym off hibv includes the upper bound. All other cases mix up the order in similar fashion.

Remove unneeded CPP

There is some unnecessary CPP in the Discovery module (see

#define USE_REWRITER
import Data.Macaw.AbsDomain.AbsState
import qualified Data.Macaw.AbsDomain.JumpBounds as Jmp
import Data.Macaw.AbsDomain.Refine
import qualified Data.Macaw.AbsDomain.StridedInterval as SI
import Data.Macaw.Architecture.Info
import Data.Macaw.CFG
import Data.Macaw.CFG.DemandSet
#ifdef USE_REWRITER
import Data.Macaw.CFG.Rewriter
#endif
)

macaw-ppc is broken after some recent macaw-base changes

Tracking down all of the issues currently at play. So far:

  • Additional type parameters to AbsProcessorState
  • mkInitialRegsForBlock and postCallAbsState has been removed from ArchitectureInfo
  • The type of postArchTermStmtAbsState has changed
  • New functions in ArchitectureInfo: extractBlockPrecond, initialBlockRegs, and archCallParams

LLVM 9 jump table size detection.

From @travitch, we may need to revisit jump table detection for LLVM 9. We currently use the range check to determine the size of the jump table, but LLVM 9 may omit them if it detects that an out-of-range access would trigger a UB.

Per the LLVM changes:

===============

LLVM will now omit range checks for jump tables when lowering switches with unreachable default destination. For example, the switch dispatch in the C++ code below

int g(int);
enum e { A, B, C, D, E };
int f(e x, int y, int z) {
switch(x) {
case A: return g(y);
case B: return g(z);
case C: return g(y+z);
case D: return g(x-z);
case E: return g(x+z);
}
}

will result in the following x86_64 machine code when compiled with Clang. This is because falling off the end of a non-void function is undefined behaviour in C++, and the end of the function therefore being treated as unreachable:

_Z1f1eii:
mov eax, edi
jmp qword ptr [8*rax + .LJTI0_0]

Make a test case for #477

PR #477 fixes the handling of call detection in thumb mode. This was a bit tricky because the ARM semantics do some masking and shifting of the return address to ensure the low bit is set. The change looks through that to detect returns properly. We need to add a test case to the test suite for it

Untangle the `macaw-loader` submodule knot

Currently the macaw repo depends on the macaw-loader repo as a submodule dependency. However, the macaw-loader repo also depends on the macaw repo as a submodule. This makes coordinating submodule bumps between the repos challenging, as they sometimes have to be updated simultaneously.

I'd like to propose that we find a way to make this situation more tolerable. Some possible ideas for doing so:

  1. Cut the submodule dependency knot by removing macaw's dependency on macaw-loader, thereby removing the macaw-loader submodule dependency from macaw. This would require changing some code in macaw to avoid the use of macaw-loader, however. In some cases, this would impact the public-facing API, such as in macaw-ppc and macaw-refinement.
  2. Bring the macaw-loader packages into the macaw repo. This would involve some amount of churn for downstream libraries, but it could be done.
  3. Something else?

Make `genArchVals` total

The GenArchInfo class can return Nothing, which is very annoying at use sites. The original design of this class was to let us specify this constraint for architectures that did not have symbolic backends defined yet. That is probably not a great choice overall. It is un-ergonomic, and we just define the symbolic backends with the normal backends anyway.

Inconsistent module names for ARM/AArch32

As demonstrated by the following import list, both x86_64 and PPC *-symbolic packages have modules with the same prefixes as the non-*-symbolic packages, whereas ARM does not:

import           Data.Macaw.ARM ( ARM )
import           Data.Macaw.AArch32.Symbolic ()
import           Data.Macaw.PPC ( PPC32, PPC64 )
import           Data.Macaw.PPC.Symbolic ()
import           Data.Macaw.X86 ( X86_64 )
import           Data.Macaw.X86.Symbolic ()

Add support for conditional jump tables

ARM supports conditional execution of most instructions. It also supports writing directly to the pc register. This means that conditional loads can (and are) used for jump table dispatch: ldrls pc, [pc, r3, lsl #2]. This pattern is used to combine a bounds check and dispatch in a single instruction.

There is no way to represent a conditional jump table dispatch in macaw currently. We would need to either:

  1. Introduce a new conditional jump table construct
  2. Make this an architecture-specific terminator

Note that (2) is not necessarily simple, as the logic for analyzing the jump table targets in the classifier needs to be available to resolve the abstract values referenced in the instruction. This means that the classifier logic will need to be made extensible.

Simulation failure on AArch32 binary using string literal

Consider this C program, appropriately decorated for use in macaw's test suite:

// bug.c
int __attribute__((noinline)) test_string_lit() {
  char x[5] = "ABCD";
  return x[0] == 'A';
}

void _start() {
  test_string_lit();
}

Compile this program with the an AArch32 version of musl-gcc:

$ armv7m-linux-musleabi-gcc -nostdlib -static -fno-stack-protector -no-pie bug.c -o bug.exe
$ armv7m-linux-musleabi-objdump -d bug.exe 

bug.exe:     file format elf32-littlearm


Disassembly of section .text:

00010074 <test_string_lit>:
   10074:       e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   10078:       e28db000        add     fp, sp, #0
   1007c:       e24dd00c        sub     sp, sp, #12
   10080:       e59f2038        ldr     r2, [pc, #56]   ; 100c0 <test_string_lit+0x4c>
   10084:       e08f2002        add     r2, pc, r2
   10088:       e24b300c        sub     r3, fp, #12
   1008c:       e8920003        ldm     r2, {r0, r1}
   10090:       e5830000        str     r0, [r3]
   10094:       e2833004        add     r3, r3, #4
   10098:       e5c31000        strb    r1, [r3]
   1009c:       e55b300c        ldrb    r3, [fp, #-12]
   100a0:       e3530041        cmp     r3, #65 ; 0x41
   100a4:       03a03001        moveq   r3, #1
   100a8:       13a03000        movne   r3, #0
   100ac:       e20330ff        and     r3, r3, #255    ; 0xff
   100b0:       e1a00003        mov     r0, r3
   100b4:       e28bd000        add     sp, fp, #0
   100b8:       e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
   100bc:       e12fff1e        bx      lr
   100c0:       0000004c        .word   0x0000004c

000100c4 <_start>:
   100c4:       e92d4800        push    {fp, lr}
   100c8:       e28db004        add     fp, sp, #4
   100cc:       ebffffe8        bl      10074 <test_string_lit>
   100d0:       e1a00000        nop                     ; (mov r0, r0)
   100d4:       e8bd8800        pop     {fp, pc}

Now run this test harness on bug.exe (the code was extracted from macaw-aarch32-symbolic's test suite):

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where

import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import qualified Data.ElfEdit as Elf
import qualified Data.Foldable as F
import           Data.Maybe ( mapMaybe )
import qualified Data.Parameterized.Classes as PC
import qualified Data.Parameterized.Nonce as PN
import           Data.Parameterized.Some ( Some(..) )
import           Data.Proxy ( Proxy(..) )

import qualified Language.ASL.Globals as ASL
import qualified Data.Macaw.Architecture.Info as MAI

import           Data.Macaw.AArch32.Symbolic ()
import qualified Data.Macaw.ARM as MA
import qualified Data.Macaw.ARM.ARMReg as MAR
import qualified Data.Macaw.Discovery as M
import qualified Data.Macaw.Symbolic as MS
import qualified Data.Macaw.Symbolic.Testing as MST
import qualified What4.Config as WC
import qualified What4.Expr as WE
import qualified What4.Expr.Builder as WEB
import qualified What4.Interface as WI
import qualified What4.ProblemFeatures as WPF
import qualified What4.Solver as WS

import qualified Lang.Crucible.Backend as CB
import qualified Lang.Crucible.Backend.Online as CBO
import qualified Lang.Crucible.Simulator as CS
import qualified Lang.Crucible.LLVM.MemModel as LLVM

main :: IO ()
main = do
  -- These are pass/fail in that the assertions in the "pass" set are true (and
  -- the solver returns Unsat), while the assertions in the "fail" set are false
  -- (and the solver returns Sat).
  let passRes = MST.SimulationResult MST.Unsat
  mkSymExTest passRes "bug.exe"

hasTestPrefix :: Some (M.DiscoveryFunInfo arch) -> Maybe (BS8.ByteString, Some (M.DiscoveryFunInfo arch))
hasTestPrefix (Some dfi) = do
  bsname <- M.discoveredFunSymbol dfi
  if BS8.pack "test_" `BS8.isPrefixOf` bsname
    then return (bsname, Some dfi)
    else Nothing

mkSymExTest :: MST.SimulationResult -> FilePath -> IO ()
mkSymExTest expected exePath = do
  bytes <- BS.readFile exePath
  case Elf.decodeElfHeaderInfo bytes of
    Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
    Right (Elf.SomeElf ehi) -> do
      case Elf.headerClass (Elf.header ehi) of
        Elf.ELFCLASS32 ->
          symExTestSized expected ehi MA.arm_linux_info
        Elf.ELFCLASS64 -> fail "64 bit ARM is not supported"

symExTestSized :: MST.SimulationResult
               -> Elf.ElfHeaderInfo 32
               -> MAI.ArchitectureInfo MA.ARM
               -> IO ()
symExTestSized expected ehi archInfo = do
   (mem, funInfos) <- MST.runDiscovery ehi MST.toAddrSymMap archInfo
   let testEntryPoints = mapMaybe hasTestPrefix funInfos
   F.forM_ testEntryPoints $ \(name, Some dfi) -> do
     putStrLn ("Testing " ++ BS8.unpack name ++ " at " ++ show (M.discoveredFunAddr dfi))
     Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
     sym <- WEB.newExprBuilder WEB.FloatRealRepr WE.EmptyExprBuilderState gen
     CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
       -- We are using the z3 backend to discharge proof obligations, so
       -- we need to add its options to the backend configuration
       let solver = WS.z3Adapter
       let backendConf = WI.getConfiguration sym
       WC.extendConfig (WS.solver_adapter_config_options solver) backendConf

       execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
       let Just archVals = MS.archVals (Proxy @MA.ARM) Nothing
       let extract = armResultExtractor archVals
       let ?memOpts = LLVM.defaultMemOptions
       simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures archInfo archVals mem extract dfi
       if expected == simRes
         then putStrLn "Test passed"
         else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes

-- | ARM functions with a single scalar return value return it in %r0
--
-- Since all test functions must return a value to assert as true, this is
-- straightforward to extract
armResultExtractor :: ( CB.IsSymInterface sym
                      )
                   => MS.ArchVals MA.ARM
                   -> MST.ResultExtractor sym MA.ARM
armResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
  let re = MS.lookupReg archVals regs (MAR.ARMGlobalBV (ASL.knownGlobalRef @"_R0"))
  k PC.knownRepr (CS.regValue re)

If you run this, it will unexpectedly fail:

$ runghc Bug.hs
Testing test_string_lit at 0x10074
Bug.hs: user error (Expected SimulationResult Unsat, but got SimulationResult Sat)

Note that if you change the array to be of a smaller length, e.g.,

  char x[2] = "A";

Then the test will pass.

@travitch suspects that the issue may arise from macaw-symbolic's interval logic here.

Code discovery in stripped static binaries is not effective

Note that this only applies to binaries compiled against a typical C standard library. Custom binaries with a simple _start are not necessarily affected. In a static binary, the entry point is a special _start function provided by libc. This function performs some amount of initialization before loading the address of main into a register and passing it to further startup code. That code eventually (indirectly) jumps to the address of main to kick off the execution of the user code. The current function pointer detection heuristics do not detect this case, meaning that macaw typically fails to find main in stripped static binaries.

To address this issue (and the implied issue where other functions called only indirectly), we will want to implement an interprocedural demand analysis that discovers the memory locations and registers that correspond to values eventually used as function pointers. Then, any writes to those locations will trigger an expansion of the discovery frontier.

Because this analysis could be computationally expensive, it could live in macaw-refinement or as a wrapper around the base analysis.

Likely infinite loop in resolveFuns

The resolveFuns function in the Discovery model has a suspicious branch in it. The following path seems like it should result in an infinite loop:

  1. The call to minViewWithKey checks to see if there are any unexplored functions left
  2. If there are, it extracts the address and reason (note: it also throws away the rest of the residual map)
  3. If the address has already been discovered (because there is an entry in funInfo), resolveFuns calls itself with the original DiscoveryState

It seems like that final call should be an infinite loop because the address hasn't been removed from unexploredFunctions (since the smaller map was thrown away in step 2)

Consolidate static/dynamic symbol table logic with `elf-edit`

Data.Macaw.Memory.ElfLoader has two very intricate functions for computing the static symbol table:

-- | Find and get symbol table entries from binary.
elfStaticSymbolTable :: Integral (ElfWordType w)
=> Elf.ElfHeader w
-> BS.ByteString -- ^ File contents
-> V.Vector (Elf.Shdr Word32 (ElfWordType w)) -- ^ Section header table
-> Maybe (V.Vector (Elf.SymtabEntry BS.ByteString (ElfWordType w)))
elfStaticSymbolTable hdr contents shdrs =
case V.toList (V.filter (\shdr -> Elf.SHT_SYMTAB == Elf.shdrType shdr) shdrs) of
[] -> Nothing
symtabShdr:_ -> do
let cl = Elf.headerClass hdr
let dta = Elf.headerData hdr
let symtabBuf = shdrData contents symtabShdr
let strtabShdr = shdrs V.! fromIntegral (Elf.shdrLink symtabShdr)
let strtab = shdrData contents strtabShdr
case Elf.decodeSymtab cl dta strtab symtabBuf of
Left _ -> Nothing
Right v -> Just v

And the dynamic symbol table:

dynamicEntries
| Elf.headerType (Elf.header elf) == Elf.ET_DYN
, [symtab] <- V.toList $ V.filter (\s -> Elf.shdrType s == Elf.SHT_DYNSYM) shdrs
, strtabIdx <- Elf.shdrLink symtab
, Just strtab <- shdrs V.!? fromIntegral strtabIdx
, symtabData <- shdrData contents symtab
, strtabData <- shdrData contents strtab
, Right entries <-Elf.decodeSymtab cl dta strtabData symtabData =
V.toList $ V.imap (\i s -> (i, s)) entries
| otherwise =
[]

It turns out, however, that elfStaticSymbolTable is essentially identical to decodeHeaderSymtab from elf-edit. We should remove macaw's code in favor of decodeHeaderSymtab. Moreover, the dynamic symbol table logic in macaw really ought to belong in elf-edit as well, so that code should be migrated over to elf-edit for others' benefit.

Add the SMT solvers to the Github Actions cache

The solver hosts (particularly yices) go down regularly and interrupt CI - we should be able to just cache them. It might be easiest to install all of the solvers to a distinguished root (e.g., /solver/bin) and add that directory to the cache.

breakages in non-Travis CI built code

Several of the packages in this repo appear to no longer build (e.g., macaw-ppc, macaw-arm, etc) and are not currently directly being built by any CI. Perhaps we would benefit from adding macaw and all its packages of interest to fryingpan?

Consolidate `macaw`'s `FloatInfo` types with `what4`'s

macaw defines FloatInfo and FloatInfoRepr data types:

data FloatInfo
= HalfFloat -- ^ 16 bit binary IEE754*
| SingleFloat -- ^ 32 bit binary IEE754
| DoubleFloat -- ^ 64 bit binary IEE754
| QuadFloat -- ^ 128 bit binary IEE754
| X86_80Float -- ^ X86 80-bit extended floats
type HalfFloat = 'HalfFloat
type SingleFloat = 'SingleFloat
type DoubleFloat = 'DoubleFloat
type QuadFloat = 'QuadFloat
type X86_80Float = 'X86_80Float
data FloatInfoRepr (fi :: FloatInfo) where
HalfFloatRepr :: FloatInfoRepr HalfFloat
SingleFloatRepr :: FloatInfoRepr SingleFloat
DoubleFloatRepr :: FloatInfoRepr DoubleFloat
QuadFloatRepr :: FloatInfoRepr QuadFloat
X86_80FloatRepr :: FloatInfoRepr X86_80Float

These are almost identical to the ones in what4 here. We should try to consolidate these if possible.

macaw-semmc hard codes endianness

The Template Haskell code generator in macaw-semmc hard-codes memory operations to big endian. It needs to be configurable to support ARM

Mis-classifications of jump tables

I ran into some examples compiled with gcc that seemed relatively straightforward (i.e., not due to gcc eliding bounds checks), but that macaw currently misclassifies. There are some PowerPC ones, too, but I figured I'd start here. My tr/jump-table-tests branch (https://github.com/GaloisInc/macaw/tree/tr/jump-table-tests/x86/tests/x64) has three extra test cases. One succeeds (compiled using clang). Two compiled with gcc fail.

  1. The indirect branch in the optimized one (test-switch-jump-table.gcc.nostdlib.opt.x86_64.exe) is mis-classified as a tail call
  2. The classification of the branch in the other (test-switch-jump-table.nostdlib.x86_64.exe.expected) fails with the error
      Unclassified block at 0x40016b
        Branch: IP is not an mux:
      let r59 := (bv_add rbp_0 (0xfffffffffffffff8 :: [64]))
          r60 := read_mem r59 (bvle8)
          r61 := (bv_mul (0x4 :: [64]) r60)
          r62 := (bv_add r61 (0x4002ac))
          r63 := read_mem r62 (bvle4)
          r65 := (sext r63 64)
          r71 := (bv_add (0x4002ac) r65)
       in r71
        Pattern match failure in do expression at src/Data/Macaw/Discovery.hs:1179:3-25
        Return: Pattern match failure in do expression at src/Data/Macaw/Discovery.hs:1212:5-19
        Jump: Jump value r71 is not a valid address.
        Jump table: Absolute jump table: Pattern match failure in do expression at src/Data/Macaw/Discovery.hs:567:3-48
        Jump table: Relative jump table: Upper bounds failed:
      let r59 := (bv_add rbp_0 (0xfffffffffffffff8 :: [64]))
          r60 := read_mem r59 (bvle8)
       in r60
      rsp = stack_frame - 8
      rbp = rsp
      rdi in (64 0 4)
      x87top in (3 7 7)
        PLT stub: Pattern match failure in do expression at src/Data/Macaw/Discovery.hs:1280:5-24
        Tail call: Expected stack height of 0

It is actually a relative indirect jump:

  400144:       55                      push   %rbp
  400145:       48 89 e5                mov    %rsp,%rbp
  400148:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
  40014c:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
  400150:       48 89 55 e8             mov    %rdx,-0x18(%rbp)
  400154:       48 89 4d e0             mov    %rcx,-0x20(%rbp)
  400158:       4c 89 45 d8             mov    %r8,-0x28(%rbp)
  40015c:       4c 89 4d d0             mov    %r9,-0x30(%rbp)
  400160:       48 83 7d f8 04          cmpq   $0x4,-0x8(%rbp)
  400165:       0f 87 93 00 00 00       ja     4001fe <mod_5+0xba>
  40016b:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  40016f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  400176:       00
  400177:       48 8d 05 2e 01 00 00    lea    0x12e(%rip),%rax        # 4002ac <_start+0x24>
  40017e:       8b 04 02                mov    (%rdx,%rax,1),%eax
  400181:       48 63 d0                movslq %eax,%rdx
  400184:       48 8d 05 21 01 00 00    lea    0x121(%rip),%rax        # 4002ac <_start+0x24>
  40018b:       48 01 d0                add    %rdx,%rax
  40018e:       ff e0                   jmpq   *%rax

It is basically adding the address of the table to the value it reads from the table and jumping to the result. The match failure of the relative jump table case comes from the call to Jmp.unsignedUpperBound in extractJumpTableSlices.

@joehendrix would you expect this to work?

`macaw-symbolic`: Consider using `crucible`'s `TailCall` for `ParsedCall`s in tail position

Currently, the addMacawParsedTermStmt case for ParsedCall looks like this:

addMacawParsedTermStmt blockLabelMap externalResolutions thisAddr tstmt = do
archFns <- translateFns <$> get
crucGenArchConstraints archFns $ do
case tstmt of
M.ParsedCall regs mret -> do
curRegs <- createRegStruct regs
let tps = typeCtxToCrucible $ fmapFC M.typeRepr $ crucGenRegAssignment archFns
fh <- evalMacawStmt (MacawLookupFunctionHandle (crucArchRegTypes archFns) curRegs)
newRegs <- evalAtom $ CR.Call fh (Ctx.singleton curRegs) (C.StructRepr tps)
case mret of
Just nextAddr -> do
setMachineRegs newRegs
addTermStmt $ CR.Jump (parsedBlockLabel blockLabelMap nextAddr)
Nothing ->
addTermStmt $ CR.Return newRegs

Note that when mret is Nothing, we add a crucible Return statement. This is perhaps not the most optimal way to do it, however, as mret being Nothing signifies a tail call. It might be better to use crucible's TailCall here instead:

diff --git a/symbolic/src/Data/Macaw/Symbolic/CrucGen.hs b/symbolic/src/Data/Macaw/Symbolic/CrucGen.hs
index 8e4d48d6..f7f8cf51 100644
--- a/symbolic/src/Data/Macaw/Symbolic/CrucGen.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/CrucGen.hs
@@ -1450,14 +1450,15 @@ addMacawParsedTermStmt blockLabelMap externalResolutions thisAddr tstmt = do
     M.ParsedCall regs mret -> do
       curRegs <- createRegStruct regs
       let tps = typeCtxToCrucible $ fmapFC M.typeRepr $ crucGenRegAssignment archFns
+          tpsRepr = C.StructRepr tps
       fh <- evalMacawStmt (MacawLookupFunctionHandle (crucArchRegTypes archFns) curRegs)
-      newRegs <- evalAtom $ CR.Call fh (Ctx.singleton curRegs) (C.StructRepr tps)
       case mret of
         Just nextAddr -> do
+          newRegs <- evalAtom $ CR.Call fh (Ctx.singleton curRegs) tpsRepr
           setMachineRegs newRegs
           addTermStmt $ CR.Jump (parsedBlockLabel blockLabelMap nextAddr)
         Nothing ->
-          addTermStmt $ CR.Return newRegs
+          addTermStmt $ CR.TailCall fh (Ctx.singleton tpsRepr) (Ctx.singleton curRegs)
     M.ParsedJump regs nextAddr -> do
       setMachineRegs =<< createRegStruct regs
       addTermStmt $ CR.Jump (parsedBlockLabel blockLabelMap nextAddr)

I haven't been able to observe a specific bug that results from not using a TailCall here, but it is likelier to be the correct thing to do.

FunctionArgs demandSet

We computing function arguments, we currently silently drop reads of register values that are supposed to be changed in a function (see calculateOnePred and transferDemands).

We should at least emit a warning when we encounter this, because it seems unexpected that a function would read a register with an undefined value.

Support binaries that invoke functions from shared libraries

We would like to have macaw be able to support binaries that invoke functions from shared libraries. This would require (but is not limited to) doing the following:

  1. Updating this case in addMacawParsedTermStmt in macaw:

    M.PLTStub{} ->
    error "Do not support translating PLT stubs"

  2. Update LookupFunctionHandle in macaw-symbolic to have some understanding of PLT entries (see https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x2251.html), as calls to shared library functions are done indirectly though the PLT.

Incorrect tail call classification

I have a case that macaw classifies as a tail call that is actually an intra-procedural jump.

The program is

#include <stdint.h>

int64_t select(int64_t a, int64_t b)
{
  int64_t res = 5;
  void* targets[] = {&&case1, &&case2, &&case3};

  int idx = 0;
  if(a > 10)
    idx = 1;
  if(a < -5)
    idx = 2;
  goto *targets[idx];

  case1:
  res = res + b;
  case2:
  res = res - b;
  case3:
  res = res * b;


  return res;
}

void _start()
{
  int64_t r = select(3, 4);
  return;
}

The machine code (compiled with clang and -O1) is:

0000000000401000 <select>:
  401000:	31 c0                	xor    %eax,%eax
  401002:	48 83 ff 0a          	cmp    $0xa,%rdi
  401006:	0f 9f c0             	setg   %al
  401009:	48 83 ff fb          	cmp    $0xfffffffffffffffb,%rdi
  40100d:	b9 02 00 00 00       	mov    $0x2,%ecx
  401012:	48 0f 4d c8          	cmovge %rax,%rcx
  401016:	ff 24 cd 00 20 40 00 	jmpq   *0x402000(,%rcx,8)
  40101d:	48 8d 46 05          	lea    0x5(%rsi),%rax
  401021:	eb 0f                	jmp    401032 <select+0x32>
  401023:	b8 05 00 00 00       	mov    $0x5,%eax
  401028:	48 0f af c6          	imul   %rsi,%rax
  40102c:	c3                   	retq
  40102d:	b8 05 00 00 00       	mov    $0x5,%eax
  401032:	48 29 f0             	sub    %rsi,%rax
  401035:	48 0f af c6          	imul   %rsi,%rax
  401039:	c3                   	retq
  40103a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

Macaw says that the jump at 0x401016 is a tail call. It does not identify the target blocks as additional entry points to functions. In case it is helpful, the full generated macaw IR is (entry point function omitted):

function 0x401000
0x401000:
  ;  rsp = stack_frame + 0
  # 0x0x401000 xor    eax,eax
  r29 := undef :: [bool]
  # 0x401000: {rip => 0x0x401002
              ;rax => 0x0 :: [64]
              ;cf => false
              ;pf => true
              ;af => r29
              ;zf => true
              ;sf => false
              ;of => false}
  # 0x0x401002 cmp    rdi,0xa
  r30 := (ssbb_overflows rdi_0 (0xa :: [64]) false)
  r31 := (trunc rdi_0 4)
  r32 := (bv_ult r31 (0xa :: [4]))
  r33 := (bv_ult rdi_0 (0xa :: [64]))
  r34 := (bv_add rdi_0 (0xfffffffffffffff6 :: [64]))
  r35 := (bv_slt r34 (0x0 :: [64]))
  r36 := (eq rdi_0 (0xa :: [64]))
  r37 := (trunc r34 8)
  r38 := (even_parity r37)
  # 0x401002: {rip => 0x0x401006
              ;cf => r33
              ;pf => r38
              ;af => r32
              ;zf => r36
              ;sf => r35
              ;of => r30}
  # 0x0x401006 setg   al
  r39 := (not r36)
  r40 := (eq r35 r30)
  r41 := (and r39 r40)
  r42 := (mux r41 (0x1 :: [8]) (0x0 :: [8]))
  r43 := (uext r42 64)
  # 0x401006: {rip => 0x0x401009
              ;rax => r43}
  # 0x0x401009 cmp    rdi,-0x5
  r44 := (ssbb_overflows rdi_0 (0xfffffffffffffffb :: [64]) false)
  r45 := (bv_ult r31 (0xb :: [4]))
  r46 := (bv_ult rdi_0 (0xfffffffffffffffb :: [64]))
  r47 := (bv_add rdi_0 (0x5 :: [64]))
  r48 := (bv_slt r47 (0x0 :: [64]))
  r49 := (eq rdi_0 (0xfffffffffffffffb :: [64]))
  r50 := (trunc r47 8)
  r51 := (even_parity r50)
  # 0x401009: {rip => 0x0x40100d
              ;cf => r46
              ;pf => r51
              ;af => r45
              ;zf => r49
              ;sf => r48
              ;of => r44}
  # 0x0x40100d mov    ecx,0x2
  # 0x40100d: {rip => 0x0x401012
              ;rcx => 0x2 :: [64]}
  # 0x0x401012 cmovge rcx,rax
  r52 := (eq r48 r44)
  r53 := (mux r52 r43 (0x2 :: [64]))
  # 0x401012: {rip => 0x0x401016
              ;rcx => r53}
  # 0x0x401016 jmp    QWORD PTR [rcx*80x402000]
  r54 := (bv_mul (0x8 :: [64]) r53)
  r55 := (bv_add r54 (0x402000 :: [64]))
  r56 := read_mem r55 (bvle8)
  # 0x401016: {rip => r56}
  tail_call
    { rip = r56
    , rax = r43
    , rcx = r53
    , cf = r46
    , pf = r51
    , af = r45
    , zf = r49
    , sf = r48
    , df = false
    , of = r44
    , x87top = 0x7 :: [3]
    }

Extend mkGlobalpointerValidityPred to perform bounds checks in memory initialized with newGlobalMemory

When global memory is initialized with Data.Macaw.Symbolic.Memory.newGlobalMemory, it is placed in a new block with a nonzero id. As such, the validity predicate mkGlobalPointerValidityPred returns does not generate any assertions for accesses to this global memory. This combined with the very large allocation performed by newGlobalMemory means that the resulting memory model permits wildly out of bounds accesses to global memory, as well as writes to read only memory when accessed via a global variable. We can mitigate this by adding an additional parameter to mkGlobalPointerValidityPred that takes the base pointer of the global memory region and performs additional bounds checks for pointers in that region, just as the function currently does for those in block 0.

`macaw-x86`: Handle sign-extended operands in `push`'s semantics

In macaw-x86's semantics for the push instruction, we currently have the following:

def_push :: InstructionDef
def_push =
defUnary "push" $ \_ii val -> do
Some (HasRepSize rep v) <-
case val of
F.SegmentValue _ ->
fail "push does not support segment selector."
F.WordReg r ->
Some . HasRepSize WordRepVal <$> get (reg16Loc r)
F.DWordReg r ->
Some . HasRepSize DWordRepVal <$> get (reg32Loc r)
F.QWordReg r ->
Some . HasRepSize QWordRepVal <$> get (reg64Loc r)
F.Mem16 ar ->
Some . HasRepSize WordRepVal <$> (get =<< getBV16Addr ar)
F.Mem32 ar ->
Some . HasRepSize DWordRepVal <$> (get =<< getBV32Addr ar)
F.Mem64 ar ->
Some . HasRepSize QWordRepVal <$> (get =<< getBV64Addr ar)
F.ByteImm w ->
pure $ Some $ HasRepSize ByteRepVal $ bvLit n8 (toInteger w)
F.WordImm w ->
pure $ Some $ HasRepSize WordRepVal $ bvLit n16 (toInteger w)
F.DWordImm i ->
pure $ Some $ HasRepSize DWordRepVal $ getImm32 i
F.QWordImm (F.UImm64Concrete w) ->
pure $ Some $ HasRepSize QWordRepVal $ bvLit n64 (toInteger w)
_ -> fail $ "Unexpected argument to push"

This is missing cases for sign-extended operand sizes such as ByteSignedImm. This likely does not cause any issues at the moment, but it will cause issues if GaloisInc/flexdis86#35 is implemented. We should fill out these missing cases.

macaw-ppc warning cleanup

There are a number of warnings in macaw-ppc, specifically in the PPC32 and PPC64 semantics modules. These warnings indicate that some operands to some opcodes are not being referenced, which is likely problematic. These need to be investigated and fixed.

Clean up submodules

There are some submodules under /deps and some under /submodules after the merge of macaw-semmc - we should put them all under one of the two.

Error parsing system symbol: Identifier must start with a letter.

Function x86RegName in module Data.Macaw.X86.Symbolic calls What4.Symbol.systemSymbol on a bunch of strings that begin with "!", such as "!ip".

x86RegName :: M.X86Reg tp -> C.SolverSymbol

However, function systemSymbol requires its input to start with a letter (the "!" must occur somewhere in the string, but not at the start) as explained in the source. Otherwise it calls error.

https://github.com/GaloisInc/crucible/blob/2db0648fe0c7485071f992fb4a79600789134f9b/what4/src/What4/Symbol.hs#L82

The invalid symbol names in Data.Macaw.X86.Symbolic cause the test suite for package macaw-x86-symbolic to fail with calls to error.

`macaw-symbolic`: Replace `mkName` with `safeSymbol`

macaw-symbolic currently defines a partial function named mkName:

mkName :: String -> IO SolverSymbol
mkName x = case userSymbol x of
Right v -> return v
Left err ->
fail $ unlines
[ "[bug] " ++ show x ++ " is not a valid identifier?"
, "*** " ++ show err
]

However, this doesn't need to be partial. In fact, what4 defines a total version of this function named safeSymbol. We should replace mkName with safeSymbol.

`macaw-x86`: Redundant case in `(.&.)`?

When compiling macaw-x86 with GHC 9.0.2, I discovered this warning that did not appear when compiling with earlier GHC versions:

[ 9 of 20] Compiling Data.Macaw.X86.Monad ( src/Data/Macaw/X86/Monad.hs, /home/rscott/Documents/Hacking/Haskell/macaw/dist-newstyle/build/x86_64-linux/ghc-9.0.2/macaw-x86-0.3.1/build/Data/Macaw/X86/Monad.o, /home/rscott/Documents/Hacking/Haskell/macaw/dist-newstyle/build/x86_64-linux/ghc-9.0.2/macaw-x86-0.3.1/build/Data/Macaw/X86/Monad.dyn_o )

src/Data/Macaw/X86/Monad.hs:926:5: warning: [-Woverlapping-patterns]
    Pattern match is redundant
    In an equation for ‘.&.’:
        .&. x y | isJust (asUnsignedBVLit x),
                  Just (BVOr _ y1 y2) <- asApp y,
                  isJust (asUnsignedBVLit y2) = ...
    |
926 |   | isJust (asUnsignedBVLit x)
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^

Upon further inspection, GHC is right to complain. The definition of (.&.) has these two cases, with the former appearing before the latter:

-- Make literal the second argument (simplifies later cases)
| isJust (asUnsignedBVLit x) = assert (isNothing (asUnsignedBVLit y)) $ y .&. x

-- x .&. (y1 .|. y2) = (y1 .&. x) .|. (y2 .&. x) -- Only apply when x and y2 is a lit.
| isJust (asUnsignedBVLit x)
, Just (BVOr _ y1 y2) <- asApp y
, isJust (asUnsignedBVLit y2) =
(y1 .&. x) .|. (y2 .&. x)

And indeed, there is no way for the latter case can ever be reached. I was tempted to just remove the latter case, but since someone took the time to document it, I wonder if the coexistence of these two cases in an oversight. I decided to open this issue just to make sure.

Support R_X86_64_COPY

Elf relocations of type R_X86_64_COPY are used in dynamic binaries that reference variables defined in shared libraries. We should add support for these, but it may be a bit tricky.

The variables are stored in the binary's .bss section, and so the relocation target is within the .bss section for the binary. These relocations can be used to tell us that address is used to store a given global variable, and this information may be of use in LLVM generation or other binaries transformations.

A couple references that describe this are https://www.cs.stevens.edu/~jschauma/631/elf.html and https://stac47.github.io/c/relocation/elf/tutorial/2018/03/01/understanding-relocation-elf.html.

In addition to adding support, test cases should be written to validate the understanding.

Syntax error in test suite

From a fresh checkout, just after getting the Git submodules:

dtc➜~/Projects/macaw(master)» stack test                                                                                                                                                                                                                                                                          [13:17:59]
crucible-0.4: unregistering (missing dependencies: parameterized-utils, tasty, tasty-hunit, tasty-quickcheck, what4)
crucible-llvm-0.1: unregistering (missing dependencies: crucible, parameterized-utils, what4)
elf-edit-0.30: unregistering (missing dependencies: tasty, tasty-hunit, tasty-quickcheck)
flexdis86-0.1.2: unregistering (missing dependencies: elf-edit, tasty, tasty-hunit)
macaw-base-0.3: unregistering (missing dependencies: elf-edit, parameterized-utils)
macaw-symbolic-0.0.1: unregistering (missing dependencies: crucible, crucible-llvm, macaw-base, parameterized-utils, what4)
macaw-x86-0.0.1: unregistering (missing dependencies: elf-edit, flexdis86, macaw-base, parameterized-utils, tasty, tasty-hunit)
macaw-x86-symbolic-0.0.1: unregistering (missing dependencies: crucible, crucible-llvm, elf-edit, flexdis86, macaw-base, macaw-symbolic, macaw-x86, parameterize...)
parameterized-utils-1.0.0: unregistering (missing dependencies: tasty, tasty-ant-xml, tasty-hunit, tasty-quickcheck)
what4-0.4.0: unregistering (missing dependencies: parameterized-utils)
clock-0.7.2: using precompiled package
tasty-1.0.1.1: configure
tasty-1.0.1.1: build
tasty-1.0.1.1: copy/register
tasty-ant-xml-1.1.3: configure
tasty-ant-xml-1.1.3: build
tasty-quickcheck-0.9.2: configure
tasty-quickcheck-0.9.2: build
tasty-hunit-0.10.0.1: configure
tasty-hunit-0.10.0.1: build
tasty-ant-xml-1.1.3: copy/register
tasty-quickcheck-0.9.2: copy/register
tasty-hunit-0.10.0.1: copy/register
elf-edit-0.30: configure (lib + test)
elf-edit-0.30: build (lib + test)
parameterized-utils-1.0.0: configure (lib + test)
parameterized-utils-1.0.0: build (lib + test)
parameterized-utils-1.0.0: copy/register
parameterized-utils-1.0.0: test (suite: parameterizedTests)
what4-0.4.0: configure (lib)
parameterized-utils-1.0.0: Test suite parameterizedTests passed
what4-0.4.0: build (lib)
elf-edit-0.30: copy/register
elf-edit-0.30: test (suite: test-elf)
flexdis86-0.1.2: configure (lib + exe + test)
elf-edit-0.30: Test suite test-elf passed
macaw-base-0.3: configure (lib)
flexdis86-0.1.2: build (lib + exe + test)
macaw-base-0.3: build (lib)
flexdis86-0.1.2: copy/register
flexdis86-0.1.2: test (suite: flexdis86-tests)
flexdis86-0.1.2: Test suite flexdis86-tests passed
macaw-base-0.3: copy/register
macaw-x86-0.0.1: configure (lib + test)
macaw-x86-0.0.1: build (lib + test)
what4-0.4.0: copy/register
crucible-0.4: configure (lib + test)
crucible-0.4: build (lib + test)
macaw-x86-0.0.1: copy/register
macaw-x86-0.0.1: test (suite: macaw-x86-tests)
macaw-x86-0.0.1: Test suite macaw-x86-tests passed
crucible-0.4: copy/register
crucible-0.4: test (suite: absint-tests)
crucible-llvm-0.1: configure (lib)
crucible-0.4: Test suite absint-tests passed
crucible-llvm-0.1: build (lib)
crucible-llvm-0.1: copy/register
macaw-symbolic-0.0.1: configure (lib)
macaw-symbolic-0.0.1: build (lib)
macaw-symbolic-0.0.1: copy/register
macaw-x86-symbolic-0.0.1: configure (lib + test)
macaw-x86-symbolic-0.0.1: build (lib + test)
Log files have been written to: /home/dtc/Projects/macaw/.stack-work/logs/
Progress: 20/21
--  While building custom Setup.hs for package macaw-x86-symbolic-0.0.1 using:
      /home/dtc/.stack/setup-exe-cache/x86_64-linux-tinfo6/Cabal-simple_mPHDZzAJ_2.0.1.0_ghc-8.2.2 --builddir=.stack-work/dist/x86_64-linux-tinfo6/Cabal-2.0.1.0 build lib:macaw-x86-symbolic test:macaw-x86-symbolic-tests --ghc-options " -ddump-hi -ddump-to-file -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1
    Logs have been written to: /home/dtc/Projects/macaw/.stack-work/logs/macaw-x86-symbolic-0.0.1.log

    Configuring macaw-x86-symbolic-0.0.1...
    Warning: To use the 'default-language' field the package needs to specify at
    least 'cabal-version: >= 1.10'.
    Preprocessing library for macaw-x86-symbolic-0.0.1..
    Building library for macaw-x86-symbolic-0.0.1..
    [1 of 2] Compiling Data.Macaw.X86.Crucible ( src/Data/Macaw/X86/Crucible.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.0.1.0/build/Data/Macaw/X86/Crucible.o )
    [2 of 2] Compiling Data.Macaw.X86.Symbolic ( src/Data/Macaw/X86/Symbolic.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.0.1.0/build/Data/Macaw/X86/Symbolic.o )
    Preprocessing test suite 'macaw-x86-symbolic-tests' for macaw-x86-symbolic-0.0.1..
    Building test suite 'macaw-x86-symbolic-tests' for macaw-x86-symbolic-0.0.1..
    [1 of 1] Compiling Main             ( tests/Main.hs, .stack-work/dist/x86_64-linux-tinfo6/Cabal-2.0.1.0/build/macaw-x86-symbolic-tests/macaw-x86-symbolic-tests-tmp/Main.o )
    
    /home/dtc/Projects/macaw/x86_symbolic/tests/Main.hs:70:33: error: parse error on input ‘}’
       |
    70 |                                 }
    

`Missing function definition for: USAT*` warnings when compiling `macaw-aarch32`

When compiling macaw-aarch32, I see the following warnings (example):

[ 9 of 11] Compiling Data.Macaw.ARM.Semantics.ARMSemantics ( src/Data/Macaw/ARM/Semantics/ARMSemantics.hs, /runner/_work/macaw/macaw/dist-newstyle/build/x86_64-linux/ghc-8.6.5/macaw-aarch32-0.1.0.0/build/Data/Macaw/ARM/Semantics/ARMSemantics.o )
Missing function definition for: USAT_A1_ASR
Missing function definition for: USAT_A1_LSL
Missing function definition for: USAT16_A1
Missing function definition for: USAT_T1_ASR
Missing function definition for: USAT_T1_LSL
Missing function definition for: USAT16_T1
[10 of 11] Compiling Data.Macaw.ARM.Semantics.ThumbSemantics ( src/Data/Macaw/ARM/Semantics/ThumbSemantics.hs, /runner/_work/macaw/macaw/dist-newstyle/build/x86_64-linux/ghc-8.6.5/macaw-aarch32-0.1.0.0/build/Data/Macaw/ARM/Semantics/ThumbSemantics.o )
Missing function definition for: USAT_A1_ASR
Missing function definition for: USAT_A1_LSL
Missing function definition for: USAT16_A1
Missing function definition for: USAT_T1_ASR
Missing function definition for: USAT_T1_LSL
Missing function definition for: USAT16_T1

Is this cause for concern? If not, can these warnings be suppressed?

I'm compiling on a 64-bit Linux, if that's relevant.

Implement simulator semantics for aarch32-symbolic backend

This follows #162. That PR implements almost all of the backend except for semantics for the low-level floating point operations.

There are a number of issues implementing that right now:

  1. The semantics of those low-level operations (according to the ISA manual) need access to the FPSCR
  2. At the spot we implement the simulation-time semantics, the value of the FPSCR is not really accessible
  3. Even if it was, if an instruction changed the floating point mode and performed a floating point operation, the change would not be visible anyway

Some of the steps to fix this are:

  1. Change asl-translator to track the FPSCR (it is currently in the untracked set)
  2. We need to implement all of the primitive floating point operations (e.g., FPAdd) in ASL as overrides. Each of those operations needs to explicitly read the FPSCR and pass it to a wrapper function that takes an extra argument (e.g., FPAddFPSCR). These might need new names because there will be semmc functions with the FPAdd family of names (i.e., the ASL implementations)
  3. We need to update the signatures of the primitive AArch32 functions to include the FPSCR as an explicit argument
  4. We can then easily implement the low-level primitives with access to the correct FPSCR value

GHC 8.8 support

GHC 8.8 builds currently fail with the following errors.

Current best practice seems to be to use Lens' instead of Simple Lens, which requires an additional language extension to continue working on 8.8.

The other error is MonadFail related.

Building library for macaw-base-0.3.15..
[ 1 of 29] Compiling Data.Macaw.AbsDomain.StridedInterval ( src/Data/Macaw/AbsDomain/StridedInterval.hs, dist/build/Data/Macaw/AbsDomain/StridedInterval.o )
[ 2 of 29] Compiling Data.Macaw.DebugLogging ( src/Data/Macaw/DebugLogging.hs, dist/build/Data/Macaw/DebugLogging.o )
[ 3 of 29] Compiling Data.Macaw.Dwarf ( src/Data/Macaw/Dwarf.hs, dist/build/Data/Macaw/Dwarf.o )

src/Data/Macaw/Dwarf.hs:108:3: error:
    ‘fail’ is not a (visible) method of class ‘Monad’
    |
108 |   fail msg = WarnT $ throwError msg
    |   ^^^^
[ 4 of 29] Compiling Data.Macaw.Memory.Permissions ( src/Data/Macaw/Memory/Permissions.hs, dist/build/Data/Macaw/Memory/Permissions.o )
[ 5 of 29] Compiling Data.Macaw.Memory ( src/Data/Macaw/Memory.hs, dist/build/Data/Macaw/Memory.o )
[ 6 of 29] Compiling Data.Macaw.Memory.LoadCommon ( src/Data/Macaw/Memory/LoadCommon.hs, dist/build/Data/Macaw/Memory/LoadCommon.o )
[ 7 of 29] Compiling Data.Macaw.Memory.Symbols ( src/Data/Macaw/Memory/Symbols.hs, dist/build/Data/Macaw/Memory/Symbols.o )
[ 8 of 29] Compiling Data.Macaw.Memory.ElfLoader ( src/Data/Macaw/Memory/ElfLoader.hs, dist/build/Data/Macaw/Memory/ElfLoader.o )
[ 9 of 29] Compiling Data.Macaw.Types ( src/Data/Macaw/Types.hs, dist/build/Data/Macaw/Types.o )
[10 of 29] Compiling Data.Macaw.AbsDomain.CallParams ( src/Data/Macaw/AbsDomain/CallParams.hs, dist/build/Data/Macaw/AbsDomain/CallParams.o )
[11 of 29] Compiling Data.Macaw.Utils.Pretty ( src/Data/Macaw/Utils/Pretty.hs, dist/build/Data/Macaw/Utils/Pretty.o )
[12 of 29] Compiling Data.Macaw.CFG.App ( src/Data/Macaw/CFG/App.hs, dist/build/Data/Macaw/CFG/App.o )
[13 of 29] Compiling Data.Macaw.CFG.AssignRhs ( src/Data/Macaw/CFG/AssignRhs.hs, dist/build/Data/Macaw/CFG/AssignRhs.o )
[14 of 29] Compiling Data.Macaw.CFG.Core ( src/Data/Macaw/CFG/Core.hs, dist/build/Data/Macaw/CFG/Core.o )

src/Data/Macaw/CFG/Core.hs:568:15: error:
    • The type synonym ‘Lens’ should have 4 arguments, but has been given none
    • In the type signature:
        boundValue :: forall r f tp.
                      OrdF r => r tp -> Simple Lens (RegState r f) (f tp)
    |
568 | boundValue :: forall r f tp
    |               ^^^^^^^^^^^^^...

src/Data/Macaw/CFG/Core.hs:646:10: error:
    • The type synonym ‘Lens’ should have 4 arguments, but has been given none
    • In the type signature:
        curIP :: RegisterInfo r =>
                 Simple Lens (RegState r f) (f (BVType (RegAddrWidth r)))
    |
646 | curIP :: RegisterInfo r
    |          ^^^^^^^^^^^^^^...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.