Code Monkey home page Code Monkey logo

b2r2's People

Contributors

0xdkay avatar 18jminkim avatar a-mehdi avatar abbasly avatar aliahmed36 avatar cheonhoojeon avatar cyclon2 avatar daramg avatar dohki avatar enkomio avatar hestati63 avatar jchoi2022 avatar jidoc01 avatar khsdo95 avatar kimdora avatar ksalpha avatar luxroot avatar ly-xthunder avatar mickowale avatar sangkilc avatar sijung07 avatar silee-k avatar soomin-kim avatar sugeunji avatar tathanhdinh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

b2r2's Issues

Error in parse EAT

Describe the bug
특정 파일을 BinHandler로 로딩하는 과정에서 아래와 에러가 발생합니다.

C:\b2r2-dlls>ipy32 b2r2.py
./test/DeviceUxRes.dll
./test/DeviceUxRes.dll
Traceback (most recent call last):
  File "b2r2.py", line 30, in <module>
IndexError: 인덱스가 배열 범위를 벗어났습니다.

To Reproduce

import clr
import os, sys

sys.path.append(os.path.abspath(r'./build_release/'))
clr.AddReferenceToFile(r'B2R2.Core.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Core.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Library.dll')

from B2R2 import *
from B2R2.FrontEnd import *


def search(dirname):
	result = []
	
	filenames = os.listdir(dirname)
	for filename in filenames:
		full_filename = os.path.join(dirname, filename)
		ext = os.path.splitext(full_filename)[-1]
		if ext == '.exe' or ext == ".dll": 
			result.append(full_filename)
	return result



result = search(r'./test/')
for name in result:
	try:
		print name
		handler = BinHandler.Init(ISA.OfString("i386"),name)
		for s in handler.FileInfo.GetSections():
				print "   %s:%s [%s]"%(hex(s.Address),hex(s.Address+s.Size),s.Name)
	except:
		print name
		raise
		#pass

위 코드를 실행하였을때, 정상적으로 동작한다면 섹션들의 정보가 출력되어야겠지만, 오류가 발생합니다.

(문제가 발생 하는 파일은 여기 에서 다운로드 받을 수 있습니다.)

Additional context

PEHelper의 parseEAT 에서 문제가 발생하는데,

let offset = edt.ExportAddressTableRVA |> getRawOffset headers

이 코드에서 edt.ExportAddressTableRVA가 0을 가르키고, getRawOffset의 아래 부분에서

let sHdr = headers.SectionHeaders.[idx]

-1의 인덱스를 참조하여 오류가 발생합니다.

이 파일은 아래 그림과 같이 Export Directory가 존재하고, Name도 있지만, 나머지 값들이 0인데, 이 경우에는 EAT를 파싱하지 않도록 수정하면 될 것 같습니다.

Environment (please complete the following information):

  • OS: Windows 7
  • .NET Core version: 2.2.104
  • B2R2 version: lastest from github

AND instruction not correctly lifted

Describe the bug
The following x86 instruction:

and eax, 0xFFFF0000

is lifted to the following statements:

T_0:I32 := (EAX & 0xFFFF0000:I32)
EAX := T_0:I32
OF := 0x0:I1
CF := 0x0:I1
AF := Undefined expression (AF is undefined.)

To Reproduce
Consider the following F# program:

let handler = 
	BinHandler.Init(
		ISA.OfString "x86", 
		ArchOperationMode.NoMode, 
		FileFormat.RawBinary, 
		Addr.MinValue, 
		[|0x25uy; 0x00uy; 0x00uy; 0xFFuy; 0xFFuy|] // and eax, 0xFFFF0000
	)

let instruction = BinHandler.ParseInstr handler 0UL
let statements = BinHandler.LiftInstr handler instruction

let statementString = LowUIR.Pp.stmtToString statements.[5]
Console.WriteLine(statementString)

It produces the following output:

AF := Undefined expression (AF is undefined.)	

Expected behavior
A correct LowUIR statement should be composed

Environment (please complete the following information):

  • OS: Windows
  • B2R2 version: 0.1.0

Additional context
The test was done by using .NET Framework 4.7.

Does it take three to four seconds for each instruction to be decompiled? Is there any way to improve the efficiency?

I have deployed B2R2 successfully and used the following example to process instructions and output their IR.

open B2R2
open B2R2.FrontEnd
[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "amd64"
  let bytes = [| 0x65uy; 0xffuy; 0x15uy; 0x10uy; 0x00uy; 0x00uy; 0x00uy |]
  let handler = BinHandler.Init (isa, bytes)
  let ins = BinHandler.ParseInstr handler 0UL
  ins.Translate handler.TranslationContext |> printfn "%A"
  0

However, I have observed that it takes around three to four seconds to process each instruction.

Considering that an average ELF file typically contains tens of thousands of instructions, and if we need to process hundreds of thousands of such ELF files, the computational time required becomes substantial.

Even with the utilization of multi-threading, it remains difficult to effectively address this issue.

Therefore, I would like to inquire if there are any recommended methods to solve this problem.

[Question] Any plan for supporting a SMT?

Is there any plan in the B2R2 roadmap to implement a SMT solver?

This feature would be interesting to have in order to "compete" with other binary analysis projects (like Triton, Angr, ...).

I think I found a B2R2 LowUIR bug.

Describe the bug
안녕하세요. 교수님,
제가 버그를 찾은 거 같아 문의드립니다.

https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/BL--Branch-with-Link-

위 ARM 문서에 따르면 16진수 기계어 코드 94000007는 현재 주소에서 7 * 4 = 28(0x1C)만큼 더한 값으로 점프하라는 의미인데요.
바이너리를 B2R2 LowUIR로 리프팅해보면 2EDB80: Jmp (PC + [0x2EDB9C:I64]:I64)라고 나옵니다.
0x2EDB80 + 0x1C = 0x2EDB9C 주소로 점프를 해야하기 때문에 아래가 맞는 표현같은데요.
2EDB80: Jmp (PC + 0x1C:I64):I64

To Reproduce
[<EntryPoint>]
let main argv =
let address = 0x2EDB80UL
let isa = ISA.OfString "aarch64"
let hdl = BinHandler.Init (isa, "default_flutter_3.so")
let ins = BinHandler.ParseInstr hdl address
let statements = BinHandler.LiftInstr hdl ins
statements |> Seq.iteri (fun i stmt ->
let stmtStr = B2R2.BinIR.LowUIR.Pp.stmtToString stmt
printf "%X: %A\n" address stmtStr)
0

아래는 so파일 링크입니다.
https://github.com/sanghwa95/tmp/blob/main/default_flutter_3.so

Environment (please complete the following information):

  • OS: Windows11
  • .NET Core version: 7.0.201
  • B2R2 version: 0.6.1

Additional context
AST로 출력하는 코드를 LowUIR로 출력하도록 코드 수정했습니다.

Updates?

Hi,

B2R2 is my preferred binary analysis framework, so I'm a bit sad to see that the last update in almost 1 year old. Any plan on releasing a new version?

Bug in GetSymbols

Describe the bug
Static 형태의 심볼을 가진 파일에서 주소를 가져올때 실제 주소가 아닌 섹션으로 부터의 오프셋만 가져옵니다.

To Reproduce
Steps to reproduce the behavior:
아래 코드를 실행하면 확인 할 수 있습니다.

import clr
import os, sys

sys.path.append(os.path.abspath(r'./build/'))
clr.AddReferenceToFile(r'B2R2.Core.dll')
clr.AddReferenceToFile(r'B2R2.BinFile.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Core.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Library.dll')

from B2R2 import *
from B2R2.FrontEnd import *
import time


def search(result,dirname):
	filenames = os.listdir(dirname)
	for filename in filenames:
		full_filename = os.path.join(dirname, filename)
		if os.path.isdir(full_filename):
			search(result,full_filename)
		else:
			ext = os.path.splitext(full_filename)[-1]
			if ext == '.so' or ext == ".o": 
				result.append(full_filename)
	return result
def main():
	result = search([],r'libc_test\0ab7d53ae2c1f7d4a2e3535d422ea434\a\extracted')
	#result = search([],r'libc\0ab7d53ae2c1f7d4a2e3535d422ea434\lib\aarch64-linux-gnu')
	
	NameList_bss = ['global_max_fast','save_arena'] #bss
	NameList_data = ['main_arena'] #main_arena
	
	NameList = NameList_bss + NameList_data
	
	print len(result)
	for name in result:
		handler = BinHandler.Init(ISA.DefaultISA, name)
		for s in handler.FileInfo.GetSymbols():
			if s.Kind == BinFile.SymbolKind.ObjectType:   
				for name in NameList:
					if s.Name == name:
						print name, s
			
if __name__ == "__main__":
	main()

Expected behavior
A clear and concise description of what you expected to happen.
주소를 가져올때 아래와 같은 값을 가져와야 한다고 생각합니다.

그러나 섹션 베이스가 빠진 값인 각각 0,0,8 을 얻어옵니다.

Environment (please complete the following information):

  • OS: Windows 7
  • .NET Core version: 2.2.204
  • B2R2 version latest from github

Additional context
사용된 파일은 여기에서 다운로드 받을 수 있습니다.

Opcode and Operands accessibility

I want to write a custom pattern base on Opcode and Oprands of IntelInstruction but it is a field in internal type. I cannot access it from my library. Can I access that fields? Any problem if I publish InsInfo type?

[<NoComparison; CustomEquality>]
type InsInfo = internal {
  /// Prefixes.
  Prefixes        : Prefix
  /// REX Prefix.
  REXPrefix       : REXPrefix
  /// VEX information.
  VEXInfo         : VEXInfo option
  /// Opcode.
  Opcode          : Opcode
  /// Operands.
  Operands        : Operands
  /// Instruction size information.
  InsSize         : InsSize
}

repz lifting question

Hi,

I noticed that if we consider the following instruction:

0x40102E:    F3 A4                          repz movsb

it is lifted to the following statements:

-------------ISMark (40102E, 2)-------------
if(ECX = 0x0:I32) then Jmp (Exit, 0) else Jmp (Continue, 1)
-------------LMark (Continue)-------------
[EDI] := [ESI]:I8
ESI := (ite (DF) ((ESI - 0x1:I32)) ((ESI + 0x1:I32)))
EDI := (ite (DF) ((EDI - 0x1:I32)) ((EDI + 0x1:I32)))
ECX := (ECX - 0x1:I32)
EIP := 0x40102E:I32
-------------LMark (Exit)-------------
EIP := 0x401030:I32
-------------IEMark (401030)-------------

This is conceptually correct but it is a bit weird, it is like if we are executing something as

jmp $

until some condition happens (in this case ECX = 0). Would it be better to have something like this one?

-------------ISMark (40102E, 2)-------------
-------------LMark (If)-------------
if(ECX = 0x0:I32) then Jmp (Exit, 0) else Jmp (Continue, 1)
-------------LMark (Continue)-------------
[EDI] := [ESI]:I8
ESI := (ite (DF) ((ESI - 0x1:I32)) ((ESI + 0x1:I32)))
EDI := (ite (DF) ((EDI - 0x1:I32)) ((EDI + 0x1:I32)))
ECX := (ECX - 0x1:I32)
Jmp (If, _)
-------------LMark (Exit)-------------
-------------IEMark (401030)-------------

Incorrect disassembly

Describe the bug
Incorrect disassembly
(Although amd64 is described as an example, there are similar bugs in x86)

To Reproduce

  1. run this code
import clr
import os, sys

clr.AddReferenceToFile(r'B2R2.Core.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Core.dll')
clr.AddReferenceToFile(r'B2R2.FrontEnd.Library.dll')

from B2R2 import *
from B2R2.FrontEnd import *


##Expected behavior : fnstcw word ptr [ds:rdx]
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('d93a')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #fstcw dword ptr [rdx]

##Expected behavior : fdivr qword ptr [rdi]
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('dc3f')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #fdivr dword ptr [rdi]

##Expected behavior : fsub qword ptr [rdi]
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('dc27')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #fsub dword ptr [rdi]

##Expected behavior : fcomp qword ptr [r9]
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('41dc19')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #fcomp dword ptr [r9]

##Expected behavior : out dx, al
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('ee')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #out al, dx

##Expected behavior : in eax,dx
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('4fed')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #in rax,dx

##Expected behavior : mov eax, ss
isa = ISA.OfString("amd64")
binary = ByteArray.ofHexString('8cd0')
handler = BinHandler.Init(isa, binary)
ins = handler.ParseInstr(handler, 0)
print ins.Disasm() #mov ax, ss

2.result

fstcw dword ptr [rdx]
fdivr dword ptr [rdi]
fsub dword ptr [rdi]
fcomp dword ptr [r9]
out al, dx
in rax, dx
mov ax, ss

Expected behavior

fnstcw word ptr [ds:rdx]
fdivr qword ptr [ds:rdi]
fsub qword ptr [ds:rdi]
fcomp qword ptr [ds:r9]
out dx, al
in eax, dx
mov eax, ss

Environment (please complete the following information):

  • OS: Windows 7 x64
  • .NET Core version: 2.2.104
  • B2R2 version : 0.2.0 (latest release)

Does the B2R2 support PPC32 or Sparc?

I have successfully built b2r2 and tested it for decompiling languages on the amd64, ARM, and MIPS architectures. However, when I change the "isa" parameter to "ppc32" or "sparc," I encounter errors. Here are examples and the error messages.

CODE:

open B2R2
open B2R2.FrontEnd

[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "ppc32"
  let bytes = [| 0x48uy; 0x00uy; 0x00uy; 0x64uy|]
  let hdl = BinHandle.Init (isa, bytes)
  let ins = BinHandle.ParseInstr hdl 0UL
  ins.Translate hdl.TranslationContext |> printfn "%A"
  0

or

open B2R2
open B2R2.FrontEnd

[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "sparc"
  let bytes = [| 0x90uy; 0x10uy; 0x20uy; 0x0cuy|]
  let hdl = BinHandle.Init (isa, bytes)
  let ins = BinHandle.ParseInstr hdl 0UL
  ins.Translate hdl.TranslationContext |> printfn "%A"
  0

The same error messages are shown as follows:
ERROR MESSAGE

Unhandled exception. B2R2.InvalidISAException: Exception of type 'B2R2.InvalidISAException' was thrown.
   at B2R2.ISA.OfString(String s)
   at Program.main(String[] argv) in C:\Program Files\dotnet\test1\Program.fs:line 9

I suspect that b2r2 does not support these two architectures, which is why it is throwing this error
But I noticed that there is relevant code for handling the PPC and SPARC architectures in the given GitHub source code. The path is as follows:
src/FrontEnd/BinLifter/PPC32
src/FrontEnd/BinLifter/Sparc64

So now I don't know how to solve this problem. Can anyone please help me?

I want to lift .so file but an error occurs

Describe the bug
I want to lift a .so file which is from Android APK file.
But an error occurs.
Other common .so files also occur the same issue.
How can I lift .so file?

To Reproduce
Steps to reproduce the behavior:

  1. I used F# script below

open B2R2
open B2R2.FrontEnd

[]
let main argv =
let isa = ISA.OfString "armv8a64"
let handler = BinHandler.Init (isa, "libapp.so")
let ins = BinHandler.ParseInstr handler 0UL
ins.Translate handler.TranslationContext |> printfn "%A"
0

  1. $ dotnet build
  2. $ dotnet run
  3. Build is successful but an error occurs like below.

Screenshots
image

Environment (please complete the following information):

  • OS: Windows11
  • .NET Core version: 7.0.201
  • B2R2 version 0.6.1

Disassembly of relative operands

Currently, B2R2 uses the current instruction address as base to present the relative target of a jmp or call, for example 48 31 d8 eb fd will be disassembled into

0x0        48 31 d8        xor rax, rbx
0x3        eb fd           jmp -0x1 ; 0x2

the value -0x1 means the target is at offset 0x1 w.r.t the current instruction (and this is correct).

But Intel assembly says that the offset should be computed w.r.t the address of the next instruction, so the result should be displayed as:

0x0        48 31 d8        xor rax, rbx
0x3        eb fd           jmp -0x3 ; 0x2

(since 0xfd = -0x3).

This is just a question to know whether it's the default behaviour of B2R2 (then it's not a bug, just B2R2 assembly is differ from Intel's)

Behavior change of FormatDetector.Detect()

Describe the bug
B2R2 0.1.x used to return a tuple of FileFormat and ISA in FormatDetector.Detect() method.
However, the method was changed to return only FileFormat in the 0.2.x. Is it intentional? If it is, where can I find the substitute for the ISA detector method?

I also have looked through the docs page. It claims that FormatDetector.Detect() returns a tuple of (B2R2.FileFormat and B2R2.ISA), but the signature of the method shows only FileFormat parameter (val detect: file:string -> FileFormat).

To Reproduce
In a C# project, copy and paste this code.
Go to the NuGet package manager and downgrade/upgrade B2R2 package, and see which method is allowed per version.

// B2R2 0.1.x
(FileFormat format, ISA isa) = FormatDetector.Detect(targetFile);
// B2R2 0.3.0 - where did `ISA isa` go?
FileFormat format = FormatDetector.Detect(targetFile);

I believe it is easily reproducible in F#, too.

Expected behavior
If the change was intentional, the docs page should be corrected.
If the ISA detection was not dropped, the docs page should let users know about the alternative API.

Screenshots
With B2R2.FrontEnd 0.1.1: First method is allowed.
image

With B2R2.FrontEnd 0.3,0: Second method is allowed.
image

Docs page has contradicting description and method signature.
image

Environment (please complete the following information):

  • OS: Linux Mint 19.2, Windows 10 v1909
  • .NET Core version: 3.0.101
  • B2R2 version 0.3.0
  • Language : C#

Additional context
I originally reviewed applying LowUIR into my project, but had to dismiss the idea because the project was so integrated into the VEX-IR. So for now, I am mainly using B2R2 as a binary Format and ISA detector. The software capable of detecting binary information is quite a few, alongside (string-parsing-required) libmagic.

P.S. B2R2 프로젝트의 결과물을 모두가 쓸 수 있게 공개해 주셔서 감사합니다.

EDIT: Fixed a few typos.

[Question] Question on analyzing the ARMv8 aarch32 binary

Hi, I was trying to analyze the binary of ARMv8 aarch32 using the following code.

let bytes = File.ReadAllBytes("xxx.bin")
let isa = ISA.OfString "armv8a32"
let hdl = BinHandle.Init (isa, bytes) //line 15

However, I got the following result.
image

Does the B2R2 support the analysis on ARMv8 aarch32 binary?

Incorrect disassembly

(many thanks for this new framework, I'm just discovering it)

Describe the bug
Incorrect disassembly of instruction

44 0f 20 c0    mov rax, cr8

To Reproduce

  1. Run this piece of code
open B2R2
open B2R2.FrontEnd

[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "amd64"
  let bytes = [| 0x44uy; 0x0fuy; 0x20uy; 0xc0uy |]
  let handler = BinHandler.Init (isa, bytes)
  let ins = BinHandler.ParseInstr handler 0UL
  ins
  |> fun i -> i.Disasm ()
  |> printfn "%s"
  0
  1. See output
mov eax, cr0

Expected behavior

mov rax, cr8

Environment (please complete the following information):

  • OS: Windows 7 x64
  • .NET Core version: 2.2.104
  • B2R2 version: latest from Github

Symbolic Execution component #question

In section VI.C of the paper there is an interesting discussion on the implementation of a Symbolic Execution prototype. Is this code available? Having something for doing symbolic execution would be very interesting (something like the Sandbox concept in Miasm).

I see in the code an Eval module but it is marked as Obsolete in the comment. Also, I see two interesting interfaces: IAbstractInterpreter and DomainInterface but no implementation is provided. Have you any plan on how to implement them? In my opinion is a good part were we can provide contribution

Thanks

[Question] Assembling LowUIR to Assembly (x86/x64/...)

Hi,

I was wondering if you have planned a feature to compile LowUIR code to Assembly (x86/x64/...). I'm aware that is a complex feature but it would be of invaluable importance for the analysis.

My use case is the creation of an obfuscator that is able to clean a strongly obfuscated binary. An example of obfuscated binary is the malware Emotet (https://cert.grnet.gr/en/blog/reverse-engineering-emotet/) that recently came back. I can deobfuscate the binary by modifying the LowUIR code. What is missing is the binary rebuilding phase, by translating LowUIR to binary code (x86).

At this time, most of the deobfuscators are based on LLVM, which is capable of rebuilding the binary.

pcmpistri lifting infine loop

Describe the bug
Lifting of Intel instruction PCMPESTRI causes an infinite loop.

To Reproduce
The following code triggers the infinite loop.

let handler = BinHandler.Init(isa, ArchOperationMode.NoMode, false, 0UL, [|0x066uy; 0x00fuy; 0x03auy; 0x063uy; 0x0c1uy; 0x041uy|])
let stmts = 
  match BinHandler.LiftBBlock handler 0UL with
  | Ok (stmts, addr) -> stmts
  | Error (stmts, addr) -> stmts
Console.WriteLine("This code is never reached")

Function genOutput seems to be the cause of the infinite loop.

Expected behavior
Valid LowUIR statements are generated

Environment (please complete the following information):

  • OS: Windows
  • .NET Core version: 3.1
  • B2R2 version 0.3.1

Additional context
The Intel instruction used for the test is: pcmpistri xmm0, xmm1, 41h

[BinHandler] Error in section reading

Describe the bug
I have a PE with the following sections:

Sec name | Virtual Size | Virtual Address | Raw Size | Raw Address
.text | 0x12A56 | 0x401000 | 0x12C00 | 0x400
.data | 0x1278 | 0x414000 | 0x200 | 0x13000
.reloc | 0x980 | 0x416000 | 0xA00 | 0x13200

If I try to read the content of the data section I receive the following error:

System.ArgumentException: 'Invalid address or size is given.
Parameter name: ReadBytes'

To Reproduce
Steps to reproduce the behavior:

  1. Consider the following source code:
        handler.FileInfo.GetSections()
        |> Seq.iter(fun section ->
            Console.Write("Read all data from section: {0} => ", section.Name)
            handler.ReadBytes(section.Address, int32 section.Size) |> ignore
            Console.WriteLine("DONE")
        )
  1. See error
Read all data from section: .text => DONE
Read all data from section: .data =>

System.ArgumentException: 'Invalid address or size is given.
Parameter name: ReadBytes'

Expected behavior
The full content of the section is read.

Environment (please complete the following information):

  • OS: Windows
  • B2R2 version 0.2.0

Additional context
This problem seems to be caused by a check on invalid range. It wasn't present in previous version.

Improve function detection algorithm

Hi,
I use regularly this awesome framework and I would like to suggest a possible improvement to the function identification algorithm. In particular, I see that the code is not able to identify indirect calls (I read the code from function foldStmts in BinCorpus).

I came across a malware that uses static objects. In C++, a static object constructor is called from the _initterm function. This function uses a table of function pointers that are used to initialize the static objects. Since this table is in the .rdata section, the call is done indirectly. Here an excerpt of the code calling all the functions inside the table:

.text:004217B1                 mov     ecx, eax
.text:004217B3                 call    ds:___guard_check_icall_fptr
.text:004217B9                 call    [ebp+function_pointer]

A possible solution would be to apply the algorithm described in https://mistakenot.net/papers/eurosp-2017.pdf, in particular section 3.2.2 Unreachable/Indirectly Called Functions.

An example of this malware is: 8bf390bbf31d99d50d719ea8f413338ce5a1e1b3dac4ec58468a707cb7b8e914 (https://www.virustotal.com/gui/file/8bf390bbf31d99d50d719ea8f413338ce5a1e1b3dac4ec58468a707cb7b8e914/detection).

Relative calls lifting

Describe the bug
The LowIR of relative calls may not be correct.

To Reproduce
The instruction

e8 e9 fa ff ff        call -0x517

at the concrete address 0 is lifted into

-------------ISMark (0, 5)-------------
T_0:I64 := 0xFFFFFFFFFFFFFAEE:I64
T_1:I64 := 0x5:I64
RSP := (RSP - 0x8:I64)
[RSP] := T_1:I64
RIP := T_0:I64
-------------IEMark (5)-------------

Expected behavior
The instruction is a relative call, so the target PC should be relative to the next instruction, i.e. RIP := RIP - 0x517 + 5 (where 5 is the size of the instruction). Moreover, the value being pushed into the stack should be calculated relatively with RIP, i.e. [RSP] = RIP + 5.

Environment:

  • Windows 7 64 bit
  • .NET Core version: 2.2.200
  • B2R2 version: master:HEAD

Additional context
PR #5 is a dirty fix where B2R2 lifts this instruction to:

-------------ISMark (0, 5)-------------
T_0:I64 := (RIP + 0xFFFFFFFFFFFFFAF3:I64)
T_1:I64 := (RIP + 0x5:I64)
RSP := (RSP - 0x8:I64)
[RSP] := T_1:I64
RIP := T_0:I64
-------------IEMark (5)-------------

Thoughts
B2R2 currently computes directly the target using the current "concrete" address of the instruction and the containing basic block (then there is no need to use a symbolic PC).

In this call instruction, B2R2 pushes the returned "concrete PC" into the stack (e.g. [RSP] := 0x5) as well as compute the next "concrete PC".

The PR #5 makes B2R2 symbolizes the PC, then [RSP] := PC + 0x5. I think it would be a "more expected" behaviour in situations where we need a symbolique PC (e.g. when we need to formally prove something which should always be true regardless of the value of PC).

PS. The issue is just a personal point of view and may not a bug but I cannot set the label myself, sorry.

Error in parse export for DLL profapi.dll

Describe the bug
When a BinHandler is created, the code in charge for parsing the Export table raise an exception. The problematic file is C:\Windows\SysWOW64\profapi.dll (MD5: 2545F5AA189DF907BC4A6E000E231A6D).

To Reproduce
Steps to reproduce the behavior:

  1. Run code: BinHandler.Init(ISA.OfString "x86", @"C:\Windows\SysWOW64\profapi.dll")
  2. See error in PEHelper.getRawOffset, exception message: System.IndexOutOfRangeException: 'Index was outside the bounds of the array.'

Environment (please complete the following information):

  • OS: Windows
  • B2R2 version: 0.1.1 (2019-03-22)

Additional context
The DLL contains a uncommon Export directory table, an example of export is:

Ordinal        Function RVA   Name Ordinal    Name RVA    Name
00000066	00004560	N/A	      N/A	  N/A

This will raise an exception at the following code:
let sHdr = headers.SectionHeaders.[idx]
where idx has value -1.

Error in getRawOffset

Describe the bug
특정 파일을 파싱하는 과정에서 getRawOffset 이 제대로 동작하지 않습니다.

Traceback (most recent call last):
  File "b2r2.py", line 31, in <module>
IndexError: 인덱스가 배열 범위를 벗어났습니다.

To Reproduce

이전에 올렸던 이슈에서 사용된 코드와 동일하고 파일만 다릅니다.

다운로드

Environment (please complete the following information):

  • OS: Windows7
  • .NET Core version: 2.2.204
  • B2R2 version: latest from github

Additional context
이 버그는 B2R2의 버그라기 보다는 PE파일의 괴랄한점(각 섹션의 VirtualSize는 PE파일을 로딩하는데 사용되지 않습니다. 그래서 VirtualSize가 설령 이런식의 형태를 띄고있다 할지라도 멀정하게 동작합니다. )과 .NET Core의GetContainingSectionIndex 구현에 기인합니다.

위 사진은 .net core의 GetContainingSectionIndex 구현인데, 이 파일의 경우 모든 섹션의 VA가 0인 괴랄한 형태입니다. 따라서 -1을 리턴하게 되고, b2r2가 제대로 동작하지 않습니다.

b2r2사용에 영향을 줄 수 있는 만큼, 현재섹션(i)의 VA와 다음섹션(i+1)의 VirtualAddress 사이에 변환하는것을 원하는 RVA값이 있는지 확인하는식으로, b2r2에서 자체적으로 구현하는게 어떻습니까?

How to update the binary handler content?

Hi,

I'm creating a simple emulator for LowUIR and during my test I need to modify the content of the Binary Handler (as in a self modifying program).

As far as I understood the only way in doing this is by using the BinHandler.UpdateCode function, that return a new handler. The function accepts an array of bytes that represent the full content of the new handler. This is a bit annoying if I have to modify just a couple of bytes (but I understand that you want to maintain BinHandler immutable).

My current strategy is:

  • Convert the VA with a raw offset inside the buffer
  • Read the handler buffer and modify the given bytes at the calculated offset
  • create an handler with the update buffer

Is there any other easier way of doing this or what I described is the proper way?

ARMv7: incorrect disassembly of VCVT

Describe the bug
The sequence of bytes "E00AB7EE" (little-endian) should decode to

"vcvt.f64.f32 d0, s1"

But the ARM32 decoder will decode to

"vcvt.f64.f32 s0, s1"

It seems that the helper "getRegAL" is incorrect for VCVT.

To Reproduce

Try to decode "E00AB7EE" for ARMv7.

Expected behavior

The sequence of bytes should decode to

"vcvt.f64.f32 d0, s1".

Error when import function does not exist in wasm binary

Hello, I'm looking at wasm binary parsing module and it seems there is a bug.

In WebAssembly, import function is not necessary.

You can see an example. This example is in WebAssembly text format and it can directly convert to wasm binary using wat2wasm.

However, please look at the code. If the import function section does not exist, the lengths of the arrays can be different.

// src/FrontEnd/BinFile/WasmParser.fs

  let importedFuncs, impSecOff =
    match wm.ImportSection with
    | Some sec ->
      match sec.Contents with
      | Some conts ->
        conts.Elements
        |> Array.filter (fun ie ->
          match ie.Desc with
          | ImpFunc _ -> true
          | _ -> false
        ), sec.Offset
      | None -> [||], 0
    | None -> [||], 0 // importedFuncs can be [||].
  let lastIdx =
    let len = Array.length importedFuncs
    if len = 0 then 0u // lastIdx can be 0u.
    else uint32 (len - 1)
  let impFuncsIdxMap = // And here is a bug!
    importedFuncs
    |> Array.map2 (fun idx ifun ->
      makeFuncIdxInfo impSecOff idx ifun.Offset) [| 0u .. lastIdx |]

When importedFuncs is [||], lastIdx is 0u. And in impFuncsIdxMap, [| 0u .. lastIdx |] will be [| 0u |]. So there is an error like

Error: 배열의 길이가 서로 다릅니다.
array1.Length = 1, array2.Length = 0 (Parameter 'array1')

I think we can fix like below.

  let impFuncsIdxMap =
    if Array.isEmpty importedFuncs then [||]
    else
      importedFuncs
      |> Array.map2 (fun idx ifun ->
        makeFuncIdxInfo impSecOff idx ifun.Offset) [| 0u .. lastIdx |]

x86/64 lifting loop instruction

Describe the bug
The loop instruction contains unnecessary internal (intra) jumps as discussed in #15.

To Reproduce

0x19146D:    E2 FB                          loop -0x3 ; 0x19146A

By lifting the above instruction, we get:

-------------ISMark (19146D, 2)-------------
T_710:I32 := ECX
-------------LMark (Loop)-------------
T_710:I32 := (T_710:I32 - 0x1:I32)
if(T_710:I32 != 0x0:I32) then Jmp (Continue, 127) else Jmp (End, 128)
-------------LMark (Continue)-------------
EIP := (EIP + 0x19146A:I32)
Jmp (Loop, 126)
-------------LMark (End)-------------
-------------IEMark (19146F)-------------

Expected behavior
The loop instruction in x86 should be handled the same as in jcc instructions. But currently we are treating loop in a particularly different way.

Additional context

See #15 for more discussion.

Improve Segmentation Semantics for x86

Describe the bug
LowIR for instructions accessing segmented memory is "probably" incorrect, e.g.

2e 8b 02    mov eax, cs:[edx]

To Reproduce

  1. Run this piece of code
open B2R2
open B2R2.FrontEnd

[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "x86"
  let bytes = [| 0x2euy; 0x8buy; 0x02uy; |]
  let handler = BinHandler.Init (isa, bytes)
  let ins = BinHandler.ParseInstr handler 0UL
  ins
  |> fun i -> i.Translate handler.TranslationContext
  |> printfn "%A"
  0
  1. See output
[|ISMark (0UL,3u);
  Put
    (Var (32,8,"EAX",IntelRegisterSet<1, 0, 0, 0>),
     Load
       (Little,32,
        BinOp
          (ADD,32,Var (32,1793,"CSBase",IntelRegisterSet<8000000, 0, 0, 0>),
           Var (32,11,"EDX",IntelRegisterSet<8, 0, 0, 0>),
           {HasLoad = false;
            VarInfo = IntelRegisterSet<8000008, 0, 0, 0>;
            TempVarInfo = set [];},None),
        {HasLoad = true;
         VarInfo = IntelRegisterSet<8000008, 0, 0, 0>;
         TempVarInfo = set [];},None)); IEMark 3UL|]

Expected behavior
In the Intel's segmented memory model, the semantics of cs:[edx] is not simply as loading [cs + edx].

Environment:

  • OS: [e.g. iOS]
  • .NET Core version: [e.g. 2.1.502]
  • B2R2 version [e.g. 0.1.0]

Additional context
I'm reading the paper but it does not discuss about how the memory is modelled.

InterJmp pretty-printing

Describe the bug
Currently there's no difference between Put to a PC register and an InterJmp statement. That is, both show the same pp: EIP := .... We should clearly distinguish two statements.

Additional context
See discussion in #15.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.