Code Monkey home page Code Monkey logo

Comments (5)

Qwokka avatar Qwokka commented on September 16, 2024

Hey @Riddle1001

This is always a hard problem. As you've pointed out, the usual techniques for narrowing down code typically rely on a "known" value you can search for.

In general, the best advice I can give is to work backwards from what you're trying to do. For example, if you're attempting to find the source of some value that is sent via websockets, you may want to start by hooking the prototype of Websocket.send() and printing a console.trace() when its called. The nice thing about this is that it not only lists the Javascript call stack but the WASM call stack as well.

It may also be worth looking for strings or other data values that are relevant to the operation you're looking for. For example, when looking for encryption and decryption, you can more than often find strings ("AES256" for example) that are used by the code doing the encrypting. This way, even though you're not looking for a specific "output" value you can still use watchpoints to narrow down code.

Even with some knowledge of which functions are relevant, it's often still a reverse engineering task. I'm not really up-to-date on the state of art in WASM reverse engineering but I do see some support for WASM in Ghidra plugins. The best advice I can give here is that the more you narrow it down to the relevant functions, the less actual reverse engineering you have to do.

Also, while I really don't mind people creating Github issues with questions, you (and anyone else reading this) are also welcome to email me at [email protected] with questions of this sort.

Jack

from cetus.

Riddle1001 avatar Riddle1001 commented on September 16, 2024

Hey @Riddle1001

This is always a hard problem. As you've pointed out, the usual techniques for narrowing down code typically rely on a "known" value you can search for.

In general, the best advice I can give is to work backwards from what you're trying to do. For example, if you're attempting to find the source of some value that is sent via websockets, you may want to start by hooking the prototype of Websocket.send() and printing a console.trace() when its called. The nice thing about this is that it not only lists the Javascript call stack but the WASM call stack as well.

It may also be worth looking for strings or other data values that are relevant to the operation you're looking for. For example, when looking for encryption and decryption, you can more than often find strings ("AES256" for example) that are used by the code doing the encrypting. This way, even though you're not looking for a specific "output" value you can still use watchpoints to narrow down code.

Even with some knowledge of which functions are relevant, it's often still a reverse engineering task. I'm not really up-to-date on the state of art in WASM reverse engineering but I do see some support for WASM in Ghidra plugins. The best advice I can give here is that the more you narrow it down to the relevant functions, the less actual reverse engineering you have to do.

Also, while I really don't mind people creating Github issues with questions, you (and anyone else reading this) are also welcome to email me at [email protected] with questions of this sort.

Jack

Thank you for your insightful response. I was unaware that console.trace could also record wasm calls, which is nice to know piece of information.

I'm currently working with a game that's built in Unity and I've converted it into WebAssembly Text Format (WAT), which is proving to be challenging to comprehend. Notably, the absence of strings is confusing. As I'm relatively new to both WASM/WAT and Unity, I'm uncertain whether this lack of strings is a standard characteristic of a WAT file or merely a byproduct of Unity's efficiency techniques or obfuscation methods, or even just how WASM/WAT works.

Based on my online research specifically about WAT, it seems that strings could potentially be located within the data section, although other elements might be stored there as well. However, simply by examining the data section, it's not immediately apparent how to transform it into legible text. Below is a brief excerpt from the data section:

  (data (;15913;) (i32.const 2940276) "\80:\04")
  (data (;15914;) (i32.const 2940296) "\f1\875\00[\00\00\00P[$\00'\00\00\00\c0\5c$\00\00^$")
  (data (;15915;) (i32.const 2940348) "\c0\a5\09")
  (data (;15916;) (i32.const 2940368) "\07\885\00R\00\00\00p_$")
  (data (;15917;) (i32.const 2940388) "\c0`$")
  (data (;15918;) (i32.const 2940421) "\a5\09")
  (data (;15919;) (i32.const 2940440) "!\885\00a\00\00\00\10b$")
  (data (;15920;) (i32.const 2940460) "\a0c$")
  (data (;15921;) (i32.const 2940492) "\c03\04")

Thanks to another extension, I was able to change script files to detour websocket.send. This game is kinda popular, so there's a lot of data moving around. The game doesn't stop sending data even when other players aren't in sight. It's hard to understand, though, because they're not using simple JSON. Instead, they seem to be using something like enums or binary - I'm not really sure. There is also no function that I see that makes it readable, or provide hints inside the websocket's receive block, so that is probably also done serverside

Taking look at Ghidra plugin now, seems very nice

Thanks for your reply!

from cetus.

Riddle1001 avatar Riddle1001 commented on September 16, 2024

I also tried the console.trace trace suggestion, and I do see something like game.wasm:32934848, but a little confused as to what the number represents as usually it's a line number, but the number is much larger than the WAT file line count

from cetus.

Qwokka avatar Qwokka commented on September 16, 2024

I'm currently working with a game that's built in Unity and I've converted it into WebAssembly Text Format (WAT), which is proving to be challenging to comprehend. Notably, the absence of strings is confusing. As I'm relatively new to both WASM/WAT and Unity, I'm uncertain whether this lack of strings is a standard characteristic of a WAT file or merely a byproduct of Unity's efficiency techniques or obfuscation methods, or even just how WASM/WAT works.

Based on my online research specifically about WAT, it seems that strings could potentially be located within the data section

You're correct, the DATA segment of the binary contains the pre-initialized data. This typically includes constant strings, but not dynamically allocated strings. To give an example in C:

char dataString[] = "data string";

int main() {
    char stackString[] = "stack string";
}

In this example, I'd expect dataString to be in the DATA segment and stackString to be dynamically constructed by the function main. That second case is a fair amount more annoying to identify, as you can probably tell.

In regards to the examples you gave from the data segment, these could unfortunately be a lot of things. They could be constant integers, or really anything else that you might expect to see in the .data segment of an ELF or EXE file.

Reverse engineering tools like Ghidra help a lot since they can often map a data address to the code that references it. When this fails, watchpoints can be helpful to figure out what a particular data address is used for.

I also tried the console.trace trace suggestion, and I do see something like game.wasm:32934848, but a little confused as to what the number represents as usually it's a line number, but the number is much larger than the WAT file line count

Yeah this is confusing. This is the offset of the instruction in the binary itself. So if you were to open game.wasm in a hex editor and go to offset 32934848, you would find the bytes that define the next instruction in the call stack. It helps to use a disassembler that lists the offset of each instruction in the disassembly. I'm not sure if wat2wasm does this, but I do know wasm-objdump from https://github.com/WebAssembly/wabt does.

from cetus.

Riddle1001 avatar Riddle1001 commented on September 16, 2024

Thank you for your help and recommendations. I'm going to experiment with all of this for a while and try out your suggestions

from cetus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.