simonlindholm / asm-differ Goto Github PK
View Code? Open in Web Editor NEWAssembly diff script
License: The Unlicense
Assembly diff script
License: The Unlicense
When running diff.py with Python 3.12.1, I get a number of warnings, and it is no longer able to correctly parse the map file.
/diff.py CARDRenameAsync
/home/cameron/Sources/smb-decomp/./diff.py:1240: SyntaxWarning: invalid escape sequence '\S'
+ "(\S+)",
/home/cameron/Sources/smb-decomp/./diff.py:3200: SyntaxWarning: invalid escape sequence '\.'
if source_line and re.fullmatch(".*\.c(?:pp)?:\d+", source_line):
/home/cameron/Sources/smb-decomp/./diff.py:1024: DeprecationWarning: ast.Num is deprecated and will be removed in Python 3.14; use ast.Constant instead
if isinstance(node, ast.Num): # <number>
/home/cameron/Sources/smb-decomp/./diff.py:1025: DeprecationWarning: Attribute n is deprecated and will be removed in Python 3.14; use value instead
return node.n
Not able to find function in map file.
(Source: https://decomp.me/scratch/xtJVv)
The target has 0x1, but the current version of the assembly displays 1. I'd expect it to not count as an actual difference.
Originally reported as decompme/decomp.me#816
<skipped>
is confusing
if both LHS and RHS match the pattern
branchlikely .target
x
x
.target:
it's worth not highlighting diffs for the first x, because it's just an artifact of an assembler optimization; the real diff is at the second x. We used to do this unconditionally, because IDO only ever emits branch likelies of this form, but with GCC it needs actual pattern matching.
It sometimes lines blocks up against each other that are quite far away. It feels like it might be operating in a divide-and-conquer fashion that doesn't work well for asm diffing. Something like Levenshtein distance might work better.
Using jal's as markers to line things up against might work, but is kinda specific.
https://decomp.me/scratch/3ZUPF
Technically there are deletions here, but imo it's safe to ignore extra nops for scoring purposes
It's much easier to package/work with the software if this file is committed rather than installing somewhat random version of dependencies (within version bounds). I can of course generate and keep one on my side but if there's a known-working blessed one already used by the developer that basically always works, that'd be best! Thank you.
https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
My environment sets the LESS variable, to LESS=-RXF
.
One particular component of this, -F or --quit-if-one-screen, causes less to exit when it's not necessary to page things. I find this very useful and would like to keep it this way.
However, it completely breaks the functionality of diff.py -w
when enabled, causing it to always exit.
Considering asm-differ is using less
in this rather unconventional manner, I think it's worth fixing in asm-differ. One simple solution is to simply unset the LESS
environment variable from within asm-differ, or looking for an option in the less
man page to disable the -F behavior.
See the commit message of c1acb7a. These occur in branches and jal's in MIPS objdump output, which we probably need to special-case (?).
I'm wanting to add PS3 support to decomp.me
, but the maintainers said I need to check if asm-differ supports it. Tried to figure it out myself but dont know how to use it correctly.
here's the command I used:
python3 ../asm-differ/diff.py -f EBOOT.asm start
output:
Traceback (most recent call last):
File "C:\cygwin\home\693982\asm-differ\diff.py", line 3664, in <module>
main()
File "C:\cygwin\home\693982\asm-differ\diff.py", line 3551, in main
diff_settings.apply(settings, args) # type: ignore
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'diff_settings' has no attribute 'apply'
dunno if im using it wrong, of if asm-differ doesnt support the instruction set. if it doesnt, i would like to know how to add support.
Some options such as stack difference sensitivity would also be useful here
I.e. compare mtime. The current message is potentially misleading and can hide build system brokenness.
(we could alternatively make -B
I guess)
.o files contain information about exported symbols, so why not.
Allow accepting only one input and just displaying it back with no diff calculation
The following asm snippet seems to cause an exception when used within decomp.me where I'm trying to implement msvc support. When the function is changed to no longer be part of a class the exception dissappears and the diff works:
i386-pc-msdosdjgpp-objdump --disassemble --disassemble-zeroes --line-numbers --start-address=0 -m i386 --no-show-raw-insn /mnt/d/Code/teststraw.o
/mnt/d/Code/teststraw.o: file format coff-go32
Disassembly of section .text:
00000000 <?Get_From@Straw@@EAEXPAV1@@Z>:
0: push %esi
1: mov 0x8(%esp),%esi
5: push %edi
6: mov %ecx,%edi
8: cmp %esi,0x4(%edi)
b: je 3e <?Get_From@Straw@@EAEXPAV1@@Z+0x3e>
d: test %esi,%esi
f: je 26 <?Get_From@Straw@@EAEXPAV1@@Z+0x26>
11: mov 0x8(%esi),%ecx
14: test %ecx,%ecx
16: je 26 <?Get_From@Straw@@EAEXPAV1@@Z+0x26>
18: mov (%ecx),%eax
1a: push $0x0
1c: call *0x4(%eax)
1f: movl $0x0,0x8(%esi)
26: mov 0x4(%edi),%eax
29: test %eax,%eax
2b: je 34 <?Get_From@Straw@@EAEXPAV1@@Z+0x34>
2d: movl $0x0,0x8(%eax)
34: test %esi,%esi
36: mov %esi,0x4(%edi)
39: je 3e <?Get_From@Straw@@EAEXPAV1@@Z+0x3e>
3b: mov %edi,0x8(%esi)
3e: pop %edi
3f: pop %esi
40: ret $0x4
43: nop
44: nop
45: nop
46: nop
47: nop
48: nop
49: nop
4a: nop
4b: nop
4c: nop
4d: nop
4e: nop
4f: nop
Motivated by decomp.me.
In #52 I added the JsonFormatter
for returning the diff results as a JSON blob. This is fine for CLI use, but decomp.me imports diff.py
directly and uses it as a library. If we're worried about diff performance in decomp.me, diff.py
could expose a function for getting the diff result as a dict
, rather than as a serialized str
.
I know i could pipe the diff to ansi2html but would be nice if this was built in
We can parse symbols to know when to stop.
I'm running into some issues with data making decompiling SH2 annoying. With SH2, there's only 8-bit immediates, so most data is loaded with pc-relative instructions, and the data is interspersed with the function. This scratch is an example of the sort of issues: https://decomp.me/scratch/SUNET
Lines 18-20 in the asm is actually a jump table.
The 0xB0 in the switch is at line 5A in the asm.
Lines 5C-5E in the asm is the pointer to D_800A7734.
Longer functions have this issue worse since the pc-relative offset is limited so there will be a block instructions and a jump, a block of data, the next block of instructions and a jump, a block of data, etc.
I've been thinking about different ways to solve this issue and was wondering if anyone has suggestions. Do any other architectures have this sort of problem?
I've though about:
Replace objdump with a better disassembler for the SH2 case? My disassembler has gnu as-compatible output but it's written in rust so that would be a significant dependency.
Add more parsing to asm-differ to try the make the objdump output better? I'm not sure exactly how much additional parsing is needed but it seems like it would be basically re-implementing a disassembler for certain patterns.
(Not really asm-differ related) Allow linker arguments in decomp.me so that the pointers can be set to the right locations? This would help with cases like struct->offset where the offset and the base pointer added together by gcc.
A bit hacky, but this is useful for seeing differences:
elif tag == 'replace':
if line1.split("\t")[0] == line2.split("\t")[0] and '%' in line1 and '%' in line2 and 'rodata' in line2:
line1 = f'{original1}'
line2 = f'{original2}'
else:
When codewarrior is passed the -sym dwarf-2
flag, a full path is output for the source file name, which includes a colon as part of the drive name and therefore breaks this regex
Line 1958 in b0ef0c5
Would allowing colons here break other things?
Mentioned here: decompme/decomp.me#420 (check for blr
)
-w, can use https://pythonhosted.org/watchdog/. Depends on moving everything to Python
Implementing our own less
equivalent would make it possible to:
Would need to support {,page }{up,down}, g
, G
, q
, and probably (annoyingly) search (/
, n
, N
); I don't think the rest of less is useful.
decompme/decomp.me#394 (comment)
I've been told I should report this here. Basically I'm told the output matches the input 100% while it still differs in exactly one spot.
I assume it is an issue with score computation only as it does seem to recognize the difference otherwise.
See this scratch: https://decomp.me/scratch/RbmNx
(Note: the header in question is a preprocessed context of bstring.h from the Hewlett Packard C++ STL.)
Diff error: Error running asm-differ: failed to find address immediate for line 'add %al,(%eax)'
In watch mode it could also show how/if this changed from last time
Could also split number of diffs in insertions/deletions/regalloc/immediates
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.