Code Monkey home page Code Monkey logo

Comments (33)

jreiser avatar jreiser commented on August 18, 2024 1

upx/upx#582 Rebuilding LZMA decompression stub

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024 1

In stub/src/arch/amd64/lzma_d.S there is this comment:

// ELFMAINX has already done this for us:
//      pushq %rbp; push %rbx  // C callable
//      pushq ldst
//      pushq dst
//      addq src,lsrc; push lsrc  // &input_eof

That doesn't sound right? If src is added to lsrc, it becomes a pointer to the end of the input buffer, instead of the size. I understand that it's needed for input_eof, but shouldn't the code restore its old value afterwards?

subq src, lsrc

If I do that, I'm getting a consistent value of inSize. Gonna let it run for a while to see if it still crashes.

from publishaotcompressed.

MichalStrehovsky avatar MichalStrehovsky commented on August 18, 2024

I wonder if capturing an strace like in upx/upx#385 would tell us whether it's during decompression, or after the uncompressed code starts running. Or if you can capture a coredump, we could look at where is the instruction pointer pointing (is it still pointing to the memory mapped bytes from the executable, or to some dynamically allocated piece of code?).

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

I wonder if capturing an strace like in upx/upx#385 would tell us whether it's during decompression

I captured one but I didn't feel like it told me much. I'll capture a new one and paste it here.

Or if you can capture a coredump, we could look at where is the instruction pointer pointing

I spent a lot of time on this, without success. It doesn't "crash" strictly speaking, but rather aborts with a call to exit(127). I tried to catch it with a debugger, but the syscall is done manually from assembly, so I can't even set a breakpoint on the libc syscall: https://github.com/upx/upx/blob/devel/src/stub/src/amd64-linux.elf-entry.S#L227-L230

die:
        push $127; pop %arg1
        push $ __NR_exit; pop %rax
        syscall

I tried to tweak UPX to modify those instructions and trigger a "real" crash, but the assembly gets compiled into a .h file: https://github.com/upx/upx/blob/devel/src/stub/amd64-linux.elf-entry.h. The way that file is generated is not documented. It looks like it's happening in https://github.com/upx/upx/blob/devel/src/stub/Makefile but I couldn't get it to work.

Edit: I edited those instructions directly in the compiled binary to trigger a crash, but I keep getting an exit code 127, so I guess it comes from elsewhere 🤔

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024
execve("/root/git/dd-trace-dotnet/shared/bin/monitoring-home/linux-musl-x64/dd-dotnet", ["/root/git/dd-trace-dotnet/shared"...], 0x7ffd4499e580 /* 27 vars */) = 0
open("/proc/self/exe", O_RDONLY)        = 3
mmap(NULL, 3055, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f92fec78000
mprotect(0x7f92fec78000, 3055, PROT_READ|PROT_EXEC) = 0
readlink("/proc/self/exe", "/root/git/dd-trace-dotnet/shared"..., 4095) = 77
mmap(0x7f92fec79000, 16331040, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f92fec79000
mmap(0x7f92fec79000, 3112280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f92fec79000
mprotect(0x7f92fec79000, 3112280, PROT_READ) = 0
mmap(0x7f92fef71000, 5688441, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7f92fef71000
mprotect(0x7f92fef71000, 5688441, PROT_READ|PROT_EXEC) = 0
mmap(0x7f92ff4de000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7f92ff4de000
mprotect(0x7f92ff4de000, 5023028, PROT_READ) = 0
mmap(0x7f92ff9a9000, 2368728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0xd2f000) = 0x7f92ff9a9000
exit(127)                               = ?
+++ exited with 127 +++

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

I'm becoming crazy 😅 I spent hours trying to get more information about where that exit(127) comes from but:

  • When launching the app with gdb, it never crashes despite 10.000+ runs
  • I tried using bpftrace, but I see no exit syscall (despite strace showing exit(127))

I'm out of ideas for now.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024
  1. Which version of UPX? Run "upx --version", or "strings ./my_app-after-compression", or "vi ./my_app-after-compression" and search for "UPX".
  2. Does the problem happen only with LZMA compression? Compress again without using "--lzma", and run the output.
  3. What is the output from "readelf --segments ./my_app-before-compression"? The sizes of the [PT_]LOAD segments should match the lengths of the calls to mmap(, length, ,,,).

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024
  1. Can the compressed app be de-compressed offline [offline: not as part of execution of the compressed app itself] by UPX? upx -d -f -o ./my-app-decompressed ./my-app-compressed

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

Which version of UPX? Run "upx --version", or "strings ./my_app-after-compression", or "vi ./my_app-after-compression" and search for "UPX".

upx 4.0.2
NRV data compression library 0.84
UCL data compression library 1.03
zlib data compression library 1.2.13.1-motley
LZMA SDK version 4.43

I also tried with a version manually built from the devel branch and the issue is still there.

Does the problem happen only with LZMA compression? Compress again without using "--lzma", and run the output.

I have no definite proof but without --lzma I haven't been able to reproduce the issue after 100,000 runs. It takes ~1000 runs to reproduce the issue with lzma.

What is the output from "readelf --segments ./my_app-before-compression"? The sizes of the [PT_]LOAD segments should match the lengths of the calls to mmap(, length, ,,,).

Elf file type is DYN (Shared object file)
Entry point 0x2f93a0
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000019 0x0000000000000019  R      0x1
      [Requesting program interpreter: /lib/ld-musl-x86_64.so.1]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000002f7d58 0x00000000002f7d58  R      0x1000
  LOAD           0x00000000002f8000 0x00000000002f8000 0x00000000002f8000
                 0x000000000056cc81 0x000000000056cc81  R E    0x1000
  LOAD           0x0000000000865000 0x0000000000865000 0x0000000000865000
                 0x00000000004ca534 0x00000000004ca534  R      0x1000
  LOAD           0x0000000000d2f960 0x0000000000d30960 0x0000000000d30960
                 0x0000000000241b78 0x00000000002627c0  RW     0x1000
  DYNAMIC        0x0000000000d30508 0x0000000000d31508 0x0000000000d31508
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  NOTE           0x00000000000002fc 0x00000000000002fc 0x00000000000002fc
                 0x0000000000000024 0x0000000000000024  R      0x4
  TLS            0x0000000000d2f960 0x0000000000d30960 0x0000000000d30960
                 0x00000000000000e8 0x0000000000000111  R      0x8
  GNU_EH_FRAME   0x0000000000bb8ef8 0x0000000000bb8ef8 0x0000000000bb8ef8
                 0x0000000000041a9c 0x0000000000041a9c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000d2f960 0x0000000000d30960 0x0000000000d30960
                 0x00000000000016a0 0x00000000000016a0  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.build-id .gnu.hash .dynsym .dynstr .rela.dyn .rela.plt
   03     .init .plt .plt.got .text __managedcode __unbox .fini
   04     .rodata .dotnet_eh_table .eh_frame_hdr .eh_frame
   05     .tdata .init_array .ctors .dtors .data.rel.ro .dynamic .got .data __modules .bss
   06     .dynamic
   07     .note.gnu.build-id
   08     .tdata .tbss
   09     .eh_frame_hdr
   10
   11     .tdata .init_array .ctors .dtors .data.rel.ro .dynamic .got

And the corresponding strace output (not the same as the previous one because I recompiled for the tests):

execve("/root/git/dd-trace-dotnet/tracer/src/Datadog.Trace.Tools.dd_dotnet/bin/Release/net7.0/linux-musl-x64/publish/dd-dotnet", ["/root/git/dd-trace-dotnet/tracer"...], 0x7ffef5f64d30 /* 26 vars */) = 0
open("/proc/self/exe", O_RDONLY)        = 3
mmap(NULL, 3055, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcefedb9000
mprotect(0x7fcefedb9000, 3055, PROT_READ|PROT_EXEC) = 0
readlink("/proc/self/exe", "/root/git/dd-trace-dotnet/tracer"..., 4095) = 118
mmap(0x7fcefedba000, 16331040, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fcefedba000
mmap(0x7fcefedba000, 3112280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fcefedba000
mprotect(0x7fcefedba000, 3112280, PROT_READ) = 0
mmap(0x7fceff0b2000, 5688449, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7fceff0b2000
mprotect(0x7fceff0b2000, 5688449, PROT_READ|PROT_EXEC) = 0
mmap(0x7fceff61f000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7fceff61f000
exit(127)                               = ?
+++ exited with 127 +++

Starting after readlink("/proc/self/exe", the last 3 mmap seems correct, however I don't know what the first one maps to (mmap(0x7fcefedba000, 16331040, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0), I see no section of that size)

Can the compressed app be de-compressed offline [offline: not as part of execution of the compressed app itself] by UPX? upx -d -f -o ./my-app-decompressed ./my-app-compressed

Yes, and I haven't been able to reproduce the crash with the decompressed app.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

strace output in a successful run:

execve("/root/git/dd-trace-dotnet/tracer/src/Datadog.Trace.Tools.dd_dotnet/bin/Release/net7.0/linux-musl-x64/publish/dd-dotnet", ["/root/git/dd-trace-dotnet/tracer"...], 0x7ffd86da5cf0 /* 26 vars */) = 0
open("/proc/self/exe", O_RDONLY)        = 3
mmap(NULL, 3055, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc03514000
mprotect(0x7fdc03514000, 3055, PROT_READ|PROT_EXEC) = 0
readlink("/proc/self/exe", "/root/git/dd-trace-dotnet/tracer"..., 4095) = 118
mmap(0x7fdc03515000, 16331040, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc03515000
mmap(0x7fdc03515000, 3112280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc03515000
mprotect(0x7fdc03515000, 3112280, PROT_READ) = 0
mmap(0x7fdc0380d000, 5688449, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7fdc0380d000
mprotect(0x7fdc0380d000, 5688449, PROT_READ|PROT_EXEC) = 0
mmap(0x7fdc03d7a000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7fdc03d7a000
mprotect(0x7fdc03d7a000, 5023028, PROT_READ) = 0
mmap(0x7fdc04245000, 2368728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0xd2f000) = 0x7fdc04245000
mprotect(0x7fdc04245000, 2368728, PROT_READ|PROT_WRITE) = 0
mmap(0x7fdc04488000, 131360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc04488000
open("/lib/ld-musl-x86_64.so.1", O_RDONLY) = 4
[skipping everything else]

strace output without lzma:

execve("/root/git/dd-trace-dotnet/tracer/src/Datadog.Trace.Tools.dd_dotnet/bin/Release/net7.0/linux-musl-x64/publish/dd-dotnet-no-lzma", ["/root/git/dd-trace-dotnet/tracer"...], 0x7fffee355280 /* 26 vars */) = 0
open("/proc/self/exe", O_RDONLY)        = 3
mmap(NULL, 3055, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e71732000
mprotect(0x7f5e71732000, 3055, PROT_READ|PROT_EXEC) = 0
readlink("/proc/self/exe", "/root/git/dd-trace-dotnet/tracer"..., 4095) = 126
mmap(0x7f5e71733000, 16331040, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e71733000
mmap(0x7f5e71733000, 3112280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e71733000
mprotect(0x7f5e71733000, 3112280, PROT_READ) = 0
mmap(0x7f5e71a2b000, 5688449, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7f5e71a2b000
mprotect(0x7f5e71a2b000, 5688449, PROT_READ|PROT_EXEC) = 0
mmap(0x7f5e71f98000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7f5e71f98000
mprotect(0x7f5e71f98000, 5023028, PROT_READ) = 0
mmap(0x7f5e72463000, 2368728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0xd2f000) = 0x7f5e72463000
mprotect(0x7f5e72463000, 2368728, PROT_READ|PROT_WRITE) = 0
mmap(0x7f5e726a6000, 131360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e726a6000
open("/lib/ld-musl-x86_64.so.1", O_RDONLY) = 4

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

I had a breakthrough, I finally understand why I wasn't able to capture the error in gdb. I noticed earlier that the crash never occurs when ASLR is disabled, well it turns out gdb disables ASLR by default: https://visualgdb.com/gdbreference/commands/set_disable-randomization

Now I have a memory dump. Hopefully it can shed some light on what's happening.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

For what it's worth, the disassembly right before the exit syscall (catch syscall exit in gdb) is:

   0x00007f8afef5c159:  pop    %rdi
   0x00007f8afef5c15a:  pop    %rsi
   0x00007f8afef5c15b:  push   $0xb
   0x00007f8afef5c15d:  pop    %rax
   0x00007f8afef5c15e:  jmp    *-0x8(%r14)
   0x00007f8afef5c162:  mov    $0x9,%al
   0x00007f8afef5c164:  mov    %rcx,%r10
   0x00007f8afef5c167:  movzbl %al,%eax
   0x00007f8afef5c16a:  syscall
=> 0x00007f8afef5c16c:  cmp    $0xfffffffffffff000,%rax

IMO it looks very similar to this:

        pop %arg1  # ADRU: unfolded upx_main etc.
        pop %arg2  # LENU
        push $__NR_munmap; pop %rax
        jmp *-NBPW(%r14)  # goto: syscall; pop %rdx; ret

mmap: .globl mmap
        movb $ __NR_mmap,%al
        movq %arg4,%sys4
sysgo:  # NOTE: kernel demands 4th arg in %sys4, NOT %arg4
        movzbl %al,%eax
        syscall

It's hard to tell if we come from mmap: or from the jmp sysgo in read::

read: .globl read
        movb $ __NR_read,%al; 5: jmp sysgo

In any case, I'm still struggling to understand how this translates into an exit(127) syscall.
rax is ffffffffffffffda (-38), which could indicate that a syscall returned ENOSYS:

#define ENOSYS          38      /* Invalid system call number */

Edit: Never mind, by unwinding the stack manually, I can see that we come from exit:, and it was called with the argument 127. That makes sense. But then we could come from any place that calls ERR_LAB. What I don't understand is that I tried using a custom build of UPX where I changed the error code in ERR_LAB and yet my process still crashed with a code 127. Maybe I messed up somewhere.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

Some of the relevant source is in src/stub/src/amd64-linux.elf-fold.S. Label sysgo is a tail merge of many syscalls. Most of them are part of a daisy-chained sequence of 'jmp', which makes 3-byte subsequences of the daisy chain identical, and hence more compressible without having to use a filter. For example,

write: .globl write
        mov $__NR_write,%al; 5: jmp 5f
read: .globl read
        movb $ __NR_read,%al; 5: jmp sysgo

There are also calls to exit(127) in src/stub/src/amd64-linux.elf-main.c, via err_exit(2); etc. To save space, the non-DEBUG compilation turns them all into exit(127).

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

Comparing the output of strace from a failing run

    <<snip>>
mmap(0x7fceff0b2000, 5688449, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7fceff0b2000
mprotect(0x7fceff0b2000, 5688449, PROT_READ|PROT_EXEC) = 0
mmap(0x7fceff61f000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7fceff61f000
exit(127)                               = ?

with output from a successful run

      <<snip>>
mmap(0x7fdc0380d000, 5688449, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x2f8000) = 0x7fdc0380d000
mprotect(0x7fdc0380d000, 5688449, PROT_READ|PROT_EXEC) = 0
mmap(0x7fdc03d7a000, 5023028, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x865000) = 0x7fdc03d7a000
mprotect(0x7fdc03d7a000, 5023028, PROT_READ) = 0
     <<snip>>

shows that this particular failure is somewhere between the mmap(ADDR1, 5023028,,,,) and the mprotect(ADDR1, 5023028, ), where the ADDR1 varies due to stack locaktion. The fixed end of the stack always is randomized (even with ASLR off), and mmap(0, ,,,) is allocated in descending order below the maximum variable end of the stack, again with random guard pages between consecutive mmap(0, .... The order of the blocks is the same, but the stack location varies, and the space between blocks also varies.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

I also tried with a version manually built from the devel branch and the issue is still there.
Because you report that a manual build succeeded, then I invite you to take the next step. In src/stub/src/amd64-linux.ef-main.c, change

#if 1  //{  save space

to

#if 0 //{  save space

which will save debugging time. Then the rebuild goes:

$ (pushd src/stub; make)   ## requires upx-stubtools
$ (pushd src; make)

and you should then be able to distinguish all the

                err_exit(2);
                err_exit(3);
            err_exit(4);
            err_exit(5);
                err_exit(7);
                err_exit(9);
            err_exit(18);
            err_exit(19);

in amd64-linux.elf-main.c.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

$ (pushd src/stub; make) ## requires upx-stubtools

Thanks, that's the part I was missing.

unpackExtent xi=(0xd4 0x7fb8fff41100)  xo=(0x350 0x7ffe80083c68)  f_exp=0x7fb90034b1ff  f_unf=0x0
xread x.size=0xd4  x.buf=0x7fb8fff41100  buf=0x7ffe80083ba0  count=0xc
xread done
h.sz_unc=0x350  h.sz_cpr=0xc8  h.b_method=0xe
upx_main1  .e_entry=0x2fa350  p_reloc=0x7ffe80083c60  *p_reloc=0x7fb8fff41000  PAGE_MASK=0xfffffffffffff000
do_xmap reloc=0x7fb8fefac000

auxv_up 3  0x7fb8fefac040
  33  0x7ffe800f9000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fb8fff41040
LOAD p_offset=0x0  p_vaddr=0x0  p_filesz=0x2f81f8  p_memsz=0x2f81f8  p_flags=0x4  prot=0x1

auxv_up 3  0x7fb8fefac040
  33  0x7ffe800f9000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fb8fefac040

auxv_up 5  0xe
  33  0x7ffe800f9000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fb8fefac040
  4  0x38
  5  0x3

auxv_up 4  0x38
  33  0x7ffe800f9000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fb8fefac040
  4  0x38
mmap addr=0x7fb8fefac000  mlen=0x2f81f8  offset=0x0  lo_frag=0x0  prot=0x1
unpackExtent xi=(0x40a0f4 0x7fb8fff41100)  xo=(0x2f81f8 0x7fb8fefac000)  f_exp=0x7fb90034b1ff  f_unf=0x7fb8fefaa005
xread x.size=0x40a0f4  x.buf=0x7fb8fff41100  buf=0x7ffe80083ad0  count=0xc
xread done
h.sz_unc=0x350  h.sz_cpr=0xc8  h.b_method=0xe
unpackExtent xi=(0x40a020 0x7fb8fff411d4)  xo=(0x2f7ea8 0x7fb8fefac350)  f_exp=0x7fb90034b1ff  f_unf=0x7fb8fefaa005
xread x.size=0x40a020  x.buf=0x7fb8fff411d4  buf=0x7ffe80083ad0  count=0xc
xread done
h.sz_unc=0x2f7ea8  h.sz_cpr=0x3c746  h.b_method=0xe
make_hatch 0x7ffe80083d18 0xfffffffffefac000 0xfff  0x4
hatch=0x0
Pprotect addr=0x7fb8fefac000  len=0x2f81f8  prot=0x1
LOAD p_offset=0x2f9000  p_vaddr=0x2f9000  p_filesz=0x56fe91  p_memsz=0x56fe91  p_flags=0x5  prot=0x5
mmap addr=0x7fb8ff2a5000  mlen=0x56fe91  offset=0x2f9000  lo_frag=0x0  prot=0x5
unpackExtent xi=(0x3cd8ce 0x7fb8fff7d926)  xo=(0x56fe91 0x7fb8ff2a5000)  f_exp=0x7fb90034b1ff  f_unf=0x7fb8fefaa005
xread x.size=0x3cd8ce  x.buf=0x7fb8fff7d926  buf=0x7ffe80083ad0  count=0xc
xread done
h.sz_unc=0x56fe91  h.sz_cpr=0x1cbcc7  h.b_method=0xe
j=0xffffffffffe3433b  out_len=0x0  &h=0x7ffe80083ad0
err_exit 7

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

Another failure for comparison:

unpackExtent xi=(0xd4 0x7fc9ffff9100)  xo=(0x350 0x7ffc51573d78)  f_exp=0x7fca004031ff  f_unf=0x0
xread x.size=0xd4  x.buf=0x7fc9ffff9100  buf=0x7ffc51573cb0  count=0xc
xread done
h.sz_unc=0x350  h.sz_cpr=0xc8  h.b_method=0xe
upx_main1  .e_entry=0x2fa350  p_reloc=0x7ffc51573d70  *p_reloc=0x7fc9ffff9000  PAGE_MASK=0xfffffffffffff000
do_xmap reloc=0x7fc9ff064000

auxv_up 3  0x7fc9ff064040
  33  0x7ffc515f6000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fc9ffff9040
LOAD p_offset=0x0  p_vaddr=0x0  p_filesz=0x2f81f8  p_memsz=0x2f81f8  p_flags=0x4  prot=0x1

auxv_up 3  0x7fc9ff064040
  33  0x7ffc515f6000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fc9ff064040

auxv_up 5  0xe
  33  0x7ffc515f6000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fc9ff064040
  4  0x38
  5  0x3

auxv_up 4  0x38
  33  0x7ffc515f6000
  51  0x6f0
  16  0x1f8bfbff
  6  0x1000
  17  0x64
  3  0x7fc9ff064040
  4  0x38
mmap addr=0x7fc9ff064000  mlen=0x2f81f8  offset=0x0  lo_frag=0x0  prot=0x1
unpackExtent xi=(0x40a0f4 0x7fc9ffff9100)  xo=(0x2f81f8 0x7fc9ff064000)  f_exp=0x7fca004031ff  f_unf=0x7fc9ff062005
xread x.size=0x40a0f4  x.buf=0x7fc9ffff9100  buf=0x7ffc51573be0  count=0xc
xread done
h.sz_unc=0x350  h.sz_cpr=0xc8  h.b_method=0xe
unpackExtent xi=(0x40a020 0x7fc9ffff91d4)  xo=(0x2f7ea8 0x7fc9ff064350)  f_exp=0x7fca004031ff  f_unf=0x7fc9ff062005
xread x.size=0x40a020  x.buf=0x7fc9ffff91d4  buf=0x7ffc51573be0  count=0xc
xread done
h.sz_unc=0x2f7ea8  h.sz_cpr=0x3c746  h.b_method=0xe
j=0xfffffffffffc38bc  out_len=0x0  &h=0x7ffc51573be0
err_exit 7

j is always negative, but the exact value changes from one run to another. I'm trying to understand where that value comes from. I'm guessing from this bit in amd64-expand.S:

eof:
        pop %rax  // MATCH_53 dst_orig
        sub %rax,%rdi  // dst -= original dst
        pop %rax  // MATCH_52 src_EOF
        pop %rcx  // MATCH_51 &dstlen
        movl %edi,(%rcx)  // actual length used at dst  XXX: 4GB
        sub %rsi,%rax  // src -= eof;  // return 0: good; else: bad
        ret

But tracing those values back is going to be tough 😅

edit: Actually the third run returned j=0xffffffffffe3433b (like the first run) and the fourth run returned j=0xfffffffffffc38bc (like the second run). So the value of j changes but it's not completely random.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

At eof: the de-compression for the current [PT_]LOAD has finished. The code at eof: is checking, "Did the de-compressor finish at the right place?" Before starting the de-compression of the current PT_LOAD, the expected EndOfFile (end of compressed input) was calculated as the sum of the starting address of compressed data plus the known length of compressed data. This value src_EOF is saved on the stack, with code tracking tag MATCH_52 to help you find the place in the source which pushed it. Register %rsi has the actual EOF value when the de-compressor finished. If the subtraction does not produce 0, then that's an error. [We also know the expected length of output from the de-compressor, and the actual length that was produced; but the code does not check those. See MATCH_51.]
So one thing to try is to be sure that the expected input EOF never changes while the de-compressor is running. Find the PUSH of MATCH_52, place a hardware watch point on that stack location (4 or 8 bytes) after the PUSH, and hope for no changes until eof:. Similarly, the expected length of de-compressed output at(%rcx) should not change, and also should match the actual length of de-compressed output (but is not checked here.) If expectations are not met, then the lzma de-compressor has a bug, or the input to de-compression was clobbered.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024
h.sz_unc=0x2f7ea8  h.sz_cpr=0x3c746  h.b_method=0xe
j=0xfffffffffffc38bc  out_len=0x0  &h=0x7ffc51573be0

The value of j is -0x3c744 which is only 2 away from the size of the compressed input h.sz_cpr = 0x3c746. The out_len is zero. So the first 2 bytes consumed by the de-compressor were incorrect header values, and the de-compressor quit without producing any de-compressed bytes at all.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

Find the PUSH of MATCH_52, place a hardware watch point on that stack location (4 or 8 bytes) after the PUSH, and hope for no changes until eof:

Unfortunately, since it takes ~1000 runs to reproduce the issue, it's a bit difficult to do 🤔

The value of j is -0x3c744 which is only 2 away from the size of the compressed input h.sz_cpr = 0x3c746. The out_len is zero. So the first 2 bytes consumed by the de-compressor were incorrect header values, and the de-compressor quit without producing any de-compressed bytes at all.

Interesting. Thanks to the debug output I should be able to locate the buffer in the coredump. Then I should be able to check what the header looks like. I'll give it a try tomorrow.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

I'm puzzled, the content of xi->buf looks identical in the working and crashing cases (I initially wrote that they were offset by 12 bytes, but that's because I didn't realize xread was updating the xi->buf pointer).

0x1a 0x3 0x0 0x79 0x83 0xbf 0xb1 0x1f 0xd1 0x75 0x5f 0xa3 0xd1 0x53 0x53 0x90 0xc8 0x98 0x37 0x21 0x58 0x5 0xc1 0x91 0x16 0xa7 0x68 0x46 0x6f 0x40 0x8a 0xd4 0x42 0x2 0x2e 0x1b 0x10 0x46 0xe3 0x14 0xb6 0xb2 0xed 0xaf 0x7e 0x2a 0xa2 0x99 0x97 0x1e 

So it looks like the LZMA extraction method fails despite the content being identical. Given that it fails only with ASLR, I assume there's some subtle alignment-dependent error in the assembly code, but it's going to be tricky to pinpoint...

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

It may be helpful to print the layout of the address space (the contents of /proc/self/maps) at the time of the failure. This might give some clue about adjacency or overlap. A failure rate of 1 in 1000 also suggests a cache coherency issue, but x86* is supposed to have automatic cache coherence between data cache and instruction cache, except for actual filling of the instruction decode prefetch buffer, which is 8 bytes wide (or 16 bytes for AVX2 ?) What is the actual hardware? For instance:

$ </proc/cpuinfo sed 5q
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

On my machine:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 154
model name      : 12th Gen Intel(R) Core(TM) i9-12900H

But we first detected the failures in the CI, which almost certainly has a different CPU. I'll try to get that information later if that's relevant. For now I'm going to capture the content of /proc/self/maps.

A failure rate of 1 in 1000 also suggests a cache coherency issue

Isn't the code single-threaded? Can there be cache coherency issues in a single-threaded context?

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

The de-compressor is single-threaded. One of Intel's engineering feats throughout the entire history of x86* is automatic cache coherency among all processors on a die.
Several years ago my apps ran on a fleet of 10,000 machines with 90% CPU utilization 22 hrs/day. (The other 2 hours were refreshes of the local copy of the database). About 2 machines per year failed: the datapath between RAM and CPU randomly dropped bit 2 (positional value 4). We could tell because the result of strcpy was not equal to the input in several places of the dumps.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

Here are the maps for 3 different failures:

7f88fef7c000-7f88fef7e000 r-xp 00000000 00:00 0
7f88fef7e000-7f88ff277000 r--p 00000000 00:00 0
7f88ff277000-7f88ff7e7000 rw-p 00000000 00:00 0
7f88ff7e7000-7f88fff13000 ---p 00000000 00:00 0
7f88fff13000-7f890031f000 r-xp 00000000 08:10 298702                     /home/kgosse/git/dd-trace-dotnet/shared/bin/monitoring-home/linux-x64/dd-dotnet-debug
7ffec0160000-7ffec0181000 rw-p 00000000 00:00 0                          [stack]
7ffec01d4000-7ffec01d8000 r--p 00000000 00:00 0                          [vvar]
7ffec01d8000-7ffec01da000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
7f21fec8e000-7f21fec90000 r-xp 00000000 00:00 0
7f21fec90000-7f21fef89000 r--p 00000000 00:00 0
7f21fef89000-7f21ff4f9000 r-xp 00000000 00:00 0
7f21ff4f9000-7f21ff9c0000 r--p 00000000 00:00 0
7f21ff9c0000-7f21ff9c1000 ---p 00000000 00:00 0
7f21ff9c1000-7f21ffc04000 rw-p 00000000 00:00 0
7f21ffc04000-7f21ffc25000 ---p 00000000 00:00 0
7f21ffc25000-7f2200031000 r-xp 00000000 08:10 298702                     /home/kgosse/git/dd-trace-dotnet/shared/bin/monitoring-home/linux-x64/dd-dotnet-debug
7ffea8be3000-7ffea8c04000 rw-p 00000000 00:00 0                          [stack]
7ffea8d7e000-7ffea8d82000 r--p 00000000 00:00 0                          [vvar]
7ffea8d82000-7ffea8d84000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
7fa7fef52000-7fa7fef54000 r-xp 00000000 00:00 0
7fa7fef54000-7fa7ff24d000 r--p 00000000 00:00 0
7fa7ff24d000-7fa7ff7bd000 rw-p 00000000 00:00 0
7fa7ff7bd000-7fa7ffee9000 ---p 00000000 00:00 0
7fa7ffee9000-7fa8002f5000 r-xp 00000000 08:10 298702                     /home/kgosse/git/dd-trace-dotnet/shared/bin/monitoring-home/linux-x64/dd-dotnet-debug
7ffc1d77f000-7ffc1d7a0000 rw-p 00000000 00:00 0                          [stack]
7ffc1d7e6000-7ffc1d7ea000 r--p 00000000 00:00 0                          [vvar]
7ffc1d7ea000-7ffc1d7ec000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

I have looked on the Actions page https://github.com/MichalStrehovsky/PublishAotCompressed/actions/runs/6194793251 but cannot identify a file dd-dotnet-debug. Please give a URL, or email it to me as an attachment. It should be quick to find my address.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

@jreiser The crashing app is not part of that repository. I sent you a mail to the address I found in the source code, with the file attached.

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

@kevingosse Email and attachment received; thank you! Looking...

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

Do you have a simple way of regenerating the lzma_d_cs.S and lzma_d_cf.S files? I assume they come from upx-lzma-sdk. It looks like to me that the upx-lzma-sdk is stateless, and I consider the "same code with same data produces different results" hypothesis as a last resort. Therefore, my favorite theory right now is that the bit in lzma_d.S that initializes CLzmaDecoderState somehow ends up producing a different value (there are some parts that are apparently dependent on alignment, so maybe? andq $~0<<6,%rbx // 64-byte align).
The simplest way to verify that theory would be to dump the content of the CLzmaDecoderState struct at the beginning of LzmaDecode, but it's hard to do without recompiling upx-lzma-sdk (I mean, in theory I could add some assembly code in lzma_d.S to dump the value, but I hope there's a better way).

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

The source begins at src/stub/src/c/lzma_d_c.c and winds through vendor/lzma-src. See also src/stub/src/c/Makevars.lzma. See also src/stub/src/arch/*/.../lzma_d_c[fs].d. I once explained this in a Comment to some Issue here on Github, but I cannot find it now. (No convenient way to download all Issues from GitHub.) The comment had text something like "not a shell script" and "thinking is required".

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

I haven't seen any difference in the state. However, I've been able to pinpoint the failure to RC_TEST:

#define RC_TEST { if (Buffer == BufferLim) return LZMA_RESULT_INPUT_OVERRUN; }

I'm trying to narrow it down, but I was very surprised to see that inSize varies wildly from one execution to the other. Is that expected?

from publishaotcompressed.

jreiser avatar jreiser commented on August 18, 2024

inSize is the compressed size of each [PT_LOAD]. For one of the cases recently, the sizes are

  unc     cpr
2f7ea8   3c746
56fe91  1cbcc7
4c6978  19f86d
241c98   6236f

where unc is the uncompressed size of the PT_LOAD (can be seen in readelf --segments dd-dotnet-debug.uncpr), and cpr is the compressed size. The first 0x350 bytes, which are part of the first PT_LOAD, are separate because they contain the Elf64_Ehdr and _Phdrs, which are used to manage the re-construction of the rest at run time.

from publishaotcompressed.

kevingosse avatar kevingosse commented on August 18, 2024

inSize is the compressed size of each [PT_LOAD]

So yeah it shouldn't change from one execution to another.

I'm currently at ~10k executions without a crash, I'm fairly confident this was the issue. I'll let it run a few hours to be 100% sure.

Edit: I did 60k more executions and no crash.

from publishaotcompressed.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.