Code Monkey home page Code Monkey logo

Comments (4)

kyz avatar kyz commented on May 28, 2024

Looking back at LZXFMT.DOC from CABSDK.EXE, you're right, it does include extra == 0 logic in the pseudocode, exactly as seen on the page you linked. I must have misread it. It's also hiding in the LZX DELTA specification logic, where the logic would run verbatim_bits = readbits(0); aligned_bits = 0;

And yes, there's a discrepancy with the java implementation, which has the logic you see in lzxd.c:

    int extra = extra_bits[position_slot];
    match_offset = position_base[position_slot];
    if      (extra > 3)  match_offset += readbits(extra - 3) << 3) + aligned_tree.readsym();
    else if (extra == 3) match_offset += aligned_tree.readsym();
    else if (extra != 0) match_offset += readbits(extra);
    else                 match_offset = 1;

I have thousands of cab files with extra == 0 cases. cabextract, EXTRACT.EXE (from 1997) and EXPAND.EXE (from the 2000s) all agree on the decoded result. The smallest one is 305 bytes and I've printed it in base64 below. If I change lzxd.c to follow the specification, it starts to get the result wrong.

$ base64 < 0625.cab 
TVNDRgAAAAAxAQAAAAAAACwAAAAAAAAAAwEBAAEAAAAAAAAASQAAAAEAAxX+AgAAAAAAAAAA6yJg
pwAAYXJyb3dfcmwuY3VyAPRpnfLgAP4CW4CAjQAg6C+NlADBAAADAAA0NAC2DoTSCjtLYidxLIHm
/yd1Iv9iT8QEXLoAIAAAZgDAjGEKCt9fOIXvSta7Fp3AUuICkRAKQpAQCuiY/U39/f/9gGgAAAAA
AAAMjICDBu4GEfv94vczBnjmSNJfZX1w+llcmmvGkPP/+LZ928xgk3+1jbc89tyWC07HWE9tZ+jf
uV2n6FtW7WQkrGciPVHl/IxMDt/omBEgIwAAAAAAAAIA/5EAoGYTN/eyUZSI1IJoddEsEqqlNLTO
WRfA5RFRJcvSQoqniAJRjqSUsII=
$ wine extract.exe /e 0625.cab 
Microsoft (R) Diamond Extraction Tool - Version (32) 1.00.0601 (03/18/97)
Copyright (c) Microsoft Corp 1994-1997. All rights reserved.

 Cabinet 0625.cab

Extracting arrow_rl.cur
$ md5sum arrow_rl.cur 
c8f980bf103d1c781ed71b25e46e7219  arrow_rl.cur
$ WINEPREFIX=$HOME/.wine64 WINEARCH=win64 wine expand.exe 0625.cab file.out
Microsoft (R) File Expansion Utility  Version 6.1.7600.16385
Copyright (c) Microsoft Corporation. All rights reserved.

Adding Z:\tmp\file.out to Extraction Queue

Expanding Files ....

Expanding Files Complete ...
$ md5sum file.out 
c8f980bf103d1c781ed71b25e46e7219  file.out

$ cabextract -t 0625.cab 
Testing cabinet: 0625.cab
  arrow_rl.cur  OK                             c8f980bf103d1c781ed71b25e46e7219

All done, no errors.
$ ./cabextract-to-spec -t 0625.cab
Testing cabinet: 0625.cab
  arrow_rl.cur  OK                             2188d750947c1fbb24dba4080dd7b51d

from libmspack.

magcius avatar magcius commented on May 28, 2024

That is bizarre to me, because it implies they're trying to code a 1 match_offset but are using a position_slot other than 3, which would already have 0 extra bits. But it seems like that's the correct behavior regardless?

from libmspack.

kyz avatar kyz commented on May 28, 2024

Thanks for your recommendation to look at it. I rewrote the decoding from scratch in 202ab41 which does correctly decode files, including the example above. I'm not entirely sure where the discrepancy lies, but there is no longer a special case for extra == 0, and it does work. Thanks!

from libmspack.

magcius avatar magcius commented on May 28, 2024

Nice, that's very similar to what I do now, except I handle the match_offset == 3 case explicitly in a separate test, which means I never have to test for extra > 0. It also means I can fold the -2 into the position_base_table, and omit the first four elements as well.

https://github.com/magcius/noclip.website/blob/99bdbc8742d44772eb25a4269aa3fa25e4ee2a76/src/Common/Compression/LZX.ts#L314-L346

from libmspack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.