Code Monkey home page Code Monkey logo

xorsum's Introduction

xorsum

XOR symbol at upper-left corner, plus-sign at bottom-right corner

Algorithm

It uses the XOR-cipher to compute a checksum digest. Basically, it splits the data in chunks whose length is the same as digest size (padding with 0), and XORs all chunks between each other into a new chunk that's returned as output.

This isn't a good hash function. It lacks the Avalanche Effect, because flipping 1 input bit flips 1 output bit.

Program

The raw digest size is 8Bytes by default, but can be set to any valid usize value with the --length option. The printed size is 16B, because of hexadecimal expansion.

Why 8B?

That was a somewhat arbitrary decision. I've choosen 8, because it's the geometric-mean of 4 and 16, CRC32's and MD5's digest-sizes, respectively. 8B is easier to implement than 16B, when a constant size is desired, because it fits in uint64_t.

The initialization-vector is hardcoded to be 0.

Name and behavior are heavily influenced by cksum, md5sum, and b3sum.

Usage

To install latest release from crates.io registry:

cargo install xorsum

This isn't guaranteed to be the latest version, but it will never throw compilation errors.

To install latest dev crate from GH:

cargo install --git https://github.com/Rudxain/xorsum.git

This is the most recent version. Compilation isn't guaranteed. Semver may be broken. And --help may not reflect actual program behavior.

To get already-compiled non-dev executables, go to GH releases. *.elfs will only be compatible with GNU-Linux x64. *.exes will only be compatible with Windows x64. These aren't setup/installer programs, these are the same executables cargo would install, so you should run them from a terminal CLI, not click them.

For a Llamalab Automate implementation, visit XOR hasher.

Argument "syntax":

xorsum [OPTIONS] [FILE]...

For ℹinfo about options, run:

xorsum --help

Examples

Regular use

# let's create an empty file named "a"
echo -n > a
xorsum --length 4 a
# output will be "00000000 a" (without quotes)

# write "aaaa" to this file and rehash it
echo -n aaaa > a
xorsum a -l 4
#out: "61616161 a"
# because "61" is the hex value of the UTF-8 char "a"

# same result when using stdin
echo -n aaaa | xorsum -l4
#61616161 -

xorsum a --brief #`-l 8` is implicit
#6161616100000000

Note: echo -n has different behavior depending on OS and binary version, it might include line endings like \n (LF) or \r\n (CR-LF). The outputs shown in the example are the (usually desired) result of NOT including an EOL.

PowerShell will ignore -n because echo is an alias of Write-Output and therefore can't recognize -n. Write-Host -NoNewline can't be piped nor redirected, so it's not a good alternative.

Emulating 🏔AE

--length doesn't truncate the output:

xorsum some_big_file -bl 3 #"00ff55"
xorsum some_big_file -bl 2 #"69aa" NOT "00ff"

As you can see, -l can return very different hashes from the same input. This property can be exploited to emulate the Avalanche Effect (to some extent).

Finding corrupted bytes

If you have 2 copies of a file and 1 is corrupted, you can attempt to "🔺️triangulate" the index of a corrupted byte, without manually searching the entire file. This is useful when dealing with big raw-binary files

xorsum a b
#6c741b7863326b2c a
#6c74187863326b2c b
# the 0-based index is 2 when using `-l 8`
# mathematically, i mod 8 = 2

xorsum a b -l 3
#3d5a0a a
#3d590a b
# i mod 3 = 1

xorsum a b -l 2
#7f12 a
#7c12 b
# i mod 2 = 0

# you can repeat this process with different `-l` values, to solve it easier.
# IIRC, using primes gives you more info about the index

There are programs (like diff) that compare bytes for you, and are more efficient and user-friendly. But if you are into math puzzles, this is a good way to pass the time by solving systems of linear modular equations 🤓.

Personal thoughts

I was surprised I couldn't find any implementation of a checksum algorithm completely based onXOR, so I posted this for the sake of completeness, and because I'm learning Rust. I also made this for low-power devices, despite using the std lib, and only compiling to x64 (this will probably change in the future, so don't worry).

⚠DISCLAIMER

  1. DO NOT SHARE HASHES OF PRIVATE DATA. You might be leaking sensitive information. Small hashes and bigger files tend to be safer, because the sbox will (probably) have enough bytes to "mix well".
  2. This program is not production-ready. The version should be 0.x.y to reflect the incompleteness of the code. I'm sorry for the inconvenience and potential confusion.

xorsum's People

Contributors

measter avatar rudxain avatar

Watchers

 avatar

Forkers

measter

xorsum's Issues

Support "anti" options as positional

The cmd

xorsum -a lo -A UP

Should output lowercase hex digest for the file "lo" and uppercase hex for "UP". Instead of the current behavior which uses whatever last capitalization is specified.

This gives more freedom to users, and makes the program "more correct"

Allow arbitrary digest sizes

Like b3sum does, this should allow users to set the digest size to any number of bytes in the interval [0, 256]

Better error handling

  • Avoid crash if file doesn't exist, and print error message to stderr
  • Avoid crash if path is a directory
  • Avoid crash if there's no permission to read a file
  • Handle possible std{in, out, err} stream errors
  • Avoid panic if --length <n> cannot be parsed to usize (handled by clap)
  • Return meaningful exit codes

Merge `case` and `code` groups

I think there should be a single --code parameter that's internally an enum that only accepts hex-UP, hex-lo, hex-UP-alt, hex-lo-alt, raw, and possibly b64 (in the future). Maybe some "CSV" modes that add whitespace or hyphens between groups of nibbles.

This is more future-proof, and avoids invalid or useless OPTION combinations. It may also simplify the experience for users.

This would definitely require a new major semver bump

Add `--check` support

This is essential (not fundamental) functionality for checksum programs. It should be capable of verifying and comparing hash values from a file(s)

Multithreading

This algorithm is infinitely parallelizable, we should take advantage of that

Init `sbox` within `xor_hasher`

That vector is always initialized to all 0s before being passed to that fn, so it makes sense to create it there and return it after mutating, instead of passing a mutable reference and manually resetting with fill

Fix duped algorithm

There are 3 instances of the XOR cipher in the code. This redundancy is unacceptable

Add a "quirky" mode

Activating this mode would change the behavior to be inconsistent with core-utils, but with the bonus of having #13 and access to #15.

It should be activated by using the --quirky flag, no shorthand allowed (to enforce being explicit)

Support custom IV

Currently, the default Initialization Vector is 0, but users should be capable of changing it

Better OS interoperability

Do the same thing as b3sum to ensure checkfiles can be shared between DOS and Unix users. This is necessary before implementing #6

Inconsistent `stdout` locks

The program doesn't lock stdout when in raw mode, but it does lock in hex mode. For the sake of consistency, it should be the same regardless of the mode.

This isn't important now, but it will be important when implementing #4

🐢🐌Too slow

I did some benchmarks, and this is CONSIDERABLY SLOWER than MD5 and Blake2. This is unacceptable. How come such a simple algorithm is slower than a single-threaded (I guess, didn't check) MD5?

Add 🌈easter 🥚eggs

  • The first one to add would be
xorsum --hell
# out: "I can't go to hell. I'm all out of vacation days."

As an Undertale reference. Because typing --hell is a common typo when requesting help from a CLI program.

  • Another related one would be:
xorsum --quirky -l 6 hell
# "666666666666 hell"

Regardless if the file exists or not, and regardless of its contents. The length of sixes corresponds to the specified value of -l. To get the ACTUAL hash, the user must be explicit about it:

xorsum --quirky -l 6 ./hell
  • An old friend:
xorsum --hello
# "world!"
  • Rickroll lol:
xorsum --rick
# "We're no strangers to love..."
  • A Spamton reference:
xorsum --heaven
# "[Heaven], are you WATCHING?"
  • My Hero Academia:
xorsum --quirky --stress
# "Doofenshmirtz Glowup moment"

Because of the meme about re-destro.

  • Umm... what?
xorsum --quirky --quirky
# "Very quirky indeed..."
  • This one would cause an infinite loop:
xorsum --quirky xorsum

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.