Code Monkey home page Code Monkey logo

ddgst's Introduction

dd86k

Hi there, I'm dd!

I'm interested in software engineering, system administration, telecommunications, and technical documentation.

My main programming language is the D programming language. I also know a good amount of C, C#, PHP, and JavaScript.

Available on: GitHub, Gitlab, and Codeberg.

A more complete portfolio is available online.

Active Projects

Project Links Description
alicedbg GitHub, GitLab, Codeberg Debugger toolkit and shell
aliceserver GitHub, GitLab, Codeberg Debugger server implementing DAP
ddhx GitHub, GitLab Hex viewer
ddgst GitHub, GitLab Hashing multithreaded utility
binco GitHub Binary-text encoder/decoder
sha3-d GitHub Keccak-f[1600,24] (SHA-3) implementation
blake2-d GitHub BLAKE2 implementation (s and b variants)
lateterm GitHub, GitLab WordPress "DOS" theme
npp_vs2012 GitHub Notepad++ "VS2012" theme

ddgst's People

Contributors

dd86k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ddgst's Issues

Optimal block read size

Under a few operating systems, there are various ways to obtain the optimal block size for file read and write operations.

For Posix systems using fstat, it is possible to obtain the optimal block size for file operations. The stat field in question is st_blksize.

For Windows, a StackOverflow answer suggests using GetDiskFreeSpace and multiplying the lpBytesPerSector and lpSectorsPerCluster from the resulting structure. This should be the filesystem's cluster size.

Typical block sizes should be around 64 KiB for Linux systems and 4 KiB for Windows.

Of course, benchmarks should be made available.

Check list items prefixed with asterisk (`*`) cannot be read

Some check files have an asterisk character (*) prefixed to the file entries. This seems to be an oddity with some utilities adding such character to designate the file is binary, and can safely be removed (it does not serve as a globber).

Another issue is the comment. In this case, it's solely for humans to be read, and I cannot do anything about this.

simplewall-3.6.7.sha256:

1e2078cd7b9934534787f04b3e4611832ddeec0853f1d50b6b454cd5dd770587 *simplewall-3.6.7-bin.zip
864418c6a03719bf98715fd6a7a91013e55de79951dada12e918481913d27b22 *simplewall-3.6.7-setup.exe
#32-bit
ab150e6555b6fdea99d10b2ef9e6d75351c170b9efee61cfbb48d65128b6618a *simplewall.exe
#64-bit
7b320a968557541d0bd6ad06aebedddb004a6f74cb0080f62ac072ff8dd73afd *simplewall.exe
#arm64
73183255fd65caac0c8579a577430ae8012146f70f643ad67cfe50bcc814ab53 *simplewall.exe

Trimmed filenames on check

error: 'mobian-installer-pinephone-phosh-20210516.img.g': Cannot open file `mobian-installer-pinephone-phosh-20210516.img.g' in mode `rb' (No such file or directory)

Improve CLI actions

I've grown old of ddh md5 file. There's nothing wrong with that, just like git, md5 would indicate an action, but md5 isn't an action, ddh is the utility performing the hashing action. Which may even confuse me at times.

Another factor other than confusion is flexibility. Currently, this limits only one selected hash per invocation (though optional) and forces the selected hash to be at the start of the command-line, making invocations like ddh file md5 further confusing.

That's why doing the obvious syntax as ddh --md5 file seem like a more interesting (for 2.0?) to me, though losing the existing syntax may trouble some (or me, who knows).

Pros:

  • Flexibility - May allow for multiple hashes to be used per invocation.
  • Clarity - Clear notation that hash is a type of option, not action.
  • Parsing - CLI will be free of forcing the hash type as the first argument.

Cons:

  • Losing syntax - Losing the current syntax may a negative thing for some.

--duplicates: Find duplicate files

I'm aware there are similar tools that do this, but I'd still like this for this utility.

Usage: --duplicates FOLDER (May assume "." by default?)

On find, both full paths are printed:

* Duplicate found
1: C:\abc\file1.bin
2: C:\Users\dd\file2.bin

Consider supporting SipHash

A few pitched in the idea to support SipHash.

As interesting as SipHash is, it is not designed as a general-purpose hashing function, since its goal is to be used as a pseudorandom/MAC function:

As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). But SipHash is not a hash in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. SipHash should therefore always be used with a secret key in order to be secure.

Meaning, even implementing this using a keyless-default approach renders the implementation and security pointless. Because ddh is supposed to be closest to sha512sum(1), openssl dgst, b3sum, and the like.

Consider --parallel

With std.parallelism, might be a very interesting option, but as an option, since I may also use std.parallelism (or other) for BLAKE2sp/BLAKE2bp, which has yet to be implemented (so using b2 and --parallel might slow down everything).

Murmurhash3 empty sums are not printed

Issue

Murmurhash3 finish() function returns a variable-length ubyte[] digest which simply print an empty string if length of digest is 0. Only std.digest.murmurhash does this.

Result: "" on empty files with default seed.

Expected: "00000000" on empty files with default seed.

Solution

Either I copy whatever is in result to a static buffer, then call toHexString or do custom formatting. This will affect Ddh.toHex and Ddh.toBase64.

User-Supplied Key

This one should be easy. Depending on the initiated object type (converting the instance to BLAKE2s256Digest, BLAKE2b512Digest, or the future BLAKE3_256Digest depending on the current Hash type), a user-supplied key can be passed using --key (similar to b2sum) to the DDH instance. All definitions have a key function, and a switch-case-ish function can do fine.

List: Invalid column order

$ ddh list
Alias         Name          Tag
CRC-32        crc32         CRC32
CRC-64-ISO    crc64iso      CRC64ISO
CRC-64-ECMA   crc64ecma     CRC64ECMA
MD5-128       md5           MD5
RIPEMD-160    ripemd160     RIPEMD160
SHA-1-160     sha1          SHA1
SHA-2-224     sha224        SHA224
SHA-2-256     sha256        SHA256
SHA-2-384     sha384        SHA384
SHA-2-512     sha512        SHA512
SHA-3-224     sha3-224      SHA3_224
SHA-3-256     sha3-256      SHA3_256
SHA-3-384     sha3-384      SHA3_384
SHA-3-512     sha3-512      SHA3_512
SHAKE-128     shake128      SHAKE128
SHAKE-256     shake256      SHAKE256

Should be Name, Alias, then Tag.

Additional aliases for the same hash type

Currently, to refer to ripemd-160, the option for it is --ripemd160. While it is a little long, OpenSSL does have this set to --rmd160.

Another example is b2sum, which can be made into an --b2b512 alias instead of --blake2b512.

Proposition:

  • Add --rmd160 for --ripemd160.
  • Add --b2s256 for --blake2s256.
  • Add --b2b512 for --blake2b512.
  • Add --mm3a for --murmur3a, maybe.
  • Add --mm3c for --murmur3c, maybe.
  • Add --mm3f for --murmur3f, maybe.

Benchmark mode

I believe a benchmark mode would be highly beneficial for general knowledge.

Testing the speeds of various hash/checksum implementations on different processors.

Proposal:

  • CLI opt of --benchmark
  • Use OOP API to avoid inflating binary size and avoid template functions
    • OOP and Template have the same benchmark results, so I'm not worried
    • Don't forget scoped allocation
  • Go through entire hash list
    • At least this tests the most common configurations

Threaded read-hash structure

  • Waiting on BLAKE2sp and BLAKE2bp variants
  • Waiting on BLAKE3
  • Others: Simple message queue

(read+hashing at the same time, messaging system?)

Comparisons of checksums do not work

dd@craptop:/media/dd/DATA/USER/DESKTOP/GAMES/ROMs/PS2$ ddgst --crc32 -A 21cf5560 Champions\ of\ Norrath\ \(USA\).iso 
warning: Entry 'Champions of Norrath (USA).iso' is different
dd@craptop:/media/dd/DATA/USER/DESKTOP/GAMES/ROMs/PS2$ ddgst --crc32 Champions\ of\ Norrath\ \(USA\).iso 
21cf5560  Champions of Norrath (USA).iso

Making BSD/GNU tag names mainlined

Currently the --tag switch outputs OpenSSL compatible tag names like SHA2-512, but GNU coreutils and BSD utilities will use SHA512.

At least in v2.0.1-3-g93c8ed3, both styles in check files mode is supported, but otherwise only the OpenSSL style is used on output.

Though now the question is, do I only add --tag2 for BSD style or make --tag for BSD style and move OpenSSL to --tag2...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.