Code Monkey home page Code Monkey logo

tlsh's Introduction

Workflow Go Report Card Go Reference

TLSH

Trend Micro Locality Sensitive Hash lib in Golang

Based on https://github.com/trendmicro/tlsh

See paper here: https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf

TLSH is a fuzzy matching library. Given a byte stream with a minimum length of 256 bytes, TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value.

The computed hash is 35 bytes long (output as 70 hexidecimal charactes). The first 3 bytes are used to capture the information about the file as a whole (length, ...), while the last 32 bytes are used to capture information about incremental parts of the file.

tlsh's People

Contributors

glaslos avatar kicdu avatar melehin avatar python333 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tlsh's Issues

FuzzyReader

Just a thought, but shouldn't FuzzyReader rather be a composite of io.Reader and io.ByteReader?

Hashing Options

TLSH_BUCKETS: determines using 128 or 256 buckets, more is better
TLSH_CHECKSUM_1B: determines checksum length, longer means less collision

benchcmp bench_1.8.3.test bench_1.9.test

Not sure where the extra allocation in BenchmarkHash-4 is coming from.
benchcmp bench_1.8.3.test bench_1.9.test

benchmark                     old ns/op     new ns/op     delta
BenchmarkPearson-4            0.32          0.29          -9.38%
BenchmarkFillBuckets-4        7545          7416          -1.71%
BenchmarkQuartilePoints-4     2128          1792          -15.79%
BenchmarkHash-4               12653         12651         -0.02%
BenchmarkModDiff-4            0.29          0.29          +0.00%
BenchmarkDigestDistance-4     36.4          36.1          -0.82%
BenchmarkDiffTotal-4          44.7          44.3          -0.89%

benchmark                     old allocs     new allocs     delta
BenchmarkPearson-4            0              0              +0.00%
BenchmarkFillBuckets-4        3              3              +0.00%
BenchmarkQuartilePoints-4     0              0              +0.00%
BenchmarkHash-4               7              8              +14.29%
BenchmarkModDiff-4            0              0              +0.00%
BenchmarkDigestDistance-4     0              0              +0.00%
BenchmarkDiffTotal-4          0              0              +0.00%

benchmark                     old bytes     new bytes     delta
BenchmarkPearson-4            0             0             +0.00%
BenchmarkFillBuckets-4        4197          4197          +0.00%
BenchmarkQuartilePoints-4     0             0             +0.00%
BenchmarkHash-4               4317          4376          +1.37%
BenchmarkModDiff-4            0             0             +0.00%
BenchmarkDigestDistance-4     0             0             +0.00%
BenchmarkDiffTotal-4          0             0             +0.00%

Add documentation

  • Comparison with C implementation
  • How to use the package
  • Some Performance numbers
  • Comparison with SSDEEP

Refactor the API to make it play nicer with other hashes

The current API looks like it takes a reader and reads the entire file by itself. This is not efficient when compared to other hashes in Go which offer a Writer interface. The advantage with a Writer interface is that the caller can supply repeated buffers to multiple hashes at the same time thereby allowing for a single read of the file, and fanning out to multiple hashes. This is most efficient for slow readers (e.g. zip file members etc) which repeated reading of the same file is too expensive.

Please consider to refactor the API to make it more similar to existing hashes to comply with the standard Hash interface:

https://pkg.go.dev/hash#Hash

I am not that familiar with this hash or if this is even possible for this specific algorithm so please feel free to dismiss this FR :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.