Code Monkey home page Code Monkey logo

ssdeep's Introduction

example workflow Go Report Card Go Reference

SSDEEP

Golang implementation based on the paper and implementation by Jesse Kornblum.

See the example in the app directory for the usage.

Tools

For CPU profiling: apt install graphviz

For banchmark comparison go install golang.org/x/perf/cmd/benchstat@latest

ssdeep's People

Contributors

davidt99 avatar glaslos avatar kung-foo avatar neilpa avatar wanglei-coder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssdeep's Issues

Verify Distance Calculation

Verify that the distance function is correct - looking at the code, it seems that the implementation is incomplete.

panic: runtime error: integer divide by zero

Hi,

I'm trying to compute the hash of a very small file (size 28) and I get this panic error:
Note that this happens only if I force the execution with --force

panic: runtime error: integer divide by zero

github.com/glaslos/ssdeep.(*ssdeepState).processByte(0xc4202f7798, 0x63)
/../src/github.com/glaslos/ssdeep/ssdeep.go:107 +0x2ee
github.com/glaslos/ssdeep.(*ssdeepState).process(0xc4202f7798, 0xc4202f7848)
/../src/github.com/glaslos/ssdeep/ssdeep.go:132 +0xbf
github.com/glaslos/ssdeep.FuzzyReader(0x6fc620, 0xc42040cc20, 0x1c, 0x200, 0xc42024bce0, 0x1c, 0xc42024bc80)
/../src/github.com/glaslos/ssdeep/ssdeep.go:154 +0x2e4
....

It looks like a 0 check is missing
106 rh := int(state.rollingState.rollSum())
107 if rh%state.blockSize == (state.blockSize - 1) {

Unclear error 'Apples != Grapes' for HashDistance() for file length of 1344 bytes and file length in range from 193 to 384 bytes

STEPS

  1. Create file length of 1344 bytes
  2. Create file length of from 193 to 384 bytes.
  3. Try to calculate HashDistance() of this files:
package main
import (
	"github.com/glaslos/ssdeep"
	"fmt"
)
func main() {
	deepMaingo := ssdeep.NewSSDEEP()
	deepMaingoStr := deepMaingo .Fuzzy("1344text.txt")
	fmt.Printf("'%s'\n", deepMaingoStr)
	deep192 := ssdeep.NewSSDEEP()
	deep192Str := deep192.Fuzzy("193.txt")
	fmt.Printf("'%s'\n", deep192Str )
	distance, err := ssdeep.HashDistance(deepMaingoStr, deep192Str )
	fmt.Printf("Distance: '%d' Error: '%s'\n", distance, err)
}

ACTUAL RESULT

block size: 24, file size: 1344, n/bs: 56
'24:tT0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T04:tT0T0T0T0T0T0T0T0T0T0T0T0T0T0T0Y,"1344text.txt"'
block size: 6, file size: 193, n/bs: 32
'6:OWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGP:MMMMMMMMMMMMULrOvwOvwOO,"193.txt"'
Distance: '0' Error: 'Apples != Grapes'

HashDistance() with hash of file 192 bytes length gives unclear error 'Apples != Grapes'

STEPS

  1. Create file with 192 bytes length:
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
  1. Try to HashDistance() with some random file:
package main
import (
	"github.com/glaslos/ssdeep"
	"fmt"
)
func main() {
	deepMaingo := ssdeep.NewSSDEEP()
	deepMaingoStr := deepMaingo.Fuzzy("main.go")
	fmt.Printf("deepMaingoStr: '%s'\n", deepMaingoStr)
	deep192 := ssdeep.NewSSDEEP()
	deep192Str := deep192.Fuzzy("192.txt")
	fmt.Printf("deep192Str : '%s'\n", deep192Str )
	distance, err := ssdeep.HashDistance(deepMaingoStr, deep192Str )
	fmt.Printf("Distance: '%d' Error: '%s'\n", distance, err)
}

ACTUAL RESULT
Error 'Apples != Grapes' is unclear:

block size: 3, file size: 192, n/bs: 64
deep192Str: '3:OWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGU:OWGukOWGukOWGukOWGukOWGukOWGukOw,"192.txt"'
Distance: '0' Error: 'Apples != Grapes'

Integrity testing

Add more tests to verify the implementation matches the reference implementation.

Make `Force` a configurable option

Hi, Is it possible to make Force at line 184 in the file ssdeep.go a configurable option? Sometimes I need to handle some slightly shorter input but now this behavior throws the error ErrFileTooSmall.

I prefer this:

input := "hello, world"
hash := ssdeep.FuzzyBytes([]byte(input), true /** force continue **/)

Or is there any better way to handle this situation?

Hash mismatch with reference implementation

Hi, thanks for the pure go library for ssdeep. I was curious if you have compared results with github.com/dutchcoders/gossdeep. I just did a quick test and it seems like the results are slightly diff:

"12288:+AxbaNI5pVxkQw3iNjQzTgLNh7EMusK8NiftLV1rq:+HITCchFHtNiftvq" (expected)
"12288:+AxbaNI5pVxkQw3iNjQzTgLNh7EMusK8NiftLV1rqs:+HITCchFHtNiftvqs" (actual)

First one is from @dutchcoders, second is from yours.

ssdeep output parity on small files

While using ssdeep 2.13 from https://ssdeep-project.github.io/ssdeep/index.html, I receive the following output on small files:

ssdeep,1.1--blocksize:hash:hash,filename
3:I3VOCdKHObbERXsvPUZdIK9LKL9v:IltdBEx5Iv,"/etc/apt/apt.conf"
ssdeep: Did not process files large enough to produce meaningful results

However, when using github.com/glaslos/ssdeep, I receive the following output on small files:

did not process files large enough to produce meaningful results

Can we have an option to also return the fuzzy hash on small files to reproduce ssdeep output, regardless if it results in unreliable output?

Stable Release

This issue is tracking the stuff that need to be done in order to have a stable release.
@glaslos feel free to reject/add stuff/discuss this with me. We are close to have this implementation usable for production.

What we need to do:

  • Create stateless api for fuzzy file and fuzzy bytes. I feel like the SSDEEP struct doesn't have much purpose.
  • Better documrntation.
  • More testing for integrity.
  • Verify that the distance function is correct - looking at the code, it seems that the implementation is incomplete.

Hang if file length less than 192 bytes

STEPS

  1. Create file of 191 byte:
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
  1. Try to execute code:
deepSig := ssdeep.NewSSDEEP()
sigStr := deepSig.Fuzzy("191.txt")

ACTUAL RESULT

  • program hangs with high CPU usage

Incorrect Hash Outputted on Certain Files

Thanks for the great work on this library! One of the issues that I'm seeing is that when I run this implementation on a malicious file, I'm seeing slightly different results than what I see in VirusTotal. I've also compiled the official SSDEEP implementation, and they also show the same result as what VT shows.

This Implementation: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1:o45AgJUEpV2NLW4GdlakpZ8Oda
Virustotal: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1r:o45AgJUEpV2NLW4GdlakpZ8Oda

The subtle difference is that the first part of the hash is missing an 'r' at the end of it. I have been debugging this for about two hours, but I can't see any obvious bug occurring, so I won't be able to submit a PR at this time.

I suspect that it might be the way that the blockSize variable is calculated, but that's just a hunch. I tried a bunch of stuff to see if I could fix it but none of it worked.

Attached Zip with password of "infected"
5403252175699968.zip This is a malicious file so please do not execute it. (Malicious VBA script)

Panic for "integer divide by zero" for empty file

STEPS

  1. Create empty file empty.txt.
  2. Try to execute following code:
deepSig := ssdeep.NewSSDEEP()
sigStr := deepSig.Fuzzy("empty.txt")

ACTUAL RESULT

panic: runtime error: integer divide by zero
goroutine 1 [running]:
github.com/glaslos/ssdeep.(*SSDEEP).Fuzzy(0xc042035e70, 0x4d68ce, 0xb, 0x0, 0x0)
	/GoPath/src/github.com/glaslos/ssdeep/ssdeep.go:137 +0x3f0

Stateless API

Create stateless API for fuzzy file and fuzzy bytes.
Investigate if the SSDEEP struct should still be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.