Golang implementation based on the paper and implementation by Jesse Kornblum.
See the example in the app directory for the usage.
For CPU profiling: apt install graphviz
For banchmark comparison go install golang.org/x/perf/cmd/benchstat@latest
SSDEEP hash lib in Golang
License: Other
Golang implementation based on the paper and implementation by Jesse Kornblum.
See the example in the app directory for the usage.
For CPU profiling: apt install graphviz
For banchmark comparison go install golang.org/x/perf/cmd/benchstat@latest
Verify that the distance function is correct - looking at the code, it seems that the implementation is incomplete.
Hi,
I'm trying to compute the hash of a very small file (size 28) and I get this panic error:
Note that this happens only if I force the execution with --force
panic: runtime error: integer divide by zero
github.com/glaslos/ssdeep.(*ssdeepState).processByte(0xc4202f7798, 0x63)
/../src/github.com/glaslos/ssdeep/ssdeep.go:107 +0x2ee
github.com/glaslos/ssdeep.(*ssdeepState).process(0xc4202f7798, 0xc4202f7848)
/../src/github.com/glaslos/ssdeep/ssdeep.go:132 +0xbf
github.com/glaslos/ssdeep.FuzzyReader(0x6fc620, 0xc42040cc20, 0x1c, 0x200, 0xc42024bce0, 0x1c, 0xc42024bc80)
/../src/github.com/glaslos/ssdeep/ssdeep.go:154 +0x2e4
....
It looks like a 0 check is missing
106 rh := int(state.rollingState.rollSum())
107 if rh%state.blockSize == (state.blockSize - 1) {
please add go mod and release new version for build
type ssdeepState struct {
rollingState rollingState
blockSize int64
hashString1 string
hashString2 string
blockHash1 uint32
blockHash2 uint32
}
STEPS
package main
import (
"github.com/glaslos/ssdeep"
"fmt"
)
func main() {
deepMaingo := ssdeep.NewSSDEEP()
deepMaingoStr := deepMaingo .Fuzzy("1344text.txt")
fmt.Printf("'%s'\n", deepMaingoStr)
deep192 := ssdeep.NewSSDEEP()
deep192Str := deep192.Fuzzy("193.txt")
fmt.Printf("'%s'\n", deep192Str )
distance, err := ssdeep.HashDistance(deepMaingoStr, deep192Str )
fmt.Printf("Distance: '%d' Error: '%s'\n", distance, err)
}
ACTUAL RESULT
block size: 24, file size: 1344, n/bs: 56
'24:tT0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T0T04:tT0T0T0T0T0T0T0T0T0T0T0T0T0T0T0Y,"1344text.txt"'
block size: 6, file size: 193, n/bs: 32
'6:OWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGP:MMMMMMMMMMMMULrOvwOvwOO,"193.txt"'
Distance: '0' Error: 'Apples != Grapes'
STEPS
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012
package main
import (
"github.com/glaslos/ssdeep"
"fmt"
)
func main() {
deepMaingo := ssdeep.NewSSDEEP()
deepMaingoStr := deepMaingo.Fuzzy("main.go")
fmt.Printf("deepMaingoStr: '%s'\n", deepMaingoStr)
deep192 := ssdeep.NewSSDEEP()
deep192Str := deep192.Fuzzy("192.txt")
fmt.Printf("deep192Str : '%s'\n", deep192Str )
distance, err := ssdeep.HashDistance(deepMaingoStr, deep192Str )
fmt.Printf("Distance: '%d' Error: '%s'\n", distance, err)
}
ACTUAL RESULT
Error 'Apples != Grapes' is unclear:
block size: 3, file size: 192, n/bs: 64
deep192Str: '3:OWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGukOWGU:OWGukOWGukOWGukOWGukOWGukOWGukOw,"192.txt"'
Distance: '0' Error: 'Apples != Grapes'
Why is 4096? If the file size is larger than 4096, the block size may be large than 48.
Add more tests to verify the implementation matches the reference implementation.
Hi, Is it possible to make Force
at line 184
in the file ssdeep.go
a configurable option? Sometimes I need to handle some slightly shorter input but now this behavior throws the error ErrFileTooSmall
.
I prefer this:
input := "hello, world"
hash := ssdeep.FuzzyBytes([]byte(input), true /** force continue **/)
Or is there any better way to handle this situation?
Hi, thanks for the pure go library for ssdeep. I was curious if you have compared results with github.com/dutchcoders/gossdeep. I just did a quick test and it seems like the results are slightly diff:
"12288:+AxbaNI5pVxkQw3iNjQzTgLNh7EMusK8NiftLV1rq:+HITCchFHtNiftvq" (expected)
"12288:+AxbaNI5pVxkQw3iNjQzTgLNh7EMusK8NiftLV1rqs:+HITCchFHtNiftvqs" (actual)
First one is from @dutchcoders, second is from yours.
Create an Hash compatible object for ongoing hashing.
The current version on pkg.go doesn't have this problem. But clone from this repo does bring it up.
While using ssdeep 2.13
from https://ssdeep-project.github.io/ssdeep/index.html, I receive the following output on small files:
ssdeep,1.1--blocksize:hash:hash,filename
3:I3VOCdKHObbERXsvPUZdIK9LKL9v:IltdBEx5Iv,"/etc/apt/apt.conf"
ssdeep: Did not process files large enough to produce meaningful results
However, when using github.com/glaslos/ssdeep
, I receive the following output on small files:
did not process files large enough to produce meaningful results
Can we have an option to also return the fuzzy hash on small files to reproduce ssdeep output, regardless if it results in unreliable output?
This issue is tracking the stuff that need to be done in order to have a stable release.
@glaslos feel free to reject/add stuff/discuss this with me. We are close to have this implementation usable for production.
What we need to do:
SSDEEP
struct doesn't have much purpose.STEPS
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
deepSig := ssdeep.NewSSDEEP()
sigStr := deepSig.Fuzzy("191.txt")
ACTUAL RESULT
Thanks for the great work on this library! One of the issues that I'm seeing is that when I run this implementation on a malicious file, I'm seeing slightly different results than what I see in VirusTotal. I've also compiled the official SSDEEP implementation, and they also show the same result as what VT shows.
This Implementation: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1:o45AgJUEpV2NLW4GdlakpZ8Oda
Virustotal: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1r:o45AgJUEpV2NLW4GdlakpZ8Oda
The subtle difference is that the first part of the hash is missing an 'r' at the end of it. I have been debugging this for about two hours, but I can't see any obvious bug occurring, so I won't be able to submit a PR at this time.
I suspect that it might be the way that the blockSize
variable is calculated, but that's just a hunch. I tried a bunch of stuff to see if I could fix it but none of it worked.
Attached Zip with password of "infected"
5403252175699968.zip This is a malicious file so please do not execute it. (Malicious VBA script)
STEPS
deepSig := ssdeep.NewSSDEEP()
sigStr := deepSig.Fuzzy("empty.txt")
ACTUAL RESULT
panic: runtime error: integer divide by zero
goroutine 1 [running]:
github.com/glaslos/ssdeep.(*SSDEEP).Fuzzy(0xc042035e70, 0x4d68ce, 0xb, 0x0, 0x0)
/GoPath/src/github.com/glaslos/ssdeep/ssdeep.go:137 +0x3f0
Create stateless API for fuzzy file and fuzzy bytes.
Investigate if the SSDEEP struct should still be used.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.