Code Monkey home page Code Monkey logo

tlsh's People

Contributors

anthraxx avatar carbureted avatar cgull avatar chuncheng avatar daniels-cysiv avatar ddeka2910 avatar dkapps avatar dortegau avatar glaslos avatar hydradragonantivirus avatar jaysonpryde avatar jonjoliver avatar lamby avatar lunar-debian avatar mapreri avatar mknjc avatar mrpolyonymous avatar rafiot avatar robertlayton avatar russbaz avatar scott4man avatar sergiosvieira avatar sschuberth avatar tkeiser-z avatar vichargrave avatar yan99ui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tlsh's Issues

Python library broken

a1c58fd breaks the python module:

building 'tlsh' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/home/raphael/gits/AIL-framework/tlsh/include -I/usr/include/python2.7 -c tlshmodule.cpp -o build/temp.linux-x86_64-2.7/tlshmodule.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
tlshmodule.cpp: In function ‘PyObject* Tlsh_fromTlshStr(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:226:16: error: ‘TLSH_STRING_LEN’ was not declared in this scope
     if (len != TLSH_STRING_LEN) {
                ^
tlshmodule.cpp: In function ‘PyObject* Tlsh_update(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:253:31: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
     if (self->required_data < MIN_DATA_LENGTH) {
                               ^
tlshmodule.cpp: In function ‘PyObject* Tlsh_final(tlsh_TlshObject*)’:
tlshmodule.cpp:269:31: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
     if (self->required_data < MIN_DATA_LENGTH) {
                               ^
tlshmodule.cpp: In function ‘PyObject* Tlsh_hexdigest(tlsh_TlshObject*)’:
tlshmodule.cpp:281:15: error: ‘TLSH_STRING_LEN’ was not declared in this scope
     char hash[TLSH_STRING_LEN + 1];
               ^
tlshmodule.cpp:287:24: error: ‘hash’ was not declared in this scope
     self->tlsh.getHash(hash, TLSH_STRING_LEN + 1);
                        ^
tlshmodule.cpp: In function ‘PyObject* Tlsh_diff(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:321:18: error: ‘TLSH_STRING_LEN’ was not declared in this scope
       if (len != TLSH_STRING_LEN) {
                  ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Please tag releases

I would like to bring up #11 again to highlight the need for release tags for every stable release.
If you don't tag your releases linux distribution package maintainers will not be aware of updates and this results in old versions of tlsh being shipped to all users.

Eg. Debian Buster is now in freeze and will ship with tlsh 3.4.4:
https://packages.debian.org/buster/tlsh-tools

allow experimentation of the parameters in the LSH function

pages 2 3 and 4 of https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf have various
"magic" parameter settings

One value (a distance of 6 for the case of one digest having a bucket value of 0 and the second digest having a value of 3 in the same bucket) was justified using the binomial distribution in section III of the paper.

So that everyone knows - these values were initially adopted by optimizing ROC curves and AUCs.
I am preparing a paper that repeats these experiments

It would be useful if researchers could experiment with the various distance values as an command line parameter

tlsh segfault on large directory

TLSH version: 3.4.5 compact hash, 1 byte checksum

Directory contains 230,781 files. The system has 64GB of memory.

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 516130
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 999999
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 516130
virtual memory (kbytes, -v) 9999999999
file locks (-x) unlimited

$ ./bin/tlsh_unittest -r /mnt/data
Segmentation fault

strace info:

mmap(NULL, 852165308416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0xc66ab4c000)                       = 0x1b22000
mmap(NULL, 852165439488, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f403d0fd000
munmap(0x7f403d0fd000, 49295360)        = 0
munmap(0x7f4044000000, 17813504)        = 0
mprotect(0x7f4040000000, 135168, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 852165308416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

gdb info:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401bfa in recursive_read_files_from_dir(char_, FileName_, int, int*) ()

Questions:

  1. Can this error condition be handled gracefully?
  2. Can the memory limit be raised?

all tests failing on build

Any idea why the tests are failing or how to easily debug it? Are the tests working on your machine?

Running tests...
Test project /build/tlsh-3.4.1/build/release
    Start 1: tlsh_unittest_len
1/2 Test #1: tlsh_unittest_len ................***Failed    0.31 sec
    Start 2: tlsh_unittest_xlen
2/2 Test #2: tlsh_unittest_xlen ...............***Failed    0.17 sec

0% tests passed, 2 tests failed out of 2

Total Test time (real) =   0.48 sec

The following tests FAILED:
      1 - tlsh_unittest_len (Failed)
      2 - tlsh_unittest_xlen (Failed)
Errors while running CTest
Makefile:72: recipe for target 'test' failed
make: *** [test] Error 8

Python TypeError on example?

I cloned the repo, built and installed the Python extension and tried the example.

The README says:

For example, tlsh.hash(str(os.urandom(256))), should always generate a hash.

Now, I wrote this tiny script:

import tlsh
import os

hash = tlsh.hash(str(os.urandom(256)))
print(hash)

And instead of "always generating a hash", it always generates this sweet TypeError for me:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    hash = tlsh.hash(str(os.urandom(256)))
TypeError: a bytes-like object is required, not 'str' 

Removing the str works and generates a hash for me, but yeah, this isn't exactly how the README says it should work. Perhaps someone forgot to update the readme?

Runtime is Python 3.6.8 (64-Bit) on Ubuntu 18.04.

Python Extension Testing Error

When I run the command ./python_test.sh, it gives following error :


python ../py_ext/tlsh_digest.py -force example_data/small.txt > tmp/py_small.tls                                                                                     h
diff tmp/py_small.tlsh exp/small.tlsh_EXP
diff: exp/small.tlsh_EXP: No such file or directory
error: diff tmp/py_small.tlsh exp/small.tlsh_EXP

It seems that the testing validation file in exp/ is missing.

TLSH_CHECKSUM_LEN MIN_DATA_LENGTH

how i config the argv like this i find many argv which i do not know the value of them you should provide one header file for configing them

python3.5.0 incompatibility

Would it be possible to make this library functional for python3?
The python2 tests are running fine, however if I invoke it with python 3.5.0, I get a ValueError:

python3:

# python test.py ../Testing/example_data/2005NISSE.txt ../Testing/example_data/1english-only.txt
tlsh.hash hex1 06D29517F780237185070293B60E36FAB735C0F833D66460688DA22D6756E751B7BAEB
tlsh.hash hex2 E951784702042376169012B1BA5A76EAF36092FC3311A595B4856235278F9F973763EF
tlsh.diff(hex1, hex2) 427
tlsh.diff(hex2, hex1) 427
tlsh.Tlsh hex1 06D29517F780237185070293B60E36FAB735C0F833D66460688DA22D6756E751B7BAEB
tlsh.Tlsh hex2 E951784702042376169012B1BA5A76EAF36092FC3311A595B4856235278F9F973763EF
h1.diff(h2) 427
h2.diff(h1) 427
h1.diff(hex2) 427
h2.diff(hex1) 427
Traceback (most recent call last):
  File "test.py", line 39, in <module>
    h3.fromTlshStr(hex2)
ValueError: argument is not a TLSH hex string

How to compute the distance for two hash strings C++

Using the C++ library I have calculated two hashes and assigned it to strings like,

std::string hash1 = "5582932E7B4443F206C202A16A4F6CDFE32AD5B9722E11542859C15D236FE35C3BFAD9";
std::string hash2 = "C782932E7B4443F205C203A16A4B6CDFE32AD4BDB23A11546859C15D236BE35C3BFAD9";

What methods do Tlsh offers to find the distance score between two hashes could any one help me with this issue.

install error. remove dependancy on GNUInstallDirs

Lala reported the following error

[piranha@ers-tools Library-tlsh]$ ./make.sh
rm -rf build
cmake -DTLSH_CHECKSUM_1B=1 ../..
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CMake Error at src/CMakeLists.txt:74 (include):
include could not find load file:

GNUInstallDirs

CMake Error at src/CMakeLists.txt:78 (install):
install TARGETS given no ARCHIVE DESTINATION for static library target
"tlsh".

CMake Error at src/CMakeLists.txt:81 (install):
install FILES given no DESTINATION!

Error in building Python Ext in Mojave

hi,

i'm having problem building the project on my machine. Below is the description of my environment and the error after executing python setup.py build.

  • conda 4.6.14
  • conda 3.17.8
  • pthon 3.7.3.final.0
  • osx-64
  • mojave 10.14.4

running build
running build_ext
building 'tlsh' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c tlshmodule.cpp -o build/temp.macosx-10.7-x86_64-3.7/tlshmodule.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
tlshmodule.cpp:148:5: warning: suggest braces around initialization of subobject
[-Wmissing-braces]
PyObject_HEAD_INIT(NULL)
^~~~~~~~~~~~~~~~~~~~~~~~
/anaconda3/include/python3.7m/object.h:87:5: note: expanded from macro
'PyObject_HEAD_INIT'
1, type },
^~~~~~~
2 warnings generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
1 warning generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.cpp:351:19: warning:
unused variable 'r' [-Wunused-variable]
unsigned char r;
^
2 warnings generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
1 warning generated.
g++ -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/tlshmodule.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.o -o build/lib.macosx-10.7-x86_64-3.7/tlsh.cpython-37m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 1

About hash string header bytes

Hi, the hash has 6 half-byte chars: checksum, Lvalue and Quartile, 3 bytes.
I have 2 questions:
1, why checksum is a byte array? it's just a single byte.
2, why do swapping for the head bytes? historical reasons?

Regargs

missing git release tags

Hey, first: thanks for this great tool.

I wanted to ask if it would be possible to push git tags for released versions. It seems that you do proper versioning of the tool (according to the Changelog and CMakeLists.txt) but never push the git tags to those commits "releasing" a new version.

It would make my life lot easier packaging your software for a distribution and keeping track of new released updates. Would highly appreciate if you push tags that you also keep them in sync in the future

sincerely,
anthraxx

calculate triplets

Hi!

In the atricle TLSH_CTC_final.pdf I read about triplets selection:
"We selected 6 triplets of the 10 possible..."
A B C
A B D
A B E
A C E
A D E
where A is first byte in slide window...

However, when I started to analyze how to create digest body I found that in update function bytes was choosen in reverse order. For example, get first 5 bytes of usual ELF file (0x7F 0x45 0x4C 0x46 0x01).
Update function uses such trigrams:
0x01 0x46 0x4C (E D C)
0x01 0x46 0x45 (E D B)
0x01 0x4c 0x45 (E C B)
0x01 0x4c 0x7f (E C A)
0x01 0x46 0x7f (E D A)
0x01 0x45 0x7f (E B A)

Can you explain whether the code works correctly or maybe I misunderstood something?

very slow when you do a -r tlsh on a large directory structure

Have an option to output TLSH as you process the directories
Avoid the qsort()
The qsort is there with good reason - when you process a directory on different OS - it is needed to make sure that you process dir in the same order - and hence get reproducibility

Having an option to process more quickly in a way that may not be reproducibile is OK
(the part that will differ between OS is the order of the files - the TLSH digest will be the same for each file)

support for input less than 256 bytes

Does tlsh only support input longer than 256 bytes? As 256 bytes is fairly large amount of chars, does it support short sentences like "hello world, hello cat, hello dog" as the input string?

undefined references in tlsh_unittest.cpp

Building fails at:
Scanning dependencies of target tlsh
[ 5%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh.cpp.o
[ 10%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh_impl.cpp.o
[ 15%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh_util.cpp.o
[ 20%] Building CXX object src/CMakeFiles/tlsh.dir/input_desc.cpp.o
[ 25%] Building CXX object src/CMakeFiles/tlsh.dir/shared_file_functions.cpp.o
[ 30%] Linking CXX static library ../../../lib/libtlsh.a
[ 30%] Built target tlsh
Scanning dependencies of target tlsh_unittest
[ 35%] Building CXX object test/CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o
[ 40%] Linking CXX executable ../../../bin/tlsh_unittest
/opt/binutils-2.32/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function trendLSH_ut(char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, char*)': tlsh_unittest.cpp:(.text+0x39b): undefined reference to operator new(unsigned long)'
/opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x42f): undefined reference to operator delete(void*, unsigned long)' /opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x45b): undefined reference to operator delete(void*, unsigned long)'
/opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x8f0): undefined reference to operator delete(void*, unsigned long)' /opt/binutils-2.32/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function trendLSH_ut(char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, char*) [clone .cold]':

Couldn't find a solution, Cmake version 3.10.2

Thanks!

Clustring

Hey
Sorry for my ignorance but is there any way to use TLSH for clustering similar files?

check for negative pointer

In utils/rand_tags.cpp, the pointer ndistinct_tags is checked if its negative or zero:

static void rhtml_contents(std::string &htmls, int *ntags, int *ndistinct_tags)
{
       if ((*ntags <= 0) && (ndistinct_tags <= 0))

Perhaps it should be compared with == NULL instead?

add tlsh_pattern - a program that matches against a pattern file

This program should
read a pattern file
col 1: pattern number
col 2: nitems in group
col 3: TLSH
col 4: radius
col 5: pattern label
input options should match the tlsh program
usage: tlsh_pattern [-xlen] [-force] -pat pattern_file -f file
: tlsh_pattern [-xlen] [-force] -pat pattern_file -d digest
: tlsh_pattern [-xlen] [-force] -pat pattern_file -r dir
: tlsh_pattern [-xlen] [-force] -pat pattern_file -l listfile

generating digest from a file list

As far as I know, current tlsh only supports generating digests for a bunch of files under a directory using commands like tlsh -r <dir>.
I was hoping there would be an option to generate digests of files that are listed in a specific file listfile, with something like tlsh -l <listfile>, since my target files spread under different directories and many other irrelevant files also exit in those directories.
(It seems that current -l option is only used for comparison but not for generating digests.)

publish tlsh java code to maven central

Please publish the tlsh Java version to the Maven central repo. It's free.

Many corporate rules (ours included) make it impossible to connect to custom repo's like bintray.com, not to mention having a custom repo in the POM slows down the download process for all other artifacts.

Test failure

I'm getting a test failure as of commit b319aed: both tlsh_unittest_len and tlsh_unittest_xlen fail.

I've removed the > /dev/null after the diff invocations in test.sh to get more details. Below is the resulting LastTest.log file generated by make test:

Start testing: Oct 15 00:00 CEST
----------------------------------------------------------
1/2 Testing: tlsh_unittest_len
1/2 Test: tlsh_unittest_len
Command: "/tmp/nix-build-tlsh-3.4.1.drv-0/source/Testing/test.sh"
Directory: /tmp/nix-build-tlsh-3.4.1.drv-0/build/Testing
"tlsh_unittest_len" start time: Oct 15 00:00 CEST
Output:
----------------------------------------------------------
HASH is 128
CHKSUM is 1
Running, not considering len, ...

test 1

../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed

test 2

../bin/tlsh_unittest -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed

test 3

../bin/tlsh_unittest -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed

test 4

../bin/tlsh_unittest -xref -r ../Testing/example_data tmp/example_data.xref.scores
passed

test 5

../bin/tlsh_unittest -T 201 -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed

Running, considering len, ...

test 1

../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed

test 2

../bin/tlsh_unittest -xlen -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed

test 3

../bin/tlsh_unittest -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed

test 4

../bin/tlsh_unittest -xref -xlen -r ../Testing/example_data tmp/example_data.xref.scores
passed

test 5

../bin/tlsh_unittest -T 201 -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running simple_unittest
10c10
< hash4 = 9A1124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C

---
> hash4 = 301124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
error: diff tmp/simple_unittest.out exp/simple_unittest_EXP
<end of output>
Test time =   0.23 sec
----------------------------------------------------------
Test Failed.
"tlsh_unittest_len" end time: Oct 15 00:00 CEST
"tlsh_unittest_len" time elapsed: 00:00:00
----------------------------------------------------------

2/2 Testing: tlsh_unittest_xlen
2/2 Test: tlsh_unittest_xlen
Command: "/tmp/nix-build-tlsh-3.4.1.drv-0/source/Testing/test.sh" "-xlen"
Directory: /tmp/nix-build-tlsh-3.4.1.drv-0/build/Testing
"tlsh_unittest_xlen" start time: Oct 15 00:00 CEST
Output:
----------------------------------------------------------
HASH is 128
CHKSUM is 1
Running, not considering len, ...

test 1

../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed

test 2

../bin/tlsh_unittest -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed

test 3

../bin/tlsh_unittest -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed

test 4

../bin/tlsh_unittest -xref -r ../Testing/example_data tmp/example_data.xref.scores
passed

test 5

../bin/tlsh_unittest -T 201 -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed

Running, considering len, ...

test 1

../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed

test 2

../bin/tlsh_unittest -xlen -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed

test 3

../bin/tlsh_unittest -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed

test 4

../bin/tlsh_unittest -xref -xlen -r ../Testing/example_data tmp/example_data.xref.scores
passed

test 5

../bin/tlsh_unittest -T 201 -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running simple_unittest
10c10
< hash4 = 9A1124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C

---
> hash4 = 301124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
error: diff tmp/simple_unittest.out exp/simple_unittest_EXP
<end of output>
Test time =   0.22 sec
----------------------------------------------------------
Test Failed.
"tlsh_unittest_xlen" end time: Oct 15 00:00 CEST
"tlsh_unittest_xlen" time elapsed: 00:00:00
----------------------------------------------------------

End testing: Oct 15 00:00 CEST

Any idea what's wrong?

Python type error

I am using Python 3.4.2 on Ubuntu 14.10 and I am getting the following error when executing this script:

import tlsh

data = "hello"
print(tlsh.hash(data))
Traceback (most recent call last):
  File "script.py", line 4, in <module>
    print(tlsh.hash(data))
TypeError: must be impossible<bad format char>, not str

Could you explain how to fix this issue?

how to use -force option in python

I read TLSH version 3.5.1 has a force option to use string length 50 instead of 256. I am working with shorter texts and this option would be very useful. But I cannot figure out how to use it in python API. Any ideas?

Building Python ext in Windows: vcvarsall.bat

Currently getting this issue:

C:\Working\git\tlsh>cd py_ext

C:\Working\git\tlsh\py_ext>python setup.py build
running build
running build_ext
building 'tlsh' extension
error: Unable to find vcvarsall.bat

Just following the tips here: https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/ if there is something i can do in future i'll try and contribute, but it looks unlikely in the short term with other deadlines to reach.

Will return to my Linux terminal for now.

Java port 404

it seems that the repository with the Java port (from README.md) no longer exists.

Using C++ library with Visual Studio 2017

Good Morning,
I have built the C++ library using Visual Studio 2017. The build was successful and I have got the tlsh.dll, In my project I have included the header files from include directory. Linked the library files and moved the dll to my project folder. When I rebuilt my project which uses your library
I have found an error (Missing header file). Upon inspecting further I see that the error fires on tlsh.h header file on line no: 63. I have found it jumps to the else statement which include "version.h" header file. I can see "win_version.h" header file in windows directory but version.h header file is missing or is it not properly jumping to include "win_version.h"

#ifdef WINDOWS
#include "win_version.h"
#else
#include "version.h"
#endif

Am I missing something? Could you give me some insights and example code to use this project for my C++ application.

Thank you,
Visweswaran N

Issue creating python extension of tlsh library

Hi ,

I am following steps mentioned in README to get tlsh on my centos 6.5 machine.
Installed cmake and done with running make.sh gracefully.

But when i try to execute -> python setup.py build, to build python extension, it gives below errors.

running build
running build_ext
building 'tlsh' extension
gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/tlsh-master/include -I/usr/local/include/python3.4m -c tlshmodule.cpp -o build/temp.linux-x86_64-3.4/tlshmodule.o
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
tlshmodule.cpp:169: warning: missing braces around initializer for ‘PyObject’
tlshmodule.cpp: In function ‘PyObject* Tlsh_fromTlshStr(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:226: error: ‘TLSH_STRING_LEN’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject* Tlsh_update(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:253: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject* Tlsh_final(tlsh_TlshObject_)’:
tlshmodule.cpp:269: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject_ Tlsh_hexdigest(tlsh_TlshObject_)’:
tlshmodule.cpp:281: error: ‘TLSH_STRING_LEN’ was not declared in this scope
tlshmodule.cpp:287: error: ‘hash’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject_ Tlsh_diff(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:321: error: ‘TLSH_STRING_LEN’ was not declared in this scope
error: command 'gcc' failed with exit status 1

Kindly assist, if i am missing some dependencies or something.
Thanks in advance.

Can't import 'python-tlsh'

After I install the 'python-tlsh', I can't import 'tlsh' in python.

How I install 'python-tlsh'

$ dpkg -L python-tlsh
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/python-tlsh
/usr/share/doc/python-tlsh/copyright
/usr/share/doc/python-tlsh/changelog.Debian.gz
/usr/lib
/usr/lib/python2.7
/usr/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages/tlsh.x86_64-linux-gnu.so
/usr/lib/python2.7/dist-packages/tlsh-0.2.0.egg-info

$ python -c 'import tlsh'
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tlsh

How can I import 'python-tlsh'?

Add R package to repo

I'm almost done with an R port/interface for this very clean and easy-to-use C++ implementation.

If desired, I can also add it to this repo (since it houses many other ports/interfaces). If is it desired, just let me know how you'd like to do it (as a copy or a remote, PR process, etc).

thx,

-boB

Java version of TLSH

Dear developers,

I've made available a Java version of the TLSH algorithm that you find at https://github.com/triplecheck/TLSH

In essence, the relevant code that performs the hashing is found inside a single Java source code file at https://github.com/triplecheck/TLSH/blob/master/sources/TLSH.java

Please verify if due credits are provided and/or if any correction is necessary. The main() method contains usage examples. To apply in other Java projects one just needs to copy TLSH.java into the target project.

Performance-wise, running on our experiments the code was useful for outputting an average of 200 million hashes/day for a sub-set of files in our archive. (Pentium i7, 4 cores).

-- Nuno

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.