trendmicro / tlsh Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
a1c58fd breaks the python module:
building 'tlsh' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/home/raphael/gits/AIL-framework/tlsh/include -I/usr/include/python2.7 -c tlshmodule.cpp -o build/temp.linux-x86_64-2.7/tlshmodule.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
tlshmodule.cpp: In function ‘PyObject* Tlsh_fromTlshStr(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:226:16: error: ‘TLSH_STRING_LEN’ was not declared in this scope
if (len != TLSH_STRING_LEN) {
^
tlshmodule.cpp: In function ‘PyObject* Tlsh_update(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:253:31: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
if (self->required_data < MIN_DATA_LENGTH) {
^
tlshmodule.cpp: In function ‘PyObject* Tlsh_final(tlsh_TlshObject*)’:
tlshmodule.cpp:269:31: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
if (self->required_data < MIN_DATA_LENGTH) {
^
tlshmodule.cpp: In function ‘PyObject* Tlsh_hexdigest(tlsh_TlshObject*)’:
tlshmodule.cpp:281:15: error: ‘TLSH_STRING_LEN’ was not declared in this scope
char hash[TLSH_STRING_LEN + 1];
^
tlshmodule.cpp:287:24: error: ‘hash’ was not declared in this scope
self->tlsh.getHash(hash, TLSH_STRING_LEN + 1);
^
tlshmodule.cpp: In function ‘PyObject* Tlsh_diff(tlsh_TlshObject*, PyObject*)’:
tlshmodule.cpp:321:18: error: ‘TLSH_STRING_LEN’ was not declared in this scope
if (len != TLSH_STRING_LEN) {
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
See https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html
should use
No need to do
"LD_PRELOAD=${CMAKE_SOURCE_DIR}/lib/libtlsh.so.0"
in the Testing directory
I would like to bring up #11 again to highlight the need for release tags for every stable release.
If you don't tag your releases linux distribution package maintainers will not be aware of updates and this results in old versions of tlsh being shipped to all users.
Eg. Debian Buster is now in freeze and will ship with tlsh 3.4.4:
https://packages.debian.org/buster/tlsh-tools
pages 2 3 and 4 of https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf have various
"magic" parameter settings
One value (a distance of 6 for the case of one digest having a bucket value of 0 and the second digest having a value of 3 in the same bucket) was justified using the binomial distribution in section III of the paper.
So that everyone knows - these values were initially adopted by optimizing ROC curves and AUCs.
I am preparing a paper that repeats these experiments
It would be useful if researchers could experiment with the various distance values as an command line parameter
It would be great if tlsh can be installed with brew install tlsh
on both Linux and Mac.
TLSH version: 3.4.5 compact hash, 1 byte checksum
Directory contains 230,781 files. The system has 64GB of memory.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 516130
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 999999
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 516130
virtual memory (kbytes, -v) 9999999999
file locks (-x) unlimited
$ ./bin/tlsh_unittest -r /mnt/data
Segmentation fault
strace info:
mmap(NULL, 852165308416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0xc66ab4c000) = 0x1b22000
mmap(NULL, 852165439488, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f403d0fd000
munmap(0x7f403d0fd000, 49295360) = 0
munmap(0x7f4044000000, 17813504) = 0
mprotect(0x7f4040000000, 135168, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 852165308416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
gdb info:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401bfa in recursive_read_files_from_dir(char_, FileName_, int, int*) ()
Questions:
the library is small
and the use of the dynamic library has cause questions and complaints due to the need to
include this library in the export path
the dynamic library should be available as an option
Any idea why the tests are failing or how to easily debug it? Are the tests working on your machine?
Running tests...
Test project /build/tlsh-3.4.1/build/release
Start 1: tlsh_unittest_len
1/2 Test #1: tlsh_unittest_len ................***Failed 0.31 sec
Start 2: tlsh_unittest_xlen
2/2 Test #2: tlsh_unittest_xlen ...............***Failed 0.17 sec
0% tests passed, 2 tests failed out of 2
Total Test time (real) = 0.48 sec
The following tests FAILED:
1 - tlsh_unittest_len (Failed)
2 - tlsh_unittest_xlen (Failed)
Errors while running CTest
Makefile:72: recipe for target 'test' failed
make: *** [test] Error 8
I cloned the repo, built and installed the Python extension and tried the example.
The README says:
For example, tlsh.hash(str(os.urandom(256))), should always generate a hash.
Now, I wrote this tiny script:
import tlsh
import os
hash = tlsh.hash(str(os.urandom(256)))
print(hash)
And instead of "always generating a hash", it always generates this sweet TypeError for me:
Traceback (most recent call last):
File "test.py", line 4, in <module>
hash = tlsh.hash(str(os.urandom(256)))
TypeError: a bytes-like object is required, not 'str'
Removing the str works and generates a hash for me, but yeah, this isn't exactly how the README says it should work. Perhaps someone forgot to update the readme?
Runtime is Python 3.6.8 (64-Bit) on Ubuntu 18.04.
When I run the command ./python_test.sh, it gives following error :
python ../py_ext/tlsh_digest.py -force example_data/small.txt > tmp/py_small.tls h
diff tmp/py_small.tlsh exp/small.tlsh_EXP
diff: exp/small.tlsh_EXP: No such file or directory
error: diff tmp/py_small.tlsh exp/small.tlsh_EXP
It seems that the testing validation file in exp/ is missing.
so that we can options where the regression tests will fail
how i config the argv like this i find many argv which i do not know the value of them you should provide one header file for configing them
Allow the library access functions that access
Would it be possible to make this library functional for python3?
The python2 tests are running fine, however if I invoke it with python 3.5.0, I get a ValueError:
python3:
# python test.py ../Testing/example_data/2005NISSE.txt ../Testing/example_data/1english-only.txt
tlsh.hash hex1 06D29517F780237185070293B60E36FAB735C0F833D66460688DA22D6756E751B7BAEB
tlsh.hash hex2 E951784702042376169012B1BA5A76EAF36092FC3311A595B4856235278F9F973763EF
tlsh.diff(hex1, hex2) 427
tlsh.diff(hex2, hex1) 427
tlsh.Tlsh hex1 06D29517F780237185070293B60E36FAB735C0F833D66460688DA22D6756E751B7BAEB
tlsh.Tlsh hex2 E951784702042376169012B1BA5A76EAF36092FC3311A595B4856235278F9F973763EF
h1.diff(h2) 427
h2.diff(h1) 427
h1.diff(hex2) 427
h2.diff(hex1) 427
Traceback (most recent call last):
File "test.py", line 39, in <module>
h3.fromTlshStr(hex2)
ValueError: argument is not a TLSH hex string
Using the C++ library I have calculated two hashes and assigned it to strings like,
std::string hash1 = "5582932E7B4443F206C202A16A4F6CDFE32AD5B9722E11542859C15D236FE35C3BFAD9";
std::string hash2 = "C782932E7B4443F205C203A16A4B6CDFE32AD4BDB23A11546859C15D236BE35C3BFAD9";
What methods do Tlsh offers to find the distance score between two hashes could any one help me with this issue.
When passing a Python bytearray to hashing function the following error is thrown:
TypeError: argument 1 must be read-only bytes-like object, not bytearray
and it prevents me from using bytearrays.
bytearrays offer advantages so it might be worth supporting:
Lala reported the following error
[piranha@ers-tools Library-tlsh]$ ./make.sh
rm -rf build
cmake -DTLSH_CHECKSUM_1B=1 ../..
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CMake Error at src/CMakeLists.txt:74 (include):
include could not find load file:
GNUInstallDirs
CMake Error at src/CMakeLists.txt:78 (install):
install TARGETS given no ARCHIVE DESTINATION for static library target
"tlsh".
CMake Error at src/CMakeLists.txt:81 (install):
install FILES given no DESTINATION!
the Python scripts mix spaces and tabs - which does not work with Python 3
see for example
https://stackoverflow.com/questions/36063679/python-3-allows-mixing-spaces-and-tabs
hi,
i'm having problem building the project on my machine. Below is the description of my environment and the error after executing python setup.py build.
running build
running build_ext
building 'tlsh' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c tlshmodule.cpp -o build/temp.macosx-10.7-x86_64-3.7/tlshmodule.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
tlshmodule.cpp:148:5: warning: suggest braces around initialization of subobject
[-Wmissing-braces]
PyObject_HEAD_INIT(NULL)
^~~~~~~~~~~~~~~~~~~~~~~~
/anaconda3/include/python3.7m/object.h:87:5: note: expanded from macro
'PyObject_HEAD_INIT'
1, type },
^~~~~~~
2 warnings generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
1 warning generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.cpp:351:19: warning:
unused variable 'r' [-Wunused-variable]
unsigned char r;
^
2 warnings generated.
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/Users/romcabral/Documents/trend/TEMP/tlsh/include -I/anaconda3/include/python3.7m -c /Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.cpp -o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.o -DBUCKETS_128
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on
the command line to use the libc++ standard library instead
[-Wstdlibcxx-not-found]
1 warning generated.
g++ -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/tlshmodule.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_impl.o build/temp.macosx-10.7-x86_64-3.7/Users/romcabral/Documents/trend/TEMP/tlsh/src/tlsh_util.o -o build/lib.macosx-10.7-x86_64-3.7/tlsh.cpython-37m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 1
I am intersted in calculating the hash value of short strings that are less that 256 bytes or 50 characters, in order to compare between short words. How can I do that ?
Hi, the hash has 6 half-byte chars: checksum, Lvalue and Quartile, 3 bytes.
I have 2 questions:
1, why checksum is a byte array? it's just a single byte.
2, why do swapping for the head bytes? historical reasons?
Regargs
allow the window size to vary between 4 to 8
allow for a variety of random projection functions
see https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Random_projection for some discussion and references
measure the performance of various window sizes
make sure that adding this code does not make tlsh too slow
Hey, first: thanks for this great tool.
I wanted to ask if it would be possible to push git tags for released versions. It seems that you do proper versioning of the tool (according to the Changelog and CMakeLists.txt) but never push the git tags to those commits "releasing" a new version.
It would make my life lot easier packaging your software for a distribution and keeping track of new released updates. Would highly appreciate if you push tags that you also keep them in sync in the future
sincerely,
anthraxx
Hi!
In the atricle TLSH_CTC_final.pdf I read about triplets selection:
"We selected 6 triplets of the 10 possible..."
A B C
A B D
A B E
A C E
A D E
where A is first byte in slide window...
However, when I started to analyze how to create digest body I found that in update function bytes was choosen in reverse order. For example, get first 5 bytes of usual ELF file (0x7F 0x45 0x4C 0x46 0x01).
Update function uses such trigrams:
0x01 0x46 0x4C (E D C)
0x01 0x46 0x45 (E D B)
0x01 0x4c 0x45 (E C B)
0x01 0x4c 0x7f (E C A)
0x01 0x46 0x7f (E D A)
0x01 0x45 0x7f (E B A)
Can you explain whether the code works correctly or maybe I misunderstood something?
Have an option to output TLSH as you process the directories
Avoid the qsort()
The qsort is there with good reason - when you process a directory on different OS - it is needed to make sure that you process dir in the same order - and hence get reproducibility
Having an option to process more quickly in a way that may not be reproducibile is OK
(the part that will differ between OS is the order of the files - the TLSH digest will be the same for each file)
I can't find version-specific tlsh solution files as the setup instruction shows.
How can I use it on Windows?
Does tlsh only support input longer than 256 bytes? As 256 bytes is fairly large amount of chars, does it support short sentences like "hello world, hello cat, hello dog" as the input string?
I would like to point out that an identifier like "_TLSH_H
" does eventually not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?
Building fails at:
Scanning dependencies of target tlsh
[ 5%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh.cpp.o
[ 10%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh_impl.cpp.o
[ 15%] Building CXX object src/CMakeFiles/tlsh.dir/tlsh_util.cpp.o
[ 20%] Building CXX object src/CMakeFiles/tlsh.dir/input_desc.cpp.o
[ 25%] Building CXX object src/CMakeFiles/tlsh.dir/shared_file_functions.cpp.o
[ 30%] Linking CXX static library ../../../lib/libtlsh.a
[ 30%] Built target tlsh
Scanning dependencies of target tlsh_unittest
[ 35%] Building CXX object test/CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o
[ 40%] Linking CXX executable ../../../bin/tlsh_unittest
/opt/binutils-2.32/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function trendLSH_ut(char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, char*)': tlsh_unittest.cpp:(.text+0x39b): undefined reference to
operator new(unsigned long)'
/opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x42f): undefined reference to operator delete(void*, unsigned long)' /opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x45b): undefined reference to
operator delete(void*, unsigned long)'
/opt/binutils-2.32/bin/ld: tlsh_unittest.cpp:(.text+0x8f0): undefined reference to operator delete(void*, unsigned long)' /opt/binutils-2.32/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function
trendLSH_ut(char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, char*) [clone .cold]':
Couldn't find a solution, Cmake version 3.10.2
Thanks!
Hey
Sorry for my ignorance but is there any way to use TLSH for clustering similar files?
In utils/rand_tags.cpp
, the pointer ndistinct_tags
is checked if its negative or zero:
static void rhtml_contents(std::string &htmls, int *ntags, int *ndistinct_tags)
{
if ((*ntags <= 0) && (ndistinct_tags <= 0))
Perhaps it should be compared with == NULL
instead?
examples are
tlsh.final(buffer, buflen)
tlsh.getHash()
will give a different hash from
tlsh.final(buffer, buflen)
tlsh.final(buffer, buflen)
tlsh.getHash()
The second sequence of calls should give a warning
To show this problem see order_bug.cpp
This program should
read a pattern file
col 1: pattern number
col 2: nitems in group
col 3: TLSH
col 4: radius
col 5: pattern label
input options should match the tlsh program
usage: tlsh_pattern [-xlen] [-force] -pat pattern_file -f file
: tlsh_pattern [-xlen] [-force] -pat pattern_file -d digest
: tlsh_pattern [-xlen] [-force] -pat pattern_file -r dir
: tlsh_pattern [-xlen] [-force] -pat pattern_file -l listfile
As far as I know, current tlsh only supports generating digests for a bunch of files under a directory using commands like tlsh -r <dir>
.
I was hoping there would be an option to generate digests of files that are listed in a specific file listfile
, with something like tlsh -l <listfile>
, since my target files spread under different directories and many other irrelevant files also exit in those directories.
(It seems that current -l
option is only used for comparison but not for generating digests.)
Adding a timing unittest program
Please publish the tlsh Java version to the Maven central repo. It's free.
Many corporate rules (ours included) make it impossible to connect to custom repo's like bintray.com, not to mention having a custom repo in the POM slows down the download process for all other artifacts.
I'm getting a test failure as of commit b319aed: both tlsh_unittest_len
and tlsh_unittest_xlen
fail.
I've removed the > /dev/null
after the diff
invocations in test.sh
to get more details. Below is the resulting LastTest.log
file generated by make test
:
Start testing: Oct 15 00:00 CEST
----------------------------------------------------------
1/2 Testing: tlsh_unittest_len
1/2 Test: tlsh_unittest_len
Command: "/tmp/nix-build-tlsh-3.4.1.drv-0/source/Testing/test.sh"
Directory: /tmp/nix-build-tlsh-3.4.1.drv-0/build/Testing
"tlsh_unittest_len" start time: Oct 15 00:00 CEST
Output:
----------------------------------------------------------
HASH is 128
CHKSUM is 1
Running, not considering len, ...
test 1
../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed
test 2
../bin/tlsh_unittest -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed
test 3
../bin/tlsh_unittest -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed
test 4
../bin/tlsh_unittest -xref -r ../Testing/example_data tmp/example_data.xref.scores
passed
test 5
../bin/tlsh_unittest -T 201 -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running, considering len, ...
test 1
../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed
test 2
../bin/tlsh_unittest -xlen -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed
test 3
../bin/tlsh_unittest -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed
test 4
../bin/tlsh_unittest -xref -xlen -r ../Testing/example_data tmp/example_data.xref.scores
passed
test 5
../bin/tlsh_unittest -T 201 -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running simple_unittest
10c10
< hash4 = 9A1124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
---
> hash4 = 301124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
error: diff tmp/simple_unittest.out exp/simple_unittest_EXP
<end of output>
Test time = 0.23 sec
----------------------------------------------------------
Test Failed.
"tlsh_unittest_len" end time: Oct 15 00:00 CEST
"tlsh_unittest_len" time elapsed: 00:00:00
----------------------------------------------------------
2/2 Testing: tlsh_unittest_xlen
2/2 Test: tlsh_unittest_xlen
Command: "/tmp/nix-build-tlsh-3.4.1.drv-0/source/Testing/test.sh" "-xlen"
Directory: /tmp/nix-build-tlsh-3.4.1.drv-0/build/Testing
"tlsh_unittest_xlen" start time: Oct 15 00:00 CEST
Output:
----------------------------------------------------------
HASH is 128
CHKSUM is 1
Running, not considering len, ...
test 1
../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed
test 2
../bin/tlsh_unittest -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed
test 3
../bin/tlsh_unittest -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed
test 4
../bin/tlsh_unittest -xref -r ../Testing/example_data tmp/example_data.xref.scores
passed
test 5
../bin/tlsh_unittest -T 201 -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running, considering len, ...
test 1
../bin/tlsh_unittest -r ../Testing/example_data > tmp/example_data.out
passed
test 2
../bin/tlsh_unittest -xlen -r ../Testing/example_data -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores
passed
test 3
../bin/tlsh_unittest -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2
passed
test 4
../bin/tlsh_unittest -xref -xlen -r ../Testing/example_data tmp/example_data.xref.scores
passed
test 5
../bin/tlsh_unittest -T 201 -xlen -l tmp/example_data.out -c ../Testing/example_data/website_course_descriptors06-07.txt > tmp/example_data.scores.2.T-201
passed
Running simple_unittest
10c10
< hash4 = 9A1124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
---
> hash4 = 301124198C869A5A4F0F9380A9AE92F2B9278F42089EA34272885F0FB2D34E6911444C
error: diff tmp/simple_unittest.out exp/simple_unittest_EXP
<end of output>
Test time = 0.22 sec
----------------------------------------------------------
Test Failed.
"tlsh_unittest_xlen" end time: Oct 15 00:00 CEST
"tlsh_unittest_xlen" time elapsed: 00:00:00
----------------------------------------------------------
End testing: Oct 15 00:00 CEST
Any idea what's wrong?
I am using Python 3.4.2 on Ubuntu 14.10 and I am getting the following error when executing this script:
import tlsh
data = "hello"
print(tlsh.hash(data))
Traceback (most recent call last):
File "script.py", line 4, in <module>
print(tlsh.hash(data))
TypeError: must be impossible<bad format char>, not str
Could you explain how to fix this issue?
I read TLSH version 3.5.1 has a force option to use string length 50 instead of 256. I am working with shorter texts and this option would be very useful. But I cannot figure out how to use it in python API. Any ideas?
Currently getting this issue:
C:\Working\git\tlsh>cd py_ext
C:\Working\git\tlsh\py_ext>python setup.py build
running build
running build_ext
building 'tlsh' extension
error: Unable to find vcvarsall.bat
Just following the tips here: https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/ if there is something i can do in future i'll try and contribute, but it looks unlikely in the short term with other deadlines to reach.
Will return to my Linux terminal for now.
it seems that the repository with the Java port (from README.md) no longer exists.
Hi,
CMake fails on d5e149a with:
CMake Warning (dev) at CMakeLists.txt:155 (add_subdirectory):
The source directory
F:/external/tlsh/Windows
does not contain a CMakeLists.txt file.
Good Morning,
I have built the C++ library using Visual Studio 2017. The build was successful and I have got the tlsh.dll, In my project I have included the header files from include directory. Linked the library files and moved the dll to my project folder. When I rebuilt my project which uses your library
I have found an error (Missing header file). Upon inspecting further I see that the error fires on tlsh.h header file on line no: 63. I have found it jumps to the else statement which include "version.h" header file. I can see "win_version.h" header file in windows directory but version.h header file is missing or is it not properly jumping to include "win_version.h"
#ifdef WINDOWS
#include "win_version.h"
#else
#include "version.h"
#endif
Am I missing something? Could you give me some insights and example code to use this project for my C++ application.
Thank you,
Visweswaran N
Hi ,
I am following steps mentioned in README to get tlsh on my centos 6.5 machine.
Installed cmake and done with running make.sh gracefully.
But when i try to execute -> python setup.py build, to build python extension, it gives below errors.
running build
running build_ext
building 'tlsh' extension
gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/tlsh-master/include -I/usr/local/include/python3.4m -c tlshmodule.cpp -o build/temp.linux-x86_64-3.4/tlshmodule.o
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
tlshmodule.cpp:169: warning: missing braces around initializer for ‘PyObject’
tlshmodule.cpp: In function ‘PyObject* Tlsh_fromTlshStr(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:226: error: ‘TLSH_STRING_LEN’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject* Tlsh_update(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:253: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject* Tlsh_final(tlsh_TlshObject_)’:
tlshmodule.cpp:269: error: ‘MIN_DATA_LENGTH’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject_ Tlsh_hexdigest(tlsh_TlshObject_)’:
tlshmodule.cpp:281: error: ‘TLSH_STRING_LEN’ was not declared in this scope
tlshmodule.cpp:287: error: ‘hash’ was not declared in this scope
tlshmodule.cpp: In function ‘PyObject_ Tlsh_diff(tlsh_TlshObject_, PyObject_)’:
tlshmodule.cpp:321: error: ‘TLSH_STRING_LEN’ was not declared in this scope
error: command 'gcc' failed with exit status 1
Kindly assist, if i am missing some dependencies or something.
Thanks in advance.
After I install the 'python-tlsh', I can't import 'tlsh' in python.
How I install 'python-tlsh'
$ dpkg -L python-tlsh
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/python-tlsh
/usr/share/doc/python-tlsh/copyright
/usr/share/doc/python-tlsh/changelog.Debian.gz
/usr/lib
/usr/lib/python2.7
/usr/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages/tlsh.x86_64-linux-gnu.so
/usr/lib/python2.7/dist-packages/tlsh-0.2.0.egg-info
$ python -c 'import tlsh'
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named tlsh
How can I import 'python-tlsh'?
tlsh-3.4.5/utils/rand_tags.cpp:369:42: warning: ordered comparison of pointer with integer zero [-Wextra]
Source code is
if ((*ntags <= 0) && (ndistinct_tags <= 0))
return;
Maybe better code
if ((_ntags <= 0) || (_ndistinct_tags <= 0))
return;
I'm almost done with an R port/interface for this very clean and easy-to-use C++ implementation.
If desired, I can also add it to this repo (since it houses many other ports/interfaces). If is it desired, just let me know how you'd like to do it (as a copy or a remote, PR process, etc).
thx,
-boB
Dear developers,
I've made available a Java version of the TLSH algorithm that you find at https://github.com/triplecheck/TLSH
In essence, the relevant code that performs the hashing is found inside a single Java source code file at https://github.com/triplecheck/TLSH/blob/master/sources/TLSH.java
Please verify if due credits are provided and/or if any correction is necessary. The main() method contains usage examples. To apply in other Java projects one just needs to copy TLSH.java into the target project.
Performance-wise, running on our experiments the code was useful for outputting an average of 200 million hashes/day for a sub-set of files in our archive. (Pentium i7, 4 cores).
-- Nuno
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.