dongdongshe / neuzz Goto Github PK
View Code? Open in Web Editor NEWneural network assisted fuzzer
License: Other
neural network assisted fuzzer
License: Other
Asked this question in a closed question, but would rather just start a new discussion.
So I've succesfully ran NEUZZ, thanks to your help! Now, i've got the crash information, the bitmaps, and the seeds.
How do I view each of them in an interpretable manner? The crash information is in a one dimensional array, while the bitmaps and seeds are in an ELF file.
WARNING: tensorflow:From nn.py:18: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
Your paper is well written, and the idea is innovative.
I have some question about the part of Mutation and Retraining in the section V IMPLEMENTATION.
I'm trying to run neuzz on a new program and am currently setting it up.
I've compiled it using the gcc command given and currently have a folder named example with example.c and its compiled version example in it. I can't seem to run afl on the example for this step:
Collect the training data by running AFL on the binary for a while(about an hour), then copy the queue folder to neuzz_in.
i'm running neuzz in a linux enviroment.
How do I generate more training data on the programs (readelf, etc.) and then view it?
After switching to a GPU based Azure Machine i've run into another problem I can't fix.
This is what my readelf folder looks like now
Whenever I run the two modules seperately, this error is returned:
This is the output in the other terminal:
Hello, when I run Neuzz it stuck like this: (on the Python module, I have connected by neuzz execution moduel ('127.0.0.1', 56218))
num_index 4096 7505 small 2048 medium 4096 large 7505
mutation len: 7506
Checking CPU scaling governor...
You have 8 CPU cores and 13 runnable tasks (utilization: 162%).
System under apparent load, performance may be spotty.
Checking CPU core loadout...
Found a free CPU core, binding to #1.
Setting up output directories...Spinning up the fork server...
Do you have any suggestion?
This is more a question about understanding the program. After running both
python nn.py ./readelf -a
./neuzz -i neuzz_in -o seeds -l 7506 ./readelf -a @@
The program succesfully runs. What does the accuracy that is presented during and after each epoch concretely represent? Couldn't seem to understand from your paper.
In line 1726 in function dry_run, the code is
else if(fault = FAULT_TMOUT){
which I believe it should be
else if(fault == FAULT_TMOUT){
Still it seems that it did not affect the overall execution. Guess it might be due to that no generated input leads to timeout error?
It looks like Neuzz will currently crash if no new edges are uncovered during a particular round, because new_seed_list
will be empty.
Backtrace:
Epoch 100/100
1/2 [===========>..................] - ETA: 0s - batch: 0.0000e+00 - size: 8.0000 - loss: 0.2662 - accur_1: 0.78382.8247524899999983e-05
3/2 [====================================] - 0s 2ms/step - batch: 1.0000 - size: 10.6667 - loss: 0.2772 - accur_1: 0.7724
#######debug1
Traceback (most recent call last):
File "nn.py", line 417, in <module>
setup_server()
File "nn.py", line 411, in setup_server
gen_grad(data)
File "nn.py", line 392, in gen_grad
gen_mutate2(model, 500, data[:5] == b"train")
File "nn.py", line 316, in gen_mutate2
rand_seed1 = [new_seed_list[i] for i in np.random.choice(len(new_seed_list), edge_num, replace=True)]
File "mtrand.pyx", line 894, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken
I wanted to ask if there is a way in order to use neuzz in binaries that you do not have the source code?
Thank you.
Hi!
I am reading your paper and code recently, they're really good. But I have some difficulties understanding the following code in get_adv2() in nn.py:
adv_list = [] loss = layer_list[-2][1].output[:, f] grads = K.gradients(loss, model.input)[0] iterate = K.function([model.input], [loss, grads])
What does the 'loss' mean here? Does it means the specific loss of the f^th output_neuron?
If fl1
is two bytes or less, splice_seed will loop infinitely, because (l_diff - f_diff) >= 2
will never be true. To demonstrate the issue I pulled out the splice_seed function into its own file (attached) and then ran:
$ dd if=/dev/zero of=file1 bs=1 count=2 # Create a two-byte file
$ for i in `seq 2 100` ; do dd if=/dev/urandom of=file$i bs=1 count=$[ $RANDOM % 521 ] ; done # Create a bunch of other files with random data
$ python3 splice.py file1 file{2..100}
3 splice.py file1 file* | head
0 0
0 1
0 1
0 1
[...]
This does actually come up in practice, as I found when trying to reproduce the harfbuzz
results:
moyix@isabella:~/git/neuzz/programs/harfbuzz$ ls -Sl seeds/ | tail
-rw------- 1 moyix moyix 41 Oct 17 17:39 id_0_000696
-rw------- 1 moyix moyix 30 Oct 17 17:47 id_0_001100
-rw------- 1 moyix moyix 16 Oct 17 17:44 id_0_000968
-rw------- 1 moyix moyix 15 Oct 17 17:53 id_0_001270
-rw------- 1 moyix moyix 8 Oct 17 18:36 id_1_001848_cov
-rw------- 1 moyix moyix 7 Oct 17 18:06 id_0_001567
-rw------- 1 moyix moyix 6 Oct 17 18:36 id_1_001849
-rw------- 1 moyix moyix 4 Oct 17 19:35 id_1_002991
-rw------- 1 moyix moyix 3 Oct 17 19:38 id_1_003024_cov
-rw------- 1 moyix moyix 2 Oct 17 21:01 id_2_003989
Dear Dongdong She,
I got the errors in building as follows. How to handle this error?
$ gcc -O3 -funroll-loops ./neuzz.c -o neuzz
./neuzz.c: In function ‘copy_seeds’:
./neuzz.c:1820:26: warning: ‘%s’ directive writing up to 255 bytes into a region of size 127 [-Wformat-overflow=]
1820 | sprintf(src, "%s/%s", in_dir, de->d_name);
| ^~
In file included from /usr/include/stdio.h:867,
from ./neuzz.c:3:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:36:10: note: ‘__builtin___sprintf_chk’ output 2 or more bytes (assuming 257) into a destination of size 128
36 | return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./neuzz.c:1821:26: warning: ‘%s’ directive writing up to 255 bytes into a region of size 127 [-Wformat-overflow=]
1821 | sprintf(dst, "%s/%s", out_dir, de->d_name);
| ^~
In file included from /usr/include/stdio.h:867,
from ./neuzz.c:3:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:36:10: note: ‘__builtin___sprintf_chk’ output 2 or more bytes (assuming 257) into a destination of size 128
36 | return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My environment is as following:
I'm trying to run my new program on neuzz.
Ubuntu 18.0.4. My folders look like this inside of the terminal with neuzz_in having all the training examples as a result of running afl-fuzz on example.exe (compiled program).
When I run
python ./example.exe -a
and
./neuzz -i neuzz_in -o seeds -l 7506 ./example.exe -a @@
I get this error:
In the other terminal: I get this error:
I tried to launch a fuzzing campaign on tiff2pdf in expectation to find vulnerabilities there, however ended up finding vulnerabilities in the fuzzer itself.
Actually, I was not able to fuzz tiff2pdf at all (with an initial corpus of 216 files whose sizes are around 200 bytes), since it crashed and caused a segmentation fault, as you can see at [6].
The crash that we have is at [1], where a negative (too big) length is passed to memcpy. Why does it happen? At line [2] the input is indeed sanitized by ignoring differences that are smaller or equal to 2.
However, len
is of type size_t
whereas del_loc
is of type int
, therefore len-del_loc
is unsigned and thus, it fails to check for locations that are higher than len
.
The reason why the location is even higher than the length is due to an other bug, namely due to an uninitialized memory error.
The location table is allocated as int loc[10000];
, however left uninitialized. Then, the input is parsed at [3]. Unfortunately, at [4], the length of loc is expected to have 1024 or more entries, hence it will read garbage if there are too few.
Moreover, trying to launch neuzz on the supplied testset would produce a mess in the directory, as shown at [5]. I haven't analyzed the cause of this yet. Note that I followed exactly the described steps to reproduce the results.
Kindly let me know if you require further information or help.
Best,
Andy Nguyen from ETH Zurich
[1] https://github.com/Dongdongshe/neuzz/blob/master/neuzz.c#L1318
[2] https://github.com/Dongdongshe/neuzz/blob/master/neuzz.c#L1312
[3] https://github.com/Dongdongshe/neuzz/blob/master/neuzz.c#L1871
[4] https://github.com/Dongdongshe/neuzz/blob/master/neuzz.c#L1310
[5] https://imgur.com/a/vvKB7HP
[6] Stack backtrace
Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms ()
at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:522
522 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0 __memmove_avx_unaligned_erms ()
at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:522
#1 0x00005555555589a9 in gen_mutate ()
#2 0x000055555555ea05 in fuzz_lop.constprop ()
#3 0x000055555555f667 in start_fuzz ()
#4 0x0000555555555f5e in main ()
(gdb) ir
rax 0x5555558198c3 93824995137731
rbx 0x555555795a60 93824994597472
rcx 0x555555761047 93824994381895
rdx 0xfffffffffff4945d -748451
rsi 0x555555817c0a 93824995130378
rdi 0x5555558198c3 93824995137731
rbp 0x33 0x33
rsp 0x7fffffffdea8 0x7fffffffdea8
r8 0x9 9
r9 0x555555762d00 93824994389248
r10 0x5555558166a4 93824995124900
r11 0x555555818e09 93824995134985
r12 0x55555578be20 93824994557472
r13 0xb2b08e9c 2997915292
r14 0x801 2049
r15 0xa67 2663
rip 0x7ffff7b72e4b 0x7ffff7b72e4b <__memmove_avx_unaligned_erms+891>
eflags 0x10286 [ PF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
---Type <return> to continue, or q <return> to quit--- fs 0x0 0
gs 0x0 0
(gdb)
Hello, I want to ask about the handling of crashes. How did you deal with these crashes? Are there any tools that can be used for reference? Thank you!
Hi, Dongdong!
I am reading your paper NEUZZ recently and it is really well written. I have some questions about the details in this paper.
"Furthermore, we only consider the edges that have been activated at least once in the training data."
"Intuitively, in our setting, the goal of gradient-based guidance is to find inputs that will change the output of the final
layer neurons corresponding to different edges from 0 to 1"
The goal of NEUZZ is to find new edges in the target program as many as possible, but when you build the NN model, you just
consider the edges that have been activated at least once in the training data, then select some output neurons to compute gradient to guide future mutation, and the final goal is to "change the output of the final layer neurons corresponding to different edges from 0 to 1".Now that the output of the final layer neurons represent the edges that have been found by the training data, what's the meaning of trying to change specific output neuron from 0 to 1.(I mean the edge represented by this neuron has been found by some input in the training data, why does NEUZZ try to find the edge again) . Why don't we also consider the edges that have not been activated in the training data, and try to change the output of the final layer neurons corresponding to these edges from 0 to 1, doesn't this means we successfully find some inputs which triger new edges that have not been activated by the training data?
What does the "unexplored edges " mean here, in the source code these edges are randomly choosen at every iteration. How does it enssure that these edges are those "unexplored edges ".
Thanks a lot!
I'm currently trying to run neuzz on the readelf program and keeping getting this error: Unable to execute programs/readelf/readelf
Ubuntu 18.0.4
Installed tensorflow, and keras using pip
Then I built neuzz using this line:
gcc -O3 -funroll-loops ./neuzz.c -o neuzz
Now, when I open two terminals and run
python nn.py ./readelf -a
in one of them, and
./neuzz -i neuzz_in -o seeds -l 7506 ./readelf -a @@
in another, I get an error that says that the readelf file is not executable. I've checked the properties of it, and it says that it is executable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.