aflplusplus / grammar-mutator Goto Github PK

View Code? Open in Web Editor NEW

210.0 7.0 18.0 557 KB

A grammar-based custom mutator for AFL++

License: Apache License 2.0

CMake 1.92% Makefile 2.41% Python 74.86% C 13.76% C++ 5.63% Ruby 1.43%

fuzzing aflplusplus afl afl-fuzz grammar-fuzzer

grammar-mutator's Introduction

Grammar Mutator - AFL++

A grammar-based custom mutator for AFL++ to handle highly-structured inputs.

Developer: Shengtuo Hu (h1994st)
Mentors: Marc Heuse, Andrea Fioraldi

Overview

We developed a grammar mutator to enhance AFL++ such that AFL++ can handle highly-structured inputs, such as JSON, Ruby, etc. The grammar mutator leverages the idea of F1 fuzzer and Nautilus for test case generation and mutations. In summary, this repository includes:

Tree-based mutation: rules mutation, random mutation, random recursive mutation, splicing mutation
Tree-based trimming: subtree trimming, recursive trimming
An ANTLR4 shim for parsing fuzzing test cases during the runtime
Documents about how to build the grammar mutator, specify custom grammars, and use the grammar mutator
Comprehensive test cases for unit testing
Sample grammar files and a script to convert nautilus's python grammar file

For more details about tree-based mutation, trimming, and grammar-based fuzzing, please refer to Nautilus paper.

A fuzzing writeup on Apache which uses the AFL++ Grammmar Mutator can be found here: https://securitylab.github.com/research/fuzzing-apache-1

Getting Started

Prerequisites

Before getting started, the following tools/packages should be installed:

sudo apt install valgrind uuid-dev default-jre python3
wget https://www.antlr.org/download/antlr-4.8-complete.jar
sudo cp -f antlr-4.8-complete.jar /usr/local/lib

If you do not leave the JAR file in the Grammar-Mutator directory or do not copy it to /usr/local/lib then you must specify the location via ANTLR_JAR_LOCATION=... in the make command.

Note that the grammar mutator is based on the latest custom mutator APIs in AFL++, so please use the latest dev or stable branch of AFL++.

git clone https://github.com/AFLplusplus/AFLplusplus.git
cd AFLplusplus
make distrib
sudo make install

Building the Grammar Mutator

Next you need to build the grammar mutator. To specify the grammar file, eg. Ruby, you can use GRAMMAR_FILE environment variable. There are several grammar files in grammars directory, such as json.json, ruby.json and http.json. Please refer to customizing-grammars.md for more details about the input grammar file. Note that pull requests with new grammars are welcome! :-)

make GRAMMAR_FILE=grammars/ruby.json

Note that the shared library and grammar generator are named after the grammar file that is specified so you can have multiple grammars generated. The grammar name part is based on the filename with everything cut off after a underline, dash or dot, hency ruby.json will result in ruby and hence grammar_generator-ruby and libgrammarmutator-ruby.so will be created. You can specify your own naming by setting GRAMMAR_FILENAME=yourname as make option.

Now, you should be able to see two symbolic files libgrammarmutator-ruby.so and grammar_generator-ruby under the root directory. These two files actually locate in the src directory.

If you would like to fork the project and fix bugs or contribute to the project, you can take a look at building-grammar-mutator.md for full building instructions.

Instrumenting Fuzzing Targets

You can refer to sample-fuzzing-targets.md to build the example fuzzing targets.

Seeds

Before fuzzing the real program, you need to prepare the input fuzzing seeds. You can either:

Generating seeds for a given grammar
Using existing seeds

Using Existing Seeds

You can feed your own fuzzing seeds to the fuzzer, which does not need to match with your input grammar file. Assuming that the grammar mutator is built with grammars/ruby.json, which is a simplified Ruby grammar and does not cover all Ruby syntax. In this case, the parsing error will definitely occur. For any parsing errors, the grammar mutator will not terminate but save the error portion as a terminal node in the tree, such that we will not lose too much information on the original test case.

To e.g. use the test cases of the mruby project as input fuzzing seeds just pass the -i mruby/test/t to afl-fuzz when we run the fuzzer (if it has been checked out with git clone https://github.com/mruby/mruby.git in the current directory).

Using Generated Seeds

grammar_generator can be used to generate input fuzzing seeds and corresponding tree files, following the grammar file that you specified during the compilation of the grammar mutator (i.e., GRAMMAR_FILE). You can control the number of generated seeds and the maximal size of the corresponding trees. Usually, the larger the tree size is, the more complex the corresponding input seed is.

# Usage
# ./grammar_generator-$GRAMMAR <max_num> <max_size> <seed_output_dir> <tree_output_dir> [<random seed>]
#
# <random seed> is optional
# e.g.:
./grammar_generator-ruby 100 1000 ./seeds ./trees

Afterwards copy the trees folder with that exact name to the output directory that you will use with afl-fuzz (e.g. -o out -S default):

mkdir -p out/default
cp -r trees out/default

Note that if you use multiple fuzzers (-M/-S sync mode) then you have to do this for all fuzzer instances, e.g. when the fuzzer instances are named fuzzer1 to fuzzer8:

for i in 1 2 3 4 5 6 7 8; do
  mkdir -p out/fuzzer$i
  cp -r trees out/fuzzer$i/
done

Fuzzing the Target with the Grammar Mutator!

Let's start running the fuzzer. The following example command is using Ruby grammar (from grammars/ruby.json) where mruby project has been cloned to the root Grammar-Mutator directory.

The default memory limit for child process is 75M in afl-fuzz. This may not be enough for some test cases, so it is recommended to increase it to 128M by adding an option -m 128.

export AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-ruby.so
export AFL_CUSTOM_MUTATOR_ONLY=1
afl-fuzz -m 128 -i seeds -o out -- /path/to/target @@

You may notice that the fuzzer will be stuck for a while at the beginning of fuzzing. One reason for the stuck is the large max_size (i.e., 1000) we choose, which results in a large size of test cases that increases the loading time. Another reason is the costly parsing operations in the grammar mutator. Since the input seeds are in string format, the grammar mutator needs to parse them into tree representations at first, which is costly. The large max_size passed into grammar_generator-$GRAMMAR does help us generate deeply nested trees, but it further increases the parsing overhead.

Changing the Default Configurations

Except for the deterministic rules mutation, users can change the default number of the following three types of mutations, by setting related environment variables:

RANDOM_MUTATION_STEPS: the number of random mutations
RANDOM_RECURSIVE_MUTATION_STEPS: the number of random recursive mutations
SPLICING_MUTATION_STEPS: the number of splicing mutations

By default, the number of each of these three mutations is 1000. Increase them on your own as follows, if needed. :)

export RANDOM_MUTATION_STEPS=10000
export RANDOM_RECURSIVE_MUTATION_STEPS=10000
export SPLICING_MUTATION_STEPS=10000
export AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-ruby.so
export AFL_CUSTOM_MUTATOR_ONLY=1
afl-fuzz -m 128 -i seeds -o out -- /path/to/target @@

Contact & Contributions

We welcome any questions and contributions! Feel free to open an issue or submit a pull request!

grammar-mutator's People

Contributors

Stargazers

Watchers

Forkers

zenhumany olivierh59500 crackercat sigma-random rmallof icodein znqt nathaniel-bennett gmh5225 1013503897 nikhilsingh2003 stonesteel84 rakhithjk asdyxcyxc 0x7fancy jzhang369 patricklopdrup

grammar-mutator's Issues

Segmentation fault when dealing with hex-ANSII conversion

Hi, I run into some problems when trying to generate a hex corpus and use that in a fuzz.

The version I use is AFL++ 4.01a release version and the latest of Grammar-Mutator in the stable branch. The fuzz target is compiled using afl-gcc-fast.

I'm trying to generate seeds based on the grammar shown below, following the solution in issue #29.

{
    "<start>": [["hex: ", "<hex>", "<hex2>"]],
    "<hex>": [["\u0087"], ["\u005a"]], 
    "<hex2>":[["\u0000"], ["\u0001"], ["\u0002"], ["\u0003"], ["\u0004"], ["\u0005"], ["\u0006"], ["\u0007"],
              ["\u0008"], ["\u0009"], ["\u000a"], ["\u000b"], ["\u000c"], ["\u000d"], ["\u000e"], ["\u000f"],
              ["\u0010"], ["\u0011"], ["\u0012"], ["\u0013"], ["\u0014"], ["\u0015"], ["\u0016"], ["\u0017"],
              ["\u0018"], ["\u0019"], ["\u001a"], ["\u001b"], ["\u001c"], ["\u001d"], ["\u001e"], ["\u001f"]]
}

I can successfully build the grammar mutator without any error.

Seeds can be generated using the grammar generator. I tested a few of them and they seem to be what I expected.

But when running afl-fuzz for the target, it will cause a segmentation fault before going into the fuzzing interface.

[*] Attempting dry run with 'id:000099,time:0,execs:0,orig:0'...
    len = 7, map size = 172, exec speed = 25 us
[!] WARNING: No new instrumentation output, test case may be useless.
[+] All test cases processed.
[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You have lots of input files; try starting small.
[+] Here are some useful stats:

    Test case count : 1 favored, 1 variable, 98 ignored, 100 total
       Bitmap range : 172 to 172 bits (average: 172.00 bits)
        Exec timing : 31 to 112 us (average: 28 us)

[*] No -t option specified, so I'll use an exec timeout of 20 ms.
[+] All set and ready to roll!
Segmentation fault

When I replaced "<hex>": [["\u0087"], ["\u005a"]], with "": [["\u001f"], ["\u001f"]] (some smaller numbers) in the grammar, the fuzzer is working fine.

Can someone help me with this problem? Any help is much appreciated.

Let me know if any other information is needed.

Grammar mutator issue : _pick_non_term_node

Hello .

When running Grammar mutator on a target, there is a problem right before running AFL++ on the target .

Here is the log :

mic@mic-System-Product-Name:~/Documents/AFLplusplus$ ./afl-fuzz -m 128 -d -i testcases/others/js/ -o myouts4 -- /home/mic/Documents/jerryscript/build/bin/jerry @@
[+] Loaded environment variable AFL_CUSTOM_MUTATOR_ONLY with value 1
[+] Loaded environment variable AFL_CUSTOM_MUTATOR_LIBRARY with value /home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so
afl-fuzz++4.00c based on afl by Michal Zalewski and a large online community
[+] afl++ is maintained by Marc "van Hauser" Heuse, Heiko "hexcoder" Eißfeldt, Andrea Fioraldi and Dominik Maier
[+] afl++ is open source, get it at https://github.com/AFLplusplus/AFLplusplus
[+] NOTE: This is v3.x which changes defaults and behaviours - see README.md
[+] No -M/-S set, autoconfiguring for "-S default"
[*] Getting to work...
[+] Using exponential power schedule (FAST)
[+] Enabled testcache with 50 MB
[*] Checking core_pattern...
[*] Checking CPU scaling governor...
[+] You have 24 CPU cores and 2 runnable tasks (utilization: 8%).
[+] Try parallel jobs - see docs/parallel_fuzzing.md.
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Checking CPU core loadout...
[+] Found a free CPU core, try binding to #0.
[*] Loading custom mutator library from '/home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so'...
[*] optional symbol 'afl_custom_post_process' not found.
[*] optional symbol 'afl_custom_havoc_mutation' not found.
[*] optional symbol 'afl_custom_havoc_mutation_probability' not found.
[*] Symbol 'afl_custom_describe' not found.
[+] Custom mutator '/home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so' installed successfully.
[*] Scanning 'testcases/others/js/'...
[+] Loaded a total of 1 seeds.
[*] Creating hard links for all input files...
[*] Validating target binary...
[*] Spinning up the fork server...
[+] All right - fork server is up.
[*] Target map size: 65536
[*] No auto-generated dictionary tokens to reuse.
[*] Attempting dry run with 'id:000000,time:0,execs:0,orig:small_script.js'...
    len = 20, map size = 1386, exec speed = 174 us
[+] All test cases processed.
[+] Here are some useful stats:

    Test case count : 1 favored, 0 variable, 0 ignored, 1 total
       Bitmap range : 1386 to 1386 bits (average: 1386.00 bits)
        Exec timing : 174 to 174 us (average: 174 us)

[*] No -t option specified, so I'll use an exec timeout of 20 ms.
[+] All set and ready to roll!
_pick_non_term_node returns NULL: No such file or directory

_pick_non_term_node returns NULL: No such file or directory

Flags :

export RANDOM_MUTATION_STEPS=10000
export RANDOM_RECURSIVE_MUTATION_STEPS=10000
export SPLICING_MUTATION_STEPS=10000
export AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-javascript.so
export AFL_CUSTOM_MUTATOR_ONLY=1

Ubuntu 20.04
AFL++ 4.00

Any ideas ?

Trimmed data returned by custom mutator is larger than original data

Error message:

[-] PROGRAM ABORT : Trimmed data returned by custom mutator is larger than original data
         Location : trim_case_custom(), src/afl-fuzz-mutators.c:287

The trimming strategies in the grammar mutator aim at reducing the tree size (i.e., the total number of non-terminal nodes). However, this does not guarantee the corresponding string is relatively small. For example, in JSON:

An input buffer is "\r-10", which is 4 bytes and has 16 non-terminal nodes.
The output trimmed buffer is "false", which is 5 bytes and has 6 non-terminal nodes.

Potential solution: we can allow the execution of the target, even if the trimmed data returned by the custom mutator is larger than the original data. This moves the responsibility of checking the trimming "size" to the custom mutator instead of the fuzzer.

(Need to think more on the solution)

Issue with recursive javascript grammar

Hi @h1994st

I'm trying to use nautilus grammars. With ruby grammar just fine, but javascript grammar I have error with it.

cityoflight@v8:~/Grammar-Mutator$ make -j8 GRAMMAR_FILE=grammars/javascript.json
Found antlr-4.8-complete: /usr/local/lib/antlr-4.8-complete.jar
Selected grammar name: javascript (from /home/cityoflight/Grammar-Mutator/grammars/javascript.json)
python3 grammars/f1_c_gen.py /home/cityoflight/Grammar-Mutator/grammars/javascript.json /home/cityoflight/Grammar-Mutator
python3 grammars/f1_c_gen.py /home/cityoflight/Grammar-Mutator/grammars/javascript.json /home/cityoflight/Grammar-Mutator
make[1]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/Cyan4973_xxHash'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/Cyan4973_xxHash'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/rxi_map'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/rxi_map'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[1]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party'

^CTraceback (most recent call last):
  File "grammars/f1_c_gen.py", line 646, in <module>
Traceback (most recent call last):
  File "grammars/f1_c_gen.py", line 646, in <module>
    main(json.load(fp), sys.argv[2])
  File "grammars/f1_c_gen.py", line 632, in main
    fuzz_hdr, fuzz_src = CFuzzer(c_grammar).fuzz_src()
  File "grammars/f1_c_gen.py", line 317, in __init__
    main(json.load(fp), sys.argv[2])
  File "grammars/f1_c_gen.py", line 632, in main
    super().__init__(grammar)
  File "grammars/f1_c_gen.py", line 262, in __init__
    self.compute_rule_recursion()
  File "grammars/f1_c_gen.py", line 310, in compute_rule_recursion
    fuzz_hdr, fuzz_src = CFuzzer(c_grammar).fuzz_src()
  File "grammars/f1_c_gen.py", line 317, in __init__
    self.rule_recursion[n] = self.is_rule_recursive(n, rule, set())
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    super().__init__(grammar)
  File "grammars/f1_c_gen.py", line 262, in __init__
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    self.compute_rule_recursion()
  File "grammars/f1_c_gen.py", line 310, in compute_rule_recursion
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  [Previous line repeated 16 more times]
KeyboardInterrupt
    self.rule_recursion[n] = self.is_rule_recursive(n, rule, set())
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  [Previous line repeated 15 more times]
  File "grammars/f1_c_gen.py", line 276, in is_rule_recursive
    for token in rule:
KeyboardInterrupt
make: *** [GNUmakefile:102: include/f1_c_fuzz.h] Interrupt
make: *** [GNUmakefile:102: src/f1_c_fuzz.c] Interrupt

I have to stop it with control C. And changing ctx.rule(u'PROGRAM',u'{STATEMENT}\n{PROGRAM}') to ctx.rule(u'PROGRAM',u'{STATEMENT}\n') still get same error.

I don't know the issue is in the javascript nautilus grammar file, or the generator grammars/f1_c_gen.py. If the recursive issue is in the grammar, what kind of pattern should I avoid?

optimized syntax '+' cause 'random_recursive_mutation' error

          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse:

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse:

so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in #17 (comment)

Readme and copyright note

After the gsoc deadline, before that we make it public, make sure to include copyright headers like this:

/*
   american fuzzy lop++ - grammar mutator
   --------------------------------------

   Written by Shengtuo Hu

   Copyright 2020 AFLplusplus Project. All rights reserved.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at:

     http://www.apache.org/licenses/LICENSE-2.0

   A grammar-based custom mutator written for GSoC '20.

 */

In the readme, provide a bit of context (this is a gsoc project etc), a brief discussion about compatible grammars (antlr, json atm)
and a simple how to with a screenshot maybe.

Use of ANTLR grammar instead of json-formatted grammar

Hi,

Is there a way to produce the generator and mutator from an ANTLR grammar instead of a json-formatted grammar? I know that the antlr grammar is generated and used in intermediate steps, so there should be a way to initiate the process directly with ANTLR, example:

make GRAMMAR_FILE=grammars/ruby.g4

Thank you.

Idea list

Let's collect some ideas on how to improve the grammar mutator.
I am not an expert on this, so some ideas might be not possible, no sense or even makes things worse.

Use the dictionary with the grammar (-x + LTO AUTODICT feature)
Increase the tree depth with every new cycle without finds (example on how to pass this to the mutator is in examples/honggfuzz/honggfuzz.c)
... ?

Also:
document for a mutation which mutation strategies were used, and if it results in a new path, crash or hang, document these away somewhere (fopen("a")... fwrite() ... fclose() would be fine enough), and learn which types are more effective than others, and then try to improve them. maybe weighting, maybe changing how unsuccessful techniques work, etc. (and of course this feature with an #ifdef TESTING or something like that.

pinging @h1994st @andreafioraldi @eqv for more ideas

A question about convert the ASCII

Hi，I had a problem when I using this tool.
I want use some numbers (eg. 1, 2, 3 ......) as a seed content then I will send it to a network server to fuzzing. But I found when I use socket to send it , the number will change to ASCII. I want the number can keep hex model(eg. 1 -> 01, 30 -> 30).
If you could tell me, think you very much.

SEGV in afl_custom_fuzz_count

I'm trying to fuzz mruby using the testcases in mruby/test/t/ (and not the testcases generated with grammar_generator) to test the antlr shim and I get

==1141== Memcheck, a memory error detector
==1141== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1141== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1141== Command: afl-fuzz -i in -o out3 -m none -- ./bin/mruby @@
==1141== 
==1141== Conditional jump or move depends on uninitialised value(s)
==1141==    at 0x12775F: bind_to_free_cpu (afl-fuzz-init.c:215)
==1141==    by 0x10F6D4: main (afl-fuzz.c:1091)
==1141== 
==1141== Invalid read of size 4
==1141==    at 0x6D29781: afl_custom_fuzz_count (grammar_mutator.c:304)
==1141==    by 0x13AFD0: fuzz_one_original (afl-fuzz-one.c:1679)
==1141==    by 0x138004: fuzz_one (afl-fuzz-one.c:4893)
==1141==    by 0x10FC14: main (afl-fuzz.c:1437)
==1141==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==1141== 
==1141== 
==1141== Process terminating with default action of signal 11 (SIGSEGV)
==1141==  Access not within mapped region at address 0x4
==1141==    at 0x6D29781: afl_custom_fuzz_count (grammar_mutator.c:304)
==1141==    by 0x13AFD0: fuzz_one_original (afl-fuzz-one.c:1679)
==1141==    by 0x138004: fuzz_one (afl-fuzz-one.c:4893)
==1141==    by 0x10FC14: main (afl-fuzz.c:1437)
==1141==  If you believe this happened as a result of a stack
==1141==  overflow in your program's main thread (unlikely but
==1141==  possible), you can try to increase the size of the
==1141==  main thread stack using the --main-stacksize= flag.
==1141==  The main thread stack size used in this run was 8388608.
==1141== 
==1141== HEAP SUMMARY:
==1141==     in use at exit: 4,641,445 bytes in 44,091 blocks
==1141==   total heap usage: 76,448 allocs, 32,357 frees, 9,150,336 bytes allocated
==1141== 
==1141== LEAK SUMMARY:
==1141==    definitely lost: 0 bytes in 0 blocks
==1141==    indirectly lost: 0 bytes in 0 blocks
==1141==      possibly lost: 16,896 bytes in 2 blocks
==1141==    still reachable: 4,624,549 bytes in 44,089 blocks
==1141==         suppressed: 0 bytes in 0 blocks
==1141== Rerun with --leak-check=full to see details of leaked memory
==1141== 
==1141== For counts of detected and suppressed errors, rerun with: -v
==1141== Use --track-origins=yes to see where uninitialised values come from
==1141== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

incorrect rule index deduction from ANTLR

During the input parsing shim, nodes are created using
node = node_create_with_rule_id(non_terminal_node->getRuleIndex(), non_terminal_node->getAltNumber() - 1);
However, in my tests the antlr4::ParserRuleContext node's getAltNumber() returns 0 on OUTER recursive grammar nodes. Therefore all nodes up to the inner one will have invalid rule_id.

For example, for this G4 grammar:
A: B | A B B: "MYTOKEN" entry: A

The input "MYTOKEN MYTOKEN MYTOKEN" will be parsed as
entry -> A -> A -> A -> B
++++++|+++| -> B
++++++| -> B

The last A will have rule_id = 0, the previous ones have rule_id = MAXUINT.
While incidentally specifically here this will not screw up the fuzzer behavior, when there are various recursive expansions it is a major issue.

A question about data length

Hi，I had a problem when I using Grammar-Mutator.
I want define a long hex in my setup file, like this

But when I use it in afl++, the program ran there will meet a bug, like this

I change the code to printf the "ret". In normal ,the "ret" should be 1. But in my program "ret" is 0, like this

As "ret" is 0, the program will not run in Grammar-Mutator mode. So can you help me? Thank you very much.

Issue with parallel build

make -j8 fails with

make[2]: Entering directory '/home/andrea/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[2]: *** No rule to make target 'antlr4-cpp-runtime-src/runtime/src/ANTLRErrorListener.o', needed by 'libantlr4-runtime.a'.  Stop.
make[2]: *** Waiting for unfinished jobs....

Feebdack

I used the mruby example and got this just when starting up:

┘mutation error: No such file or directory

[-] PROGRAM ABORT : Error in custom_fuzz. Size returned: 0
         Location : fuzz_one_original(), src/afl-fuzz-one.c:1747

It should all be there:

AFL_CUSTOM_MUTATOR_ONLY=1
AFL_CUSTOM_MUTATOR_LIBRARY=/prg/Grammar-Mutator/trunk/src/libgrammarmutator.so
afl-fuzz -i in -o out -- mruby/bin/mruby @@
ls out/trees/
...
id:000070,time:0,orig:70  id:000156,time:0,orig:156  id:000242,time:0,orig:242
id:000071,time:0,orig:71  id:000157,time:0,orig:157  id:000243,time:0,orig:243
id:000072,time:0,orig:72  id:000158,time:0,orig:158  id:000244,time:0,orig:244
id:000073,time:0,orig:73  id:000159,time:0,orig:159  id:000245,time:0,orig:245
...

more feedback:

IMHO the GRAMMAR_FILE env var should always be required. having a JSON default is not helpful.
./grammar_generator 123 100 1000 /tmp/seeds /tmp/trees -> not found. it is src/grammar_generator.
better copy the grammar_generator and the .so to the project root when done compiling, maybe even with the grammar type in their filename?

export export AFL_CUSTOM_MUTATOR_LIBRARY=/path/to/libgrammarmutator.so -> double export, also again below

dont put -o to /tmp, this is not best practice. just leave paths away so the example work in the the current directory

Inconsistency between compilations

Hi!

I was testing the project with this simple grammar rules:

{
	"<START>": [
		["<?php", "<FUZZ>", "\n?>"]
	],
	"<FUZZ>": [
		["\ntry {\n try {\n", "<DEFCLASS>", "\n} catch (Exception $e){}} catch(Error $e){}", "<FUZZ>"],[]
	],
	"<DEFCLASS>" : [
		["class ", "<CLASSNAME>", " {\n\t", "<CLASSBODY>","}\n"],[]
	],
	"<CLASSNAME>" : [
		["Class01"],
		["Class02"],
		["Class03"],
		["Class04"],
		["Class05"]
	],
	"<CLASSBODY>" : [
		["test01;\n"],
		["test02;\n"],
		["test03;\n"]
	]
}

Every time I compile (make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json) and test the rules with the generator I obtain different results: sometimes it only picks a rule, other a small chain of rules...:

 psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
 psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
class Class03 {
	test01;
}

psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   make clean && make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json && ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
<?php
try {
 try {
class Class04 {
	test03;

}

} catch (Exception $e){}} catch(Error $e){}
?>

psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   make clean && make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json && ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
Class01

I am not sure if I am creating the rules in a wrong way (but checking documentation and the ruby example it looks fine to me).

UnicodeEncodeError

Hi,

I'm trying to use the tool to generate Javascript testcases, however when I run make GRAMMAR_FILE=grammars/javascript.json
I got the following result: UnicodeEncodeError: 'latin-1' codec can't encode character '\u2421' in position 0: ordinal not in range(256)

Then if I try to do the same for the ruby.json, the previous failed execution somehow corrupt the .jar file Error: Invalid or corrupt jarfile /usr/local/lib

Any help?

Progress Report

Hi Shengtuo,

please put your weekly progress reports in this issue.
Besides, I have not seen any progress the last 5 days. Do you need assistance?

Test compilation error

When compilings with ENABLE_TESTING=ON, I get the following issue:

<some_path>/Grammar-Mutator/include/custom_mutator.h:26:9: error: empty struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat] typedef struct afl { ^ 1 error generated. make[2]: *** [tests/CMakeFiles/test_custom_mutator.dir/test_custom_mutator.cpp.o] Error 1 make[1]: *** [tests/CMakeFiles/test_custom_mutator.dir/all] Error 2 make: *** [all] Error 2

I think we need to remove this error somehow inside the code.

BTW I use the default Xcode compiler.

Grammar Mutator crashes due to null pointer dereference on write_tree_to_file

A crash happens when writing some trees to file:

881 ret = write(fd, tree->ser_buf, tree->ser_len);
(gdb) bt
#0 0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650788 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
#1 0x00007ffff6a38335 in afl_custom_queue_new_entry (data=0x55555564f6e0, filename_new_queue=0x55555a6e7720 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", filename_orig_queue=0x55555565ac00 "fuzzer/custom/queue/id:000028,time:0,orig:34")
at grammar_mutator.c:563
#2 0x00005555555926e6 in add_to_queue (afl=0x7ffff7655010, fname=0x55555a6e7720 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", len=, passed_det=) at src/afl-fuzz-queue.c:473
#3 0x0000555555562c91 in save_if_interesting (afl=afl@entry=0x7ffff7655010, mem=mem@entry=0x7ffff7ffb000, len=109, fault=0 '\000') at src/afl-fuzz-bitmap.c:516
#4 0x000055555556ded3 in sync_fuzzers (afl=) at src/afl-fuzz-run.c:667
#5 0x000055555555f578 in main (argc=, argv_orig=, envp=) at src/afl-fuzz.c:2037
0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650a58 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
881 ret = write(fd, tree->ser_buf, tree->ser_len);
(gdb) p fd
$4 = 14
(gdb) p tree->ser_buf
Cannot access memory at address 0x20
(gdb) p tree
$5 = (tree_t *) 0x0

As we can see, 'tree' here is NULL, and yet tree->ser_buf is dereferenced:

(gdb) l
876 return;
877
878 }
879
880 // Write the data
881 ret = write(fd, tree->ser_buf, tree->ser_len);
882 if (unlikely(ret < 0)) {
883
884 perror("Unable to write (write_tree_to_file)");
885 return;
(gdb) bt
#0 0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650a58 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
#1 0x00007ffff6a38335 in afl_custom_queue_new_entry (data=0x55555564f9b0, filename_new_queue=0x55555577e7f0 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", filename_orig_queue=0x55555565af10 "fuzzer/custom/queue/id:000028,time:0,orig:34")
at grammar_mutator.c:563
#2 0x00005555555926e6 in add_to_queue (afl=0x7ffff7655010, fname=0x55555577e7f0 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", len=, passed_det=) at src/afl-fuzz-queue.c:473
#3 0x0000555555562c91 in save_if_interesting (afl=afl@entry=0x7ffff7655010, mem=mem@entry=0x7ffff7ffb000, len=109, fault=0 '\000') at src/afl-fuzz-bitmap.c:516
#4 0x000055555556ded3 in sync_fuzzers (afl=) at src/afl-fuzz-run.c:667
#5 0x000055555555f578 in main (argc=, argv_orig=, envp=) at src/afl-fuzz.c:2037

My setup:
AFL_IMPORT_FIRST=1 AFL_NO_AFFINITY=1 AFL_MAP_SIZE=137344 afl-fuzz -x dict.dict -i ~/Grammar-Mutator/seeds/ -o fuzzer -S custom ./program
other children (and main) are not using grammar-mutator.
fuzzer/custom/trees/ exists and is populated per the manual.

Long recursive calls cause afl to segfault

As discussed in #14 the following grammar causes a segfault from AFL (maybe only on startup?): https://paste.pr0.tips/rm

This is due to really long recursion:

#764 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#765 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#766 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#767 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#768 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#769 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#770 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#771 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#772 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#773 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#774 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#775 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#776 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so

core dump

afl-fuzz coredumps in the grammar mutator with Program received signal SIGSEGV, Segmentation fault.

#0  0x00007ffff7fb6f56 in afl_custom_trim (data=0x555555647690, out_buf=0x7fffffffc638) at /prg/Grammar-Mutator/branches/dev/src/grammar_mutator.cpp:114
#1  0x0000555555562a20 in trim_case_custom (mutator=0x5555556459f0, in_buf=0x7ffff7ffb000 "30E-0", q=0x5555556d9f20, afl=0x5555555c0400) at src/afl-fuzz-mutators.c:277
#2  trim_case (afl=0x5555555c0400, q=0x5555556d9f20, in_buf=0x7ffff7ffb000 "30E-0") at src/afl-fuzz-run.c:629
#3  0x000055555558465d in fuzz_one_original (afl=0x5555555c0400) at src/afl-fuzz-one.c:526
#4  0x000055555555c82e in fuzz_one (afl=0x5555555c0400) at src/afl-fuzz-one.c:4731
#5  main (argc=<optimized out>, argv_orig=<optimized out>, envp=<optimized out>) at src/afl-fuzz.c:1278

command line was

# env|grep AFL
AFL_CUSTOM_MUTATOR_ONLY=1
AFL_CUSTOM_MUTATOR_LIBRARY=/prg/Grammar-Mutator/branches/dev/build/src/libgrammarmutator.so
# afl-fuzz -i in -o out -- ../../json-parser/test_json @@

Memory leaks in `splicing_mutation`

As indicated by CI results, there are 5 memory leaks in splicing_mutation (see below).

Better to throw an error while encountering memory leaks?

==4365== 8 bytes in 1 blocks are indirectly lost in loss record 1 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED894: node_init_subnodes (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB6F: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 64 bytes in 1 blocks are indirectly lost in loss record 2 of 5
==4365==    at 0x483B723: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x483E017: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48EDA66: node_set_val (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB45: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB8B: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 72 bytes in 1 blocks are indirectly lost in loss record 3 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED7A9: node_create (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB20: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 72 bytes in 1 blocks are indirectly lost in loss record 4 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED7A9: node_create (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB20: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB8B: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 288 (72 direct, 216 indirect) bytes in 1 blocks are definitely lost in loss record 5 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48EE63C: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x116E23: main (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==

Statistics

I this issue I will collect statistics during my tests over the next 8 weeks :)

Is it possible to automatically eliminate indirect left-recursion

I generated the Lua grammar file using the following command:

python3 nautilus_py_grammar_to_json.py . /nautilus_py_grammars/lua.py . /lua.json

But when compiling it prompts :

make GRAMMAR_FILE=grammars/lua.json

error(119): /home/eqqie/Fuzz/Grammar-Mutator/grammars/Grammar.g4::: The following sets of rules are mutually left-recursive [node_FUNCTIONCALL, node_EXPR]

Even though I can manually eliminate such easy indirect left-recursion, it's a big headache for complex syntax, is it currently possible to do it automatically? 😂

How to add extras dynamically during fuzzing

As we know, naive afl would add some extras to auto_extras directory in queue and use them as a 'dynamic dictionary'.
I would like to implement this function also in grammar mutator, then when the fuzzer extracts some strings during execution, they could be passed to this mutator.
Are there any plan to make this feature ? Or I have to repeat modifying the grammar json file and rebuilding the .so library.

Wasteful rebuilding of non-terminal trees

tree_get_non_terminal_nodes() always recalculates the non_terminal_node_list, and is called for every new sample at afl_custom_fuzz_count(). Those elements are popped at afl_custom_fuzz() before being passed to the mutator.
Instead, we could keep the non-terminal list in the serialized tree format and update it per mutation. It will not be recalculated for every sample.

`tree_from_buf` hangs when parsing a small test case

Environment

Ubuntu 20.04.1 on amd64. Grammar-mutator commit cbe5e32752773945e0142fac9f1b7a0ccb5dcdff and afl++ version 4.01a

Description

The grammar mutator takes an exceptionally long time to generate a tree from a given input (even when that input is empty) when using a custom grammar file attached below. More specifically, it hangs at the function tree_from_buf, a complete backtrace is included below.
I suspect this might have to do with the grammar file, since I haven't been able to reproduce the bug with other grammars. But since it is also true that the generated test case is the simplest possible case according to the grammar, I am unsure as to what might be causing it.

This is concerning because a fuzzing campaign using this grammar can only be started from a few (at most) inputs, and when interrupted the campaign cannot be resumed since the time cost becomes prohibitive.

How to reproduce

Build Grammar-Mutator using the file attached below with make ENABLE_DEBUG=1 GRAMMAR_FILE=python.json
We don't need any specific instrumented binary, since the issue takes place before the fuzzing starts, so I am going to use echo as an example
Generate a simple test case: grammar_generator-python 1 50 in out/default/trees. In my case the generated input was:

a=1
b=1.203213
foo='123213'
bar=b'123\x01'
1=a

which is one of the simplest test cases possible according to the grammar. With tree file (base64 encoded):

AQAAAAAAAAACAAAAAAAAAAIAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAArAAAAYT0xCmI9MS4y
MDMyMTMKZm9vPScxMjMyMTMnCmJhcj1iJzEyM1x4MDEnCgMAAAAAAAAAAgAAAAAAAAAKAAAAAAAA
AAEAAAAAAAAACwAAAAAAAAACAAAAAAAAAAwAAAAAAAAAAQAAAAAAAAAOAAAAAAAAAAIAAAAAAAAA
DwAAAAAAAAABAAAAAAAAAC8AAAAAAAAAAQAAAAAAAAAzAAAAAAAAAAEAAAAAAAAANAAAAAAAAAAB
AAAAAAAAADUAAAAAAAAAAQAAAAAAAAA2AAAAAAAAAAEAAAAAAAAAOQAAAAAAAAABAAAAAAAAADoA
AAAAAAAAAQAAAAAAAAA7AAAAAAAAAAEAAAAAAAAAPAAAAAAAAAABAAAAAAAAAD0AAAAAAAAAAQAA
AAAAAAA+AAAAAAAAAAEAAAAAAAAAPwAAAAAAAAABAAAAAAAAAEEAAAAAAAAAAgAAAAAAAABDAAAA
AQAAAAEAAAAAAAAAXAAAAAEAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAxQAAAAAAAAAAAAAAA
AAAAAA0AAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAAAAABAAAAPQ8AAAAAAAAAAQAAAAAAAAAvAAAA
AAAAAAEAAAAAAAAAMwAAAAAAAAABAAAAAAAAADQAAAAAAAAAAQAAAAAAAAA1AAAAAAAAAAEAAAAA
AAAANgAAAAAAAAABAAAAAAAAADkAAAAAAAAAAQAAAAAAAAA6AAAAAAAAAAEAAAAAAAAAOwAAAAAA
AAABAAAAAAAAADwAAAAAAAAAAQAAAAAAAAA9AAAAAAAAAAEAAAAAAAAAPgAAAAAAAAABAAAAAAAA
AD8AAAAAAAAAAQAAAAAAAABBAAAAAAAAAAIAAAAAAAAAQwAAAAAAAAABAAAAAAAAAFgAAAAAAAAA
AQAAAAAAAAAAAAAAAAAAAAAAAAABAAAAYUAAAAAAAAAAAAAAAAAAAABaAAAAAAAAAAEAAAAAAAAA
AAAAAAAAAAAAAAAAAQAAAApaAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAo=

Start fuzzing: AFL_CUSTOM_MUTATOR_ONLY=1 AFL_CUSTOM_MUTATOR_LIBRARY=libgrammarmutator-python.so afl-fuzz -i in -o out -- ./echo @@

The grammar file is:

# python.json
{
	"<START>": [["<defs>", "<program>"]],
	"<defs>": [["a=1\nb=1.203213\nfoo='123213'\nbar=b'123\\x01'\n"]],
	"<program>": [["<stmt>", "<NEWLINE>"], ["<stmt>", "<NEWLINE>", "<program>"]],
	
	"<decorator>": [["@", "<dotted_name>", "(", "<arglist>", ")", "<NEWLINE>"],
		["@", "<dotted_name>", "<NEWLINE>"]],
	"<decorators>": [["<decorator>"], ["<decorator>", "<decorators>"]],
	"<decorated>": [["<decorators>", "<classdef>"], ["<decorators>", "<funcdef>"]],
	"<funcdef>": [["def ", "<NAME>", "<parameters>", "->", "<test>", ":", "<suite>"],
		["def ", "<NAME>", "<parameters>", ":", "<suite>"]],
	"<parameters>": [["(", ")"], ["(", "<typedargslist>", ")"]],
	"<typedargslist>": [
		["<NAME>", ", ", "<typedargslist>"], ["<NAME>"], ["*", "<NAME>"], ["**", "<NAME>"], ["*", "<NAME>", ", ", "**", "<NAME>"], []
	],
	
	"<stmt>": [["<simple_stmt>"], ["<compound_stmt>"]],
	"<simple_stmt>": [["<small_stmt>", "<NEWLINE>"], ["<small_stmt>", ";", "<simple_stmt>"]],
	"<small_stmt>": [["<expr_stmt>"], ["<del_stmt>"], ["<pass_stmt>"], ["<flow_stmt>"],
		["<import_stmt>"], ["<global_stmt>"], ["<nonlocal_stmt>"], ["<assert_stmt>"]],
	"<aux_equals_sequence>": [["=", "<testlist_star_expr>"], ["=", "<yield_expr>"],
		["=", "<testlist_star_expr>", "<aux_equals_sequence>"], ["=", "<yield_expr>", "<aux_equals_sequence>"]],
	"<expr_stmt>": [["<testlist_star_expr>", "<augassign>", "<yield_expr>"],
		["<testlist_star_expr>", "<augassign>", "<testlist>"],
		["<testlist_star_expr>", "<aux_equals_sequence>"]],
	"<testlist_star_expr>": [["<test>"], ["<star_expr>"], ["<star_expr>", ",", "<testlist_star_expr>"],
		["<test>", ",", "<testlist_star_expr>"]],
	"<augassign>": [["+="], ["-="], ["*="], ["/="], ["%="],  ["&="],  ["|="],  ["^="], 
		["<<="],  [">>="],  ["**="],  ["//="]],
	"<del_stmt>": [["del ", "<exprlist>"]],
	"<pass_stmt>": [["pass"]],
	"<flow_stmt>": [["<break_stmt>"], ["<continue_stmt>"], ["<return_stmt>"], ["<raise_stmt>"], ["<yield_stmt>"]],
	"<break_stmt>": [["break"]],
	"<continue_stmt>": [["continue"]],
	"<return_stmt>": [["return ", "<testlist>"], ["return"]],
	"<yield_stmt>": [["<yield_expr>"]],
	"<raise_stmt>": [["raise"], ["raise ", "<test>"], ["raise ", "<test>", " from ", "<test>"]],
	"<import_stmt>": [["<import_name>"]],
	"<import_name>": [["import ", "<dotted_import_as_names>"]],
	"<dotted_import_as_name>": [["<dotted_import_name>"], ["<dotted_import_name>", " as ", "<NAME>"]],
	"<dotted_import_as_names>": [["<dotted_import_as_name>"], ["<dotted_import_as_name>", ", ", "<dotted_import_as_names>"]],
	"<aux_trailing_dots>": [[".", "<NAME>", "<aux_trailing_dots>"], [".", "<NAME>"]],
	"<dotted_import_name>": [["<IMODULE_NAME>"], ["<IMODULE_NAME>", "<aux_trailing_dots>"]],
	"<dotted_name>": [["<NAME>"], ["<IMODULE_NAME>"], ["<NAME>", "<aux_trailing_dots>"], ["<IMODULE_NAME>", "<aux_trailing_dots>"]],
	"<global_stmt>": [["global ", "<NAME>"]],
	"<nonlocal_stmt>": [["nonlocal ", "<NAME>"]],
	"<assert_stmt>": [["assert ", "<test>"]],
	"<compound_stmt>": [["<if_stmt>"], ["<while_stmt>"], ["<for_stmt>"], 
		["<try_stmt>"], ["<with_stmt>"], ["<funcdef>"], ["<classdef>"], ["<decorated>"]],
	"<aux_elif_stmts>": [[], ["elif ", "<test>", ":", "<suite>", "<aux_elif_stmts>"]],
	"<if_stmt>": [["if ", "<test>", ":", "<suite>", "<aux_elif_stmts>"], ["if ", "<test>", ":", "<suite>", "<aux_elif_stmts>", "else", ":", "<suite>"]],
	"<while_stmt>": [["while ", "<test>", ":", "<suite>"], ["while ", "<test>", ":", "<suite>", "else", ":", "<suite>"]],
	"<for_stmt>": [["for ", "<exprlist>", " in ", "<testlist>", " : ", "<suite>"],
		["for ", "<exprlist>", " in ", "<testlist>", ":", "<suite>", "else", ":", "<suite>"]],
	"<aux_except_seq>": [["<except_clause>", ":", "<suite>"], ["<except_clause>", ":", "<suite>", "<aux_except_seq>"]],
	"<try_stmt>": [["try", ":", "<suite>", "<aux_except_seq>", "else", ":", "<suite>"],
		["try", ":", "<suite>", "<aux_except_seq>", "finally", ":", "<suite>"],
		["try", ":", "<suite>", "<aux_except_seq>", "else", ":", "<suite>", "finally", ":", "<suite>"],
		["try", ":", "<suite>", "finally", ":", "<suite>"]],
	"<with_stmt>": [["with ", "<with_item>", ":", "<suite>"]],
	"<with_item>": [["<test>"], ["<test>", " as ", "<expr>"]],
	"<except_clause>": [["except"], ["except ", "<test>"], ["except ", "<test>", " as ", "<NAME>"]],
	"<aux_stmt_seq>": [["<stmt>"], ["<stmt>", "<aux_stmt_seq>"]],
	"<suite>": [["<simple_stmt>"], ["<NEWLINE>", "arrancalasuite", "<NEWLINE>", "<aux_stmt_seq>", "<NEWLINE>", "terminalasuite", "<NEWLINE>"]],
	"<test>": [["<or_test>", " if ", "<or_test>", " else ", "<test>"],
		["<or_test>"], ["<lambdef>"]],
	"<test_nocond>": [["<or_test>"], ["<lambdef_nocond>"]],
	"<lambdef>": [["lambda ", "<typedargslist>", ":", "<test>"], ["lambda ", ":", "<test>"]],
	"<lambdef_nocond>": [["lambda ", "<typedargslist>", ":", "<test_nocond>"], ["lambda ", ":", "<test_nocond>"]],
	"<or_test>": [["<and_test>"], ["<and_test>", " or ", "<or_test>"]],
	"<and_test>": [["<not_test>"], ["<not_test>", " and ", "<and_test>"]],
	"<not_test>": [["not ", "<not_test>"], ["<comparison>"]],
	"<comparison>": [["<expr>"], ["<expr>", "<comp_op>", "<comparison>"]],
	"<comp_op>": [["<"], [">"], ["=="], [">="], ["<="], ["!="], [" in "], [" not in "], [" is "], [" is not "]],
	"<star_expr>": [["*", "<expr>"]],
	"<expr>": [["<xor_expr>"], ["<xor_expr>", "|", "<expr>"]],
	"<xor_expr>": [["<and_expr>"], ["<and_expr>", "^", "<xor_expr>"]],
	"<and_expr>": [["<shift_expr>"], ["<shift_expr>", "&", "<and_expr>"]],
	"<shift_expr>": [["<arith_expr>"], ["<arith_expr>", "<<", "<shift_expr>"], ["<arith_expr>", ">>", "<shift_expr>"]],
	"<arith_expr>": [["<term>"], ["<term>", "+", "<arith_expr>"], ["<term>", "-", "<arith_expr>"]],
	"<term>": [["<factor>"],
		["<factor>", "*", "<term>"],
		["<factor>", "/", "<term>"],
		["<factor>", "%", "<term>"],
		["<factor>", "//", "<term>"]],
	"<factor>": [["+", "<factor>"],
		["-", "<factor>"],
		["~", "<factor>"],
		["<power>"]],
	"<aux_trailer_seq>": [[], ["<trailer>", "<aux_trailer_seq>"]],
	"<power>": [["<atom>", "<aux_trailer_seq>"],
		["<atom>", "<aux_trailer_seq>", "**", "<factor>"]],
	"<aux_string_seq>": [["<STRING>"], ["<STRING>", "<aux_string_seq>"]],
	"<atom>": [["<NAME>"], ["<NUMBER>"], ["<aux_string_seq>"], ["..."], ["None"], ["True"], ["False"],
		["[]"], ["{}"], ["[", "<testlist_comp>", "]"], ["{", "<dictorsetmaker>", "}"], ["()"],
		["(", "<yield_expr>", ")"], ["(", "<testlist_comp>", ")"]],
	"<aux_test_star_seq>": [[], [",", "<test>", "<aux_test_star_seq"], [",", "<star_expr>", "<aux_test_star_seq>"]],
	"<testlist_comp>": [["<test>", "<comp_for>"], ["<star_expr>", "<comp_for>"],
		["<test>", "<aux_test_star_seq>"], ["<star_expr>", "<aux_test_star_seq>"]],
	"<trailer>": [["()"], ["(", "<arglist>", ")"], ["[]"], ["[", "<subscriptlist>", "]"], [".", "<NAME>"]],
	"<subscriptlist>": [["<subscript>"], ["<subscript>", ",", "<subscriptlist>"]],
	"<subscript>": [["<test>"], [":"], ["<test>", ":"], [":", "<test>"], [":", "<sliceop>"], ["<test>", ":", "<test>"], [":", "<test>", "<sliceop>"], ["<test>", ":", "<sliceop>"], ["<test>", ":", "<test>", "<sliceop>"]],
	"<sliceop>": [[":"], [":", "<test>"]],
	"<exprlist>": [["<expr>"], ["<star_expr>"], ["<expr>", ",", "<exprlist>"], ["<star_expr>", ",", "<exprlist>"]],
	"<testlist>": [["<test>"], ["<test>", ",", "<testlist>"]],
	"<aux_test_seq>": [[], [",", "<test>", ":", "<test>", "<aux_test_seq>"]],
	"<aux_test_seq2>": [[], [",", "<test>", "<aux_test_seq>"]],
	"<dictorsetmaker>": [["<test>", ":", "<test>", " ", "<comp_for>"],
		["<test>", ":", "<test>", " ", "<aux_test_seq>"],
		["<test>", " ", "<comp_for>"],
		["<test>", " ", "<aux_test_seq2>"]],
	
	"<classdef>": [["class ", "<NAME>", "(", "<arglist>", ")", ":", "<suite>"], ["class ", "<NAME>", "()", ":", "<suite>"], ["class ", "<NAME>", ":", "<suite>"]],
	
	"<aux_arugment_seq>": [[], ["<argument>", ",", "<aux_arugment_seq>"]],
	"<arglist>": [["<aux_arugment_seq>", "**", "<test>"], ["<aux_arugment_seq>","<argument>"], ["<aux_arugment_seq>", "*", "<test>", ",", "<aux_arugment_seq>", "**", "<test>"], ["<aux_arugment_seq>", "*", "<test>", ",", "<aux_arugment_seq>"]],
	"<argument>": [["<NAME>"], ["<NAME>", "<comp_for>"], ["<NAME>", "=", "<test>"]],
	"<comp_iter>": [["<comp_for>"], ["<comp_if>"]],
	"<comp_for>": [["for ", "<exprlist>", " in ", "<or_test>"], ["for ", "<exprlist>", " in ", "<or_test>", "<comp_iter>"]],
	"<comp_if>": [["if ", "<test_nocond>"], ["if ", "<test_nocond> ", "<comp_iter>"]],
	
	"<yield_expr>": [["yield"], ["yield ", "<yield_arg>"]],
	"<yield_arg>": [["from ", "<test>"], ["from ", "<testlist>"]],
	"<NAME>": [["a"], ["b"], ["foo"], ["bar"], ["foobar"]],
	"<IMODULE_NAME>": [["math"]],
	"<NEWLINE>": [["\n"]],
	"<STRING>": [["'asd'"], ["(0x1000 * 'a')"], ["'aaaaaaaaaaaabbbbbbbbbbbbbbbb'"], ["'zarakatunga'"]],
	"<NUMBER>": [["0"], ["1"], ["0.1"], ["1/2"], ["-1"], ["11111111111"], ["1e100"], ["1e-100"], ["1e999999"], ["3.14"]]
}

I've run afl-fuzz with gdb, let it run for some minutes and interrupted it in order to check where it was hanging and got this backtrace (the same after several tries):

[...]
#61 0x00007fffeebd69dc in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555637860, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:136
#62 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555637920, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#63 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555634240, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#64 0x00007fffeebd69dc in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x5555556394d0, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (empty) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:136
#65 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555639770, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (empty) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#66 0x00007fffeebd620b in antlr4::atn::LL1Analyzer::LOOK (this=0x7fffffffb340, s=0x555555639770, stopState=0x0, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:67
#67 0x00007fffeebd60d7 in antlr4::atn::LL1Analyzer::LOOK (this=0x7fffffffb340, s=0x555555639770, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:57
#68 0x00007fffeeba7aef in antlr4::atn::ATN::nextTokens (this=0x7fffeed46600 <GrammarParser::_atn>, s=0x555555639770, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/ATN.cpp:86
#69 0x00007fffeeba7bd1 in antlr4::atn::ATN::nextTokens (this=0x7fffeed46600 <GrammarParser::_atn>, s=0x555555639770) at antlr4-cpp-runtime-src/runtime/src/atn/ATN.cpp:94
#70 0x00007fffeec45bb3 in antlr4::DefaultErrorStrategy::sync (this=0x555555664940, recognizer=0x7fffffffb750) at antlr4-cpp-runtime-src/runtime/src/DefaultErrorStrategy.cpp:103
#71 0x00007fffeeb2f354 in GrammarParser::node_program (this=0x7fffffffb750) at generated/GrammarParser.cpp:233
#72 0x00007fffeeb2eaff in GrammarParser::node_START (this=0x7fffffffb750) at generated/GrammarParser.cpp:132
#73 0x00007fffeeb2e668 in GrammarParser::entry (this=0x7fffffffb750) at generated/GrammarParser.cpp:75
#74 0x00007fffeeb226e6 in tree_from_buf (data_buf=0x7ffff7ffb000 "a=1\nb=1.203213\nfoo='123213'\nbar=b'123\\x01'\n1=a\n\n", data_size=48) at antlr4_shim.cpp:95
#75 0x00007fffeeb1c408 in load_tree_from_test_case (filename=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0") at tree.c:861
#76 0x00007fffeeb1968f in afl_custom_queue_get (data=0x555555665d20, filename=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0") at grammar_mutator.c:216
#77 0x00007fffeeb1a306 in afl_custom_queue_new_entry (data=0x555555665d20, filename_new_queue=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0", filename_orig_queue=0x0) at grammar_mutator.c:681
#78 0x000055555557fb62 in run_afl_custom_queue_new_entry (afl=0x7ffff7552010, q=0x5555556610e0, fname=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0", mother_fname=0x0) at src/afl-fuzz-mutators.c:41
#79 0x000055555559350f in pivot_inputs (afl=0x7ffff7552010) at src/afl-fuzz-init.c:1358
#80 0x000055555558b4d9 in main (argc=8, argv_orig=0x7fffffffdf18, envp=0x7fffffffdf60) at src/afl-fuzz.c:1832

Only part of the trace is included, from that point up it just repeats the last three calls.

Please let me know if I can provide any more info, or if you've got any hint as to what might be the problem.

Does this mutator uses genetic algorithm?

I suppose it is, so I am sorry for such a stupid question, but I am forced to ask it.

Can it be 100% confirmed that this mutator uses a genetic algorithm?

Replace C++ with C

Current C++ parts:

Chunk store relies on
- std::map --> rxi/map
- std::set --> barrust/set
- std::vector
Some parts rely on std::string, which can be replaced by C char[] and related functions

TBD:

googletesting is implemented in C++. Other candidates for the testing framework: CUnit, CMocka

Exception in the ANTLR shim

After few minutes of fuzzing mruby using the testcases from test/t/

terminate called after throwing an instance of 'std::range_error'
  what():  wstring_convert::from_bytes
==13251== 
==13251== Process terminating with default action of signal 6 (SIGABRT)
==13251==    at 0x4E7AF47: raise (raise.c:51)
==13251==    by 0x4E7C8B0: abort (abort.c:79)
==13251==    by 0x71F2256: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD605: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD670: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD904: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71F4C0B: std::__throw_range_error(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x6D94479: std::__cxx11::wstring_convert<std::codecvt_utf8<char32_t, 1114111ul, (std::codecvt_mode)0>, char32_t, std::allocator<char32_t>, std::allocator<char> >::from_bytes(char const*, char const*) (locale_conv.h:324)
==13251==    by 0x6D94032: antlrcpp::utf8_to_utf32[abi:cxx11](char const*, char const*) (StringUtils.h:43)
==13251==    by 0x6D934D4: antlr4::ANTLRInputStream::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ANTLRInputStream.cpp:40)
==13251==    by 0x6D9322B: antlr4::ANTLRInputStream::ANTLRInputStream(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ANTLRInputStream.cpp:22)
==13251==    by 0x6D932CA: antlr4::ANTLRInputStream::ANTLRInputStream(char const*, unsigned long) (ANTLRInputStream.cpp:26)
==13251== 
==13251== HEAP SUMMARY:
==13251==     in use at exit: 1,082,303,564 bytes in 18,122,690 blocks
==13251==   total heap usage: 59,484,413 allocs, 41,361,723 frees, 3,887,947,618 bytes allocated
==13251== 
==13251== LEAK SUMMARY:
==13251==    definitely lost: 0 bytes in 0 blocks
==13251==    indirectly lost: 0 bytes in 0 blocks
==13251==      possibly lost: 69,776 bytes in 4 blocks
==13251==    still reachable: 1,082,233,788 bytes in 18,122,686 blocks
==13251==                       of which reachable via heuristic:
==13251==                         stdstring          : 52 bytes in 1 blocks
==13251==         suppressed: 0 bytes in 0 blocks
==13251== Rerun with --leak-check=full to see details of leaked memory
==13251== 
==13251== For counts of detected and suppressed errors, rerun with: -v
==13251== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted

json to g4 only with "parser" cause some syntax error

In my experimental environment, I found json to g4 only with "parser" cause some syntax error, syntax parsing errors may lead to the possibility of losing a large amount of mutated data.

I made mincase lex.json:

{
    "<A>": [["<NUMBER>", "<STRING>", "\n"]],
    "<NUMBER>": [["10"], ["99"]],
    "<STRING>": [["(", "<HEXSTRING>", ")"]],
    "<HEXSTRING>": [["<CHAR>", "<HEXSTRING>"], []],
    "<CHAR>": [
            ["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
            ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"]
    ]
}

Grammar-Mutator make it, generate Grammar.g4 is:

grammar Grammar;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
node_STRING
    : '(' node_HEXSTRING ')'
    ;
node_HEXSTRING
    : 
    | node_CHAR node_HEXSTRING
    ;
node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

we prepared input data seed1 / seed2, and use antlr4-parse to testing:

why is 10(10) parsed incorrectly? because antlr4 is divided into two stages: lexer and parser. during lexer stage, node_NUMBER:10 will be recognized as TOKEN, and in the parser stage, the result is node_NUMBER (node_NUMBER), so an error occurred.

in the antlr4 grammar, lex rules begin with an uppercase letter, parser rules begin with a lowercase letter, so we should tell antlr4 the lexical rules clearly, patch Grammar_patch.g4:

grammar Grammar_patch;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER Node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
Node_STRING
    : '(' Node_HEXSTRING ')'
    ;
Node_HEXSTRING
    : 
    | Node_CHAR Node_HEXSTRING
    ;
Node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

testing again:

the "warning" prompts us it can match the empty string, this may cause antlr4 parsing backtrace issues, but we can easily mark it with fragment Node_HEXSTRING

maybe we can optimize the json to g4 generation code, to distinguish between lexer and parser?

Enhancement: name lib and grammar generator with name of the grammar type

when compiling for a grammar compiling new would overwrite an existing one. also from the lib name it is not clear for which grammar it is.

proposal: simply name the resulting grammar generator and lib with the grammar type, e.g. libgrammarmutator-json.so