ise-uiuc / freefuzz Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 15.0 8.6 MB

Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source (ICSE'22)

Python 100.00%

fuzzing mining pytorch tensorflow

freefuzz's People

Contributors

Stargazers

Watchers

Forkers

stefanusagus liliquan0118 lingming kristoff-starling jiangtianjie2021 yangchenyuan yuehaoshi embed-debuger vatsalchheda non1187 sunxiyin2 monamour716 co1lin driverowen xyy9233

freefuzz's Issues

Question on experimental configuration

Hi. I met a trouble when I tried FreeFuzz on my machine. The problem is, I can’t reproduce the experiment. While experiment in paper spent 7.3 hours for all APIs of PyTorch, mine spent 33.5 hours only for torch.nn.functional.*(less than 100 APIs). I only ran CPU mode and the processor of my machine is at least comparable to the machine in paper.

I sent an email to Anjiang first, and he suspected slowdown due too big tensor shape. So I tried to find a configuration to set a range for the shape of tensors as he suggested, but I couldn’t find. Therefore, it would be appreciated if you let me know how to configure tensor size (or, any setting for executing FreeFuzz like the way you did in your paper). Thank you :)

tensorflow instrumentation bug

Hi,

When I want to instrument tensorflow APIs, I get weird results as shown in the attached image.

For example, I get:

tf.__main__.MatrixSolveOpTest

As you can see, __main__ and Test are in the API name which is wrong.

Also, all value spaces are same for all APIs.

Any Idea why it happens?

Thanks.

BUG feedback

The code logic of the write_API_signature function in process_data.py is incorrect. Failed to build signature collection properly.

Obtaining code coverage from gcov

Hello, congrats on the paper, and thanks for the great work on FreeFuzz, @Anjiang-Wei, @YangChenyuan @dengyinlin!

What are the steps for using gcov to obtain the c++ code coverage of the tests generated by FreeFuzz? The paper was correct in pointing the importance of the coverage trend analysis, and it would be amazing if the process can be provided in the repository. (:

A question about mutation strategy

Hi！
First of all, thank you for your contribution.After reading the paper, I want to know why there is a random mutation strategy but still need to use the database mutation strategy. In my opinion, the coverage of random mutation strategy should be better than that of database mutation strategy？If a single mutation strategy is used, which mutation strategy has the best effect among the three strategies？Is Type muation?

torch.complex32 missing in the stable torch release

Executing cd src && python FreeFuzz.py --conf demo_torch.conf yield the following error

  File "/home/shangyit/projects/FreeFuzz/src/classes/torch_api.py", line 14, in TorchArgument
    torch.complex32, torch.complex64, torch.complex128, torch.bool
  AttributeError: module 'torch' has no attribute 'complex32'

The torch developers did temporary disable this field in release 1.11.0 (newest stable). See this.

About GCOV

Hi,
I am wondering how you relate GCOV to TensorFlow or Pytorch, because I am also currently experimenting with code coverage for deep learning networks, but I can't get operator coverage (or line coverage) for a TensorFlow network model after calling it from a C++ program.

I did this by calling the encapsulated network model from a C++ program and then using GCOV on the C++ program, but that didn't associate it with the source code.

#include <iostream>
#include "Python.h"


int main() {
    callModel();
    return 0;
}


// call python program
void callModel(){
    Py_Initialize();
    const char *init_call = "import sys\n"
                            "import os\n"
                            "sys.path.append('/'.join(os.getcwd().split('/')[:-1]))\n";
    PyRun_SimpleString(init_call);

    PyObject* pModule = PyImport_ImportModule("torch_.lenet");

    if (pModule != nullptr) {
        PyObject* pFunc = PyObject_GetAttrString(pModule, "go");
        PyObject_CallObject(pFunc, nullptr);
    }
    else{
        printf("Fail");
    }

    Py_Finalize();

}

g++ -fprofile-arcs -ftest-coverage main.cpp -o main
./main
gcov -n main.cpp
gcovr -v --html-details main.html

Maybe a bug about the crash oracle?

Issue

In run_code() (tf_library.py), the current code wraps exec(code) with "try" in order to catch unexpected exceptions:

try:
    exec(code)
    MARK_DONE_FLAG = True
except Exception as e:
    error = str(e)

However, the target code has been wrapped with a layer of "try" in generate_code(), so any exceptions during execution will be caught inside the target code and recorded in results dictionary. In this situation, error and MARK_DONE_FLAG will maintain the default value (None, True) and the check code in test_with_oracle() won't be able to detect the failure and invalid code will be put into the "success" directory:

results, error, MARK_DONE_FLAG = self.run_code(code)
if not MARK_DONE_FLAG:
    self.write_to_dir(join(self.output[oracle], "potential-bug"), api.api, code)
elif error == None:
    self.write_to_dir(join(self.output[oracle], "success"), api.api, code)
else:
    self.write_to_dir(join(self.output[oracle], "fail"), api.api, code)

Reproduction

+       code = "\
+try:\n\
+   unrunnable\n\
+except Exception as e:\n\
+   results['err'] = str(e)"
        try:
            exec(code)
            MARK_DONE_FLAG = True
        except Exception as e:
            error = str(e)

If we directly substitute the target code in run_code() with an invalid code shown above, as long as the code doesn't have syntax error, the execution will end normally and error won't catch the exception. In FreeFuzz's outputs, there also exists tests that offer invalid arguments to TF APIs and get away with the checker.

Solution

In tf.library.py, we can check the execution result by checking whether results['err'] exists instead of checking the value of error.

Documentation unclear

Hi,
In this section https://github.com/ise-uiuc/FreeFuzz/tree/main/src/instrumentation/torch and in subsection (3), you mentioned that after configuring our mongo dg in write_tools.py, we should run the code where Pytorch APIs are invoked. I totally understand how this works, my confusion is that where the APIs in "torch.nn.functional.txt", "torch.nn.txt", "torch.txt" come from? How you extracted this APIs?

Also, another confusion is that when I arbitrarily run an API, let say torch.randn(), if torch.randn() is not in the text files ("torch.nn.functional.txt", "torch.nn.txt", "torch.txt"), what happens? I can't get the value space of randn()?

Thanks,
Nima.