ise-uiuc / freefuzz Goto Github PK
View Code? Open in Web Editor NEWFree Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source (ICSE'22)
Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source (ICSE'22)
Hi. I met a trouble when I tried FreeFuzz on my machine. The problem is, I can’t reproduce the experiment. While experiment in paper spent 7.3 hours for all APIs of PyTorch, mine spent 33.5 hours only for torch.nn.functional.*(less than 100 APIs). I only ran CPU mode and the processor of my machine is at least comparable to the machine in paper.
I sent an email to Anjiang first, and he suspected slowdown due too big tensor shape. So I tried to find a configuration to set a range for the shape of tensors as he suggested, but I couldn’t find. Therefore, it would be appreciated if you let me know how to configure tensor size (or, any setting for executing FreeFuzz like the way you did in your paper). Thank you :)
The code logic of the write_API_signature
function in process_data.py
is incorrect. Failed to build signature collection properly.
Hello, congrats on the paper, and thanks for the great work on FreeFuzz, @Anjiang-Wei, @YangChenyuan @dengyinlin!
What are the steps for using gcov to obtain the c++ code coverage of the tests generated by FreeFuzz? The paper was correct in pointing the importance of the coverage trend analysis, and it would be amazing if the process can be provided in the repository. (:
Hi!
First of all, thank you for your contribution.After reading the paper, I want to know why there is a random mutation strategy but still need to use the database mutation strategy. In my opinion, the coverage of random mutation strategy should be better than that of database mutation strategy?If a single mutation strategy is used, which mutation strategy has the best effect among the three strategies?Is Type muation?
Executing cd src && python FreeFuzz.py --conf demo_torch.conf
yield the following error
File "/home/shangyit/projects/FreeFuzz/src/classes/torch_api.py", line 14, in TorchArgument
torch.complex32, torch.complex64, torch.complex128, torch.bool
AttributeError: module 'torch' has no attribute 'complex32'
The torch developers did temporary disable this field in release 1.11.0 (newest stable). See this.
Hi,
I am wondering how you relate GCOV to TensorFlow or Pytorch, because I am also currently experimenting with code coverage for deep learning networks, but I can't get operator coverage (or line coverage) for a TensorFlow network model after calling it from a C++ program.
I did this by calling the encapsulated network model from a C++ program and then using GCOV on the C++ program, but that didn't associate it with the source code.
#include <iostream>
#include "Python.h"
int main() {
callModel();
return 0;
}
// call python program
void callModel(){
Py_Initialize();
const char *init_call = "import sys\n"
"import os\n"
"sys.path.append('/'.join(os.getcwd().split('/')[:-1]))\n";
PyRun_SimpleString(init_call);
PyObject* pModule = PyImport_ImportModule("torch_.lenet");
if (pModule != nullptr) {
PyObject* pFunc = PyObject_GetAttrString(pModule, "go");
PyObject_CallObject(pFunc, nullptr);
}
else{
printf("Fail");
}
Py_Finalize();
}
g++ -fprofile-arcs -ftest-coverage main.cpp -o main
./main
gcov -n main.cpp
gcovr -v --html-details main.html
In run_code()
(tf_library.py
), the current code wraps exec(code)
with "try" in order to catch unexpected exceptions:
try:
exec(code)
MARK_DONE_FLAG = True
except Exception as e:
error = str(e)
However, the target code has been wrapped with a layer of "try" in generate_code()
, so any exceptions during execution will be caught inside the target code and recorded in results
dictionary. In this situation, error
and MARK_DONE_FLAG
will maintain the default value (None
, True
) and the check code in test_with_oracle()
won't be able to detect the failure and invalid code will be put into the "success" directory:
results, error, MARK_DONE_FLAG = self.run_code(code)
if not MARK_DONE_FLAG:
self.write_to_dir(join(self.output[oracle], "potential-bug"), api.api, code)
elif error == None:
self.write_to_dir(join(self.output[oracle], "success"), api.api, code)
else:
self.write_to_dir(join(self.output[oracle], "fail"), api.api, code)
+ code = "\
+try:\n\
+ unrunnable\n\
+except Exception as e:\n\
+ results['err'] = str(e)"
try:
exec(code)
MARK_DONE_FLAG = True
except Exception as e:
error = str(e)
If we directly substitute the target code in run_code()
with an invalid code shown above, as long as the code doesn't have syntax error, the execution will end normally and error
won't catch the exception. In FreeFuzz's outputs, there also exists tests that offer invalid arguments to TF APIs and get away with the checker.
In tf.library.py
, we can check the execution result by checking whether results['err']
exists instead of checking the value of error
.
Hi,
In this section https://github.com/ise-uiuc/FreeFuzz/tree/main/src/instrumentation/torch and in subsection (3), you mentioned that after configuring our mongo dg in write_tools.py
, we should run the code where Pytorch APIs are invoked. I totally understand how this works, my confusion is that where the APIs in "torch.nn.functional.txt", "torch.nn.txt", "torch.txt" come from? How you extracted this APIs?
Also, another confusion is that when I arbitrarily run an API, let say torch.randn(), if torch.randn() is not in the text files ("torch.nn.functional.txt", "torch.nn.txt", "torch.txt"), what happens? I can't get the value space of randn()?
Thanks,
Nima.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.