Code Monkey home page Code Monkey logo

oss-fuzz-gen's Introduction

A Framework for Fuzz Target Generation and Evaluation

This framework generates fuzz targets for real-world C/C++ projects with various Large Language Models (LLM) and benchmarks them via the OSS-Fuzz platform.

More details available in AI-Powered Fuzzing: Breaking the Bug Hunting Barrier: Alt text

Current supported models are:

  • Vertex AI code-bison
  • Vertex AI code-bison-32k
  • Gemini Pro
  • OpenAI GPT-3.5-turbo
  • OpenAI GPT-4

Generated fuzz targets are evaluated with four metrics against the most up-to-date data from production environment:

  • Compilability
  • Runtime crashes
  • Runtime coverage
  • Runtime line coverage diff against existing human-written fuzz targets in OSS-Fuzz.

Here is a sample experiment result from 2024 Jan 31. The experiment included 1300+ benchmarks from 297 open-source projects.

image

Overall, this framework manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

Note that these reports are not public as they may contain undisclosed vulnerabilities.

Usage

Check our detailed usage guide for instructions on how to run this framework and generate reports based on the results.

Collaborations

Interested in research or open-source community collaborations? Please feel free to create an issue or email us: [email protected].

Bugs Discovered

So far, we have reported 6 new bugs/vulnerabilities found by automatically generated targets built by this framework:

Project Bug LLM Prompt template
cJSON OOB read Vertex AI default
libplist OOB read Vertex AI default
hunspell OOB read Vertex AI default
zstd OOB write Vertex AI default
Undisclosed stack buffer underflow Vertex AI default
Undisclosed use of unitialised memory Vertex AI default

These bugs could only have been discovered with newly generated targets. They were not reachable with existing OSS-Fuzz targets.

Current top coverage improvements by project

Project Coverage increase % *
tinyxml2 29.84
inih 29.67
lodepng 26.21
libarchive 23.39
cmark 21.61
fribidi 18.20
lighttpd 17.56
libmodbus 16.59
valijson 16.21
libiec61850 13.53
hiredis 13.50
cmake 12.62
pugixml 12.43
meshoptimizer 12.23
libusb 11.12
json 10.84

* Percentage coverage is calculated using a denominator of the total lines of source code compiled during the OSS-Fuzz build process for the entire project.

Citing This Work

Please click on the 'Cite this repository' button located on the right-hand side of this GitHub page for citation details.

oss-fuzz-gen's People

Contributors

another-rex avatar arthurscchan avatar ch1hyun avatar cjx10 avatar davidkorczynski avatar dependabot[bot] avatar donggeliu avatar eltociear avatar erfanio avatar fdt622 avatar inferno-chromium avatar jonathanmetzman avatar marklee131 avatar marktefftech avatar mihaimaruseac avatar oliverchang avatar scottbrenner avatar trashvisor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oss-fuzz-gen's Issues

Capture more corner cases in benchmark name parsing

For example:
output-jsonnet-jsonnet::internal:: in 2024-02-11-64-dg-comparison

The name is incomplete because of having a ( in the namespace, and our parser mistook that as the ( for params:

jsonnet::internal::(anonymous namespace)::Interpreter::builtinExtVar(jsonnet::internal::LocationRange const&, std::__1::vector<jsonnet::internal::(anonymous namespace)::Value, std::__1::allocator<jsonnet::internal::(anonymous namespace)::Value> > const&)

Remove function name usage.

Currently, both the function signature and function name store the function signature, which will be used as the unique function identifier.

We can simplify the code by only keeping the function signature.

Related:
#64 (comment)

Handle VertexAI error response 'Text too long'

The error is caused by 'Text too long'. Here is the message:
https://pantheon.corp.google.com/logs/query;aroundTime=2024-02-13T12:58:00.000Z;cursorTimestamp=2024-02-13T13:00:23.372992698Z;duration=PT15M;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22oss-fuzz%22%0Aresource.labels.location%3D%22us-central1%22%0Aresource.labels.cluster_name%3D%22llm-experiment%22%0Aresource.labels.namespace_name%3D%22default%22%0Alabels.k8s-pod%2Fbatch_kubernetes_io%2Fcontroller-uid%3D%228f6b14ca-f7a2-4653-85f1-90e542b95e79%22%20severity%3E%3DWARNING%0Atimestamp%3D%222024-02-13T13:00:23.372992698Z%22%0AinsertId%3D%225lnkmvgn576p21g1%22?project=oss-fuzz

The related logs before/after the message:
https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22oss-fuzz%22%0Aresource.labels.location%3D%22us-central1%22%0Aresource.labels.cluster_name%3D%22llm-experiment%22%0Aresource.labels.namespace_name%3D%22default%22%0Alabels.k8s-pod%2Fbatch_kubernetes_io%2Fcontroller-uid%3D%228f6b14ca-f7a2-4653-85f1-90e542b95e79%22%20severity%3E%3DWARNING;cursorTimestamp=2024-02-13T13:00:23.372992698Z;aroundTime=2024-02-13T12:58:00.000Z;duration=PT15M?e=-13802955&mods=logs_tg_prod&project=oss-fuzz

Handling this requires two tasks:

  1. Investigate its root cause. This appears to happen in the code fixing step. We need to understand what the prompt was, and why it is overlong (e.g., Did the error parser parse too much text?)
  2. Capture this error and log it so that the experiment won't break because of it.

Distinguish `benchmark` and `project`

We need unambiguous names:
Each benchmark should be a function, and a project may have multiple benchmarks/functions.

For example, our current benchmark.yaml should be project.yaml.
This will involve other renaming/refactoring modifications.

Fix false negative 'function used in fuzz targets'

Our current code automatically fails an LLM-generated fuzz target if it does not contain the function under test.
However, the current pattern-matching is native and has false negatives (i.e., function used correctly but not recognized by us), and rejects valid fuzz targets.

This relates to our function name parsing regex:

names = re.findall(r'.*?\s*([\w:<>+*~]+)\s*\([^\(]*\)', function_signature)
if names:
# Normalize names.
return re.sub(r'[^\w:]', '-', names[-1])

A quick temporary solution is checking the function names without special chars.
Later we can make better use of data from FI

Initial `gcloud` authentication failure in GKE experiments

Some initial cloud build requests failed due to glcoud authentication errors (example1, example2). Some related observations and guesses:

  1. This has recurred on multiple GKE experiments, yet I failed to reproduce them in local experiments. Maybe this is because we have to authenticate gcloud manually before local experiments?
  2. The number of initial experiments affected by this seems random: sometimes only the first one, sometimes multiple. This could be due to parallelism in experiments.
  3. This error disappears after a while in each experiment.

Coverage diffs: handle templates properly.

It's likely that we aren't handling template instantiations properly in our textcov diffing, leading to some inflated coverage numbers for certain C++ projects that use templates extensively.

Refactor prompt generation and evaluation loop.

Currently our prompt generation is tied to our template format here: https://github.com/google/oss-fuzz-gen/tree/main/prompts/template_xml

We should make it easier and more flexible for others to test different prompt generation strategies, by allowing these custom prompts to be python modules instead that look something like the following:

def generate(benchmark:  Benchmark) -> str:
  ...

i.e. the module would be expected to define a generate function which produces a full prompt to pass to the LLM.

Similarly, we should also make our generation/evaluation loop more configurable, e.g. extract the logic here:

model.prompt_path = model.prepare_generate_prompt(

into a driver.py that can be similarly replaced:

def evaluate(model: models.LLM,  benchmark: Benchmark, prompt_generator: Module):
  prompt = generator.generate(benchmark)
  targets = generate_targets(model, prompt)
  results = evaluate(targets) 
  ...

And tying this all together, the resulting invocations would look something like:

./run_all_experiments --driver /path/to/driver.py --prompt_generator prompts/custom_generator.py

Re-select `comparison` becnhmark-set from the new benchmarks.

This requires looking into the failures and selecting benchmarks from recent results based on the following guidelines:

  1. < 30 benchmarks in total. This ensures we can finish the experiment quickly. Particularly for PR experiments.
  2. <=2 benchmarks from one project to include more projects.
  3. <= 1 benchmark with a 100% build rate or > 10% coverage increase from one project, with ~10 benchmarks in total. Well-performing benchmarks are for regression checks; No point in having too many.
  4. <=1 benchmark failed with the same error from the same project to track improvements.
  5. Avoid the same general error from different projects (e.g., size_t or others from standard lib undefined)
  6. Interesting projects to include:
  • tinyxml2
  • icu
  • avahi
  • Project with complex function signature (e.g., #89, guetzli, abseil-cpp, cppitertools's operator*, etc.)
  1. Some interesting failures to include in comparison:
  • Incorrect path in #include <...> / Undefined function.
  • Missing build failure.
  • Function not called.
  • Incorrect function usage.
  • Need the defs of data types used in the function under test.
  • Incorrect usage of the function under test.
  1. <=1 relatively good performing benchmark from a project (e.g., 60% build rate, 1% coverage increase).

Given there are many benchmarks to select, please document all selected benchmarks with justification here for future references.
Feel free to document some interesting benchmarks that are not selected or not sure if they should be selected.

If more benchmarks are needed, use https://llm-exp.oss-fuzz.com/Result-reports/ochang-2024-01-25/sort.html.
However, their names are not as convenient as the one above. Some of them might not be available either.

API for providing project context

  • Existing fuzz targets (path, binary name)
  • Function implementations
  • Data structure definitions
  • Usages of functions and data structures.

Convert functions in `benchmark.yaml` into a more structured way.

For example,
from:

functions:
  - int main2(int argc, char **argv)
  - int scanopt_usage(scanopt_t *scanner, FILE *fp, const char *usage)
project: flex
target_path: /src/fuzz-main.c

to:

"functions":
- "name": "main2"
  "param_names":
  - "argc"
  - "argv"
  "param_types":
  - "int"
  - "char **"
  "return_type": "int"
  "signature": "int main2(int argc, char ** argv)"
- "name": "scanopt_usage"
  "param_names":
  - "scanner"
  - "fp"
  - "usage"
  "param_types":
  - "char **"
  - "struct._IO_FILE *"
  - "char *"
  "return_type": "int"
  "signature": "int scanopt_usage(char ** scanner, struct _IO_FILE * fp, char * usage)"

project: flex
target_path: /src/fuzz-main.c

Instructions for generating targets that read from files

I had some success manually including this in prompts for certain benchmarks to improve the quality of the fuzzing:

IMPORTANT: If the solution needs to load from a file,  you need to create a temporary file called "/tmp/input", write data to it from FuzzedDataProvider, and use that as the file input. Example:
<code>
FILE *handle = fopen("/tmp/input", "w");
std::vector<uint8_t> contents = stream.ConsumeRemainingBytes(); // or stream.ConsumeBytes
fwrite(contents.data(), contents.size(), 1, handle);
fclose(handle);

int result = ParseFile("/tmp/input");
</code>

Need to evaluate this against more targets.

Mitigate null terminator issues with LLVMFuzzerTestOneInput

Some of the currently generated targets incorrectly assume that the data parameter is null terminated, leading to false positive overflows when they're passed to functions expecting null terminated strings.

We need to better detect these, and experiment with including instructions in our prompts to avoid this.

More informative logging

  1. Add a logging module that stores more data in each log's jsonPayload (e.g., project, function/benchmark, LLM response id (1-10), code fixing id(0-5), etc.). FuzzBench reference 1 , reference 2.
  2. Replace current logging and print with it.

Example view:
image

This is useful to help us identify error messages to reproduce/fix them.

Extract project context (function impls, data structure defs, usages) and include them in prompts

Having this context will likely significantly improve the quality of targets generated.

The prompts should include:

  • function source code
  • function xrefs (sample callsites and usages)
  • data structure definitions, including which header path they're defined in.

The header file where a function prototype is declared is hard to extract accurately, but to work around this we may be able to just include the function prototype in the target itself without knowing where the header is. The header files that contain the relevant type definitions that are part of the prototype will still need to be included.

Capture gcloud crashes and re-submit experiment

image https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22oss-fuzz%22%0Aresource.labels.location%3D%22us-central1%22%0Aresource.labels.cluster_name%3D%22llm-experiment%22%0Aresource.labels.namespace_name%3D%22default%22%0Alabels.k8s-pod%2Fbatch_kubernetes_io%2Fcontroller-uid%3D%2266b327fe-6fdb-4a63-9c00-3346f27a5f9b%22%20severity%3E%3DWARNING;cursorTimestamp=2024-02-09T02:57:11.312515248Z;startTime=2024-02-09T02:33:01.547Z?e=-13802955&mods=logs_tg_prod&project=oss-fuzz

This is likely another cause of missing error in code-fixing prompt.

We need to:

  1. Capture this error.
  2. Log the corresponding instance (i.e., project, benchmark, sample id).
  3. Re-submit the experiment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.