isu-paal / deepdfa Goto Github PK

Replication package for "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection", ICSE 2024.

License: MIT License

Python 95.01% Shell 4.83% Dockerfile 0.16%

deepdfa's Issues

Tensor size mismatch while concatenating tokenizer embedding and flowgnn_embed

When I run bash scripts/msr_train_combined.sh 1 MSR,
There is a size mismatch between the tokenizer embedding and flowgnn embedding when executing line 18 of linevul_models.py: x = torch.cat((x, flowgnn_embed), dim=1).

The error message is:

File "~/data/data/LineVul/linevul/linevul_model.py", line 18, in forward
    x = torch.cat((x, flowgnn_embed), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 16 for tensor number 1 in the list.

When I change batch size from 16 to 8. The error becomes Expected size 4 but got size 8

The detailed error report is:

Traceback (most recent call last):
  File "linevul_main.py", line 668, in <module>
    main()
  File "linevul_main.py", line 641, in main
    train(args, train_dataset, model, tokenizer, eval_dataset, flowgnn_dataset)
  File "linevul_main.py", line 199, in train
    loss, logits = model(input_ids=inputs_ids, labels=labels, graphs=graphs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/linevul/linevul_model.py", line 62, in forward
    logits = self.classifier(outputs, flowgnn_embed)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/linevul/linevul_model.py", line 18, in forward
    x = torch.cat((x, flowgnn_embed), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 16 for tensor number 1 in the list.

'NoneType' object is not subscriptable

When I run python -u sastvd/scripts/dbize.py, there is a 'NoneType' error when executing line 205 of DDFA/sastvd/helpers/evaluate.py:before_graph = feature_extraction(filepath_before)[0] .

The error message is:

Traceback (most recent call last):
  File "sastvd/scripts/dbize.py", line 26, in <module>
    dep_add_lines = ivde.get_dep_add_lines_bigvul()
  File "/DDFA/sastvd/helpers/evaluate.py", line 282, in get_dep_add_lines_bigvul
    return get_dep_add_lines("bigvul", cache, sample)
  File "/DDFA/sastvd/helpers/evaluate.py", line 205, in get_dep_add_lines
    before_graph = feature_extraction(filepath_before)[0]
TypeError: 'NoneType' object is not subscriptable

It appears that when I carefully reviewed the function calls within the evaluate.py file, I noticed that the function get_dep_add_lines_bigvul() is calling the get_dep_add_lines("bigvul", cache, sample) function, but the parameters being passed do not match the expected parameters in the get_dep_add_lines(filepath_before, filepath_after, added_lines) function defined in the same file. Despite having the same number of parameters, this mismatch is causing the function to return None. I would like to confirm whether there is an issue with parameter passing and understand how to correctly pass the parameters. What should be the correct way to pass the parameters?

Questions About Dataset Preparation Scripts

Hi! DeepDFA is a very interesting model and I have managed to reproduce its performance on pre-processed graph data. It seems that models solely based on dataflow features have strong potential for detecting vulnerabilities, comparing to text/semantic based transformers, which are largely unaware of code dependency.

However, I encountered some issues when I tried to run the script that preprocesses bigvul dataset (at DDFA/sastvd/scripts/prepare.py).

DeepDFA/DDFA/sastvd/scripts/prepare.py

Lines 7 to 12 in 070ddb9

    
           def bigvul(): 
        
               """Run preperation scripts for BigVul dataset.""" 
        
               print(svdd.bigvul(sample=args.sample)) 
        
               ivde.get_dep_add_lines("bigvul", sample=args.sample) 
        
               svdglove.generate_glove("bigvul", sample=args.sample) 
        
               svdd2v.generate_d2v("bigvul", sample=args.sample)

Line 10, get_dep_add_lines() got an unexpected keyword argument "sample". And I could not find any module named svdglove or svdd2v in the project.

Looking forward to your reply. Thanks!

Which version of joern did you use for the paper?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def bigvul():
	"""Run preperation scripts for BigVul dataset."""
	print(svdd.bigvul(sample=args.sample))
	ivde.get_dep_add_lines("bigvul", sample=args.sample)
	svdglove.generate_glove("bigvul", sample=args.sample)
	svdd2v.generate_d2v("bigvul", sample=args.sample)