Code Monkey home page Code Monkey logo

deepdfa's Issues

Tensor size mismatch while concatenating tokenizer embedding and flowgnn_embed

When I run bash scripts/msr_train_combined.sh 1 MSR,
There is a size mismatch between the tokenizer embedding and flowgnn embedding when executing line 18 of linevul_models.py: x = torch.cat((x, flowgnn_embed), dim=1).

The error message is:

File "~/data/data/LineVul/linevul/linevul_model.py", line 18, in forward
    x = torch.cat((x, flowgnn_embed), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 16 for tensor number 1 in the list.

When I change batch size from 16 to 8. The error becomes Expected size 4 but got size 8

The detailed error report is:

Traceback (most recent call last):
  File "linevul_main.py", line 668, in <module>
    main()
  File "linevul_main.py", line 641, in main
    train(args, train_dataset, model, tokenizer, eval_dataset, flowgnn_dataset)
  File "linevul_main.py", line 199, in train
    loss, logits = model(input_ids=inputs_ids, labels=labels, graphs=graphs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/linevul/linevul_model.py", line 62, in forward
    logits = self.classifier(outputs, flowgnn_embed)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "~/data/data/LineVul/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/data/data/LineVul/linevul/linevul_model.py", line 18, in forward
    x = torch.cat((x, flowgnn_embed), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 16 for tensor number 1 in the list.

'NoneType' object is not subscriptable

When I run python -u sastvd/scripts/dbize.py, there is a 'NoneType' error when executing line 205 of DDFA/sastvd/helpers/evaluate.py:before_graph = feature_extraction(filepath_before)[0] .

The error message is:

Traceback (most recent call last):
  File "sastvd/scripts/dbize.py", line 26, in <module>
    dep_add_lines = ivde.get_dep_add_lines_bigvul()
  File "/DDFA/sastvd/helpers/evaluate.py", line 282, in get_dep_add_lines_bigvul
    return get_dep_add_lines("bigvul", cache, sample)
  File "/DDFA/sastvd/helpers/evaluate.py", line 205, in get_dep_add_lines
    before_graph = feature_extraction(filepath_before)[0]
TypeError: 'NoneType' object is not subscriptable

It appears that when I carefully reviewed the function calls within the evaluate.py file, I noticed that the function get_dep_add_lines_bigvul() is calling the get_dep_add_lines("bigvul", cache, sample) function, but the parameters being passed do not match the expected parameters in the get_dep_add_lines(filepath_before, filepath_after, added_lines) function defined in the same file. Despite having the same number of parameters, this mismatch is causing the function to return None. I would like to confirm whether there is an issue with parameter passing and understand how to correctly pass the parameters. What should be the correct way to pass the parameters?

Questions About Dataset Preparation Scripts

Hi! DeepDFA is a very interesting model and I have managed to reproduce its performance on pre-processed graph data. It seems that models solely based on dataflow features have strong potential for detecting vulnerabilities, comparing to text/semantic based transformers, which are largely unaware of code dependency.

However, I encountered some issues when I tried to run the script that preprocesses bigvul dataset (at DDFA/sastvd/scripts/prepare.py).

def bigvul():
"""Run preperation scripts for BigVul dataset."""
print(svdd.bigvul(sample=args.sample))
ivde.get_dep_add_lines("bigvul", sample=args.sample)
svdglove.generate_glove("bigvul", sample=args.sample)
svdd2v.generate_d2v("bigvul", sample=args.sample)

Line 10, get_dep_add_lines() got an unexpected keyword argument "sample". And I could not find any module named svdglove or svdd2v in the project.

Looking forward to your reply. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.