The spt-code's discuss from nougatca

The class BartForClassificationAndGeneration's forward function in bart.py

The below code snippets are simplified for highlighting only the control flow by the function execution set_model_mode(), where the first task TASK_CODE_AST_PREDICTION changes the mode to MODEL_MODE_CLS and the next two tasks switch the mode to MODEL_MODE_GEN.

When running the program, the mode MODEL_MODE_GEN for those two tasks worked well. However, the mode MODEL_MODE_CLS configured by the task TASK_CODE_AST_PREDICTION caused the runtime error which was triggered from the forward() function in the class BartForClassificationAndGeneration. In this function, the mode MODEL_MODE_GEN in the IF conditional expression (CODE LINK) only executed the next function forward_gen() without any error.

pre_train.py

def pre_train():
    for task in tasks:
        if task == enums.TASK_CODE_AST_PREDICTION:
            model.set_model_mode(enums.MODEL_MODE_CLS)
        elif task == enums.TASK_MASS:
            model.set_model_mode(enums.MODEL_MODE_GEN)
        elif task == enums.TASK_METHOD_NAME_PREDICTION:
            model.set_model_mode(enums.MODEL_MODE_GEN)
    return model, (code_vocab, ast_vocab, nl_vocab)

bart.py

class BartForClassificationAndGeneration(BartForConditionalGeneration):
    def forward():
        if self.mode == enums.MODEL_MODE_GEN:
            return self.forward_gen(input_ids=input_ids,..)
        else:
            raise ValueError

Environment:

ubuntu 22.04.4 LTS, python 3.8.17, and conda 23.5.2

Command:

python main.py --do-pre-train --pre-train-tasks cap --batch-size 16 --eval-batch-size 32 --cuda-visible-devices 0 --fp16 --model-name pre_train

Error message:

Running configurations initialized successfully
----------------------------------------------------------------------------------------------------
Start pre-training task: cap
  0%|          | 0/1770 [00:00<?, ?it/s]Traceback (most recent call last):
  File "main.py", line 100, in <module>
    main(main_args)
  File "main.py", line 22, in main
    model, vocab = pre_train(args=args)
  File "/home/[email protected]/Documents/0-research-spt-code/spt-code/sources/pre_train.py", line 218, in pre_train
    cap_result = trainer.train()
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop(
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/transformers/trainer.py", line 2902, in training_step
    loss = self.compute_loss(model, inputs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/transformers/trainer.py", line 2925, in compute_loss
    outputs = model(**inputs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/my-conda-path/envs/spt-code/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/[email protected]/Documents/0-research-spt-code/spt-code/sources/models/bart.py", line 85, in forward
    raise ValueError
ValueError

As shown below, there is only one location in the source files to set the model mode MODEL_MODE_CLS and others use MODEL_MODE_GEN through the function set_model_mode().

/my-program-path/sources $ grep -r set_model_mode . --include=*.py
./downstream_tasks/completion.py:    model.set_model_mode(enums.MODEL_MODE_GEN)
./downstream_tasks/translation.py:    model.set_model_mode(enums.MODEL_MODE_GEN)
./downstream_tasks/summarization.py:    model.set_model_mode(enums.MODEL_MODE_GEN)
./downstream_tasks/search.py:    model.set_model_mode(enums.MODEL_MODE_GEN)
./downstream_tasks/bug_fix.py:    model.set_model_mode(enums.MODEL_MODE_GEN)
./models/bart.py:            self.set_model_mode(mode)
./models/bart.py:    def set_model_mode(self, mode):
./pre_train.py:            model.set_model_mode(enums.MODEL_MODE_CLS)
./pre_train.py:            model.set_model_mode(enums.MODEL_MODE_GEN)
./pre_train.py:            model.set_model_mode(enums.MODEL_MODE_GEN)

I want to understand the case that TASK_CODE_AST_PREDICTION changes the mode to MODEL_MODE_CLS in pre_train.py (line 153). I wonder if there was something I missed. The other 2 pre-training tasks worked well.

Bug

Hello, I found that using the source code you provide, by calling the
nl = extract_nl_from_code(source=source, root=root, lang=lang, name=name)
method to get the results of the program method call name are basically wrong, may I ask what method can correct this error, get the correct name
Some examples of errors like the follow
['wip', 'wip.compareAn', 'f (q.is', ' ', ' if', ' ', ' ', ' ', ' QueueDr']
['er.requireNonN', 'urn to', 't().toObserv', 'e()', 'ons.listSo', 'n)).flatMapIter', '>iden']
['er.requireNonN', 'urn to', 't().toObserv', 'e()', 'ons.listSo', 'n)).flatMapIter', '>iden']

Questions about the baselines in the paper

Hi,

It is a great and interesting work!
I am curious that do you have a chance to compare with the model proposed in this work CodeT5 They did evaluation on the similar datasets as yours.

Thanks.

How to get original functions of tokenized function codes that are present in 'dataset_saved/classcial_summarization'

Hi, thanks for sharing your great work!

I have downloaded your tokenized data that are present in 'dataset_saved/classcial_summarization'. But how could I map the tokenized code to the original functions from https://github.com/EdinburghNLP/code-docstring-corpus.

It could be very helpful if you provide some guidelines to get the original function.

Thanks a lot!

nougatca / spt-code Goto Github PK

spt-code's Issues

The class BartForClassificationAndGeneration's forward function in bart.py

pre_train.py

bart.py

Environment:

Command:

Error message:

Bug

Questions about the baselines in the paper

How to get original functions of tokenized function codes that are present in 'dataset_saved/classcial_summarization'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent