kaijuml / dtt-multi-branch Goto Github PK

Code for Controlling Hallucinations at Word Level in Data-to-Text Generation (C. Rebuffel, M. Roberti, L. Soulier, G. Scoutheeten, R. Cancelliere, P. Gallinari)

Home Page: https://arxiv.org/abs/2102.02810

License: Other

Python 97.23% Shell 2.77%

deep-learning data-to-text natural-language-generation wikibio hallucinations nlg

dtt-multi-branch's People

Contributors

Stargazers

Watchers

Forkers

kapishgarg14 kleeeeea

dtt-multi-branch's Issues

Issues when running dependency parsing script

After successfully running the script for POS-tagging, I am facing issues on the dependency parsing part. When running the respective command python3 dep_parsing_spacy.py --orig wikibio/train_output.txt --dest wikibio/train_deprel.txt --format sent, I get the following error:

Loading sentences...
[OK] (526575 sentences in 14.364 seconds)
Loading SpaCy parser...
[OK] (1.922 seconds)
Using 32 processes, starting now.
Parsing sentences:   0%|                                                                                                          | 0/526575 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "dep_parsing_spacy.py", line 94, in _deal_with_one_instance
    ret = nlp.parser(Doc(nlp.vocab, sentence))
AttributeError: 'English' object has no attribute 'parser'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dep_parsing_spacy.py", line 111, in <module>
    total=len(sentences)
  File "dep_parsing_spacy.py", line 104, in <listcomp>
    processed_sentences = [item for item in tqdm.tqdm(
  File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 325, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AttributeError: 'English' object has no attribute 'parser'

I an using the default spacy of version 3.0.6, but maybe some other version is needed.

Unable to successfully run POS-tagging cmd

I am trying to run the POS-tagging cmd python3 pos_tagging.py --do_train --do_tagging train --gpus 0 1 --dataset_folder wikibio that is listed in the README of data folder, but doesn't complete successfully. I get the following error:

Using the following devices: [1,2]
Using the following environment variables, please edit the script if needed
CUDA_VISIBLE_DEVICES=1,2
Using the following arguments, please edit the script if needed
--data_dir ./pos --model_type bert --labels ./pos/labels.txt --model_name_or_path bert-base-uncased --output_dir ./pos/trained --max_seq_length 256 --num_train_epochs 3 --per_gpu_train_batch_size 32 --save_steps 750
Traceback (most recent call last):
  File "run_ner.py", line 70, in <module>
    (),
  File "run_ner.py", line 68, in <genexpr>
    for conf in (BertConfig, RobertaConfig, DistilBertConfig, CamembertConfig, XLMRobertaConfig)
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'
Loading examples from train
Traceback (most recent call last):
  File "run_ner.py", line 70, in <module>
    (),
  File "run_ner.py", line 68, in <genexpr>
    for conf in (BertConfig, RobertaConfig, DistilBertConfig, CamembertConfig, XLMRobertaConfig)
AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'
Traceback (most recent call last):
  File "pos_tagging.py", line 225, in <module>
    do_tagging(args.pos_folder, args.dataset_folder, args.do_tagging, gpus, args.max_seq_length, args.split_size)
  File "pos_tagging.py", line 163, in do_tagging
    run_script(examples, pos_folder, dest, gpus, max_seq_length)
  File "pos_tagging.py", line 139, in run_script
    open(orig, mode="r", encoding='utf8') as origfile:
FileNotFoundError: [Errno 2] No such file or directory: './pos/trained/test_predictions.txt'

The first type of error is happening when computing the run_ner.py script and I think is related to the version of the transformers package that is used. As I have tested, when the version is => 3.X.X the above error of AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map' is thrown. When a lower version of the package is used, say 2.0.0, the following error ImportError: cannot import name 'CamembertConfig' from 'transformers' (/opt/conda/lib/python3.7/site-packages/transformers/__init__.py) is thrown. Maybe there should be a specific version of the package used where none of the errors is happening.

The second and more important error says that the above ./pos/trained/test_predictions.txt file is not found. Where shall I find this file? Do I need to get it from somewhere else? Thanks in advance :)

About loading models in pos_tagging.py

I have a question about the implementation of Part-of-Speech tagging.
The following command will tag the POS.

python3 pos_tagging.py --do_train --do_tagging train --gpus 0 1 --dataset_folder wikibio

--do_train will load the pre-trained model bert-base-uncased, perform fine tuning, and save the model in ./pos/trained.
But why does --do_tagging load --model_name_or_path bert-base-uncased in def run_script instead of loading the stored model in ./pos/trained ?

cmd = " ".join([
        f'CUDA_VISIBLE_DEVICES={gpus}',
        'python run_ner.py',
        f'--data_dir {pos_folder}/',
        '--model_type bert',
        f'--labels {os.path.join(pos_folder, "labels.txt")}',
        '--model_name_or_path bert-base-uncased',
        f'--output_dir {os.path.join(pos_folder, "trained")}',
        f'--max_seq_length {max_seq_length}',
        '--do_predict',
        '--per_gpu_eval_batch_size 64'
    ])

Error when running format weight script | broken download link | assertion error during the preprocessing step

I am trying to run the format weights script and I get the following errors.

Initially, I can't download the file that is given through the wget https://datacloud.di.unito.it/index.php/s/KPr9HnbMyNWqRdj/download command because I get 404 not found error.

Moreover, I thought of changing a little the next command so it uses the preprocessed files in data/wikibio and not data/download. The changed command looks like this python3 data/format_weights.py --orig data/wikibio --dest train_weights.txt --strategy thresholds --thresholds 0.4 --normalize --weight_regularization 1 --eos_weights 1 0 and when I execute it, I get the error below:

Writting formatted wieghts to: /workspace/src/repos/dtt-multi-branch/train_weights.txt
Reading orig file. Can take up to a minute.
WARNING: path is data/wikibio but format is txt (by default).
Traceback (most recent call last):
  File "data/format_weights.py", line 240, in <module>
    args.orig, func=lambda x,y: (x, float(y)))
  File "data/format_weights.py", line 239, in <listcomp>
    sent for sent in TaggedFileIterable.from_filename(
  File "/workspace/src/repos/dtt-multi-branch/data/utils.py", line 124, in __getitem__
    return next(self)
  File "/workspace/src/repos/dtt-multi-branch/data/utils.py", line 134, in __next__
    return next(self._iterable)
  File "/opt/conda/lib/python3.7/site-packages/more_itertools/more.py", line 2670, in __next__
    item = next(self._source)
  File "/workspace/src/repos/dtt-multi-branch/data/utils.py", line 173, in read_file
    with open(path, mode='r', encoding='utf8') as f:
IsADirectoryError: [Errno 21] Is a directory: 'data/wikibio'

I am not sure what the next course of action should be.

'eos_weights' args missing

when you run
python3 data/format_weights.py --orig download --dest train_weights.txt --strategy thresholds --thresholds 0.4 --normalize --weight_regularization 1

the argument --eos_weights seems to be required, what value do you suggest?

Thank you

词性标记部分找不到文件

运行这个代码之后python3 pos_tagging.py --do_train --do_tagging train --gpus 0 1 --dataset_folder wikibio
报错 No such file or directory: './pos/trained/test_predictions.txt'。

Typo in sentence scoring script

When running the sentence scoring script, there is a typo in one of the import statements here. Since we already are inside data folder when running the cmd, then the import should rather be from utils import FileIterable, TaggedFileIterable in order to avoid import errors.

kaijuml / dtt-multi-branch Goto Github PK

dtt-multi-branch's People

Contributors

Stargazers

Watchers

Forkers

dtt-multi-branch's Issues

Issues when running dependency parsing script

Unable to successfully run POS-tagging cmd

About loading models in pos_tagging.py

Error when running format weight script | broken download link | assertion error during the preprocessing step

'eos_weights' args missing

词性标记部分找不到文件

Typo in sentence scoring script

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent