psunlpgroup / summ-n Goto Github PK
View Code? Open in Web Editor NEWCode for ACL 2022 Paper "SUMM^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents"
License: MIT License
Code for ACL 2022 Paper "SUMM^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents"
License: MIT License
Thanks for sharing the code. It was very helpful.
I tried replicating the result for SummScreen FD dataset without making any changes to the code and the config.
I got the following results:
F_measure: [23.54, 4.42, 20.88] Recall: [20.59, 3.84, 18.22] Precision: [32.85, 6.24, 29.31]
The result in the paper matches closely with ROUGE Precision scores instead of ROUGE-F1. Can you verify this or suggest a solution if I am doing something wrong?
Thanks!
I encounter the problem in run.py line 77:
/workspace/shared/apps/anaconda3/envs/summarization/bin/python: Error while finding module specification for 'examples.roberta.multiprocessing_bpe_encoder' (ModuleNotFoundError: No module named 'examples'
In the paper you present a human evaluation comparing your model against HMNet. Did you run HMNet yourself to get the outputs or did you find them somewhere?
Could you share both your system outputs as well as HMNet's on AMI and ICSI?
The code is broken in a number of places, making it impossible to reproduce results:
args.mode
in run.py
other than editing it into the script
--mode train
to run.py
does not work, because that argument will be stored in args.train.mode
, not args.mode
.args.checkpoint_dir
for inference.gen_summary/inference.py:38
, you assign to self.bart.cfg.dataset.batch_size_valid = bsz
. I'm not sure what this is supposed to do, but the bart
object comes from fairseq and has no cfg
member.run.py
does not actually use the output from stage 1 and will just create the same input/output as stage 1 again
For others who are also trying to reproduce:
PYTHONPATH
. You must move it out of the main fairseq dir, or the fairseq logging module will override the Python builtin.pip install
it, you will need to edit the scripts to change imports.Hi author, regarding the Summ-N framework, I have some problems in the very first stage of training the model, at first, I configured the environments as required, but when I utilise the first dataset AMI script file.sh, i.e. go to the Summ-N-main main file, and run the bash scripts/run_AMI.sh command, it will always run to test mode in the run.py file and return an error. I always run the run.py file to the test mode, and return an error, but the default mode should not be the training mode, then I will run.py in the args.mode variable assigned to "train", which can enter the training mode, but there is the following problem: OSError: Model file not found:. /output/AMI/stage_1/trainer_output/checkpoints/checkpoint_best.pt, which means I'm missing the model weights file, but I'm not sure why it shows that the .pt file is missing, and I can't get the previous folder checkpoints to work either. open, it shows no files. I'd be grateful if the author could answer this question sometime, thanks!
In line 111 run.py
os.system(f"bash models/train_summarizor.sh {data_folder} {trainer_output_folder} {args.train.cuda_devices}")
The above command is run in stage 2. However, the BART_PATH
argument in train_summarizor.sh is still the default ./bart.large.cnn/model.pt
. Shouldn't it be the path to the checkpoint_best.pt from stage 1's output?
Hi
I can't reproduce this repository
Because there is an unknown import module library from ThirdParty.rouge.rouge.rouge_score in "segmentor_core.py" file
import nltk
from rouge import Rouge
from ThirdParty.rouge.rouge.rouge_score import *
from utils.tools import download_nltk
I checked the AnyROUGE repository, but there is no such module.
Can you check about this issue?
Thank you so much in advance!
Import "ThirdParty.rouge.rouge.rouge_score" in the segmentor_core.py file, but it was not found in ThirdParty
Hi, I am glad you release some of the prediction result. Really helps me a lot to follow your work.
However I have one question, what's the order of the prediction result? Can you tell me which test file corresponds to each output?
While reproducing the repository, I could not configure ThirdParty/ROUGE resulting in broken code. Could you assist me with this
Hello, I encountered the following issue while trying to reproduce your code:
[nltk_data] Downloading package punkt to /home/isaac/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to /home/isaac/nltk_data...
[nltk_data] Package punkt is already up-to-date!
Finish loading stage 0 dataset!
Train size: 97
Val size: 20
Test size: 20
Start target matching of Stage 1. This may take several minutes.
637it [00:00, 1936066.41it/s]
122it [00:00, 1336044.62it/s]
139it [00:00, 1362168.82it/s]
Finish loading stage 1 dataset!
Train size: 637
Val size: 122
Test size: 139
2023-06-06 03:29:21 | INFO | fairseq.file_utils | Archive name 'stage_1/trainer_output' was not found in archive name list. We assumed 'stage_1/trainer_output' was a path or URL but couldn't find any file associated to this path or URL.
Traceback (most recent call last):
File "run.py", line 91, in
summary_generator = SummaryGenerator(args, split_source, fine_grained=False, test_mode=True)
File "/home/isaac/Summ-N/models/gen_summary/inference.py", line 18, in init
self.stage_cfg.trainer_output_folder,
File "/home/isaac/.conda/envs/summ-n/lib/python3.7/site-packages/fairseq/models/bart/model.py", line 122, in from_pretrained
**kwargs,
File "/home/isaac/.conda/envs/summ-n/lib/python3.7/site-packages/fairseq/hub_utils.py", line 55, in from_pretrained
kwargs["data"] = os.path.abspath(os.path.join(model_path, data_name_or_path))
File "/home/isaac/.conda/envs/summ-n/lib/python3.7/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
scripts/run_AMI.sh: 19: --cuda-devices: not found
The error seems to occur in the initialization of BARTModel.from_pretrained in inference.py and the issue is TypeError: expected str, bytes or os.PathLike object, not NoneType:
self.bart = BARTModel.from_pretrained(
self.stage_cfg.trainer_output_folder,
checkpoint_file='checkpoints/checkpoint_best.pt',
data_name_or_path="./bin",
)
Question 1: Do you have any idea what might have caused this? How can I resolve it? Should I modify the data_name_or_path variable?
Question 2: What does the error scripts/run_AMI.sh: 19: --cuda-devices: not found mean?
Question 3: Why “2023-06-06 03:29:21 | INFO | fairseq.file_utils | Archive name 'stage_1/trainer_output' was not found in archive name list. We assumed 'stage_1/trainer_output' was a path or URL but couldn't find any file associated to this path or URL.”?
Thank you very much!
Hello, I saw that the version number of fairseq you wrote needs 1.10.0, but I did not find 1.10.0 in the historical version of fairseq, is it version 0.10.0? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.