yale-lily / convosumm Goto Github PK
View Code? Open in Web Editor NEWLicense: Creative Commons Attribution Share Alike 4.0 International
License: Creative Commons Attribution Share Alike 4.0 International
Thanks for the awesome work!!! I have several questions about the email dataset in data_processed/email/vanilla
Hi~
Thank you very much for your code.
But when I tested meeting summarization, I couldn't achieve the results in your paper.Can you provide the hyperparameters?Thanks!
Hi~ thanks for your awesome work, but i met a problem about the results.
When I used your code,I found that my results was lower than you. I don't know that which setting was wrong.
The main hyper-parameters and results are as follows:
The other settings are the same as yours.
I am looking forward to your reply!
Hi , I am trying to replicate your project.As a first step I am trying to download model checkpoints using awscli. I am using aws for the first time and I am having an error " fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied " . Can you help me how to solve this?I created a Free tier account with aws.
Is the model checkpoints are available to download using google drive link also?
I am trying to use scripts/prep.sh
and scripts/inference.py
to load /reddit_vanilla_actual/checkpoint_best.pt
BART for inference. I have been having many issues, mostly related to package versions and the extended 2048
source positions.
Environment:
pytorch 1.7.1 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch
And I tried installing fairseq
from source to access the examples
module, but then I saw you had your own copy of fairseq in this repo so I installed your version according to the instructions here
cd fairseq
pip install --editable ./
python setup.py build develop
I binarized val.source
and val.target
from and am running inference as such:
python scripts/inference.py /home/aadelucia/ConvoSumm/checkpoints/reddit_vanilla_actual checkpoint_best.pt /home/aadelucia/ConvoSumm/alexandra_test/data_processed /home/aadelucia/ConvoSumm/alexandra_test/data/val.source /home/aadelucia/ConvoSumm/alexandra_test/inference_output.txt 4 1 80 120 1 2048 ./misc/encoder.json ./misc/vocab.bpe
And I get the following error:
Traceback (most recent call last):
File "scripts/inference.py", line 42, in <module>
hypotheses_batch = bart.sample(slines, beam=beam, lenpen=lenpen, min_len=min_len, no_repeat_ngram_size=3)
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/hub_utils.py", line 132, in sample
batched_hypos = self.generate(tokenized_sentences, beam, verbose, **kwargs)
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/models/bart/hub_interface.py", line 108, in generate
return super().generate(
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/hub_utils.py", line 171, in generate
for batch in self._build_batches(tokenized_sentences, skip_invalid_size_inputs):
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/hub_utils.py", line 258, in _build_batches
batch_iterator = self.task.get_batch_iterator(
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/tasks/fairseq_task.py", line 244, in get_batch_iterator
batch_sampler = dataset.batch_by_size(
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/data/fairseq_dataset.py", line 145, in batch_by_size
return data_utils.batch_by_size(
File "/home/aadelucia/ConvoSumm/code/fairseq/fairseq/data/data_utils.py", line 337, in batch_by_size
return batch_by_size_vec(
File "fairseq/data/data_utils_fast.pyx", line 20, in fairseq.data.data_utils_fast.batch_by_size_vec
File "fairseq/data/data_utils_fast.pyx", line 27, in fairseq.data.data_utils_fast.batch_by_size_vec
AssertionError: Sentences lengths should not exceed max_tokens=1024
Am I using the wrong version of a package? Is there something extra needed for this to work?
Hi,
By using the run.sh script in inference mode by loading one of the pretrained models from the README file, I keep getting this error:
Traceback (most recent call last):
File "./scripts/summarization.py", line 432, in <module>
main(args)
File "./scripts/summarization.py", line 346, in main
model = Summarizer.load_from_checkpoint(args.from_pretrained, args)
File "/users/oncescu/miniconda3/envs/longformer/lib/python3.7/site-packages/pytorch_lightning/core/saving.py", line 169, in load_from_checkpoint
model = cls._load_model_state(checkpoint, *args, **kwargs)
File "/users/oncescu/miniconda3/envs/longformer/lib/python3.7/site-packages/pytorch_lightning/core/saving.py", line 208, in _load_model_state
model = cls(*cls_args, **cls_kwargs)
TypeError: __init__() takes 2 positional arguments but 3 were given
Hi Alexander,
Thanks for publishing the dataset. After downloading the data, do you know how to split the data to train/valid/test for a fair comparison with your experimental results?
Tyson
Hi,
Thanks for your nice work on conversation summarization. I am planning to use to Argument classifier module for my own work. I have two questions regarding that:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.