Comments (7)
@ShibataGenjiro Thank you for the detailed bug report. The longformer does not support token_type_ids
so you need to set the --no_use_token_type_ids
option. I've pushed a change that will automatically enable this option for the longformer. I also opened huggingface/transformers#9111 to make the huggingface/transformers documentation more clear that the longformer does not support token_type_ids
. Let me know how your training run goes. If you end up training the longformer on CNN/DM, I'd appreciate it if you open a pull request with a link to the model weights file so it can be added to the library.
from transformersum.
@HHousen Thank you very much. The longformer can be trained now. (But training is very slow, because I set the batch_size to 1 on single 3090 GPU. If I set a larger batch_size, OOM problem will occur.)
Anyway, you said that the longformer does not use the token_type_ids
(segment_id in BERTSUM, I think).
Does this mean that longform only uses token embeddings
and position embeddings
as input? (while BERTSUM uses token embeddings
, segment embeddings
and position embeddings
)
from transformersum.
@ShibataGenjiro Correct, the longformer only uses token embeddings
and position embeddings
while BERT uses token embeddings
, segment embeddings
, and position embeddings
. This is because the longformer is based on RoBERTa, which is an improved version of BERT. Regarding the OOM issue, you can try setting --gradient_checkpointing
to enable less memory consumption at the expense of a slower backward pass.
from transformersum.
@HHousen OK, I will try. Thank you for your patience in explaining!^^
from transformersum.
@HHousen OK, I will try. Thank you for your patience in explaining!^^
No problem 😄.
from transformersum.
Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option --no_use_token_type_ids
does not seem to be a valid option for abstractive.
from transformersum.
Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option
--no_use_token_type_ids
does not seem to be a valid option for abstractive.
@thechargedneutron A seq2seq (text-to-text) model is needed for abstractive summarization (like t5, BART, etc). The longformer is just an encoder. It does not have a decoder. However, the LED exists for this exact purpose. Here is the huggingface/transformers documentation.
from transformersum.
Related Issues (20)
- TypeError: __init__() got an unexpected keyword argument 'gradient_checkpointing' HOT 1
- predictions_website.py raises AttributeError: '_LazyAutoMapping' object has no attribute '_mapping' HOT 6
- ModuleNotFoundError: No module named 'extractive' HOT 1
- AttributeError: '_LazyAutoMapping' object has no attribute '_mapping' HOT 1
- Abstractive BART Model , RuntimeError: The size of tensor a (64000) must match the size of tensor b (64001) at non-singleton dimension 1
- ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. HOT 3
- error when training an extractive summarization model HOT 2
- Found keys that are in the model state dict but not in the checkpoint HOT 3
- Suggest about the index order of extractive results
- A Chinese solution for TransformerSum-extractive, and I've implemented your work in my project HOT 1
- After extractive training, a process on one GPU won't terminate automatically.
- Fine-tuning/Inference commands for "roberta-base-ext-sum"
- '--data_type' is not accepted when running main.py (extractive mode)
- Why tokenize twice?
- TypeError: forward() got an unexpected keyword argument 'source'
- Instruction for fine tune
- Installation via Pip
- Some versioning problems when installing the environment HOT 2
- pytorch_lightning.callbacks update HOT 1
- RoBERTa & Longformer extractive model checkpoints availability
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformersum.