Comments (4)
Hi @colanim ,
Only the encoder is pre-trained in a bidirectional manner, while the decoder is left-to-right, which is controlled by the attention mask matrix. So the fine-tuning process is the same as inference in terms of decoding.
-li
from unilm.
Thanks for the quick response @donglixp !
So if I understood well, for abstractive summarization there is 3 tasks :
- left-to-right LM on the summary part for decoder
- Bidirectional LM on the article part for the encoder
- Extractive task based on the first token
Is it right ?
from unilm.
Because the source side has been given. During fine-tuning, we only compute generation loss for the decoder, which is similar to previous seq2seq models. In the paper, we added an extractive loss in the encoder side, but we didn't use it in the repo's example. The released checkpoint can achieve better results even without the extractive loss.
from unilm.
Ok so in the actual code there is only one loss, which is the generation loss for the decoder (so, left-to-right LM based on the summary).
Thank you very much for your answers !
from unilm.
Related Issues (20)
- [textdiffuser-2] where to set the loss type during training? HOT 2
- OCR on bounding boxes of an image HOT 1
- [textdiffuser2]the unzip cannot download
- YOCO: data and model opensource HOT 1
- About using BEATs as audio feature extractor HOT 2
- Reproducing WavLM results on speaker verification
- BEATs model produces NaN when using mixed precision with pytorch lightning
- Question about TROCR model variations in terms of FLOPs and Inference time
- Unable to use finetuned LayoutLMV3 for object detection task model for testing
- BEiT2 linear probing
- Inference on my own images HOT 1
- Prompt Preparation of Kosmos-2 Object Detection Fine-tuning HOT 1
- unimim is still unavailable after one year HOT 2
- Request for DiT FUNSD MRCNN Config File
- Fine tuning Kosmos 2.5 HOT 1
- How to fine-tune E5-mistral-7b-instruct? HOT 1
- Kosmo2.5 Chinese performance very bad HOT 2
- why last_hidden_states.sequence_length is not same as input_ids.sequence_length
- Model for commercial use? HOT 1
- Kosmos 2.5 for Volta GPU
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unilm.