Thanks for open-sourcing the code ! After reading your paper, I have

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the quick response <a class="user-mention notranslate" data-hovercard-type=

❓ Question : Training - Evaluating discrepancy in Abstractive Summarization about unilm HOT 4 CLOSED

microsoft commented on July 20, 2024 1

❓ Question : Training - Evaluating discrepancy in Abstractive Summarization

from unilm.

Comments (4)

donglixp commented on July 20, 2024 3

Hi @colanim ,

Only the encoder is pre-trained in a bidirectional manner, while the decoder is left-to-right, which is controlled by the attention mask matrix. So the fine-tuning process is the same as inference in terms of decoding.

-li

from unilm.

astariul commented on July 20, 2024 1

Thanks for the quick response @donglixp !

So if I understood well, for abstractive summarization there is 3 tasks :

left-to-right LM on the summary part for decoder
Bidirectional LM on the article part for the encoder
Extractive task based on the first token

Is it right ?

from unilm.

donglixp commented on July 20, 2024

Because the source side has been given. During fine-tuning, we only compute generation loss for the decoder, which is similar to previous seq2seq models. In the paper, we added an extractive loss in the encoder side, but we didn't use it in the repo's example. The released checkpoint can achieve better results even without the extractive loss.

from unilm.

astariul commented on July 20, 2024

Ok so in the actual code there is only one loss, which is the generation loss for the decoder (so, left-to-right LM based on the summary).

Thank you very much for your answers !

from unilm.

❓ Question : Training - Evaluating discrepancy in Abstractive Summarization about unilm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent