xcfcode / plm_annotator Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 7.0 6.71 MB

Codes for our ACL21 paper: Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

Python 96.09% Shell 2.17% C++ 0.49% Cuda 1.12% Lua 0.13%

plm_annotator's Introduction

Hi there 👋 I'm Xiachong Feng.

plm_annotator's People

Contributors

Stargazers

Watchers

Forkers

curiszhou xrosliang qiucf-king fuxn 2022nlp afjey123 stevenzhb

plm_annotator's Issues

关于Topic Segmentation代码部分的问题

您好！很感谢你们提供的开源代码。在查看topic_segmentation代码的过程中，我注意到你们筛去了第一和第二句utterance，请问一下是出于什么目的呢？

PLM_annotator/annotator/annotate.py

Lines 175 to 176 in 2018eb8

    
           if index == 0 or index == 1:  # do not consider 1st and 2nd 
        
               continue

关于第一步get loss的问题

两个问题：

用python get_loss.py -d ami 处理速度慢；

每个样例数据需要处理大约1~2min，请问是否为正常速度？

第二步中python recover_word_loss.py -d [自己数据集]报错，
Load train_loss.json finished, Data size:100 Load valid_loss.json finished, Data size:30 Load test_loss.json finished, Data size:30 Traceback (most recent call last): File "recover_word_loss.py", line 90, in <module> process(train_datas, dataset, "train") File "recover_word_loss.py", line 70, in process res.append(process_one(data)) File "recover_word_loss.py", line 62, in process_one words, losses = recover_word_level(subwords, losses) # recover word-level losses File "recover_word_loss.py", line 49, in recover_word_level assert len(dialogue.split()) == len(word_level_losses.split()) AssertionError
我猜测可能是datasets可能有固定json格式？我用的是[["summary1", "dialogues1"], ... ["summary_N_", "dialogues_N_"]]这种格式。
其中summary为“ summary ”
dialogue为“utterence1\n utterence2\n ... ”
如果有严格的格式要求，还望麻烦告知一下！

感谢您的工作，也希望您能尽快解决我的问题，不胜感激！

Hi,
I tried to re-implement the results on SAMSum following the README.md. However, the rouge score I got are only:
rouge-1: 55.71 55.75 53.04
rouge-2: 29.65 29.20 27.85
rouge-l: 51.50 51.57 49.71
ROUGE 1-2-L F: 53.04-27.85-49.71

I can get the exact matched scores shown in your paper by the generated summaries provided:
rouge-1: 57.90 54.90 53.70
rouge-2: 31.22 29.53 28.79
rouge-l: 53.72 51.54 50.81
ROUGE 1-2-L F: 53.70-28.79-50.81

I tested with the best checkpoint which actually didn't update after the first several epochs.
Is this variance acceptable? Are there any details I may lost before?

Thank you very much!

能分享一下源码吗

你的另一篇论文，Incorporating Commonsense Knowledge into Abstractive Dialogue
Summarization via Heterogeneous Graph Networks，事实感知的我挺感兴趣的，但是没有找到源码，方便分享一下嘛

Code for Data Preprocessing

Hello, can you release the code for giving annotations about keywords extraction, redundancy detection, and topic segmentation? We want to process some new datasets.

Thanks!

Max sentence length shouldn't exceed 800

I download your model and put it in ckpt to reproduce your reslults, but it runs error "max sentence length shouldn't exceed 800". I used the data download from here, and ran bpe.sh and binarise.sh to preprocess them. If there is any wrong? I would appreciate if got your reply.

Special Tokens Representation

Hello! Do these special tokens have special representations while doing the bpe tokenization? Or just treat them as a normal word. Because I haven't found any of the ##KEY## [TS] [RDD] in the encoder.json which is provided by your Google Drive.
I would appreciate if got your reply. @xcfcode

Rouge Score

Hello.
Could you please tell the rouge score in your paper is r, p or f? And for bart baseline , did you do any preprocessing like deleting \n\r or add some special tokens ? thanks

xcfcode / plm_annotator Goto Github PK

plm_annotator's Introduction

Hi there 👋 I'm Xiachong Feng.

plm_annotator's People

Contributors

Stargazers

Watchers

Forkers

plm_annotator's Issues

关于Topic Segmentation代码部分的问题

关于第一步get loss的问题

Re-implementation results

能分享一下源码吗

Code for Data Preprocessing

Max sentence length shouldn't exceed 800

Special Tokens Representation

Rouge Score

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if index == 0 or index == 1: # do not consider 1st and 2nd
	continue