The evidentiality_qa from akariasai

evidentiality_qa's Introduction

Hi there 👋

I am a fifth-year PhD student at Paul G. Allen School of Computer Science & Engineering, University of Washington advised by ‪Hannaneh Hajishirzi‬. I work on Natural Language Processing as part of UW NLP, with a focus on retrieval-augmented language models. Before UW, I obtained a B.E. degree in Electrical Engineering and Computer Science (EEIC) from The University of Tokyo.

Contact

Personal Website: akariasai.github.io/
Twitter: @AkariAsai
email: my_first_name[at]cs.washington.edu

evidentiality_qa's People

Contributors

Stargazers

Watchers

evidentiality_qa's Issues

Pretrained weights

Hey,

This looks like a super cool model. Are you planning on releasing the fine-tuned weights for the model as well?

Question about the "top-20 recall" score in Table 1

Hi.

Thanks for your great work.

Your Table 1 reports that the "top-20 recall" score for WoW is 96.2. Does this score refer to the DPR retrieved results that you provided in the README?

I was wondering how did you calculate the score. Is there any script you used for calculating that score?

Because I did that evaluation using both the script of KILT and by writing my own script. The top-20 recall score is only around 65.55.

Below is my script. wow-dev-kilt.jsonl is from the Github page of KILT

import json

with open('wow_dev.json', 'r') as f:
    guess_data = json.loads(f.read())

gold_data = []
with open('wow-dev-kilt.jsonl', 'r') as f:
    for line in f:
        gold_data.append(json.loads(line))

recall = 0
for d1, d2 in zip(guess_data, gold_data):
    assert(d1['question'] == d2['input'])
    d1['ctxs'].sort(key=lambda x: x.get('score'), reverse=True)
    guess_titles = [ctx['title'] for ctx in d1['ctxs']]
    gold_title = d2['output'][0]['provenance'][0]['title']
    if gold_title in guess_titles:
        recall += 1
print(recall / len(guess_data))

I saw similar thing with FEVER.
Thanks.

Questions about evidentiality prediction

Hi,

Thanks for your exciting work.

I am trying to reproduce the evidentiality prediction results. What is the accuracy/F1 that you achieved for the evidentiality classifier? And how many positive/negative pairs do you mine from each dataset?

I searched for this information in your paper but couldn't find it. Please let me know if I miss anything. Thanks!

Recommend Projects

akariasai / evidentiality_qa Goto Github PK

evidentiality_qa's Introduction

Hi there 👋

Contact

evidentiality_qa's People

Contributors

Stargazers

Watchers

evidentiality_qa's Issues

Pretrained weights

Question about the "top-20 recall" score in Table 1

Questions about evidentiality prediction

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent