akariasai / learning_to_retrieve_reasoning_paths Goto Github PK

The official implementation of ICLR 2020, "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering".

License: MIT License

Python 99.56% Shell 0.44%

open-domain-qa multi-hop-reasoning hotpotqa squad natural-questions retrieval reading-comprehension

learning_to_retrieve_reasoning_paths's Introduction

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

This is the official implementation of the following paper:
Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering. In: Proceedings of ICLR. 2020

In the paper, we introduce a graph-based retriever-reader framework that learns to retrieve reasoning paths (a reasoning path = a chain of multiple paragraphs to answer multi-hop questions) from English Wikipedia using its graphical structure, and further verify and extract answers from the selected reasoning paths. Our experimental results show state-of-the-art results across three diverse open-domain QA datasets: HotpotQA (full wiki), Natural Questions Open, SQuAD Open.

Acknowledgements: To implement our BERT-based modules, we used the huggingface's transformers library. The implementation of TF-IDF based document ranker and splitter started from the DrQA and document-qa repositories. Huge thanks to the contributors of those amazing repositories!

0. Quick Run on HotpotQA

We provide quick_start_hotpot.sh, with which you can easily set up and run evaluation on HotpotQA full wiki (on the first 100 questions).

The script will

download our trained models and evaluation data (See Installation for the details),
run the whole pipeline on the evaluation data (See Evaluation), and
calculate the QA scores and supporting facts scores.

The evaluation will give us the following results:

{'em': 0.6, 'f1': 0.7468968253968253, 'prec': 0.754030303030303, 'recall': 0.7651666666666667, 'sp_em': 0.49, 'sp_f1': 0.7769365079365077, 'sp_prec': 0.8275, 'sp_recall': 0.7488333333333332, 'joint_em': 0.33, 'joint_f1': 0.6249458756180065, 'joint_prec': 0.6706212121212122, 'joint_recall': 0.6154999999999999}

Wanna try your own open-domain question? See Interactive Demo! Once you run the quick_start_hotpot.sh, you can easily switch to the demo mode by changing some options in the command.

1. Installation

Requirements

Our framework requires Python 3.5 or higher. We do not support Python 2.X.

It also requires installing pytorch-pretrained-bert version (version 0.6.2) and PyTorch version 1.0 or higher. The other dependencies are listed in requirements.txt.
We are planning to migrate from pytorch-pretrained-bert to transformers soon.

Set up

Run the following commands to clone the repository and install our framework:

git clone https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths.git
cd learning_to_retrieve_reasoning_paths
pip install -r requirements.txt

Downloading trained models

All the trained models used in our paper for the three datasets are available in google drive:

HotpotQA full wiki: hotpot_models.zip
Natural Questions Open: nq_models.zip
SQuAD Open: squad_models.zip: for SQuAD Open, please download .db and .npz files following DrQA repository.

Alternatively, you can download a zip file containing all models by using gdown.

e.g., download HotpotQA models

mkdir models
cd models
gdown https://drive.google.com/uc?id=1ra37xtEXSROG_f90XxR4kgElGJWUHQyM
unzip hotpot_models.zip
rm hotpot_models.zip
cd ..

Note: the size of the zip file is about 4GB for HotpotQA models, and once it is extracted, the total size of the models is more than 8GB (including the introductory paragraph only Wikipedia database). The nq_models.zip include full Wikipedia database, which is around 30GB once extracted.

Downloading data

for training

You can download all of the training datasets from here (google drive).
We create (1) data to train graph-based retriever, and (2) data to train reader by augmenting the publicly available machine reading comprehension datasets (HotpotQA, SQuAD and Natural Questions). See the details of the process in Section 3.1.2 and Section 3.2 in our paper.

for evaluation

Following previous work such as DrQA or qa-hard-em, we convert the original machine reading comprehension datasets to sets of question and answer pairs. You can download our preprocessed data from here.
For HotpotQA, we only use question-answer pairs as input, but we need to use the original HotpotQA development set (either fullwiki or distractor) to evaluate supporting fact evaluations from HotpotQA's website.

mkdir data
cd data
mkdir hotpot
cd hotpot
gdown https://drive.google.com/uc?id=1m_7ZJtWQsZ8qDqtItDTWYlsEHDeVHbPt # download preprocessed full wiki data
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_fullwiki_v1.json # download the original full wiki data for sp evaluation. 
cd ../..

2. Train

In this work, we use a two-stage training approach, which lets you train the reader and retriever independently and easily switch to new reader models. The details of the training process can be seen in the README files in graph_retriever, reader and sequence_sentence_selector.

You can download our pre-trained models from the link mentioned above.

3. Evaluation

After downloading a TF-IDF retriever, training a graph-retriever and reader models, you can test the performance of our entire system.

HotpotQA

If you set up using quick_start_hotpot.sh, you can run full evaluation by setting the --eval_file_path option to data/hotpot/hotpot_fullwiki_first_100.jsonl .

python eval_main.py \
--eval_file_path data/hotpot/hotpot_fullwiki_data.jsonl \
--eval_file_path_sp data/hotpot/hotpot_dev_distractor_v1.json \
--graph_retriever_path models/hotpot_models/graph_retriever_path/pytorch_model.bin \
--reader_path models/hotpot_models/reader \
--sequential_sentence_selector_path models/hotpot_models/sequential_sentence_selector/pytorch_model.bin \
--tfidf_path models/hotpot_models/tfidf_retriever/wiki_open_full_new_db_intro_only-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/hotpot_models/wiki_db/wiki_abst_only_hotpotqa_w_original_title.db \
--bert_model_sequential_sentence_selector bert-large-uncased --do_lower_case \
--tfidf_limit 500 --eval_batch_size 4 --pruning_by_links --beam_graph_retriever 8 \
--beam_sequential_sentence_selector 8 --max_para_num 2000 --sp_eval

The evaluation will give us the following results (equivalent to our reported results):

{'em': 0.6049966239027684, 'f1': 0.7330873757783022, 'prec': 0.7613180885780131, 'recall': 0.7421444532461545, 'sp_em': 0.49169480081026334, 'sp_f1': 0.7605390258327606, 'sp_prec': 0.8103758721584524, 'sp_recall': 0.7325846435805953, 'joint_em': 0.35827143821742063, 'joint_f1': 0.6143774960171196, 'joint_prec': 0.679462464277477, 'joint_recall': 0.5987834193329556}

SQuAD Open

python eval_main.py \
--eval_file_path data/squad/squad_open_domain_data.jsonl \
--graph_retriever_path models/squad_models/selector/pytorch_model.bin \
--reader_path models/squad_models/reader \
--tfidf_path DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path DrQA/data/wikipedia/docs.db \
--bert_model bert-base-uncased --do_lower_case \
--tfidf_limit 50 --eval_batch_size 4 \
--beam_graph_retriever 8 --max_para_num 2000 --use_full_article

Natural Questions

python eval_main.py \
--eval_file_path data/nq_open_domain_data.jsonl \
--graph_retriever_path models/nq/selector/pytorch_model.bin --reader_path models/nq/reader/ \
--tfidf_path models/nq_models/tfidf_retriever/wiki_20181220_nq_hyper_linked-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/nq_models/wiki_db/wiki_20181220_nq_hyper_linked.db \
--bert_model bert-base-uncased --do_lower_case --tfidf_limit 20 --eval_batch_size 4 --pruning_by_links \
--beam_graph_retriever 8 --max_para_num 2000 --use_full_article

(optional) Using TagMe for initial retrieval

As mentioned in Appendix B.7 in our paper, you can optionally use an entity linking system (TagMe) for the initial retrieval.

To uee TagMe,

register to get API key, and
set the API key via --tagme_api_key option, and set --tagme option true.

python eval_main.py \
--eval_file_path data/nq_open_domain_data.jsonl \
--graph_retriever_path models/nq/selector/pytorch_model.bin --reader_path models/nq/reader/ \
--tfidf_path models/nq_models/tfidf_retriever/wiki_20181220_nq_hyper_linked-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/nq_models/wiki_db/wiki_20181220_nq_hyper_linked.db \
--bert_model bert-base-uncased --do_lower_case --tfidf_limit 20 --eval_batch_size 4 --pruning_by_links --beam 8 --max_para_num 2000 --use_full_article --tagme --tagme_api_key YOUR_API_KEY

The implementation of the two-step TF-IDF retrieval module (article retrieval --> paragraph-level re-ranking) for Natural Questions is currently in progress, which might give slightly lower scores than the reported results in our paper. We'll fix the issue soon.

4. Interactive demo

You could run interactive demo and ask open-domain questions. Our model answers the question with supporting facts.

If you set up using quick_start.sh script, you can run full evaluation by changing the script name to from eval_main.py to demo.py, and removing --eval_file_path and --eval_file_path_sp options.

e.g.,

python demo.py \
--graph_retriever_path models/hotpot_models/graph_retriever_path/pytorch_model.bin \
--reader_path models/hotpot_models/reader \
--sequential_sentence_selector_path models/hotpot_models/sequential_sentence_selector/pytorch_model.bin \
--tfidf_path models/hotpot_models/tfidf_retriever/wiki_open_full_new_db_intro_only-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/hotpot_models/wiki_db/wiki_abst_only_hotpotqa_w_original_title.db \
--do_lower_case --beam 4 --quiet --max_para_num 200 \
--tfidf_limit 20 --pruning_by_links \

An output example is as follows:

#### Reader results ####
[    {
        "q_id": "DEMO_0",
        "question": "Bordan Tkachuk was the CEO of a company that provides what sort of products?",
        "answer": "IT products and services",
        "context": [
            "Cintas_0",
            "Bordan Tkachuk_0",
            "Viglen_0"
        ]
    }
]

#### Supporting facts ####
[
    {
        "q_id": "DEMO_0",
        "supporting facts": {
            "Viglen_0": [
                [0, "Viglen Ltd provides IT products and services, including storage systems, servers, workstations and data/voice communications equipment and services."
                ]
            ],
            "Bordan Tkachuk_0": [
                [0, "Bordan Tkachuk ( ) is a British business executive, the former CEO of Viglen, also known from his appearances on the BBC-produced British version of \"The Apprentice,\" interviewing for his boss Lord Sugar."
                ]
            ]
        }
    }
]

5. Others

Distant supervision & negative examples data generation

In this work, we augment the original MRC data with negative and distant supervision examples to make our retriever and reader robust to inference time noise. Our experimental results show these training strategy gives significant performance improvements.

All of the training data is available here (google drive).

We are planning to release our codes to augment training data with negative examples and distant examples to guide future research in open-domain QA fields. Please stay tuned!

Dataset format

For quick experiments and detailed human analysis, we save intermediate results for each step: original Q-A pair (format A), TF-IDF retrieval (format B), our graph-based (format C) retriever.

Format A (eval data, the input of TF-IDF retriever)

For the evaluation pipeline, our initial input is a simple jsonlines format where each line contains one example with id = [str], question = [str] and answer = List[str] (or answers = List[str] for datasets where multiple answers are annotated for each question) information.

For SQuAD Open and HotpotQA fullwiki, you can download the preprocessed format A files from here.

e.g., HotpotQA fullwiki dev

{
"id": "5ab3b0bf5542992ade7c6e39", 
"question": "What year did Guns N Roses perform a promo for a movie starring Arnold Schwarzenegger 
as a former New York Police detective?", 
"answer": ["1999"]
}

e.g., SQuAD Open dev

{
"id": "56beace93aeaaa14008c91e0", 
"question": "What venue did Super Bowl 50 take place in?", 
"answers": ["Levi's Stadium", "Levi's Stadium", 
"Levi's Stadium in the San Francisco Bay Area at Santa Clara"]
}

Format B (TF-IDF retriever output)

For TF-IDF results, we store the data as a list of JSON, and each data point contains several information.

q_id = [str]
question = [str]
answer = List[str]
context = Dict[str, str]: Top $N$ paragraphs which are ranked high by our TF-IDF retriever.
all_linked_para_title_dic = Dict[str, List[str]]: Hyper-linked paragraphs' titles from paragraphs in context.
all_linked_paras_dic = Dict[str, str]: the paragraphs of the hyper-linked paragraphs.

For training data, we have additional items that are used as ground-truth reasoning paths.

short_gold = List[str]
redundant_gold = List[str]
all_redundant_gold = List[List[str]]

e.g., HotpotQA fullwiki dev

{
"question": 'Were Scott Derrickson and Ed Wood of the same nationality?'.
"q_id": "5ab3b0bf5542992ade7c6e39", 
"context": 
    {"Scott Derrickson_0": "Scott Derrickson (born July 16, 1966) is an American director,....", 
      "Ed Wood'_0": "...", ....}, 
'all_linked_para_title_dic':
    {"Scott Derrickson_0": ['Los Angeles_0', 'California_0', 'Horror film_0', ...]},
'all_linked_paras_dic': 
    {"Los Angeles_0": "Los Angeles, officially the City of Los Angeles and often known by its initials L.A., is ...", ...}, 
'short_gold':[], 
'redundant_gold': [],
'all_redundant_gold': []
}

Format C (Graph-based retriever output)

The graph-based retriever's output is a list of JSON objects as follows:

q_id = [str]
titles = [str]: a sequence of titles (the top one reasoning path)
topk_titles = List[List[str]]: k sequences of titles (the top k reasoning paths).
context = Dict[str, str]: the paragraphs which are included in top reasoning paths.

{
"q_id": "5a713ea95542994082a3e6e4",
"titles": ["Alvaro Mexia_0", "Boruca_0"],
"topk_titles": [
    ["Alvaro Mexia_0", "Boruca_0"], 
    ["Alvaro Mexia_0", "Indigenous peoples of Florida_0"], 
    ["Alvaro Mexia_0"], 
    ["List of Ambassadors of Spain to the United States_0", "Boruca_0"], 
    ["Alvaro Mexia_0", "St. Augustine, Florida_0"], 
    ["Alvaro Mexia_0", "Cape Canaveral, Florida_0"], 
    ["Alvaro Mexia_0", "Florida_0"], 
    ["Parque de la Bombilla (Mexico City)_0", "Alvaro Mexia_0", "Boruca_0"]],
"context": {
    "Alvaro Mexia_0": "Alvaro Mexia was a 17th-century Spanish explorer and cartographer of the east coast of Florida....",       "Boruca_0": "The Boruca (also known as the Brunca or the Brunka) are an indigenous people living in Costa Rica"}
}

Citation and Contact

If you find this codebase is useful or use in your work, please cite our paper.

@inproceedings{
asai2020learning,
title={Learning to Retrieve Reasoning Paths over Wikipedia Graph for  Question Answering},
author={Akari Asai and Kazuma Hashimoto and Hannaneh Hajishirzi and Richard Socher and Caiming Xiong},
booktitle={International Conference on Learning Representations},
year={2020}
}

Please contact Akari Asai (@AkariAsai, akari[at]cs.washington.edu) for questions and suggestions.

learning_to_retrieve_reasoning_paths's People

Contributors

Stargazers

Watchers

Forkers

ammieqi intuitionmachine arimkatz pdqn noelnamai ian-donaldson dragomirradev waldenn czheng17 databill86 wgc20 mbrhd silencio94 boogh saadhashmi91 arita37 zhangxingyaothu gaiyu0 xliu019 odellus rogervaas xennygrimmato qianrenjian altaireon briantin vivek-2018201078 lidagh know-nothing8 nikitosoleil svathsa stjordanis anupam20sep kunlqt tiagoooliveira isamelbcode bobycv06fpm hamishivi ajesujoba tk1363704 sparkjiao xuehuiquan haritzpuerto deepak-katchi jzhoubu yuelala iamfinethanksu isfengg jocelyn1981 himeshph henryzhao5852 jianliu-ml gongchuanyang weijunlei mubidiy kerwen-l seungonekim mukhal txiaox-hub xiyuanhou bachng parastoosj iq-scm sushmaakoju regression-io

learning_to_retrieve_reasoning_paths's Issues

How to train and evaluate the models in HotpotQA distractor setting?

Hi, thanks for your great works!

I'm currently trying to reproduce your results in HotpotQA distractor setting, but I am facing some technical difficulties.
I apologize in advance if these are dumb questions, but it would be very helpful if you answer these:

'hotpot_train_order_sensitive.json' file
Readme file in graph_retriever folder specifies that 'hotpot_train_order_sensitive.json' is used for training in hotpot distractor setting. But I can't find this file in train_data folder you released. Is there any way I can download this particular file, or is there a way I can create a file of this particular format from original HotpotQA training set?
sentence selector
I read in your paper that, graph retriever in hotpotQA distractor setting is different from full-wiki setting, but both settings share the same reader model. I'm curious if the sentence selector model is separate(like graph retriever) or shared(like reader) across distractor/full-wiki setting. Also, if the sentence selector for distractor setting is different from that of full-wiki setting, I wonder how I can get the train data for distractor setting. (It seems that the train data you released contains only one pair of dev/train data for the sentence selector)
preprocessing of hotpot distractor dataset
It seems that, in order to run your model(for evaluation), the user needs a preprocessed dataset.
I checked that the preprocessed hotpot full-wiki data is available, but I am not sure I have access to the hotpot distractor dataset. Is there any way for me to get preprocessed hotpot distractor data? (Downloading it or Preprocessing it by myself?)
Evaluation on distractor setting
It seems that the evaluation code for QA/SP task basically considers the open-domain scenario.
How can I evaluate the model in closed scenario in distractor setting, as you did in your paper?

Thanks for your reading and attention :) @hassyGo @AkariAsai

Fine-tuning on own documents?

Hi - what would be the recommended approach for fine tuning (not full retrain) of the model on one's own documents?

Thank you

Using switch_logits of final doc_span only

Hi again,

I'm reading through rc_utils.py and I found that you only use the switch_logits of the final doc_span/stride. Is this intentional?

https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/reader/rc_utils.py#L1019

You also use it in lines 1061 and 1062. Have you tried other ways? Like taking average of switch logits over all doc_spans, or max.. etc.?

Small typo in the paper

In figure 2, the third input the recurrent network in the lower part should be "H" instead of "D", is that a typo?

Some details regarding generating NQ trainset for the reader model

Hi @AkariAsai. Thank you for this great work.

I'd like to understand more clearly how the NQ trainset for the reader model is generated.
On your comment, you said that you removed all the tables and list elements from the NQ's original preprocessed HTML data.
#9 (comment)

I'm curious how you handled the case where a list element contains an answer and a paragraph contains the list? (like the following example)
https://github.com/google-research-datasets/natural-questions/blob/master/toy_example.md

eg. <p>Google was founded in 1998 By:<ul><li>Larry</li><li>Sergey</li></ul></p>

Inconsistent 'answers' types in the nq_reader_train data

I found that a few 'answers' which are distantly supervised have 'dict' types while most of the 'answers' have 'list' types as values.

Example.

How to evaluate the pretrained graph retriever model?

I downloaded the pretrained model. I want to evaluate the graph retriever on HotpotQA. Should I just input the 'models/hotpot_models/graph_retriever' as the output_dir? And can I use the pretrained model to test the HotpotQA distractor? Or I need to train a new model for HotpotQA distractor?

The error when training the graph_retriever in the HotpotQA

Thanks for your great work!
When I run the run_graph_retriever.py ( in the graph_retriever folder) to train the graph-based recurrent retriever model in the train dataset ( the files in the hotpotqa_new_selector_train_data_db_2017_10_12_fix.zip ) of HotpotQA, there is an error like it.

Traceback (most recent call last):
File "run_graph_retriever.py", line 546, in
main()
File "run_graph_retriever.py", line 264, in main
train_examples = processor.get_train_examples(graph_retriever_config)
File "/DATA/sunzhanchen/learning_to_retrieve_reasoning_paths-master/graph_retriever/utils.py", line 200, in get_train_examples
examples += self._create_examples(file_name, graph_retriever_config, "train")
File "/DATA/sunzhanchen/learning_to_retrieve_reasoning_paths-master/graph_retriever/utils.py", line 429, in _create_examples
assert t in context
AssertionError

Is there any question about the train dataset or any other question?
Thanks you again!

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Experiencing this error while training graph retriever over hotpotqa data
jsn = json.load(open(file_name, 'r', encoding="utf-8", errors='ignore', newline='\n'))
learning_to_retrieve_reasoning_paths/graph_retriever/utils.py", line 232

Preprocessing of HotpotQA

Hi,

Thank you for your work and sharing your code!

I have some questions about the file you provide: "HotpotQA reader train data". Can you please point/share the preprocessing code that gives "answer starts", because the original hotpotqa training data doesn't have this. Also, in that file, are all yes/no regarded as "is_impossible=True"?

question about wikipedia data

Hi, thanks for sharing the code. Great work!

I have a quick question: Where can I find your preprocessed Wikipedia paragraphs and Wikipedia graph?

What do output_masks do?

Hi @AkariAsai ,

Thanks for the great repo!

I'm trying to adapt your model to a new dataset, I find that there is an output_masks in the graph_retriever/utils.py convert_examples_to_features function, may I check what exactly do the output_masks do? How should I set the masking for positive and negative reasoning paths?

Also, may I clarify how did you set the gold labels for each RNN step, and for the negative paths?

Thanks!

Why are some document titles missing?

Thank you for the amazing repo.

I am curious why are some titles missing from the tfidf index. It seems that during evaluation we get multiple such warnings:

Oranjegekte_0 is missing
James Gunn_0 is missing
..

I assume this means that some document titles are not found in the database. Is that normal? could you explain?

Thanks!

negative documents construction for graph retriever of hotpotQA fullwiki

Hello AkariAsai, thank you for the great job! After going through the codes of graph retriever, I found that the principle of negative documents construction for graph retriever seems: TF-IDF documents first, then the hyperlink negative ones? My question is: hyperlink negative docs are considered by appending docs of all_linked_paras_dic, but keys of all_linked_paras_dic are all TF-IDF retrieved titles, so the most important part, hyperlink negative doc of gold path, may not be included for training?

Wrong links to the training data of the reader models

In the README file for the reader (https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/reader/README.md), [SQuAD reader train data] is linked to a private file (https://drive.google.com/open?id=186kFXd_g0pGJz12-E_sHvBVX6QrnyrzE), and [Natural Questions train data] is linked to a file that seems like the training data for SQuAD (https://drive.google.com/file/d/1aMTXIxYZCAC6sX5mZt6nytYxeKvjuigq/view).

How many of the first TF-IDF processing needs to be retained?

Hello！ I would like to ask how many tf-idf need to be kept at the beginning. Is it fixed?

Thank you!

How to evaluate the supporting facts in the HotPotQA experiment?

Hello, the content is amazing, aoao, but I am curious about hotpotqa's supporting facts experiment.

If the reasoning is based on the wikipedia data of the external chain, then how is the accuracy of the hotpotqa data: "from which text is the answer" (supporting the fact) calculated for the hotpotqa data?

thank you for your reply!

demo.py arg error about NQ

Hi Akari,
Thanks for the great repo. I had an error when trying to use the demo for the natural questions' model. I followed the tip to rename demo.py in the previous example for running eval of NQ.

I am running

python demo.py \
--graph_retriever_path models/nq/selector/pytorch_model.bin \
--reader_path models/nq/reader/ \
--tfidf_path models/nq_models/tfidf_retriever/wiki_20181220_nq_hyper_linked-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/nq_models/wiki_db/wiki_20181220_nq_hyper_linked.db \
--bert_model bert-base-uncased --do_lower_case --tfidf_limit 20 --eval_batch_size 4 --pruning_by_links \
--beam_graph_retriever 8 --max_para_num 2000 --use_full_article

And got the errors:

demo.py: error: ambiguous option: --bert_model could match --bert_model_graph_retriever, --bert_model_sequential_sentence_selector.
demo.py: error: unrecognized arguments: --use_full_article （i assumed it was the bert_model_sequential_sentence_selector)
FileNotFoundError: [Errno 2] No such file or directory: 'models/nq/selector/pytorch_model.bin' (I removed the --use_full_article) I think this should be models/nq_models/graph_retrieverpytorch_model.bin?
--reader_path models/nq/reader/ seems not correct? I think this should be models/nq_models/reader

What the TF-IDF retriever data output mean

Thanks for the good work. Just to be sure I understand the paper and implementation correctly,

does the graph retriever model extract paragraphs from another source except from the output data from the TF-IDF retrieval output during training and inference?
Going by the TF-IDF output format

{
"question": 'Were Scott Derrickson and Ed Wood of the same nationality?'.
"q_id": "5ab3b0bf5542992ade7c6e39",
"context":
    {"Scott Derrickson_0": "Scott Derrickson (born July 16, 1966) is an American director,....",
      "Ed Wood'_0": "...", ....},
'all_linked_para_title_dic':
    {"Scott Derrickson_0": ['Los Angeles_0', 'California_0', 'Horror film_0', ...]},
'all_linked_paras_dic':
    {"Los Angeles_0": "Los Angeles, officially the City of Los Angeles and often known by its initials L.A., is ...", ...},
'short_gold':[],
'redundant_gold': [],
'all_redundant_gold': []
}

Am I correct to say that
C_1 = context
C_2 = any of the **all_linked_para_title_dic** as extracted by the graph based retriever?

Am I also correct to say that the data format would work only for questions that need to be answered in at-most 2-hops?

Sorry if my questions are too basic

The hyperparameters for training the bert-base reader ?

Hi, thanks to your contribution, would you mind sharing the hyperparameters for training the bert-base reader ?
It seems that the group of hyperparameters using mini-batch of 128 mentioned in the paper are for bert-wwm-large. And I can't reproduce the results using the command provided by the reader dir. I obtained an em as 50.53 and f1 as 63.17.

Thank you very much!

Evaluation input for retriever

Hi,

Thanks for sharing the code. Wondering which file should I use if I want to get the output of graph retriever using hotpot_dev_fullwiki_v1.json? Looks like the code only takes input in SQuAD 2.0 format. Please let me know where I can download processed hotpot_dev_fullwiki data.

Thanks

What is the problem？

After I ran the provided script file, the result returned was as follows: {'em': 0.2, 'f1': 0.2976111111111111, 'prec': 0.30498268398268397, 'recall': 0.3068333333333333, 'sp_em': 0.02, 'sp_f1': 0.10866666666666668, 'sp_prec': 0.11333333333333334, 'sp_recall': 0.1075, 'joint_em': 0.01, 'joint_f1': 0.057762626262626265, 'joint_prec': 0.0596111111111111, 'joint_recall': 0.06208333333333333}

A problem about total tranining steps of reader

Hi, very appreciate to your contribution.

I have a question about the training steps setting here:

https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/reader/run_reader_confidence.py#L209-L210

num_train_optimization_steps = int(
            len(train_examples) / args.train_batch_size / args.gradient_accumulation_steps) * args.num_train_epochs

I think the len(train_examples) should be replaced with len(train_features) since the total length of dataset is the length of all processed features, whose amount is much more than that of the initial examples.

Why are some document titles missing?

hotpot model zip file corrupted?

Hello, I haven't been able to unzip the hotpot model zip, and ive tried a few different methods. Seems its corrupt in some way? Has anybody else had a problem unzipping?
The squad models dont have a problem

sqlite3.OperationalError: unable to open database file

Hi, when I run quick_start_hotpot.sh it raise error: sqlite3.OperationalError: unable to open database file. But I alreay download all the code, dataset and model.

`database is locked` while evaluation

Hi, I am trying to run the eval_main.py for the nq data by this

python eval_main.py \
--eval_file_path nq.jsonl \
--graph_retriever_path models/nq_models/graph_retriever/pytorch_model.bin \
--reader_path models/nq_models/reader \
--tfidf_path models/nq_models/tfidf_retriever/wiki_20181220_nq_hyper_linked-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz \
--db_path models/nq_models/wiki_db/wiki_20181220_nq_hyper_linked.db \
--bert_model_sequential_sentence_selector bert-base-uncased --do_lower_case --tfidf_limit 20 --eval_batch_size 4 --pruning_by_links \
--beam_graph_retriever 4 --max_para_num 100

And I got this error:

  File "eval_main.py", line 57, in <module>
    main()
  File "eval_main.py", line 24, in main
    tfidf_retrieval_output, selector_output, reader_output = odqa.eval()
  File "/home/bill/learning_to_retrieve_reasoning_paths/eval_odqa.py", line 303, in eval
    tfidf_retrieval_output = self.retrieve(eval_questions)
  File "/home/bill/learning_to_retrieve_reasoning_paths/eval_odqa.py", line 237, in retrieve
    eval_q["id"], eval_q["question"], self.args)
  File "/home/bill/learning_to_retrieve_reasoning_paths/pipeline/tfidf_retriever.py", line 126, in get_abstract_tfidf
    context = self.load_abstract_para_text(doc_names)
  File "/home/bill/learning_to_retrieve_reasoning_paths/pipeline/tfidf_retriever.py", line 45, in load_abstract_para_text
    para_title_text_pairs = load_para_collections_from_tfidf_id_intro_only(doc_name, self.db)
  File "/home/bill/learning_to_retrieve_reasoning_paths/retriever/utils.py", line 213, in load_para_collections_from_tfidf_id_intro_only
    if db.get_doc_text(tfidf_id) is None:
  File "/home/bill/learning_to_retrieve_reasoning_paths/retriever/doc_db.py", line 42, in get_doc_text
    (doc_id,)
sqlite3.OperationalError: database is locked
Question:   0%|

Thanks!

Training data construction for reader verifier

Hello and thanks again!!!
I'm trying to reproduce the reader(SQUAD 2.0 alike) part. If I'm not wrong, the reader is also a path re-ranker to help get the best path that contains the answers and supporting sentences. About this I have 2 questions: (1)How are the negative paths(is impossible=True) constructed? by TF-IDF or the upstream retriever? (2) What if the negative paths contain part of the supporting sentences, or even the answer(eg. for comparison question)? also make is_impossible==True?

Minor fix in demo.py

Hi,

Thank you for your amazing work. While running demo.py, I encountered a simple bug.
Line 44:

tfidf_retrieval_output += self.tfidf_retriever.get_abstract_tfidf('DEMO_{}'.format(i), question, self.args.tfidf_limit)

should be

tfidf_retrieval_output += self.tfidf_retriever.get_abstract_tfidf('DEMO_{}'.format(i), question, self.args)

as get_abstract_tfidf(...) expects args as the last argument.

akariasai / learning_to_retrieve_reasoning_paths Goto Github PK

learning_to_retrieve_reasoning_paths's Introduction

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

Quick Links

0. Quick Run on HotpotQA

1. Installation

Requirements

Set up

Downloading trained models

Downloading data

for training

for evaluation

2. Train

3. Evaluation

HotpotQA

SQuAD Open

Natural Questions

(optional) Using TagMe for initial retrieval

4. Interactive demo

5. Others

Distant supervision & negative examples data generation

Dataset format

Format A (eval data, the input of TF-IDF retriever)

Format B (TF-IDF retriever output)

Format C (Graph-based retriever output)

Citation and Contact

learning_to_retrieve_reasoning_paths's People

Contributors

Stargazers

Watchers

Forkers

learning_to_retrieve_reasoning_paths's Issues

Thanks for your great work! When I run the run_graph_retriever.py ( in the graph_retriever folder) to train the graph-based recurrent retriever model in the train dataset ( the files in the hotpotqa_new_selector_train_data_db_2017_10_12_fix.zip ) of HotpotQA, there is an error like it.

Recommend Projects

Recommend Topics

Recommend Org

Thanks for your great work!
When I run the run_graph_retriever.py ( in the graph_retriever folder) to train the graph-based recurrent retriever model in the train dataset ( the files in the hotpotqa_new_selector_train_data_db_2017_10_12_fix.zip ) of HotpotQA, there is an error like it.