SQuAD 1.0 & 2.0 |
https://rajpurkar.github.io/SQuAD-explorer/ |
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. |
Who-Did-What |
https://tticnlp.github.io/who_did_what/leaderBoard.html |
We have constructed a new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. |
MS MARCO |
http://www.msmarco.org/ |
Microsoft MAchine Reading COmprehension Dataset |
RACE |
http://www.qizhexie.com/data/RACE_leaderboard |
The RACE dataset is a large-scale ReAding Comprehension dataset collected from English Examinations that are created for middle school and high school students. |
Movie QA |
http://movieqa.cs.toronto.edu/leaderboard/ |
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text.Each question comes with a set of five highly plausible answers; only one of which is correct. The questions can be answered using multiple sources of information: movie clips, plots, subtitles, and for a subset scripts and DVS. |
Hotpot QA |
https://hotpotqa.github.io/ |
A Dataset for Diverse, Explainable Multi-hop Question Answering.HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems. |
CoQA |
https://stanfordnlp.github.io/coqa/ |
A Conversational Question Answering Challenge.HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems. |
DREAM |
https://dataset.org/dream/ |
A Challenge Dataset and Models for Dialogue-Based Reading Comprehension.DREAM is a multiple-choice Dialogue-based REAding comprehension exaMination dataset. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. |
QuAC |
http://quac.ai/ |
Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context. |
ReCoRD |
https://sheng-z.github.io/ReCoRD-explorer/ |
Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD) is a large-scale reading comprehension dataset which requires commonsense reasoning. ReCoRD consists of queries automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of ReCoRD is to evaluate a machine's ability of commonsense reasoning in reading comprehension. ReCoRD is pronounced as [ˈrɛkərd]. |
Cosmos QA |
https://wilburone.github.io/cosmos/ |
Cosmos QA is a large-scale dataset of 35.6K problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. It focuses on reading between the lines over a diverse collection of people's everyday narratives, asking questions concerning on the likely causes or effects of events that require reasoning beyond the exact text spans in the context. |