webquery-dataset's Introduction

WebQuery

This dataset is released as part of the paper "Automated Query Generation for Evidence Collection from Web Search Engines". It's geared towards the development of systems for automated evidence collection from the web, with the purpose of assisting fact-checking.

The WebQuery dataset is composed of 390 claims extracted from news articles. Each claim is manually annotated with a search query used to find verifying evidence in the web, as well as a target URL.

Data Format

Each sample contains 6 fields:

claim_id: The ID of the claim.
claim_text: Text of the claim
target_search_query: Human-generated search query
target_url: URL containing the evidence necessary to verify the claim
target_url_clean: The target URL, stripped of protocol and query parameters
evidence_snippet: The evidence sentence extracted from the target URL

Most of the experiments in the related paper were performed using 4-fold cross-validation. Thus, we publish the data splits for each of the folds in the k0-k3 directories. Each directory contains a train, dev and test set. We also share the full dataset in a single file in all_data.tsv.

Please, cite the following paper if you use the dataset or the models in this repository:

coming soon

Recommend Projects

nestorp / webquery-dataset Goto Github PK

webquery-dataset's Introduction

WebQuery

Data Format

webquery-dataset's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent