Code Monkey home page Code Monkey logo

claimreview-data's Introduction

claimreview-data

CC BY-NC-SA 4.0

CC BY-NC-SA 4.0

This repository contains a dataset of claims and their corresponding fact-checks.

The data is automatically updated every day with ClaimReview.

You can download the latest version from the release section, or browse all the versions.

Collection

The data collection is performed in 6 steps:

  1. Collection of ClaimReviews URLs Candidates: using DataCommons and Google Fact-Check API, we get all the URLs where ClaimReviews are published
  2. Collection of ClaimReviews from fact-checkers: we recollect from the websites of fact-checkers the ClaimReviews
  3. Validation and Cleaning: we fix and clean the metadata
  4. Ratings Mapping: we normalise the labels to credible, mostly credible, uncertain, mostly non-credible, non-credible and unknown
  5. Occurrences Extraction and Unshortening: we extract the URLs where the claims occur and we resolve the links that use shortening services (e.g., bit.ly) or archives (e.g., archive.is)
  6. Misinformation Database and Snapshot: we build the output files described below

The process of collection is run by the claimreview_collector_full from the MisinfoMe project.

Output files

Each archive contains the following files:

  • ifcn_sources.json: details of the fact-checkers present in the dataset, such as website, country and language. The details also include the details of the IFCN compliance: date of issue, expiration, adherence to each of the skills (e.g. transparency of sourcing, transparency of methodology).
  • claim_reviews_raw.json: this file contains the ClaimReviews collected in the first step from DataCommons and Google Fact-Check Tools. As noted before, this is bigger than the final cleaned dataset (recollection issues) but contains uncleaned data (appearance and firstAppearance fields).
  • claim_reviews_recollected.json: the recollected dataset.
  • claim_reviews.json: the final cleaned dataset with ratings mapping and unshortening.
  • claim_labels_mapping.json: statistics on how labels have been translated.
  • disagreeing_reviews.json: cases where the same URL has received multiple disagreeing ratings.
  • not_ifcn_sources.json: this file contains a list of domains that published ClaimReview but are not in the IFCN list.
  • links_all_full.json: details of the reviewed URLs. For each URL that has been reviewed, we show the reviews and the normalised ratings.
  • stats.json: statistics of data collection.

This data is currently used by MisinfoMe.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

claimreview-data's People

Contributors

ehrhart avatar martinomensio avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

claimreview-data's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.