Light

martinomensio / claimreview-data Goto Github PK

View Code? Open in Web Editor NEW

2.0 6.0 0.0 9 KB

License: Other

claimreview-data's Introduction

claimreview-data

This repository contains a dataset of claims and their corresponding fact-checks.

The data is automatically updated every day with ClaimReview.

You can download the latest version from the release section, or browse all the versions.

Collection

The data collection is performed in 6 steps:

Collection of ClaimReviews URLs Candidates: using DataCommons and Google Fact-Check API, we get all the URLs where ClaimReviews are published
Collection of ClaimReviews from fact-checkers: we recollect from the websites of fact-checkers the ClaimReviews
Validation and Cleaning: we fix and clean the metadata
Ratings Mapping: we normalise the labels to credible, mostly credible, uncertain, mostly non-credible, non-credible and unknown
Occurrences Extraction and Unshortening: we extract the URLs where the claims occur and we resolve the links that use shortening services (e.g., bit.ly) or archives (e.g., archive.is)
Misinformation Database and Snapshot: we build the output files described below

The process of collection is run by the claimreview_collector_full from the MisinfoMe project.

Output files

Each archive contains the following files:

ifcn_sources.json: details of the fact-checkers present in the dataset, such as website, country and language. The details also include the details of the IFCN compliance: date of issue, expiration, adherence to each of the skills (e.g. transparency of sourcing, transparency of methodology).
claim_reviews_raw.json: this file contains the ClaimReviews collected in the first step from DataCommons and Google Fact-Check Tools. As noted before, this is bigger than the final cleaned dataset (recollection issues) but contains uncleaned data (appearance and firstAppearance fields).
claim_reviews_recollected.json: the recollected dataset.
claim_reviews.json: the final cleaned dataset with ratings mapping and unshortening.
claim_labels_mapping.json: statistics on how labels have been translated.
disagreeing_reviews.json: cases where the same URL has received multiple disagreeing ratings.
not_ifcn_sources.json: this file contains a list of domains that published ClaimReview but are not in the IFCN list.
links_all_full.json: details of the reviewed URLs. For each URL that has been reviewed, we show the reviews and the normalised ratings.
stats.json: statistics of data collection.

This data is currently used by MisinfoMe.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

claimreview-data's People

Contributors

Stargazers

Watchers

claimreview-data's Issues

No update since 2023_07_05

TODO: check logs and see what's wrong

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.