Code Monkey home page Code Monkey logo

2018-12-fake-news-top-50's Introduction

Analysis of fake news sites and viral posts, 2016 - 2018

This repository contains data and analysis supporting the BuzzFeed News article, In Spite Of Its Efforts, Facebook Is Still The Home Of Hugely Viral Fake News published December 28, 2018. Please read that article, which contains important context and methodological details, before proceding.

Data

Over the past four years, BuzzFeed News has maintained lists of sites that publish completely fabricated stories. As we encounter new ones and debunk their content, we add them to the list. To produce this story, we used this list as well as some sites brought to our attention by fact-checking website, Lead Stories.

The data/sites_2016.csv, data/sites_2017.csv, and data/sites_2018.csv files contain lists of fake-news domains we included in our analyses for each of those years.

The data/top_2018.csv file contains information about the top fake news articles of 2018 (by Facebook engagement) published by our 2018 list of fake news sites. It contains the folowwing columns:

  • title: The title of the article.
  • url: The URL of the article.
  • fb_engagement: The number of Facebook engagements the article received. (See note below.)
  • published_date: The date the article was published.
  • category: The article's main theme, as categorized by BuzzFeed News. Category may be Crime, Politics, Medical, Music, Sports, Business, or uncategorized.
  • source: The data source, if other than BuzzSumo. (See note below.)

Some of these data were used in BuzzFeed News' 2017 analysis of fake news.

Note on Facebook engagement data

Most of the engagement data comes from BuzzSumo.

Nine stories were not picked up by our BuzzSumo search, but were brought to our attention by our contacts at Trendolizer. In seven of those stories, Trendolizer used the Facebook Graph API to retrieve engagement counts.

For the remaining two stories, the domain had been blocked by Facebook's servers, but only after the stories had gone viral and generated large engagement counts. For these URLs, BuzzSumo and the Facebook API both innacurately returned 0 engagement counts. Trendolizer archived the data before Facebook blocked the domain, and so we used their counts.

Trendolizer retrieves Facebook engagement data by scraping the displayed number on the "like" button of a Facebook post. Because the like button rounds down large numbers for display, these data are are slight undercounts.

Analysis

The analysis is contained in notebooks/analysis.ipynb. The code there, written in Python, compares the 2018 site list to its 2017 and 2016 counterparts. It also looks for trends over time and calculates the most popular domains and categories.

Licensing

All code in this repository is available under the MIT License. The data files in the data/ directory are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Feedback/Questions

Contact Scott Pham at [email protected].

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.

2018-12-fake-news-top-50's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.