Code Monkey home page Code Monkey logo

gdeltpyr's Introduction

Linux and Mac OS Windows OS Module Version
Build Status Build status PyPI version

gdeltPyR

gdeltPyR is a Python-based framework to access and analyze Global Database of Events, Language, and Tone (GDELT) 1.0 or 2.0 data for analysis in Python Pandas or R dataframes. A user can enter a date, date range (two strings), or individual dates and return a tidy data set ready for scientific or data-driven exploration.

gdeltPyR retrieves Global Database of Events, Language, and Tone (GDELT) data (version 1.0 or version 2.0) via parallel HTTP GET requests and is an alternative to accessing GDELT data via Google BigQuery . Therefore, the more cores you have, the less time it takes to pull more data. Moreover, the more RAM you have, the more data you can pull. And finally, for RAM-limited workflows, create a pipeline that pulls data, writes to disk, and flushes.

The GDELT Project advertises as the largest, most comprehensive, and highest resolution open database of human society ever created. It monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and accesses the world’s breaking events and reaction in near-realtime as both the GDELT Event and Global Knowledge Graph update every 15 minutes. Visit the GDELT website to learn more about the project.

New Features (0.1.10)

  1. Added geodataframe output; can be easily converted into a shapefile or choropleth visualization.
  2. Added continuous integration testing for Windows, OSX, and Linux (Ubuntu)
  3. Normalized columns output; export data with SQL ready columns (no special characters, all lowercase)
  4. Choosing between the native-english or translated-to-english datasets from GDELT v2.
import gdelt

gd= gdelt.gdelt(versin=2)

events = gd.Search(['2017 May 23'],table='events',output='gpd',normcols=True,coverage=False)

Coming Soon (version 0.1.11, as of 29 May 2017)

Installation

gdeltPyR can be installed via pip

pip install gdelt

Basic Examples

GDELT 1.0 Queries

import gdelt

# Version 1 queries
gd1 = gdelt.gdelt(version=1)

# pull single day, gkg table
results= gd1.Search('2016 Nov 01',table='gkg')
print(len(results))

# pull events table, range, output to json format
results = gd1.Search(['2016 Oct 31','2016 Nov 2'],coverage=True,table='events')
print(len(results))

GDELT 2.0 Queries

# Version 2 queries
gd2 = gdelt.gdelt(version=2)

# Single 15 minute interval pull, output to json format with mentions table
results = gd2.Search('2016 Nov 1',table='mentions',output='json')
print(len(results))

# Full day pull, output to pandas dataframe, events table
results = gd2.Search(['2016 11 01'],table='events',coverage=True)
print(len(results))

Output Options

gdeltPyR can output results directly into several formats which include:

  • pandas dataframe
  • csv
  • json
  • geopandas dataframe (as of version 0.1.10)
  • GeoJSON (coming soon version 0.1.11)
  • Shapefile (coming soon version 0.1.11)

Performance on 4 core, MacOS Sierra 10.12 with 16GB of RAM:

  • 900,000 by 61 (rows x columns) pandas dataframe returned in 36 seconds
    • data is a merged pandas dataframe of GDELT 2.0 events database data

gdeltPyR Parameters

gdeltPyR provides access to 1.0 and 2.0 data. Four basic parameters guide the query syntax:

Name Description Input Possibilities/Examples
version (integer) - Selects the version of GDELT data to query; defaults to version 2. 1 or 2
date (string or list of strings) - Dates to query "2016 10 23" or "2016 Oct 23"
coverage (bool) - For GDELT 2.0, pulls every 15 minute interval in the dates passed in the 'date' parameter. Default coverage is False or None. gdeltPyR will pull the latest 15 minute interval for the current day or the last 15 minute interval for a historic day. True or False or None
translation (bool) - For GDELT 2.0, if the english or translated-to-english dataset should be downloaded True or False
tables (string) - The specific GDELT table to pull. The default table is the 'events' table. See the GDELT documentation page for more information 'events' or 'mentions' or 'gkg'
output (string) - The output type for the results 'json' or 'csv' or 'gpd'
These parameter values can be mixed and matched to return the data you want. the coverage parameter is used with GDELT version 2; when set to "True", the gdeltPyR will query all available 15 minute intervals for the dates passed. For the current day, the query will return the most recent 15 minute interval.

Facts

  • GDELT 1.0 is a daily dataset
    • 1.0 only has 'events' and 'gkg' tables
    • 1.0 posts the previous day's data at 6AM EST of next day (i.e. Monday's data will be available 6AM Tuesday EST)
  • GDELT 2.0 is updated every 15 minutes
    • 2.0 has 'events','gkg', and 'mentions' tables
    • 2.0 has a distinction between native english and translated-to-english news
    • 2.0 has more columns

Known Issues

  • None

Coming Soon

gdeltpyr's People

Contributors

linwoodc3 avatar pietermarsman avatar

Watchers

James Cloos avatar Adam Ezekiel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.