Code Monkey home page Code Monkey logo

autocast's Introduction

Forecasting Future World Events with Neural Networks

This is the repository for "Forecasting Future World Events with Neural Networks"
by Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, and Dan Hendrycks.

Introduction

Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus. Questions are taken from forecasting tournaments, ensuring high quality, real-world importance, and diversity. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts (avoiding leakage from the future). We test language models on our forecasting task and find that performance is far below a human expert baseline. However, performance improves with increased model size and incorporation of relevant information from the news corpus. In sum, Autocast poses a novel challenge for large language models and improved performance could bring large practical benefits.

Autocast Dataset

The latest version of the Autocast dataset can be downloaded here. For more details on how to use the Autocast dataset and news articles, please refer to our short demonstration in usage.ipynb.

Each question has the following fields:

{
  "id":                "unique identifier (str)",
  "question":          "question body (str)",
  "background":        "question context/details (str)",
  "qtype":             "question type (str)",
  "status":            "question status (str)",
  "choices":           "choices or possible ranges (List or Dict)",
  "answer":            "question resolution (str or float)",
  "crowd":             "human crowd forecasts over time (List)",
  "publish_time":      "publish timestamp (str)",
  "close_time":        "close timestamp (str)",
  "prediction_count":  "number of crowd predictions (int)",
  "forecaster_count":  "number of crowd forecasters (int)",
  "tags":              "question category (List)",
  "source_links":      "source links from comments (List)"
}

We obtained permission from Metaculus to host the dataset on GitHub for research purposes only.

IntervalQA Dataset

Motivated by the difficulty of forecasting numbers across orders of magnitude (e.g. global cases of COVID-19 in 2022), we also curate IntervalQA, a dataset of numerical questions and metrics for calibration.

Download the IntervalQA dataset here.

Citation

If you find this useful in your research, please consider citing:

@article{zouforecasting2022,
  title={Forecasting Future World Events with Neural Networks},
  author={Andy Zou and Tristan Xiao and Ryan Jia and Joe Kwon and Mantas Mazeika and Richard Li and Dawn Song and Jacob Steinhardt and Owain Evans and Dan Hendrycks},
  journal={NeurIPS},
  year={2022}
}

autocast's People

Contributors

andyzoujm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.