Code Monkey home page Code Monkey logo

niemanlab-predictions's Introduction

#Nieman Lab Predictions#

###Intro###

This is a cheeky little experiment in making predictions for the future using a corpus of predictions from last year and a Markov chain generator. I did this to write a post for Nieman Lab's 2015 predictions -- an unedited version is included in the repository -- but why should I have all the fun?

###Instructions###

Making your own predictions is pretty easy. In the repository is the corpus and a pre-parsed Sqlite database of the corpus to get you from zero to predictions in no time.

If you're on a Mac or Linux machine, you're good to go. If you're on Windows, you'll need to make sure you have Python installed (it does not come standard).

If you have Python, and you want to use the pre-made database to generate your predictions, simply clone the repository, open up a terminal, navigate into the directory where the code is and run this:

python markov.py gen predictions 10

The relevant parts are:

  • gen which generates the Markov chains
  • predictions which is the database that has the data needed to generate them
  • 10 which is the number of Markov chains to generate. You can change that 10 to 1 or 100 or 23 or whatever arbitrary number you want.

If you don't like what you're getting, or you want to try your hand at generating your own, you'll need to create a new database first. You do that with this command:

python markov.py parse newdatabasename 3 predictions.txt

The relevant parts here are:

  • parse which parses the corpus into the required chunks
  • newdatabasename which can be whatever you want it to be, but it will be your database name that you'll use in the gen command,
  • 3 which is the depth that the Markov chain generator will use to predict the next word. In practice, I've found 3 to be the Goldilocks zone of depth. Depth 2 is unintelligibly crazy, and 4 is just copies of existing sentences. Experiment as you wish and see if you agree.
  • predictions.txt, which is the name of the text corpus used to generate all this.

After your parse command runs -- it'll take a little bit because there's 38,000 words in the corpus -- you'll need to run the gen command above substituting your new database name.

niemanlab-predictions's People

Contributors

mattwaite avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.