Code Monkey home page Code Monkey logo

wordswarm's Introduction

WordSwarm

Create an animated word cloud that helps visualize the variation of word frequency in a time-series of 'articles'. [Check out some example WordSwarms on YouTube](https://www.youtube.com/playlist?list=PLW9BhSPL6nfRFieaT0mqaAiZhA1ZFRzWX)

Directory Structure

./1-Preprocessing/

Programs in here are used from scraping data sources , calculating word frequency, and outputting a CSV file for further use.

  • Scraper_\*.py Article source specific file for scraping the data and creating a binary file of the article text and dates for use by Processor.py.

  • Processor.py Reads a binary file of article text and dates and outputs a CSV file of word frequency or count in each date bin. Expects a file in the current directory named article_date.out in the binary format proved by the example scraper.

  • WordBlaclist.py A list of common words not to include in the output CSV file.

  • text2ngram.exe The windows command line executable used to calculate the nGrams for a block of texts (i.e. all article text within a certain date range)

./2-WordSwarm/

Contains the program which actually generates the animated WordSwarm

  • wordSwarm.py This is the actual program used to display the wordSwarm animation. Run python wordSwarm -h for usage instructions.

  • /framework/ This contains all of the framework files borrowed from PyBox2D and PyGame. Not much should need to be changed here. (The screen resolution can be changed here if need, but beware of scaling issues)

./3-PostProcessing/

The programs in here are used to convert the frames saved by WordSwarm using the 'output frame' argument into a video file.

  • frames2video.bat Converts the individual frames created by wordSwarm.py into an H.264 video file. Assumes the default save argument wordSwarm.py -s was used. Uses the executable Windows binary of the open source program ffmpeg included, but also available from http://www.ffmpeg.org/

Dependencies

Currently only tested and developed for Windows. The Beautiful Soup, Pyglet, PyGame, and PyBox2D modules can be installed using the WinPython Control Panel following these instructions

Usage

The following example describes how to create a WordSwarm using the AAAS Science Magazine as an example corpus

  1. Install WinPython and all the dependencies listed above
  2. Open an instance of WinPython Command Prompt.exe a. Change the directory > cd <WordSwarm Directory>/1-Preprocessing/
  3. Run a scraper to parse source data into a format for later nGram calculations with the following command: > python Scraper_Science.py. A. This will create a binary (pickle) file of all the articles and timestamps: 1-Preprocessing\article_data.out
  4. Run the processor > python Processor.py which will take the previously created binary file, copy text into bins for each date range, run the nGram calculation in each date range, then return a CSV file to 2-WordSwarm\nGramOut<Date_Ttime>.csv. You can modify the beginning of Processor.py with the following options:
    • win = 90 This sets the width of each date-range to 90 days
    • lap = 30 This sets each date-range to increment by 30 days
    • topN = 500 The final CSV file will have the top 500 results
    • minFreq = 2 A word must show up at least twice in a date-range in order to be counted as a frequent word
    • absCount = False Means that word frequency is calculated relative to the number of words in each date-range as opposed to the total number of counts.
  5. Change the directory > cd <WordSwarm Directory>/2-WordSwarm/
  6. Run WordSwarm > python wordSwarm.py -i nGramOut00000000000000.csv -s -m 15 -t "My First WordSwarm!" in which the arguments mean:
    • -i nGramOut00000000000000.csv specifies which CSV file to create the WordSwarm for.
    • -s specifies that you want each frame to be saved as a PNG file for later processing into a movie
    • -m 10 is a relative term that determines how large the largest word displayed on the screen will be.
    • -t "My First WordSwarm is the title that will be displayed in the top left corner of the WordSwarm
    • See wordSwarm.py for more commands to change word color, start date, and end date.
  7. Change the directory > cd <WordSwarm Directory>/3-Postprocessing/
  8. Convert the frames into a movie. frames2video.bat
    • This will create an output video file wordSwarmOut.mp4

Congratulations on creating your first WordSwarm!

wordswarm's People

Contributors

thisismikekane avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.