Code Monkey home page Code Monkey logo

bookworm's Introduction

Bookworm ๐Ÿ“š

Most novels are, in some way, a description of a social network. Bookworm ingests novels, builds a solid version of their implicit character network and spits out a intuitively understandable and deeply analysable graph.

Navigation

  • bookworm for the code itself.
  • Notebooks including example usage (with a load of interwoven description of how the thing actually works), in jupyter notebook form. Start Here
  • data for a description of how to get hold of data so that you can run bookworm yourself.

Usage

Command Line Usage

The bookworm('path/to/book.txt') function wraps the following steps into one simple command, allowing the entire analysis process to be run easily from the command line

python run_bookworm.py --path 'path/to/book.txt'
  • Add --d3 to format the output for interpretation by the d3.js force directed graph
  • Add --threshold n where n is an integer to specify the minimum character interaction strength to be included in the output (default 2)
  • Add --output_file 'path/to/file' to specify where the .json or .csv should be left

Detailed API Usage

Start by loading in a book

book = load_book('path/to/book.txt')

Split the book into individual sentences, sequences of n words, or sequences of n characters by respectively running

sequences = get_sentence_sequences(book)
sequences = get_word_sequences(book, n=50)
sequences = get_character_sequences(book, n=200)

Manually input a list of character names or automatically extract a list of 'plausible' character names by respectively using

characters = load_characters('path/to/character_list.csv')
characters = extract_character_names(book)

Find instances of each character in each sequence with find_connections(), enumerate their cooccurences with calculate_cooccurence(), and transform that into a more easily interpretable format using get_interaction_df()

df = find_connections(sequences, characters)
cooccurence = calculate_cooccurence(df)
interaction_df = get_interaction_df(cooccurence, characters)

The resulting dataframe can be easily transform into a networkx graph using

nx.from_pandas_dataframe(interaction_df,
                         source='source',
                         target='target')

From there, all sorts of interesting analysis can be done. See the project's associated jupyter notebooks and the networkx documentation for more details.

Slides

I presented a bunch of this stuff at

bookworm's People

Contributors

harrisonpim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.