Code Monkey home page Code Monkey logo

Comments (5)

swizzard avatar swizzard commented on September 24, 2024 3

from 2017.

moonmilk avatar moonmilk commented on September 24, 2024 2

This part is so beautiful.
image

from 2017.

aparrish avatar aparrish commented on September 24, 2024 1

I'm a day late but I posted the source code and a new version of the output. For the new version, I decided to ignore punctuation tokens when calculating the vectors for each novel. The result has fewer commas and variation that is a bit more interesting IMO!

from 2017.

aparrish avatar aparrish commented on September 24, 2024

Some progress! I present: The Average Novel.

I'm still working with Project Gutenberg files from the April 2010 DVD ISO (downloadable here)
and Leonard Richardson's 47000_metadata.json. Steps:

(1) Fetch every text in PG that was labelled as fiction and then parsed them into sentences and used gensim's Word2Vec module to calculate 100-dimensional word embeddings from the resulting sentences.
(2) Create an array of word embeddings for every text (by looking up each word in the embedding) and normalize the length of these arrays to 50,000 (leaving ~11k arrays of dimensionality (50000,100)).
(3) Sum the arrays for every length-normalized text and divide by the number of texts.
(4) For each vector in the resulting array, find the word with the closest embedding.

You can see the results here.

I guess I secretly hoped that this technique would reveal, average face-like, the Narrative Ur-text underlying all storytelling. But the result is pretty much what I actually expected: all of the structural variation gets lost in the wash. (The Produced and Proofreaders tokens at the top are obviously remnants of PG credits and boilerplate that weren't caught by the filtering tools I'm using; the , token just happens to have been the vector most central to the average, which I guess kinda makes sense given how Word2Vec works. Not sure what all those pachyderms are doing in there though.)

I'm planning to continue experimenting with this technique, but wanted to share this progress in case further experiments extend past the deadline.

from 2017.

aparrish avatar aparrish commented on September 24, 2024

going to post the source code for this soon, stay tuned!

from 2017.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.