Code Monkey home page Code Monkey logo

smmry's Introduction

Smmry Impl

A final project for one of my seminars. I saw this algorithm (or similar) used on Reddit with the TL;DR bots. I wanted to implement it, I do not claim any rights to the algorithm. It is clearly listed here on the Smmry.com website.

Implementation

The algorithm is slightly more technical when actually put into code, so here is how I implemented it.

  1. Load the text into an in memory buffer and do some initial cleanup.
  2. Chop the entire thing into senteces making sure that each period does not come after a abbreviation.
  3. Load the sentences into a linked list with their lengths and positions in text.
  4. Go through the sentences strip out plurals replacing them with their singular definitions.
  • If a word is not in the most common irregular plurals and doesn't end in s then move on.
  • Otherwise, use the common plural rules.
  1. Go through the sentences and replace words with their synonyms.
  2. Create a hashmap and go through each sentence, hash every word and store the number of occurences.
  3. For each sentence in our linked list, assign it a score that is the sum of the frequencies for all its words.
  4. Return the top number of specified sentences.

How to run

cd smmry_impl
cmake .
make
./summ <file_to_summarize> <number_of_lines>

Example

For an example we are going to summarize this Wikipedia article (the abstract section at the top).

  1. First copy the abstract into a file using your favorite text editor. I am going to copy it into the file example.txt.
  2. Run the summarize algorithm:
./summ example.txt 3
  1. The summary will be printed to the console and will be the 3 most important sentences.

Predators of the wood stork include raccoons (which predate on chicks), northern crested caracaras, which prey on eggs, and other birds of prey, which feed on eggs and chicks. During the breeding season, which is initiated when the water levels decline and can occur anytime between November and August, a single clutch of three to five eggs is laid. They fledge 60 to 65 days after hatching, although only about 31% of nests fledge a chick in any given year, with most chicks dying during their first two weeks, despite being watched by an adult during that time.

Notice

Again, this is for an educational project and this is not my algorithm. I simply implemented it and keep the reference around because it is an elegant algo. "# smmry"

smmry's People

Contributors

thistleknot avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.