Code Monkey home page Code Monkey logo

ts-textrank's Introduction

Tests

ts-textrank

ts-textrank is a Typescript implementation of the TextRank algorithm.

Install

Using npm:

$ npm install ts-textrank

Using yarn:

$ yarn add ts-textrank

Usage

  • Create a config object
  • Create a summarizer with your config
  • Call summarizer.summarize to extract most relevant senteces from an input text
import { SorensenDiceSimilarity, DefaultTextParser, ConsoleLogger, RelativeSummarizerConfig, Summarizer, NullLogger, Sentence } from "ts-textrank";

//Only one similarity function implemented at this moment.
//More could come in future versions.
const sim = new SorensenDiceSimilarity()

//Only one text parser available a this moment
const parser = new DefaultTextParser()

//Do you want logging?
const logger = new ConsoleLogger()

//You can implement LoggerInterface for different behavior,
//or if you don't want logging, use this:
//const logger = new NullLogger()

//Set the summary length as a percentage of full text length
const ratio = .25 

//Damping factor. See "How it works" for more info.
const d = .85

//How do you want summary sentences to be sorted?
//Get sentences in the order that they appear in text:
const sorting = SORT_BY.OCCURRENCE
//Or sort them by relevance:
//const sorting = SORT_BY.SCORE
const config = new RelativeSummarizerConfig(ratio, sim, parser, d, sorting)

//Or, if you want a fixed number of sentences:
//const number = 5
//const config = new AbsoluteSummarizerConfig(number, sim, parser, d, sorting)    

const summarizer = new Summarizer(config, logger)

//Language is used for stopword removal.
//See https://github.com/fergiemcdowall/stopword for supported languages
const lang = "en"

const text = "...Text to summarize..."
//summary will be an array of sentences summarizing text
const summary = summarizer.summarize(text, lang)

How it works

TextRank algorithm was introduced by Rada Mihalcea and Paul Tarau in their paper "TextRank: Bringing Order into Texts" in 2004. It applies the same principle that Google's PageRank used to discover relevant web pages.

The idea is to split a text into sentences, and then calculate a score for each sentence in terms of its similarity to the other sentences. TextRank treats sentences having common words as a link between them (like hyperlinks between web pages). Then, it applies a weight to that link based on how many words the sentences have in common. ts-textrank uses Sorensen-Dice Similarity for this.

The sentences with the higher score will be those that share the most words with the rest and can be used as a summary of the whole text.

Damping factor

Original PageRank algorithm included a damping factor to represent the probability of a user clicking random links on a page. In this context, the authors have kept it and fixed it to a value of .85, but it can be modified if needed for better results in specific cases.

ts-textrank's People

Contributors

meszaroszoltan avatar nachobrito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ts-textrank's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.