Code Monkey home page Code Monkey logo

keyworder's Introduction

Description

This program is intended to extract the minimum number of words needed to understand the text. Duplicate words, similar words, short words, and proper nouns are removed. The resulting list forms the foundational words of the text.

Usage

> go build main.go textUtils.go jaro.go
     > .\main.go input.txt

Issues

  • Word similarity algorithm sometimes deletes words with different meanings
  • Suggested proper nouns sometimes picks words that are at the start of a sentence

Example (167 words):

It was on the corner of the street that he noticed the first sign of something peculiar -- a cat reading a map. For a second, Mr. Dursley didn't realize what he had seen -- then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn't a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat. It stared back. As Mr. Dursley drove around the corner and up the road, he watched the cat in his mirror. It was now reading the sign that said Privet Drive -- no, looking at the sign; cats couldn't read maps or signs. Mr. Dursley gave himself a little shake and put the cat out of his mind. As he drove toward town he thought of nothing except a large order of drills he was hoping to get that day.

Output (56 words):

noticed except first reading then standing have said maps street realize drove toward nothing back second sight town large tabby shake around look again wasn't himself thinking watched order corner jerked could that something cats been light mind drills blinked looking signs peculiar seen must gave hoping didn trick road little thought head stared mirror read

Example of extracted similar words

packages   package
shape      shapes
shape      shaped
boil       boils
leapt      leap
objects    object
punishment punishments
stacked    stack
toilets    toilet
furiously  furious
tight      tighter
tight      tightly
drone      droned
brave      bravery
brave      braver
brave      bravely
crush      crushed
doughnuts  doughnut

The program also removes duplicate words, short words and proper nouns like Dursley, Grunnings, Privet Drive.

keyworder's People

Contributors

odhranroche avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.