Code Monkey home page Code Monkey logo

compromise's Introduction

compromise
modest natural language processing
npm install compromise

compromise tries its best.

Welcome to v12! - Release Notes here ๐Ÿ‘

.match():

compromise makes it simple to interpret and match text:

let doc = nlp(entireNovel)

doc.if('the #Adjective of times').text()
// "it was the blurst of times??"
if (doc.has('^simon says #Verb+')) {
  return doc.match('#Verb .*').text() //'fire the lazer ..'
}

.verbs():

conjugate and negate verbs in any tense:

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

.nouns():

transform nouns to plural and possessive forms:

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

.numbers():

interpret plaintext numbers

nlp.extend(require('compromise-numbers'))

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(2)
doc.text()
// 'ninety five thousand and fifty four'

.topics():

grab subjects in a text:

let doc = nlp(buddyHolly)
doc
  .people()
  .if('mary')
  .json()
// [{text:'Mary Tyler Moore'}]

let doc = nlp(freshPrince)
doc
  .places()
  .first()
  .text()
// 'West Phillidelphia'

doc = nlp('the opera about richard nixon visiting china')
doc.topics().json()
// [
//   { text: 'richard nixon' },
//   { text: 'china' }
// ]

.contractions():

work with contracted and implicit words:

let doc = nlp("we're not gonna take it, no we ain't gonna take it.")

// match an implicit term
doc.has('going') // true

// transform
doc.contractions().expand()
dox.text()
// 'we are not going to take it, no we are not going to take it.'

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script src="https://unpkg.com/compromise-numbers"></script>
<script>
  nlp.extend(compromiseNumbers)

  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or as an es-module:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is 170kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating many forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here.

.extend():

set a custom interpretation of your own words:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make more changes with a compromise-plugin.

const nlp = require('compromise')

nlp.extend((Doc, world) => {
  // add new tags
  world.addTags({
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  })

  // add or change words in the lexicon
  world.addWords({
    kermit: 'Character',
    gonzo: 'Character',
  })

  // add methods to run after the tagger
  world.postProcess(doc => {
    doc.match('light the lights').tag('#Verb . #Plural')
  })

  // add a whole new method
  Doc.prototype.kermitVoice = function() {
    this.sentences().prepend('well,')
    this.match('i [(am|was)]').prepend('um,')
    return this
  }
})

API:

Constructor

(these methods are on the nlp object)

  • .tokenize() - parse text without running POS-tagging
  • .extend() - mix in a compromise-plugin
  • .load() - re-generate a Doc object from .export() results
  • .verbose() - log our decision-making for debugging
  • .version() - current semver version of the library
Utils
  • .all() - return the whole original document ('zoom out')
  • .found [getter] - is this document empty?
  • .parent() - return the previous result
  • .parents() - return all of the previous results
  • .tagger() - (re-)run the part-of-speech tagger on this document
  • .wordCount() - count the # of terms in the document
  • .length [getter] - count the # of characters in the document (string length)
  • .clone() - deep-copy the document, so that no references remain
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
Accessors
Match

(all match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .lookBehind('') - search through earlier terms, in the sentence
  • .lookAhead('') - search through following terms, in the sentence
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .lookup([]) - quick find for an array of string matches
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
Output
Selections
Subsets

Plugins:

These are some helpful extensions:

Adjectives

npm install compromise-adjectives

Dates

npm install compromise-dates

Numbers

npm install compromise-numbers

Ngrams

npm install compromise-ngrams

Output

npm install compromise-output

  • .hash() - generate an md5 hash from the document+tags
  • .html({}) - generate sanitized html from the document
Paragraphs

npm install compromise-paragraphs this plugin creates a wrapper around the default sentence objects.

Sentences

npm install compromise-sentences

Syllables

npm install compromise-syllables

  • .syllables() - split each term by its typical pronounciation

Docs:

Tutorials:
3rd party:
Talks:
Some fun Applications:

Limitations:

  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

    โ˜‚๏ธ Isn't javascript too...

      yeah it is!
      it wasn't built to compete with NLTK, and may not fit every project.
      string processing is synchronous too, and parallelizing node processes is weird.
      See here for information about speed & performance, and here for project motivations

    ๐Ÿ’ƒ Can it run on my arduino-watch?

      Only if it's water-proof!
      Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.

    ๐ŸŒŽ Compromise in other Languages?

      we've got work-in-progress forks for German and French, in the same philosophy.
      and need some help.

    โœจ Partial builds?

      compromise isn't easily tree-shaken.
      the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
      It's recommended to run the library fully.

See Also:

  • ย  naturalNode - fancier statistical nlp in javascript
  • ย  superScript - clever conversation engine in js
  • ย  nodeBox linguistics - conjugation, inflection in javascript
  • ย  reText - very impressive text utilities in javascript
  • ย  jsPos - javascript build of the time-tested Brill-tagger
  • ย  spaCy - speedy, multilingual tagger in C/python

MIT

compromise's People

Contributors

abazhenov avatar adamjuhasz avatar amilajack avatar anastasiia-zolochevska avatar andrewsalvesonpw avatar arjunmenon avatar brycebaril avatar camjc avatar creatorrr avatar davidbuhler avatar ericcarraway avatar fmacpro avatar ilyankou avatar jaredreisinger avatar johnyesberg avatar kahwee avatar khtdr avatar kiran-rao avatar leoseccia avatar lostfictions avatar myarete avatar nloveladyallen avatar papandreou avatar rek avatar scagood avatar shamoons avatar silentrob avatar soyjavi avatar spencermountain avatar wallali avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.