spencermountain / compromise Goto Github PK

View Code? Open in Web Editor NEW

11.3K 11.3K 645.0 55.79 MB

modest natural-language processing

Home Page: http://compromise.cool

License: MIT License

JavaScript 99.75% TypeScript 0.19% HTML 0.06%

named-entity-recognition nlp part-of-speech

compromise's People

Contributors

Stargazers

Watchers

Forkers

fragglebob vunb socialcloudtech big-data naturallanguage ichaib saidelimam yawetse uyghurdev goraneza hugobuddel matthi0uw chukonu somejeff jaforbes hacksparrow aminembarki cflann nfons cranesandcaff ulflander gunyarakun miji-dev damianofusco clns janusnic redaktor vidul-nikolaev-petrov siyuan1990 baselex01 ashokpant abacusadvertising clintg zzmjohn siawyoung cyancymk rsunder10 prashantagarwal w0lfeagle outout tscheys daliu jaags slneufeld v32itas outmost chriswli24 palmeral silky amitdjs antimatter15 robertomalatesta nithesh1992 cloudtracer randomeizer openbizgit aboutlo archtechfl cortexmg kendelljoseph mahangu nathanmarks aterrien creatorrr cryptixcoder cgilboy ericcarraway personsg njuhugn icaas fy2c tfg-urjc-2017 cmelgarejo jaredmansaakintola pauline-ng terrytompkins kshreyas91 tniewiar gbraad dak martindale yanlinaung benjamingr pasupunuri jadbox dleen ivanhoe011 divs1210 vibster dexterbrylle jimbog seescode asabovici wanjinchang little1tow fupio joskid jdrew1303 xenyal curtiszimmerman

compromise's Issues

Circa, or c.

Is it possible to add an exception for the following regex?

/c\.(\ ?[0-9]+)/

Right now I'm using a small script to pre-process the text that I want to analyze with nlp_compromise. The current solution that I am using looks like this:

raw = raw.replace(/c\.(\ ?[0-9]+)/g, 'circa $1');

Basically, any c. YEAR will be replaced by circa YEAR so nlp doesn't mess up with that c.. While only c. might not be good enough to be added to abbreviations since it's not significant enough, this expression matches c. NUMBER, which I think it's unambiguous enough. What do you think? Is there a way to add this or other similar, case-specific abbreviations?

(I am not proposing to change c. for circa, this is just my solution, I am proposing to add, if possible, an exception for c. YEAR and not break sentences in that period).

"something" and "other" not labelled as nouns

I'd consider these words common enough to be included in the lexicon.

Real numbers, (i.e. 5) aren't recognised as cardinal numbers.

Title says it all.

installing using `npm install nlp_compromise` gives me a `shasum check failed` error

tried on my home and work computer, same error.

Also looks like the version on npm is still 0.0.7

nlp.pos('constructor') is returning an error

Uncaught TypeError: Cannot read property 'match' of undefined

if (w.match(/^(over|under|out|-|un|re|en).{4}/)) {
  var attempt = w.replace(/^(over|under|out|.*?-|un|re|en)/, '')
  return parts_of_speech[lexicon[attempt]]
}

feature: add custom verbs/nouns etc

Scenario: using the library for natural language processing for a calendar assistant. Doesn't recognise "schedule" as a verb.

Would it be possible to pass in some configuration when instantiating the library, eg an array of verbs + nouns etc to allow users to inject extra words.
In my case I might extend verbs by passing in an array of my own verbs:
["schedule"]

That seems to work for me if I hack the code and add 'schedule' to the list of verbs...but I don't grok grammar well enough to know if it's completely correct (it becomes an infinitive verb, VBP)

nlp.sentences doesn't split on '!'

Example: nlp.sentences('How are you! That is great.') returns one sentence not two.

nlp.tag is not defined

The README mentions it, but I don't see it in the exports nor does the current version published to npm have it.

The example "Ten and a half million" does not work

Yet to go over the code but that specific example does not yield a result

2.0 - just a note : string to regex

see lexidates:
res.dayS = '\b('.concat(Object.keys(res.days).join('|'), ')\b');

When a string becomes a regex, in javascript, you must quote stuff with special regex-meaning double.
So \b should be \\bhere - see my original code...

If you want to use it like on top you need to pass it to a quote function, e.g.
Mozilla:

function escapeRegExp(string){
  return string.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}

or in dojo see .string ...

Sentence boundary detection.

Very nice library.

When playing with some text pulled from a web article, noticed that the sentence boundary does not always work.

For example, the text below does not split sentences correctly.

The man who tried to kill former Pope John Paul II 33 years ago showed up at the Vatican on Saturday to put white roses on his tomb and said he wanted to meet Pope Francis.Mehmet Ali Agca, a Turk, left John Paul critically injured after firing several shots in the failed assassination attempt in St. Peter's Square on May 13, 1981.The former pope forgave Agca, once a member of a Turkish far right group known as the Grey Wolves, and went to meet him in 1983 in the Rome prison where he had been sentenced to life imprisonment for the attack.Agca called the Italian daily la Repubblica on Saturday to announce he had arrived in the Vatican, his first visit since the assassination attempt and exactly 31 years after John Paul met him in prison.The visit was confirmed to Reuters by Father Ciro Benedettini, the Vatican's deputy spokesman, who said Agca stood for a few moments in silent meditation over the tomb in St. Peter's Basilica before leaving two bunches of white roses.Agca, 56, was pardoned by Italy in 2000 and extradited to Turkey where he was imprisoned for the 1979 murder of a journalist and other crimes. He was released from jail in 2010.The attack against John Paul, who died in 2005, has remained clouded by unanswered questions over who may have been behind it. An Italian investigative parliamentary commission said in 2006 it was "beyond reasonable doubt" that it was masterminded by leaders of the former Soviet Union.The Vatican on Saturday gave a cool response to Agca's request to meet with Pope Francis. "He has put his flowers on John Paul's tomb; I think that is enough," Vatican spokesman father Federico Lombardi told la Repubblica.

nlp.sentences() not returning all sentences

var text = "She was dead. He was ill."
nlp.sentences(text)

// returns only ["She was dead."]

I think it's because of the abbreviation regex is picking up ill. as an abbreviation, rather than the end of the sentence.

https://github.com/spencermountain/nlp_compromise/blob/master/src/methods/tokenization/sentence.js#L11

Similarly, nlp.sentences("It was Sunday. He attended mass.") only returns ["It was Sunday."] too

feature request : phrasal verbs stemming

In the default mode [without {dont_combine:true}] it would be nice to have phrasal verbs recognized – as they can have a totally new meaning. For example

My grandfather likes to look back on his childhood.
``look back`

[taken from http://www.englisch-hilfen.de/grammar/phrasal_verbs.htm]

is_plural again

Hm - the last commit does not work properly because

in pluralize_rules we have rules for both singular to plural AND plural to plural
while
in singularize_rulesit is only plural to singular

(???)

In general I am working on a factory method called "dictionary" based on the "words" and "rules" and this can be autotranslated by our database to several languages covering the ngram and metrics etc.

Use strict mode please

And write code with this mode.
Example, in file client_side/nlp.js:5652 (release 1.1.0):

  uncountable_nouns = uncountables.reduce(function(h, a) {
    h[a] = true
    return h
  }, {})

ReferenceError: uncountable_nouns is not defined

http://www.w3schools.com/js/js_strict.asp

nlp.tag('Tony Hawk said he was very happy')[0] !== 'Tony Hawk'

But stated otherwise at https://github.com/spencermountain/nlp_comprimise#named-entity-recognition

Double quotes to specify pos

We are using nlp_compromise to parse requests for data pulls. In many cases, a product or retailer will get parsed in an undesirable fashion, i.e. "Stop & Shop" will not be thought of as a noun.

Is it possible today, or would it be possible, to allow double-quotes to group words together and default them to a particular part of speech, like NN?

use of var in for loops ...

Regarding
a7f1e68 -> src/parents/noun/conjugate/inflect.js
please see e.g.
http://stackoverflow.com/questions/5717126/var-or-no-var-in-javascripts-for-in-loop

Usually it is "better practice" to use var in each for loop...
Personal opinion. Just saying.

Prepositions (IN)

Hey there,

when I initially compared the data it ignored the prepositions (IN) due to a typo and our db splits in pre- and postpositions. Got that now and found some prepositions which are listed in other categories:
"before": is also CC,
"round": is also JJ,
"apart": is also RB (but can be preposition "apart from this" OR postposition "this apart")

And we list some prepositions which are not in the array yet (did NOT check other categories):

[
{en: 'a'},
{en: 'an'},
{en: 'abaft'},
{en: 'abeam'},
{en: 'aboard'},
{en: 'absent'},
{en: 'afore'},
{en: 'alongside'},
{en: 'amidst'},
{en: 'amongst'},
{en: 'anenst'},
{en: 'apropos'},
{en: 'apud'},
{en: 'aside'},
{en: 'astride'},
{en: 'athwart'},
{en: 'atop'},
{en: 'barring'},
{en: 'beneath'},
{en: 'beside'},
{en: 'beyond'},
{en: 'but'},
{en: 'chez'},
{en: 'circa'},
{en: 'concerning'},
{en: 'excluding'},
{en: 'failing'},
{en: 'following'},
{en: 'for'},
{en: 'forenenst'},
{en: 'given'},
{en: 'including'},
{en: 'inside'},
{en: 'like'},
{en: 'mid'},
{en: 'midst'},
{en: 'minus'},
{en: 'modulo'},
{en: 'near'},
{en: 'next'},
{en: 'notwithstanding'},
{en: 'opposite'},
{en: 'outside'},
{en: 'pace'},
{en: 'past'},
{en: 'plus'},
{en: 'pro'},
{en: 'qua'},
{en: 'regarding'},
{en: 'sans'},
{en: 'save'},
{en: 'times'},
{en: 'toward'},
{en: 'underneath'},
{en: 'unto'},
{en: 'worth'},
{en: 'together', description: 'questionable'},
{en: 'vis-à-vis', description: 'questionable'},
{en: 'thru', description: 'informal', meta: {entitySubstitution: ['en']}},
{en: 'thruout', description: 'informal', meta: {entitySubstitution: ['en']}},
{en: 'till', description: 'same as "until", wikipedia: "with prosodic restrictions"'},
{en: 'versus', description: 'NAB conflict: commonly abbreviated as "vs.", or (law or sports) as "v."'},
{en: 'vice', description: 'used as "in place of"'},
{en: 'with', description: 'sometimes written as "w/"'},
{en: 'w/', meta: {entitySubstitution: ['en']}},
{en: 'within', description: 'sometimes written as "w/in" or "w/i"'},
{en: 'w/in', meta: {entitySubstitution: ['en']}},
{en: 'w/i', meta: {entitySubstitution: ['en']}},
{en: 'without', description: 'sometimes written as "w/o"'},
{en: 'w/o', meta: {entitySubstitution: ['en']}},
{en: 'o\'', description: 'apocopic form of "of"', meta: {entitySubstitution: ['en']}}
]

btw - a nice one: https://www.youtube.com/watch?t=108&v=MHX-CiJBVy0

past tense '-dy', '-ly'

I think maybe this is not working correctly, but as it seems broken for a bunch of verbs, maybe I'm missing something...

nlp.verb('study').to_past()
"studyed"

nlp.verb('apply').to_past()
"applyed"

Angular module

I'm writing angularjs module for this - https://github.com/Kroid/angular-nlp-compromise, if someone need.

Add Fahrenheit, Celcius, and Kelvin to Inflect Uncountables

I'm terrible with GitHub, and I'll probably screw stuff up trying to do this myself. But anyway, these need adding. Thanks!

nlp.spot not returning entity array, but rather an object (still)

lates nlp.js from master:

Dictionary?

Are you using a own internal dictionary / algorithms to do the transformations etc? If so, and I seem to think this is the case, there is something off with the conjugation of the verb "load":

{ infinitive: 'loa',
  present: 'loads',
  past: 'loaded',
  gerund: 'loading',
  doer: 'loaer',
  future: 'will loa' }

Now if I try something else, like "to load":

{ infinitive: 'to load',
  present: 'to loads',
  past: 'to loaded',
  gerund: 'to loading',
  doer: 'to loader',
  future: 'will to load' }

This doesn't seem right either. Am I doing something wrong with entry of the string word(s) - some form I am missing? Or is this a corner case in the algorithm perhaps? Thought I'd at least report it 📦

Otherwise, great solution: much thanks!

Sentence negate will negate everything

for example:
[Orig]
They are based on different physical effects use to guarantee a stable grasping between a gripper and the object to be grasped.

[negate]
They are not based on different physical didn't effects use to doesn't guarantee a stable grasping between a gripper and the object to be grasped.

Maybe just negate the first verb would be sufficient.

Question / feature request

for normalizing the input:

How about all normalizing all typographic stuff like curly and special quotes
to well the normalized ones ?

maybe useful for e.g.
O’Reilly to O'Reilly etc.

see http://practicaltypography.com/straight-and-curly-quotes.html

and note to me :
Maybe it would be useful to write a "preprocess" test, testing if everything in .js and .min.js ("expanded") is the same.

PP - compared to fork

Hey,
sorry for opening a new one (trying to separate the different questions) :
This is about posessive-pronounsPP which were not covered in the initial comparison (same reason: because they split to different categories in our db - I made the db compatible now, when you look for PP it'll join all 3 categories) - I am pasting the question I just added to the code (not sure if there are better expressions for the cases):

    // TODO - this covers more than the original :
    // posessive-pronouns (should) have 3 forms :

    // as possessive (adjective) determiner pronoun (my) OR
    // as possessive (noun) pronoun (mine) OR
    // as a reflexive pronoun (myself)

What do you think?

btw: some changes you proposed in contributing.md also came to the fork.
E. g. it has now JSdoc documentation (WIP, standard template for now) ...

ambiguous contractions in negate -

nlp.pos("he's eating a veggie burger").sentences[0].negate().text();

'he's isn't eating a veggie burger'

adding a quick fix...

Question - Advanced Date Parsing

First off, this is such a great project!
Do you have any thoughts returning an array of dates from a parsed sentense? Or more advanced logic like ranges?

It looks like this does a lot of what Ive done in https://github.com/silentrob/normalizer but I suspect much faster (basic normalization and commonwelth => american conversation).

I also have some code that deals with numbers and parsing math expressions here https://github.com/silentrob/superscript/blob/master/lib/math.js.

Britishize asymmetry with Americanize

On version 1.1.3

If you try typing in something like

require("nlp_compromise").britishize("color");
> "color"

require("nlp_compromise").britishize("favorite");
> "favorite"

require("nlp_compromise").britishize("internationalization");
> "internationalization"

It just returns whatever the input is.

The americanize function works perfectly fine, though.

NPM installation throws error

Below is the stack trace --

npm http GET https://registry.npmjs.org/nlp_comprimise
npm http 304 https://registry.npmjs.org/nlp_comprimise
npm http GET https://registry.npmjs.org/nlp_comprimise/-/nlp_comprimise-0.0.3.tgz
npm http 404 https://registry.npmjs.org/nlp_comprimise/-/nlp_comprimise-0.0.3.tgz
npm ERR! fetch failed https://registry.npmjs.org/nlp_comprimise/-/nlp_comprimise-0.0.3.tgz
npm ERR! Error: 404 Not Found
npm ERR!     at WriteStream.<anonymous> (/usr/local/Cellar/node/0.10.25/lib/node_modules/npm/lib/utils/fetch.js:57:12)
npm ERR!     at WriteStream.EventEmitter.emit (events.js:117:20)
npm ERR!     at fs.js:1596:14
npm ERR!     at /usr/local/Cellar/node/0.10.25/lib/node_modules/npm/node_modules/graceful-fs/graceful-fs.js:103:5
npm ERR!     at Object.oncomplete (fs.js:107:15)
npm ERR! If you need help, you may report this *entire* log,
npm ERR! including the npm and node versions, at:
npm ERR!     <http://github.com/isaacs/npm/issues>

npm ERR! System Darwin 13.0.0
npm ERR! command "/usr/local/Cellar/node/0.10.25/bin/node" "/usr/local/bin/npm" "install" "nlp_comprimise" "--save"
npm ERR! cwd /Users/WS/nlp/natural
npm ERR! node -v v0.10.25
npm ERR! npm -v 1.3.24
npm ERR! 
npm ERR! Additional logging details can be found in:
npm ERR!     /Users/WS/nlp/natural/npm-debug.log
npm ERR! not ok code 0

nlp.spot not returning entity array, but rather an object

From Kroid/angular-nlp-compromise#1 :

nlp.spot("joe carter loves toronto");

From docs:

nlp.spot("joe carter loves toronto")
// ["joe carter", "toronto"]

I checked it from chrome console in example page http://rawgit.com/spencermountain/nlp_compromise/master/client_side/cute_demo/index.html

Contractions

Please note that he'sand she's becomes ['he', 'is'] and ['she', 'is']
but it could also be ['he', 'has'] and ['she', 'has']
stackexchange

• how about `ìt``?

• shouldn't the negative contractions be handled here too?
"cannot": ["can", "not"] is the only one.
But how about stuff like
"shouldn't": ["should", "not"]

This would affect logic_negate, I assume.

cannot find dates

using this from node gives an error:
Error: Cannot find module './dates'

Steps to reproduce:

npm install nlp_compromise
node
nlp = require('nlp_compromise');

Bower

Amazing library, thanks guy!
Please, add your lib to bower registry http://bower.io/ http://bower.io/docs/creating-packages/

January is not recognized by date_extractor?

Heya,

This is a wonderful library. Hoping to use it to extract dates in a project, but I noticed that January consistently fails to be properly extracted in tests. I'm wondering if this is a subtle bug with indexes / accidental type coercion of 0 to false in date_extractor.coffee.

Would be happy to help you track down the issue if you have trouble.

Here is an example I just tried on master:

nlp.value("Today is January 7, 2015").date()
{ month: null,
  day: 7,
  year: 2015,
  to_day: null,
  to_year: 2015,
  to_month: null }

'Sixteen one' is two numbers, not one

Twenty five should register as one number,
Sixteen one should not.

language independence ...

Hey there,
again : this is not an issue.
The changes recently done are totally fine but let me explain why I make (made or am planning to make) which changes in the fork https://github.com/redaktor/nlp_compromise

As a European I would love this project to be as multilingual as possible ;)
The changes made have these goals :
• for contributing be totally explanative and readable
• for transport be browser-friendly and thus very small
• completely separate data / language logic / project logic

Three new files in src/data
: dictionary.js
The file where we can contribute multilingual words in the categories like in the readme.
: dictionary_rules.js (tba)
The file where we can contribute multilingual rules.
: _build.js
To build the data modules for one/some/all languages.
This could also be the first grunt step.

It will generate or overwrite a folder like 'en'.
Check it out node _build -l
Basically I am planning to let the build script generate a customized client side file and additional AMD browser modules.
See for instance the module.exportslines, there are more than 30 but they are useless in the browser and apart from that I'd optimize the compressing for browser a bit further.

I do also try to avoid duplicates further. For example in phrasal verbs : Some verbs are already in the verb data module and some adjectives are already in the adj. module ...

When it is complete:
• each module e.g. in /parent should only be a littlebit 'project logic'.
• our database can autotranslate
• I could attach our web interface to encourage translators even more ;)

referenced_by and reference_to

Hey,
please see https://github.com/spencermountain/nlp_compromise/blob/master/src/parents/noun/index.js

referenced_by uses the var posessives (typo ?). It is defined in the scope of the module and is

{
    "his": "he",
    "her": "she",
    "hers": "she",
    "their": "they",
    "them": "they",
    "its": "it"
  }

while reference_to uses the var var possessives defined in the scope of the function which is just

{
    "his":"he",
    "her":"she",
    "their":"they"
}

Shouldn't both be the same and maybe

{ 
    mine: 'i',
    yours: 'you',
    his: 'he',
    her: 'she',
    its: 'it',
    our: 'we',
    their: 'they',
    them: 'they' 
}

performance / why is it running twice ...

Hey there,
contributing from my fork doesn't make sense because the structure will change to 'only the 3 dictionary files and a factory' soon.
However - let me ask some perfomance questions.
Maybe I missed something, hidden in the code, but
--> several 'autoclosure' functions run every time when a module is required.

Let's take an example - the conjugation of verbs which is used quite often.
I'll use simple console.log to demonstrate it.

In src/parents/verb/index

put some logs in the conjugate function

the.conjugate = function() {
  console.log( 'BEWARE! conjugate is conjugating' );
  verb_conjugate = require('./conjugate');
  var conjugated = verb_conjugate(the.word);
  console.log( 'conjugate result', conjugated );
  return conjugated; //verb_conjugate(the.word);
}

and in the 'autoclosure' form function

the.form = (function() {
    console.log( 'BEWARE! the.form is conjugating' );
    verb_conjugate = require('./conjugate');
    // don't choose infinitive if infinitive == present
    var order = [
      'past',
      'present',
      'gerund',
      'infinitive'
    ];
    var forms = verb_conjugate(the.word);
    console.log( 'forms result', forms );
    for (var i = 0; i < order.length; i++) {
        if (forms[order[i]] === the.word) {
            return order[i];
        }
    }
})()

When I do

console.log( nlp.verb('last') );

it will conjugate

and when I do

console.log( nlp.verb('last').conjugate() );

it will conjugate twice

Verb Tense Bug in Demo

Sentence:

joe carter plays patiently in toronto

Steps to reproduce:

Change plays to past-tense
Negate played

Result:

joe carter didn't playe patiently in toronto

ngram: support for ngrams with size equal to 1

Currently there is no way to use nlp.ngram() to perform a simple word frequency calculation (i.e. ngrams with size equal to one). Setting the max_size option to 1 produces ngrams with size 2. Setting max_size to 0 also gives the same result. I suspect that these two are the lines responsible for this issue: https://github.com/spencermountain/nlp_compromise/blob/master/src/methods/tokenization/ngram.js#L11 (where max_size is incremented - why?) and https://github.com/spencermountain/nlp_compromise/blob/master/src/methods/tokenization/ngram.js#L6
(where since 0 is false, max_size is assigned the value 5).

nlp.sentences() collapses whitespace

Pretty self explanatory, but if there are multiple white space characters between a word, the sentence detected collapses these characters together

Implementing the package on a data structure ?

Hey I am quite new to meteor and computer science basically. I am linguistic trying to learn computational linguistics somehow. I guess I am struggling still. I was wondering how it would be possible to use this on my own corpus ? Let's say that I have a list of sentences and whenever I choose the sentence I want to see the properties of the sentence. Would be possible ?

are/were

I'm new to nlp and am weak on my grammar, so maybe I'm barking up the wrong trees.

I'm using nlp_compromise to switch the verb-tense in sentences from past to present, or present to past, using nlp.verb(vb).to_present() or nlp.verb(vb).to_past() as required.

It's working great, for the most part, except for when I try to swap the tense of "They are friends" or "They were friends".

Is there some other way I should be going about this, am I using the wrong tools, or is this something that can be extended with some new rules?

date_extractor's regex not replacing properly

Hi,

in date_extractor.js, line 24 to 35, the replace regex replaces dates in the format "Feb. 14, 1969" to "February14, 1969" (no space between the month and the date), leading the parser to skip the date and only match the year.

Fixed by surrounding the replaced month names by spaces:

    text = text.replace(/ Feb\.? /g, ' February ');
    text = text.replace(/ Mar\.? /g, ' March ');
    text = text.replace(/ Apr\.? /g, ' April ');
    text = text.replace(/ Jun\.? /g, ' June ');
    text = text.replace(/ Jul\.? /g, ' july ');
    text = text.replace(/ Aug\.? /g, ' august ');
    text = text.replace(/ Sep\.? /g, ' september ');
    text = text.replace(/ Oct\.? /g, ' october ');
    text = text.replace(/ Nov\.? /g, ' november ');
    text = text.replace(/ Dec\.? /g, ' december ');

Adding new Names for recognition using spot?

I looked through a variety of files, but haven't found either a list or a method of where I can append known people/organization names for recognition using .spot -- does this exist and I'm just missing it?

Just FYI.

@spencermountain
Please see this demo http://expresso-app.org/tutorial ...
I made the same demo with your nice project.
More or less lazily by porting the "python metrics logic" to .js.
The advantages are : .js only and onKeyPress ... Think of a better http://www.hemingwayapp.com ;))
Will work on it later today. Also pointed the author of expresso to your project.

The method could either be contributed as a .metrics() function to the "root level" used in a demo or as a standalone demo. Just tell me if you are interested by writing to @redaktor (I'll close this directly) .
Thank you for starting to produce this missing javascript-puzzlepiece !

Changes in the fork and the pull request ...

Hey,

just commited nearly the last changes to the fork
https://github.com/redaktor/nlp_compromise
before I could do a pull request.

I need to eliminate the
• 'hardcoded' dups in lexicon generation
• last 37/1360(?) tests failing

The lexicon will be at least 10% smaller then and I really think starting with this structure
language dependent contributing can become easy.
Just because I saw you were recently active ...

date_extractor: Cannot read property '1' of null

Hi,
I'm getting this error when trying to parse certain strings.
I've put in a hack for the function to always return null as i'm not using date extraction, but it's not a fix.

/node_modules/nlp_compromise/src/parents/value/coffeejs/date_extractor.js:224
h[k] = arr[places[k]];
^
TypeError: Cannot read property '1' of null
at /node_modules/nlp_compromise/src/parents/value/coffeejs/date_extractor.js:224:21
at Array.reduce (native)
at Object.regexes.process (/node_modules/nlp_compromise/src/parents/value/coffeejs/date_extractor.js:223:36)
at main (/node_modules/nlp_compromise/src/parents/value/coffeejs/date_extractor.js:334:20)
at the.date (/node_modules/nlp_compromise/src/parents/value/index.js:13:11)
at /node_modules/nlp_compromise/src/parents/value/index.js:38:11
at new Value (/node_modules/nlp_compromise/src/parents/value/index.js:45:4)
at Object.parents.value (/node_modules/nlp_compromise/src/parents/parents.js:22:10)
at /node_modules/nlp_compromise/src/pos.js:366:47
at Array.map (native)