winkjs / wink-nlp Goto Github PK

Developer friendly Natural Language Processing ✨

License: MIT License

JavaScript 99.76% TypeScript 0.24%

natural-language-processing nlp tokenize sbd sentence-boundary-detection negation-handling sentiment-analysis pos-tagging ner named-entity-extraction

wink-nlp's People

Contributors

Stargazers

Watchers

Forkers

rachnachakraborty prtksxna ankugit catataw sanjayaksaxena prithvi-singh wjlee-barco aakashtuteja4 creadicted robertomalatesta a1ip sparshkesari spkdroid qa-engine-x nightshiftdevelopment landonbar jspantheonlab tens0rflowjs copperdong qforger sno3mahn angadsawadh winkexternship mehediasif e7dal decisively muresanandrei kcziggystar 3ephn3zxdr5c hbcbh1999 theykk-bunker nanderoo doytsujin richardsonjf nimsala1234 kustomzone shweshi msngeeky positioner kerlic mbrukman markforster itpiligrim soltrinox johnnyma sanyuesiyuewuyue hertera1 jean-london aqhali vitaly-z priyanshusingh rawsh jdkdev ajayzend anthogez dchest

wink-nlp's Issues

Trouble using the word "day" in customEntities pattern

I'm trying to use learnCustomEntities to match various patterns. It's working great for most of them, but for some reason I'm having trouble with it picking up the word "day". Here's a sample brief sample:

const types = [
{ name: 'today', patterns: [ 'day', 'day is it', 'have the date', 'day it is'] },
{ name: 'where', patterns: [ 'wheres', 'where', 'get to','find', 'place' ] },
];

nlp.learnCustomEntities( types, {matchValue: false, usePOS: true, useEntity: true } );
const doc = nlp.readDoc( text );
const type = doc.customEntities().out( its.detail );

A text string like "how do you get to the taco shop" returns "where", but a text string like "what day is it" returns nothing - it seems like none of the phrases including the word "day" get picked up.

Any thoughts? Am I doing something wrong here?

[Question type] Is pattern matching fully explained in documentation?

Document: https://winkjs.org/wink-nlp/custom-entities.html

Statements "Each option is separated by a vertical pipe character as in [NOUN|PROPN]" and "An option may be empty as in the case of the first two sets of options — [|DET] and [|ADJ]" suggest that the symbol "|" can be used as OR operator between 2 options, and if at beginning, the option can be absent.

Are there any more special symbols we can use to achieve certain pattern matching, besides "^" for escaping?

Is it possible for me to train my own models?

Hello there, is it possible to train my own, say, massive model?
What should you do if this is the case?
How and what type of data am I going to feed it?

I'm just getting started in NLP.

non-breaking space does not appear in token stream

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );

const text = 'Hello\u00a0World';
const doc = nlp.readDoc(text);

doc.sentences().each(sentence => {
  sentence.tokens().each(token => {
    console.log([token.out(nlp.its.precedingSpaces), token.out()])
  })
})

does not return \u00a0 as either precedingSpaces or token.out()

Custom entity Matching sequence not working

  var text = `Go grocery shopping on Sept 21st`;
   
  const patterns = [
    { name: 'TEST', patterns: ['[CARDINAL|ORDINAL] Sept [CARDINAL|ORDINAL]'] },
  ];
  nlp.learnCustomEntities( patterns );

  var doc = nlp.readDoc(text);

When creating a pattern that contains CARDINAL or ORDINAL, it prioritizes the CARDINAL/ORDINAL over the custom entity. In this example the output is {value: '21st', type: 'ORDINAL'} instead of the intended {value: 'Sept 21st', type: 'TEST'}

I get this anytime I make any pattern with Cardinal or Ordinal in it.

NOTE: in this particular issue I am not sure why Sept isn't getting picked up as a Date in the first place, only Sep short form works even though in many places Sept is also used. Which is the main reason I am trying to work around it and create a custom pattern

Creating Special Entities

I'd like to manually create Special Entities in my text, based on matching a string, but there doesn't seem to be a way to use a custom function to do this. So, instead, I'm creating patterns that contain the strings I'm looking for, then training the model with these strings before loading the text.

Weirdly, some of my patterns are causing the following error:

    if ( fsm[ state ][ otherwise ] ) {
                     ^

TypeError: Cannot read properties of undefined (reading ' otherwise')
at Object.recognize (...\node_modules\wink-nlp\src\automaton.js:408:26)

Here's an example of a string I'm using in a pattern that causes this:

'at night - the scn tells the brain to make more melatonin so you get drowsy'

Named-Entity Recognition and Proper Nouns?

Hey all,

Firstly this is an amazing library, thank you for doing such great work.

I have two separate but related questions/comments:

How does named-entity recognition work (not custom)? From reading the docs, it looks like it's supposed to work out of the box with the ner pipeline and then printing out .entities(). However this seems to only support the following: date, ordinal, cardinal, money, percent, time, duration, hashtag, emoji, emoticon, email, url and mention. This is well-documented, but I was surprised to find it didn't support tagging e.g. Person or Organization names, where NER has a lot of value. As-is, I can't use the ner pipeline since I'm really only looking to filter out Proper Nouns/Names, which I'd imagine is a common use-case. Is this on the roadmap for future support, or is there something I'm missing here?

I would expect to be able to parse out a sentence "John Doe and Susan Smith work at Coca-Cola" and see three (five?) entitles in that sentence. Instead there are none.

On the subject of proper nouns, the implementation seems to naively tag the first word of most sentences as a proper noun (unless it's overruled by another specific part-of-speech), only because it's capitalized. Some examples I've found from just cursory testing:

Sentence is:  Main menu
Tokens:  (2) ['PROPN', 'NOUN']

Sentence is:  Contents
Tokens:  ['PROPN']

Sentence is:  Current events
Tokens:  (2) ['PROPN', 'NOUN']

Sentence is:  Following Vogel's firing after the season, he said, "I'm not sure what his issue was with me."
Tokens:  (23) ['PROPN', 'PROPN', 'PART', 'NOUN' ...

Sentence is:  Fourth triple-double season and career triple-doubles record
Tokens:  (11) ['PROPN', 'ADJ', 'PUNCT', 'ADJ', 'NOUN', ...

I can simply guard against this in my own code of course, but it does seem like something that can be improved in the model. Looking forward to hearing your take on this feedback!

(For context, I'm using wink-eng-lite-web-model)

How does this package relate to wink-sentiment, wink-post, etc.?

I've been using various wink packages for some time, and they're great!

I just realised that this package existed today, and now I'm wondering if I should have been using it instead of the individual packages. It doesn't seem this is listed on your website. Is it new?

Any advice on how you see the whole family of libraries relating to each other?

Custom entities, POS and disambiguating

Consider the following phrase:

"This afternoon I plan to go to the bookstore and buy a book on Go".

In the example, go is used as a verb and as a noun (programming language).

I'd like to detect the programming language Go. I noticed that when I tag the phrase with POS, the first go is tagged as VB and the second as NNP.

Is there a way to say to the learnCustomEntities method that I'm interested to detect the word Go but only when it's an NNP?

const text = 'This afternoon I plan to go to the bookstore and buy a book on Go.';
const patterns = [
  { name: 'go', patterns: [ 'Go&NNP' ] },
];
nlp.learnCustomEntities(patterns);

Typescript imports don't work correctly

I seem to only be able to import with the following:

import * as winkNlp from "wink-nlp";
import * as winkEngLiteWebModel from "wink-eng-lite-web-model";
const nlp = winkNlp(winkEngLiteWebModel); // throws: TS2349: This expression is not callable.

Side note: It would really help if your documentation didn't jump to the conclusion you've already imported and setup wink-nlp.

Invalid value of sentiment

In sentence "The abandoned amusement park was now a haunting place, with decaying rides and an eerie atmosphere." I get "-1.3877787807814457e-17" sentiment. It is less than -1.

Typescript support missing for `wink-eng-lite-model`

can't import model using typescript

Add its.lemma property to its helper

what is the default behaviour of `learnCustomEntities`?

I had a lot of trouble using learnCustomEntities until I discovered the second config parameter with its usePos: true setting.

https://winkjs.org/wink-nlp/learn-custom-entities.html

matchValue - the docs show that both true and false are the default. Something is not right there.
usePos - The docs say usePos defaults to true, but does it actually default to false? I was not able to match NOUN until I explicitly used the config usePos: true. I think the default is false. In either case, can we also document the config options on the learn custom entities page here? I was very confused following these examples: https://winkjs.org/wink-nlp/custom-entities.html (I actually stumbled across the config options in the source code rather than reading about it in the docs)

Custom entities are a great feature, be great to make it easier to understand how to use them. Thanks :)

Pos tagging for imperative sentence is inconsistent

Hi,
I ran into a corner case with pos tagging for imperative sentences like:
Suppose I tell you that it is true.
if run this sentence on its own then it works as expected

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

if run it with text that contains one sentence before
it changes pos of suppose to pnoun

nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
I 0 I I X 2 0 word i / PRON
watch 1 wa tch xxxx 1 0 word watch / VERB
TV 1 TV TV XX 2 0 word tv / NOUN
every 1 ev ery xxxx 1 0 word every / DET
day 1 da day xxx 1 0 word day / NOUN
. 0 . . . 0 0 punctuat . / PUNCT

total number of tokens: 6

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:

 nlp.readDoc('I like playing football').printTokens();
 nlp.readDoc('Suppose I tell you that it is true.').printTokens();

produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB

can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?

versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",

typescript definitions not found

Hello,

I'm getting the following linting errors in nextjs:

'BM25VectorizerConfig' is declared but its value is never read.ts(6133)
Could not find a declaration file for module 'wink-nlp/utilities/bm25-vectorizer'. '/home/robert/Documents/dank-edge/node_modules/.pnpm/[email protected]/node_modules/wink-nlp/utilities/bm25-vectorizer.js' implicitly has an 'any' type.
  Try `npm i --save-dev @types/wink-nlp` if it exists or add a new declaration (.d.ts) file containing `declare module 'wink-nlp/utilities/bm25-vectorizer';`

I see that the types for wink-nlp/utilities/bm25-vectorizer exist but for some reason they are not getting picked up. I have

    "esModuleInterop": true,
    "allowSyntheticDefaultImports": true,

In my typescript config.

Edit: it looks like a workaround is importing winknlp first, even though I'm not using it.

import winkNLP from "wink-nlp";
import BM25VectorizerConfig from "wink-nlp/utilities/bm25-vectorizer";

Question: Text Summarization

Hello again, Sanjaya.

Sorry to bother you with this.

While I am fairly certain that we had a brief discussion about Wink's ability to summarize unstructured text, I've lost track of where that discussion might live. I seem to recall that you also provided an example.

Do you have any sense for where that discussion might live and, if not, can you point me to examples of such summarization?

Thanks very much.

Cordially,

Paul

readDoc() breaks with long string of numbers

While parsing some text found in the wild, I noticed winkjs will break when parsing a long string of comma-separated numbers.
No error is thrown, winkNlp simply never returns from the .readDoc call

Here is code that breaks:

const text = '47,47,32,67,111,112,121,114,105,103,104,116,32,74,111,121,101,110,116,44,32,73,110,99,46,32,97,110,100,32, '
const nlp = winkNlp(model, []);
const doc = nlp.readDoc(text.toLowerCase());

The same code with a long string has no issue:

const text = 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
const nlp = winkNlp(model, []);
const doc = nlp.readDoc(text.toLowerCase());

Add key sentences detection image

Ways to highlight most interesting phrases in a text using sentiment analysis

I'm looking at ways to highlight most interesting phrases in a text using sentiment analysis and the first way I thought I could do it is by identifying phrases which have the most scored words.

Let's take this text as an example:

Modern card issuer Marqeta has announced its partnership with FNBO to expand its partner ecosystem and allow customers to launch modern credit card programmes. The new collaboration aims to modernise the credit card offering and meet clients’ demands, offering a more flexible and reliable digital experience. FNBO and Marqeta will allow companies to easily launch credit cards using the latter’s APIs and embed the card experience within their app ecosystem. The Marqeta platform benefits from a self-service dashboard to update credit products according to clients ‘needs, while client companies can instantly extend credit applications, decisions, and onboard accounts, among other features. First National Bank of Omaha is a subsidiary of First National of Nebraska, with offices in Nebraska, Colorado, Illinois, Kansas, and Texas, among others. It is a six-generation privately own BaaS. Marqeta is headquartered in California and is certified to operate and offer its flexible payment solutions in 36 countries globally.

The bolded phrase has the most scored words, therefore it's highlighted as an interesting text to read.
My question is if there other better ways to do this, because sometimes the highlighted text may seem out of context.

And a second question, the text above has
{score: 8, normalizedScore: 0.3121}
Which of the two properties indicates better the text sentiment?

Preserving blank space at the end of sentence

Would like it if you could support preserving blank space at the end of sentence when using doc.sentences().out(). A few other sbd libraries do this and it's useful to have. Thanks.

Add wiki timeline image

Is it possible to unmark marked text?

Hey :)
I'm not sure if this is the right place, but I'm gonna post here because the markup method appears on items, and isn't a helper method.

I see it is possible to wrap an item using markup. However, once the item is markedup, I don't see a way to unmark it.
I tried to item.markup("", ""), but I assume that doesn't work because the mark isn't part of the item.

Is it possible to overwrite the doc string and then remark where I want to? doc.text = doc.out().replaceAll('<mark>', '').

Is there another method I'm missing?

Thanks for reading.

Add Spanish Model - How to Help

I run a language learning software company. We would benefit greatly from quality in-browser (and nodejs) nlp tools. How can we get started adding support for Spanish, French, German, and other languages?

Is it possible to set up a winkModel (using the `wink-eng-lite-web-model`) in a Worker?

I'm trying to set up a web worker to handle some nlp stuff.
The project I'm working on requires me to use the web-model.

I've tried a few different approaches, but I always end up getting this error:

Pointing to this section of code:

I've tried importing wink in different ways, but I don't think that's the issue. I've also configured esbuild to bundle it for platform: "browser", but that hasn't changed the error.
I'm not sure if this issue is related to wink, but any help would be appreciated.

This worker code should reproduce the error:

self.onmessage = async function (e) {
	const model = require("wink-eng-lite-web-model");
	const wink = require("wink-nlp");
	const winkModel = wink(model);

	self.postMessage(e.data);
};

Is the data be sent across the network to perform NLP operations when integrated in the browser?

Hi Team,
Is the data be sent across the network to perform NLP operations when integrated in the browser?

Regards,
Hemasundara.

Testing error

The following test is failing:

const winkNLP = require('wink-nlp')
    , model = require('wink-eng-lite-model')
    , nlp = winkNLP(model)
;
let en_text = 'Dear Sonia, good morning! For elaboration I ask the following information: As for the admission I clarify: There is no device'
    , doc = nlp.readDoc(en_text)
;
console.log(doc.sentences().out());
console.log(doc.entities().out());
console.log(doc.tokens().out());

With the error:

/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-eng-lite-model/src/pos-updater.js:1
var updater=function(a,b,c){var d=b.cache;for(let e=0;e<a.length;e+=1){const f=a[e][2],g=a[e][0];0>f?c[g]=Math.abs(f):d.isMemberPOS(b.tokens[4*g],f)&&(c[g]=f)}};module.exports=updater;
                                                                                                                        ^

TypeError: Cannot read property 'isMemberPOS' of undefined
    at updater (/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-eng-lite-model/src/pos-updater.js:1:121)
    at Object.readDoc (/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-nlp/src/wink-nlp.js:272:7)
    at Object.<anonymous> (/Volumes/Work/Developer/Web/rdr_playground/test.js:17:17)
    at Module._compile (internal/modules/cjs/loader.js:1128:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1167:10)
    at Module.load (internal/modules/cjs/loader.js:983:32)
    at Function.Module._load (internal/modules/cjs/loader.js:891:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47

copy-edit examples compiler's comments

Questions about wink-distance

Hello Sanjaya,

I've just stumbled upon your wink.js and am very intrigued. But please note that I am a relative noob in the ML/NLP world; trying to learn.

I have a few questions about the use of wink-distance to detect similarities between raw, unstructured texts:

Do you think wink-distance is suitable for such a task?
Am I right that one must create a "bag of words" from the raw text as proper input to wink-distance?
In wink-distance/test/bow-cosine-spec.js is { a: { the: 2, dog: 1, chased: 1, cat: 1 }, b: { the: 2, dog: 1, chased: 1, cat: 1 } } an example of a "bow?"
If so, does wink.js offer a way to create a "bow?"

Thanks very much for your help.

Cordially,

Paul

Add its.stem property to its helper

Rename .document() as .parentDocument()

Question - How to design so that I can feed language grammar as part of conversation?

Cosine similarity to support for upcoming BM25 bowOf() method

It was earlier intended to be used with as.bow of winkNLP. It should now also support bowOf() method of BM25, which has been coded, integration test cases & documentation needs to be completed.

Thanks @HimanshuMittal01 for helping/pointing out.

Add image to README

Feature request: Dependency parser

🙏 Thanks for 😉 !!

add RunKit example using the web language model

const winkNLP = require( 'wink-nlp' );
const its = require( 'wink-nlp/src/its.js' );
const as = require( 'wink-nlp/src/as.js' );
// Use web model for RunKit.
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model )

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );
// Print tokens.
console.log( doc.tokens().out() );
// Print each token's type.
console.log( doc.tokens().out( its.type ) );
// Print details of each entity.
console.log( doc.entities().out( its.detail ) );
// Markup entities along with their type for highlighting them in the text.
doc.entities().each( ( e ) => {
  e.markup( '<mark>', `<sub style="font-weight:900"> ${e.out(its.type)}</sub></mark>` );
} );
// Render them as HTML via RunKit
doc.out(its.markedUpText);

Output:

Question

Hi,

I am about to start looking at BM25 Vectorizer. I recalled our discussion, available at

#31

wherein Sanjaya remarks that both BM25 and Wink's Similarity are language agnostic. If this is true, why do its examples, e.g.,

https://winkjs.org/wink-nlp/bm25-vectorizer.html

show an English model?

Thank you.

PoS markup example on website does not work with typescript

On this webpage: https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html

There is this code:

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model )
// Acquire "its" and "as" helpers from nlp.
const its = nlp.its;
const as = nlp.as;

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );

doc.entities().each((e) => e.markup());
document.getElementById("result").innerHTML = doc.out(its.markedUpText);

I've adapted it to typescript to get the following:

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP( model )
// Acquire "its" and "as" helpers from nlp.
const its = nlp.its;
const as = nlp.as;

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );

doc.entities().each((e) => e.markup());
(document.getElementById("result") as HTMLElement).innerHTML = doc.out(its.markedUpText);

And I am seeing this error:

(method) ItemEntity.markup(beginMarker: string, endMarker: string): void
Expected 2 arguments, but got 0.ts(2554)
index.d.ts(186, 12): An argument for 'beginMarker' was not provided.

Is it possible the typescript definition of ItemEntity.markup isn't quite correct? When I run the code in nodejs without typescript, everything runs just fine.

Custom entities with regex pattern

I'm ingesting some data from Slack. Slack formats its messages with some custom templates. For example:

Custom emojis are surrounded in colons :slightly_smiling_face: and :green_checkmark:.

At-mentions are re-encoded similar to this: <@U024BE7LH> <#C024BE7LR>.

More examples and docs here: https://api.slack.com/reference/surfaces/formatting#retrieving-messages

How can I have wink tag these known regular elements? It seems like learnCustomEntities is the correct path but it needs to take a regex pattern.

Typescript support

Are there any plans to write Typescript declarations files for the wink modules? I use Typescript for all JS projects these days, and in order to use wink properly I've had to hack up some declaration files (I know there's the any type, but ew).

I'd be happy to contribute what I've written so far, but I can't promise it's complete or correct. It has been serving me well so far though :)

Required esModuleInterop in true for TypeScript projects

Please add to documentation, that TypeScript project required esModuleInterop in true to import wink-nlp module.
See: https://www.typescriptlang.org/tsconfig#esModuleInterop
It's not obvious fo full TypeScript libs.

Your project is cool. 🥇

add showcases in under the Documentation heading in README

Importing and creating custom models

Hi, @sanjayaksaxena!

As I understand it, you are a maintainer of this npm packet. I'd like to thank you, because it's very awesome and supports TS types out-of-the-box, instead of most of many other NLP packets.

I'd like to ask a question about custom importing models and converting other models (like BERT or GPT). For example, I'd like to take a custom BERT model from this repositories. I am very interested of using Russian (Cyrillic) models, which supports this language.

I already downloaded and open wink-eng-lite-model and found that it's a bit different than other models.

Is there any option, like a converting model guideline for that?

Typescript Types are not accurate

it seems docs says we should use the embeddings like this:

import wink from "wink-nlp";
import model from "wink-eng-lite-web-model";
import vectors from "wink-embeddings-sg-100d";

const nlp = wink(model, [], vectors);

But in the types wink doesn't take a 3rd argument:

wink(theModel: Model, pipe?: string[] | undefined): WinkMethods

but I can see its correct in code, but due to the clash we can't use this in Typescript

var nlp = function ( theModel, pipe = null, wordEmbeddings = null ) {

Additionally, most JS examples online do not match the types provided.

Different result for pos tagging if using tokens.out and printTokens functions

Hi. for some reason. In sentence They both give you a nice dashboard
I recieve different results in doc.printTokens() and tokens out(its.lemma) for pos tagging
regarding verb give which is correctly identified in printTokens function VERB and incorectly if using tokens.out function NOUN

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';

const nlp = winkNLP(model);
const its = nlp.its;

const sentence = 'They both give you a nice dashboard';
const doc = nlp.readDoc(sentence);
doc.tokens().each((item) => {
  console.log(item.out(its.lemma), item.out(its.pos));
});
doc.printTokens();


they PRON
both DET
give NOUN - should be the VERB. incorrect POS tagging
you PRON
a DET
nice ADJ
dashboard NOUN


token      p-spaces   prefix  suffix  shape   case    nerHint type     normal/pos
———————————————————————————————————————————————————————————————————————————————————————
They              0   Th      hey     Xxxx    3       0       word     they / PRON
both              1   bo      oth     xxxx    1       0       word     both / DET
give              1   gi      ive     xxxx    1       0       word     give / VERB - correct POS tagging
you               1   yo      you     xxx     1       0       word     you / PRON
a                 1   a       a       x       1       0       word     a / DET
nice              1   ni      ice     xxxx    1       0       word     nice / ADJ
dashboard         1   da      ard     xxxx    1       0       word     dashboard / NOUN

versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",

Portuguese support

Hi! This is looking good!

Can we expect out of the box support for other languages like, in my specific case, brazilian portuguese?

And how can we add in house support for a language of choice?

Thanks!

Add context aware word cloud image

Question: Spanish models

Hi,
so reading the documentation it seems like there are only two English models as of right now. Browsing through the repo issues it seems like there was Portuguese support in the works.

Is Spanish anywhere in the road map?
Is there any way I could help contribute to that? (not an expert on the topic, though)

Thanks!

[Question] Can winkjs expand contractions and abbreviations ?

I see from the documentation that we can detect whether a token is a contraction or an abbreviation, but, Is there a way to expand either of them?