Code Monkey home page Code Monkey logo

wink-nlp's People

Contributors

dependabot[bot] avatar itpiligrim avatar pimpale avatar prtksxna avatar rachnachakraborty avatar rawsh avatar sanjayaksaxena avatar searleser97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wink-nlp's Issues

what is the default behaviour of `learnCustomEntities`?

I had a lot of trouble using learnCustomEntities until I discovered the second config parameter with its usePos: true setting.

image
https://winkjs.org/wink-nlp/learn-custom-entities.html

  1. matchValue - the docs show that both true and false are the default. Something is not right there.

  2. usePos - The docs say usePos defaults to true, but does it actually default to false? I was not able to match NOUN until I explicitly used the config usePos: true. I think the default is false. In either case, can we also document the config options on the learn custom entities page here? I was very confused following these examples: https://winkjs.org/wink-nlp/custom-entities.html (I actually stumbled across the config options in the source code rather than reading about it in the docs)

Custom entities are a great feature, be great to make it easier to understand how to use them. Thanks :)

Custom entity Matching sequence not working

  var text = `Go grocery shopping on Sept 21st`;
   
  const patterns = [
    { name: 'TEST', patterns: ['[CARDINAL|ORDINAL] Sept [CARDINAL|ORDINAL]'] },
  ];
  nlp.learnCustomEntities( patterns );

  var doc = nlp.readDoc(text);

When creating a pattern that contains CARDINAL or ORDINAL, it prioritizes the CARDINAL/ORDINAL over the custom entity. In this example the output is {value: '21st', type: 'ORDINAL'} instead of the intended {value: 'Sept 21st', type: 'TEST'}

I get this anytime I make any pattern with Cardinal or Ordinal in it.

NOTE: in this particular issue I am not sure why Sept isn't getting picked up as a Date in the first place, only Sep short form works even though in many places Sept is also used. Which is the main reason I am trying to work around it and create a custom pattern

readDoc() breaks with long string of numbers

While parsing some text found in the wild, I noticed winkjs will break when parsing a long string of comma-separated numbers.
No error is thrown, winkNlp simply never returns from the .readDoc call

Here is code that breaks:

const text = '47,47,32,67,111,112,121,114,105,103,104,116,32,74,111,121,101,110,116,44,32,73,110,99,46,32,97,110,100,32, '
const nlp = winkNlp(model, []);
const doc = nlp.readDoc(text.toLowerCase());

The same code with a long string has no issue:

const text = 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
const nlp = winkNlp(model, []);
const doc = nlp.readDoc(text.toLowerCase());

Typescript Types are not accurate

it seems docs says we should use the embeddings like this:

import wink from "wink-nlp";
import model from "wink-eng-lite-web-model";
import vectors from "wink-embeddings-sg-100d";

const nlp = wink(model, [], vectors);

But in the types wink doesn't take a 3rd argument:

wink(theModel: Model, pipe?: string[] | undefined): WinkMethods

but I can see its correct in code, but due to the clash we can't use this in Typescript

var nlp = function ( theModel, pipe = null, wordEmbeddings = null ) {

Additionally, most JS examples online do not match the types provided.

Ways to highlight most interesting phrases in a text using sentiment analysis

I'm looking at ways to highlight most interesting phrases in a text using sentiment analysis and the first way I thought I could do it is by identifying phrases which have the most scored words.

Let's take this text as an example:

Modern card issuer Marqeta has announced its partnership with FNBO to expand its partner ecosystem and allow customers to launch modern credit card programmes. The new collaboration aims to modernise the credit card offering and meet clients’ demands, offering a more flexible and reliable digital experience. FNBO and Marqeta will allow companies to easily launch credit cards using the latter’s APIs and embed the card experience within their app ecosystem. The Marqeta platform benefits from a self-service dashboard to update credit products according to clients ‘needs, while client companies can instantly extend credit applications, decisions, and onboard accounts, among other features. First National Bank of Omaha is a subsidiary of First National of Nebraska, with offices in Nebraska, Colorado, Illinois, Kansas, and Texas, among others. It is a six-generation privately own BaaS. Marqeta is headquartered in California and is certified to operate and offer its flexible payment solutions in 36 countries globally.

The bolded phrase has the most scored words, therefore it's highlighted as an interesting text to read.
My question is if there other better ways to do this, because sometimes the highlighted text may seem out of context.

And a second question, the text above has
{score: 8, normalizedScore: 0.3121}
Which of the two properties indicates better the text sentiment?

Portuguese support

Hi! This is looking good!

Can we expect out of the box support for other languages like, in my specific case, brazilian portuguese?

And how can we add in house support for a language of choice?

Thanks!

Trouble using the word "day" in customEntities pattern

I'm trying to use learnCustomEntities to match various patterns. It's working great for most of them, but for some reason I'm having trouble with it picking up the word "day". Here's a sample brief sample:

const types = [
{ name: 'today', patterns: [ 'day', 'day is it', 'have the date', 'day it is'] },
{ name: 'where', patterns: [ 'wheres', 'where', 'get to','find', 'place' ] },
];

nlp.learnCustomEntities( types, {matchValue: false, usePOS: true, useEntity: true } );
const doc = nlp.readDoc( text );
const type = doc.customEntities().out( its.detail );

A text string like "how do you get to the taco shop" returns "where", but a text string like "what day is it" returns nothing - it seems like none of the phrases including the word "day" get picked up.

Any thoughts? Am I doing something wrong here?

Creating Special Entities

I'd like to manually create Special Entities in my text, based on matching a string, but there doesn't seem to be a way to use a custom function to do this. So, instead, I'm creating patterns that contain the strings I'm looking for, then training the model with these strings before loading the text.

Weirdly, some of my patterns are causing the following error:

    if ( fsm[ state ][ otherwise ] ) {
                     ^

TypeError: Cannot read properties of undefined (reading ' otherwise')
at Object.recognize (...\node_modules\wink-nlp\src\automaton.js:408:26)

Here's an example of a string I'm using in a pattern that causes this:

'at night - the scn tells the brain to make more melatonin so you get drowsy'

Is it possible to set up a winkModel (using the `wink-eng-lite-web-model`) in a Worker?

I'm trying to set up a web worker to handle some nlp stuff.
The project I'm working on requires me to use the web-model.

I've tried a few different approaches, but I always end up getting this error:
image

Pointing to this section of code:

I've tried importing wink in different ways, but I don't think that's the issue. I've also configured esbuild to bundle it for platform: "browser", but that hasn't changed the error.
I'm not sure if this issue is related to wink, but any help would be appreciated.

This worker code should reproduce the error:

self.onmessage = async function (e) {
	const model = require("wink-eng-lite-web-model");
	const wink = require("wink-nlp");
	const winkModel = wink(model);

	self.postMessage(e.data);
};

How does this package relate to wink-sentiment, wink-post, etc.?

I've been using various wink packages for some time, and they're great!

I just realised that this package existed today, and now I'm wondering if I should have been using it instead of the individual packages. It doesn't seem this is listed on your website. Is it new?

Any advice on how you see the whole family of libraries relating to each other?

Questions about wink-distance

Hello Sanjaya,

I've just stumbled upon your wink.js and am very intrigued. But please note that I am a relative noob in the ML/NLP world; trying to learn.

I have a few questions about the use of wink-distance to detect similarities between raw, unstructured texts:

  • Do you think wink-distance is suitable for such a task?
  • Am I right that one must create a "bag of words" from the raw text as proper input to wink-distance?
  • In wink-distance/test/bow-cosine-spec.js is { a: { the: 2, dog: 1, chased: 1, cat: 1 }, b: { the: 2, dog: 1, chased: 1, cat: 1 } } an example of a "bow?"
  • If so, does wink.js offer a way to create a "bow?"

Thanks very much for your help.

Cordially,

Paul

Typescript support

Are there any plans to write Typescript declarations files for the wink modules? I use Typescript for all JS projects these days, and in order to use wink properly I've had to hack up some declaration files (I know there's the any type, but ew).

I'd be happy to contribute what I've written so far, but I can't promise it's complete or correct. It has been serving me well so far though :)

add RunKit example using the web language model

const winkNLP = require( 'wink-nlp' );
const its = require( 'wink-nlp/src/its.js' );
const as = require( 'wink-nlp/src/as.js' );
// Use web model for RunKit.
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model )

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );
// Print tokens.
console.log( doc.tokens().out() );
// Print each token's type.
console.log( doc.tokens().out( its.type ) );
// Print details of each entity.
console.log( doc.entities().out( its.detail ) );
// Markup entities along with their type for highlighting them in the text.
doc.entities().each( ( e ) => {
  e.markup( '<mark>', `<sub style="font-weight:900"> ${e.out(its.type)}</sub></mark>` );
} );
// Render them as HTML via RunKit
doc.out(its.markedUpText);

Output:
op

Importing and creating custom models

Hi, @sanjayaksaxena!

As I understand it, you are a maintainer of this npm packet. I'd like to thank you, because it's very awesome and supports TS types out-of-the-box, instead of most of many other NLP packets.

I'd like to ask a question about custom importing models and converting other models (like BERT or GPT). For example, I'd like to take a custom BERT model from this repositories. I am very interested of using Russian (Cyrillic) models, which supports this language.

I already downloaded and open wink-eng-lite-model and found that it's a bit different than other models.

Is there any option, like a converting model guideline for that?

Question: Spanish models

Hi,
so reading the documentation it seems like there are only two English models as of right now. Browsing through the repo issues it seems like there was Portuguese support in the works.

Is Spanish anywhere in the road map?
Is there any way I could help contribute to that? (not an expert on the topic, though)

Thanks!

Question: Text Summarization

Hello again, Sanjaya.

Sorry to bother you with this.

While I am fairly certain that we had a brief discussion about Wink's ability to summarize unstructured text, I've lost track of where that discussion might live. I seem to recall that you also provided an example.

Do you have any sense for where that discussion might live and, if not, can you point me to examples of such summarization?

Thanks very much.

Cordially,

Paul

Is it possible for me to train my own models?

Hello there, is it possible to train my own, say, massive model?
What should you do if this is the case?
How and what type of data am I going to feed it?

I'm just getting started in NLP.

[Question type] Is pattern matching fully explained in documentation?

Document: https://winkjs.org/wink-nlp/custom-entities.html

Statements "Each option is separated by a vertical pipe character as in [NOUN|PROPN]" and "An option may be empty as in the case of the first two sets of options — [|DET] and [|ADJ]" suggest that the symbol "|" can be used as OR operator between 2 options, and if at beginning, the option can be absent.

Are there any more special symbols we can use to achieve certain pattern matching, besides "^" for escaping?

Typescript imports don't work correctly

I seem to only be able to import with the following:

import * as winkNlp from "wink-nlp";
import * as winkEngLiteWebModel from "wink-eng-lite-web-model";
const nlp = winkNlp(winkEngLiteWebModel); // throws: TS2349: This expression is not callable.

Side note: It would really help if your documentation didn't jump to the conclusion you've already imported and setup wink-nlp.

Named-Entity Recognition and Proper Nouns?

Hey all,

Firstly this is an amazing library, thank you for doing such great work.

I have two separate but related questions/comments:

  1. How does named-entity recognition work (not custom)? From reading the docs, it looks like it's supposed to work out of the box with the ner pipeline and then printing out .entities(). However this seems to only support the following: date, ordinal, cardinal, money, percent, time, duration, hashtag, emoji, emoticon, email, url and mention. This is well-documented, but I was surprised to find it didn't support tagging e.g. Person or Organization names, where NER has a lot of value. As-is, I can't use the ner pipeline since I'm really only looking to filter out Proper Nouns/Names, which I'd imagine is a common use-case. Is this on the roadmap for future support, or is there something I'm missing here?

I would expect to be able to parse out a sentence "John Doe and Susan Smith work at Coca-Cola" and see three (five?) entitles in that sentence. Instead there are none.

  1. On the subject of proper nouns, the implementation seems to naively tag the first word of most sentences as a proper noun (unless it's overruled by another specific part-of-speech), only because it's capitalized. Some examples I've found from just cursory testing:
Sentence is:  Main menu
Tokens:  (2) ['PROPN', 'NOUN']
Sentence is:  Contents
Tokens:  ['PROPN']
Sentence is:  Current events
Tokens:  (2) ['PROPN', 'NOUN']
Sentence is:  Following Vogel's firing after the season, he said, "I'm not sure what his issue was with me."
Tokens:  (23) ['PROPN', 'PROPN', 'PART', 'NOUN' ...
Sentence is:  Fourth triple-double season and career triple-doubles record
Tokens:  (11) ['PROPN', 'ADJ', 'PUNCT', 'ADJ', 'NOUN', ...

I can simply guard against this in my own code of course, but it does seem like something that can be improved in the model. Looking forward to hearing your take on this feedback!

(For context, I'm using wink-eng-lite-web-model)

non-breaking space does not appear in token stream

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );

const text = 'Hello\u00a0World';
const doc = nlp.readDoc(text);

doc.sentences().each(sentence => {
  sentence.tokens().each(token => {
    console.log([token.out(nlp.its.precedingSpaces), token.out()])
  })
})

does not return \u00a0 as either precedingSpaces or token.out()

Custom entities with regex pattern

I'm ingesting some data from Slack. Slack formats its messages with some custom templates. For example:

Custom emojis are surrounded in colons :slightly_smiling_face: and :green_checkmark:.

At-mentions are re-encoded similar to this: <@U024BE7LH> <#C024BE7LR>.

More examples and docs here: https://api.slack.com/reference/surfaces/formatting#retrieving-messages

How can I have wink tag these known regular elements? It seems like learnCustomEntities is the correct path but it needs to take a regex pattern.

Question

Hi,

I am about to start looking at BM25 Vectorizer. I recalled our discussion, available at

#31

wherein Sanjaya remarks that both BM25 and Wink's Similarity are language agnostic. If this is true, why do its examples, e.g.,

https://winkjs.org/wink-nlp/bm25-vectorizer.html

show an English model?

Thank you.

Custom entities, POS and disambiguating

Consider the following phrase:

"This afternoon I plan to go to the bookstore and buy a book on Go".

In the example, go is used as a verb and as a noun (programming language).

I'd like to detect the programming language Go. I noticed that when I tag the phrase with POS, the first go is tagged as VB and the second as NNP.

Is there a way to say to the learnCustomEntities method that I'm interested to detect the word Go but only when it's an NNP?

const text = 'This afternoon I plan to go to the bookstore and buy a book on Go.';
const patterns = [
  { name: 'go', patterns: [ 'Go&NNP' ] },
];
nlp.learnCustomEntities(patterns);

Add Spanish Model - How to Help

I run a language learning software company. We would benefit greatly from quality in-browser (and nodejs) nlp tools. How can we get started adding support for Spanish, French, German, and other languages?

PoS markup example on website does not work with typescript

On this webpage: https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html

There is this code:

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model )
// Acquire "its" and "as" helpers from nlp.
const its = nlp.its;
const as = nlp.as;

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );

doc.entities().each((e) => e.markup());
document.getElementById("result").innerHTML = doc.out(its.markedUpText);

I've adapted it to typescript to get the following:

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP( model )
// Acquire "its" and "as" helpers from nlp.
const its = nlp.its;
const as = nlp.as;

const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );

doc.entities().each((e) => e.markup());
(document.getElementById("result") as HTMLElement).innerHTML = doc.out(its.markedUpText);

And I am seeing this error:

(method) ItemEntity.markup(beginMarker: string, endMarker: string): void
Expected 2 arguments, but got 0.ts(2554)
index.d.ts(186, 12): An argument for 'beginMarker' was not provided.

Is it possible the typescript definition of ItemEntity.markup isn't quite correct? When I run the code in nodejs without typescript, everything runs just fine.

Is it possible to unmark marked text?

Hey :)
I'm not sure if this is the right place, but I'm gonna post here because the markup method appears on items, and isn't a helper method.

I see it is possible to wrap an item using markup. However, once the item is markedup, I don't see a way to unmark it.
I tried to item.markup("", ""), but I assume that doesn't work because the mark isn't part of the item.

Is it possible to overwrite the doc string and then remark where I want to? doc.text = doc.out().replaceAll('<mark>', '').

Is there another method I'm missing?

Thanks for reading.

Invalid value of sentiment

In sentence "The abandoned amusement park was now a haunting place, with decaying rides and an eerie atmosphere." I get "-1.3877787807814457e-17" sentiment. It is less than -1.

Testing error

The following test is failing:

const winkNLP = require('wink-nlp')
    , model = require('wink-eng-lite-model')
    , nlp = winkNLP(model)
;
let en_text = 'Dear Sonia, good morning! For elaboration I ask the following information: As for the admission I clarify: There is no device'
    , doc = nlp.readDoc(en_text)
;
console.log(doc.sentences().out());
console.log(doc.entities().out());
console.log(doc.tokens().out());

With the error:

/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-eng-lite-model/src/pos-updater.js:1
var updater=function(a,b,c){var d=b.cache;for(let e=0;e<a.length;e+=1){const f=a[e][2],g=a[e][0];0>f?c[g]=Math.abs(f):d.isMemberPOS(b.tokens[4*g],f)&&(c[g]=f)}};module.exports=updater;
                                                                                                                        ^

TypeError: Cannot read property 'isMemberPOS' of undefined
    at updater (/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-eng-lite-model/src/pos-updater.js:1:121)
    at Object.readDoc (/Volumes/Work/Developer/Web/rdr_playground/node_modules/wink-nlp/src/wink-nlp.js:272:7)
    at Object.<anonymous> (/Volumes/Work/Developer/Web/rdr_playground/test.js:17:17)
    at Module._compile (internal/modules/cjs/loader.js:1128:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1167:10)
    at Module.load (internal/modules/cjs/loader.js:983:32)
    at Function.Module._load (internal/modules/cjs/loader.js:891:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47

typescript definitions not found

Hello,

I'm getting the following linting errors in nextjs:

'BM25VectorizerConfig' is declared but its value is never read.ts(6133)
Could not find a declaration file for module 'wink-nlp/utilities/bm25-vectorizer'. '/home/robert/Documents/dank-edge/node_modules/.pnpm/[email protected]/node_modules/wink-nlp/utilities/bm25-vectorizer.js' implicitly has an 'any' type.
  Try `npm i --save-dev @types/wink-nlp` if it exists or add a new declaration (.d.ts) file containing `declare module 'wink-nlp/utilities/bm25-vectorizer';`

I see that the types for wink-nlp/utilities/bm25-vectorizer exist but for some reason they are not getting picked up. I have

    "esModuleInterop": true,
    "allowSyntheticDefaultImports": true,

In my typescript config.

Edit: it looks like a workaround is importing winknlp first, even though I'm not using it.

import winkNLP from "wink-nlp";
import BM25VectorizerConfig from "wink-nlp/utilities/bm25-vectorizer";

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.