loretoparisi / fasttext.js Goto Github PK

FastText for Node.js

License: MIT License

JavaScript 98.16% Shell 0.84% Dockerfile 0.10% Python 0.40% Makefile 0.49%

deeplearning word2vec machinelearning text-classification javascript nodejs neural-networks language-detection word-embeddings fasttext

fasttext.js's Introduction

What is FastText and FastText.js

FastText is a library for efficient learning of word representations and sentence classification. FastText is provided by Facebook Inc. FastText.js is a JavaScript library that wraps FastText to run smoothly within node.

What's New

Added support for M1 Apple processor.

FastText.js APIs
How to Install
- Install via NPM
- Install via Docker
- WASM 🆕
How to Use
- Train
- Train Supervised
- Train Unsupervised
- Track Progress
- Test
- Predict
- Nearest Neighbor – Tools
- Confusion Matrix 🆕
Examples
- Train
- Test
- Test Labels 🆕
- Predict
- Multi-label 🆕
- Run a Prediction Server
- Run a Prediction CLI 🆕
- Language Identification Server
- Train Language Identification Model 🆕
Training set and Test set format
Datasets 🆕
Models 🆕
Other Versions – Supported Platforms
- External Binary – How It Works – Disclaimer – Acknowledgments

FastText.js APIs

This version of FastText.js comes with the following JavaScript APIs

FastText.new(options)
FastText.load()
FastText.loadnn() [NEW API]
FastText.word2vec() [NEW API]
FastText.train()
FastText.test()
FastText.predict(string)
FastText.nn() [NEW API]

How to Install

git clone https://github.com/loretoparisi/fasttext.js.git
cd fasttext.js
npm install

Install via NPM

FastText.js is available as a npm module here. To add the package to your project

npm install --save fasttext.js

Install via Docker

Build the docker image

docker build -t fasttext.js .

This will update the latest FastText linux binaries from source to lib/bin/linux.

Now running the image on the docker host binding the port 3000 it is possibile to run the server example:

docker build -t fasttext.js .
docker run --rm -it -p 3000:3000 fasttext.js node fasttext.js/examples/server.js

To serve a different custom model a volume can be used and passing the MODEL environment variable

docker run -v /models/:/models --rm -it -p 3000:3000 -e MODEL=/models/my_model.bin fasttext.js node fasttext.js/examples/server.js

WASM

We provide a WASM compiled binary of FastText to be used natively NodeJS. Please check /wasm folder for library and examples. 🆕

How to Use

Train

You can specify all the train parameters supported by fastText as a json object

train: {
    // number of concurrent threads
    thread: 8,
    // verbosity level [2]
    verbose: 4,
    // number of negatives sampled [5]
    neg: 5,
    // loss function {ns, hs, softmax} [ns]
    loss: process.env.TRAIN_LOSS || 'ns',
    // learning rate [0.05]
    lr: process.env.TRAIN_LR || 0.05,
    // change the rate of updates for the learning rate [100]
    lrUpdateRate: 100,
    // max length of word ngram [1]
    wordNgrams: process.env.TRAIN_NGRAM || 1,
    // minimal number of word occurences
    minCount: 1,
    // minimal number of word occurences
    minCountLabel: 1,
    // size of word vectors [100]
    dim: process.env.TRAIN_DIM || 100,
    // size of the context window [5]
    ws: process.env.TRAIN_WS || 5,
    //  number of epochs [5]
    epoch: process.env.TRAIN_EPOCH || 5,
    // number of buckets [2000000]
    bucket: 2000000,
    // min length of char ngram [3]
    minn: process.env.TRAIN_MINN || 3,
    // max length of char ngram [6]
    maxn: process.env.TRAIN_MAXN || 6,
    // sampling threshold [0.0001]
    t: 0.0001,
    // load pre trained word vectors from unsupervised model
    pretrainedVectors: process.env.WORD2VEC || ''
    }

Train Supervised

To train the model you must specificy the training set as trainFile and the file where the model must be serialized as serializeTo. All the FastText supervised options are supported. See here for more details about training options. Note that serializeTo does not need to have the file extension in. A bin extension for the quantized model will be automatically added. You can use the pretrainedVectors option to load an unsupervised pre-trained model. Please use the word2vec api to train this model.

var fastText = new FastText({
    serializeTo: './band_model',
    trainFile: './band_train.txt'
});
fastText.train()
.then(done=> {
    console.log("train done.");
})
.catch(error => {
    console.error(error);
})

Train Unsupervised

To train an unsupervised model use the word2vec api. You can specify the words representation to train using word2vec.model parameter set to skipgram or cbow and use the train parameters as usual:

fastText.word2vec()
    .then(done => {
    })
    .catch(error => {
        console.error("Train error", error);
    })

Track Progress

It is possible to track the training progress using the callback parameters trainProgress in the options:

{
    trainCallback: function(res) {
        console.log( "\t"+JSON.stringify(res) );
    }
}

This will print out the training progress in the following format:

{
    "progress":17.2,
    "words":174796,
    "lr":0.041382,
    "loss":1.232538,
    "eta":"0h0m",
    "eta_msec":0
}

that is the parsed JSON representation of fasttext output

Progress:  17.2% words/sec/thread:  174796 lr:  0.041382 loss:  1.232538 ETA:   0h 0m

where eta_msec represents the ETA (estimated time of arrival) in milliseconds.

So a typical progress will fire the trainCallback several times like

{"progress":0.6,"loss":4.103271,"lr":0.0498,"words":21895,"eta":"0h9m","eta_msec":540000}
{"progress":0.6,"loss":3.927083,"lr":0.049695,"words":21895,"eta":"0h6m","eta_msec":360000}
{"progress":0.6,"loss":3.927083,"lr":0.049695,"words":21895,"eta":"0h6m","eta_msec":360000}
{"progress":0.6,"loss":3.676603,"lr":0.049611,"words":26813,"eta":"0h5m","eta_msec":300000}
{"progress":0.6,"loss":3.676603,"lr":0.049611,"words":26813,"eta":"0h5m","eta_msec":300000}
{"progress":0.6,"loss":3.345654,"lr":0.04949,"words":33691,"eta":"0h4m","eta_msec":240000}
{"progress":1.2,"loss":3.345654,"lr":0.04949,"words":39604,"eta":"0h4m","eta_msec":240000}

Until the progress will reach 100%:

{"progress":99,"loss":0.532964,"lr":0.000072,"words":159556,"eta":"0h0m","eta_msec":0}
{"progress":99,"loss":0.532964,"lr":0.000072,"words":159556,"eta":"0h0m","eta_msec":0}
{"progress":99,"loss":0.532392,"lr":-0.000002,"words":159482,"eta":"0h0m","eta_msec":0}
{"progress":100,"loss":0.532392,"lr":0,"words":159406,"eta":"0h0m","eta_msec":0}
{"progress":100,"loss":0.532392,"lr":0,"words":159406,"eta":"0h0m","eta_msec":0}

NOTE. Please note that some approximation errors may occur in the output values.

Test

To test your model you must specificy the test set file as testFile and the model file to be loaded as loadModel. Optionally you can specificy the precision and recall at k (P@k and R@k) passing the object test: { precisionRecall: k }.

var fastText = new FastText({
    loadModel: './band_model.bin',
    testFile:  './band_test.txt'
});
fastText.test()
.then(evaluation=> {
    console.log("test done.",evaluation);
})
.catch(error => {
    console.error(error);
})

The evaluation will contain the precision P, recall R and number of samples N as a json object { P: '0.278', R: '0.278' }. If a train is called just before test the evaluation will contain the number of words W and the number of labels L as well as a json object: { W: 524, L: 2, N: '18', P: '0.278', R: '0.278' }:

fastText.train()
.then(done=> {
    return fastText.test();
})
.then(evaluation=> {
    console.log("train & test done.",evaluation);
})
.catch(error => {
    console.error("train error",error);
})

Test-Labels

🆕 The api testLabels evaluate labels F1-Score, Recall and Precision for each label in the model.

var fastText = new FastText({
    loadModel: './band_model.bin',
    testFile:  './band_test.txt'
});
fastText.testLabels()
.then(evaluation=> {
    console.log("test-labels:",evaluation);
})
.catch(error => {
    console.error(error);
})

This will print out the values F1, P and R for each label.

[
  {
    "ham": {
      "F1": "0.172414",
      "P": "0.094340",
      "R": "1.000000"
    }
  },
  {
    "spam": {
      "F1": "0.950495",
      "P": "0.905660",
      "R": "1.000000"
    }
  }
]

Predict

To inference your model with new data and predict the label you must specify the model file to be loaded as loadModel. You can then call the load method once, and predict(string) to classify a string. Optionally you can specify the k most likely labels to print for each line as predict: { precisionRecall: k }

var sample="Our Twitter run by the band and crew to give you an inside look into our lives on the road. Get #FutureHearts now: http://smarturl.it/futurehearts";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="LBi Software provides precisely engineered, customer-focused #HRTECH solutions. Our flagship solution, LBi HR HelpDesk, is a SaaS #HR Case Management product.";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    fastText.unload();
})
.catch(error => {
    console.error(error);
});

Multi-Label

A multi-label example will have a list of columns of labels marked by __label__ prefix

__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?

FastText.js will automatically handle multiple labels in the dataset for training and testing, please run the multilabel.js example to test it out:

cd examples/
node multilabel.js

The train() and test() will print out:

N	3000
P@1	0.000333
R@1	0.000144
Number of examples: 3000
exec:fasttext end.
exec:fasttext exit.
{ N: '3000', P: '0.000333', R: '0.000144' }

while the testLabels api will print out the labels array:

[ '__label__equipment',
  '__label__cast',
  '__label__oven',
  '__label__sauce',
  '__label__indian',
  '__label__breakfast',
  '__label__chili',
  '__label__spicy',
  '__label__bread',
  '__label__eggs',
  '__label__baking',
   ...

and by labels scores:

F1-Score : 0.175182  Precision : 0.096000  Recall : 1.000000   __label__baking
F1-Score : 0.150432  Precision : 0.081333  Recall : 1.000000   __label__food-safety
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__substitutions
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__equipment
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__bread
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__chicken
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__storage-method
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__eggs
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__meat
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__sauce
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__cake
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__flavor
F1-Score : --------  Precision : --------  Recall : 0.000000   __label__freezing
...

Nearest Neighbor

To get the nearest neighbor words for a given term use the nn api:

fastText.loadnn()
.then(labels=> {
    return fastText.nn(text)
})
.then(labels=> {
    console.log("Nearest Neighbor\n", JSON.stringify(labels, null, 2));
})
.catch(error => {
    console.error("predict error",error);
});

Tools

We provide some support tools and script.

Confusion Matrix

To evaluate the Confusion Matrix of a model, please use the confusion.sh script provided in the tools/ folder. The script requires sklearn and matplotlib installed on the system:

$ cd tools/
$ ./confusion.sh 
Usage: ./confusion.sh DATASET_FILE MODEL_FILE [LABEL_COLUMS, def:1] [LABEL_NORMALIZED, default|normalize, def:default] [PLOT, 0|1, def:0]

You must specify the dataset file path that has been used to train the model and the model file path. If the label golumn is different than the first column, plese specify the LABEL_COLUMN column index. If the dataset must be normalized, having a different label prefix or no one, please use the value normalize:

cd examples/
node train
cd ..
cd tools/
./confusion.sh ../examples/dataset/sms.tsv ../examples/models/sms_model.bin 1 normalize 1

If the dataset has the fasttext label prefix i.e. __label__, please set the parameter NORMALIZE_LABEL to default:

./confusion.sh ../examples/dataset/sms.tsv ../examples/models/sms_model.bin 1 default

The script will calculate the predictions against the dataset and build confusion matrix using sklearn

./confusion.sh ../examples/dataset/sms.tsv ../examples/models/sms_model.bin 1
Platform is Darwin
Normalizing dataset ../examples/dataset/sms.tsv...
Splitting 1 label colums...
Calculating predictions...
Calculating confusion matrix...
Test labels:1324 (sample)
['spam']
labels:{'ham', 'spam'}
Predicted labels:1324 (sample)
['ham']
Accuracy: 0.756797583082
[[1002    0]
 [ 322    0]]
Confusion matrix
[[ 1.  0.]
 [ 1.  0.]]

and and visualize it using matplotlib:

Examples

A folder examples contains several usage examples of FastText.js.

Train

$ cd examples/
$ node train.js 
train [ 'supervised',
  '-input',
  '/var/folders/_b/szqwdfn979n4fdg7f2j875_r0000gn/T/trainfile.csv',
  '-output',
  './data/band_model',
  '-dim',
  10,
  '-lr',
  0.1,
  '-wordNgrams',
  2,
  '-minCount',
  1,
  '-bucket',
  10000000,
  '-epoch',
  5,
  '-thread',
  4 ]
Read 0M words
Number of words:  517
Number of labels: 2
Progress: 100.0%  words/sec/thread: 1853435  lr: 0.000000  loss: 0.681683  eta: 0h0m -14m 
exec:fasttext end.
exec:fasttext exit.
train done.
task:fasttext pid:41311 terminated due to receipt of signal:null

Test

$ cd examples/
$ node test.js 
test [ 'test',
  './data/band_model.bin',
  '/var/folders/_b/szqwdfn979n4fdg7f2j875_r0000gn/T/trainfile.csv',
  1 ]
Number of examples: 18
exec:fasttext end.
exec:fasttext exit.
test done.
task:fasttext pid:41321 terminated due to receipt of signal:null

Predict

$ cd examples/
$ node predict.js 
TEXT: our twitter run by the band and crew to give you an inside look into our lives on the road .  get #futurehearts now  http //smarturl . it/futurehearts PREDICT: BAND
TEXT: lbi software provides precisely engineered ,  customer-focused #hrtech solutions .  our flagship solution ,  lbi hr helpdesk ,  is a saas #hr case management product .  PREDICT: ORGANIZATION

Run a Prediction Server

We run a model and serve predictions via a simple node http api. We first download the example hosted models:

cd examples/models
./models.sh

We now run the server.js example that will create a http server. We can export the MODEL env that will point to the local pretrained model, and a optional PORT parameter were the server is going to listen (default PORT is 3000):

$ cd examples/
$ export MODEL=models/lid.176.ftz
$ node server.js 
model loaded
server is listening on 3000

Try this simple server api passing a text parameter like:

http://localhost:3000/?text=LBi%20Software%20provides%20precisely%20engineered,%20customer-focused%20#HRTECH

http://localhost:3000/?text=Our%20Twitter%20run%20by%20the%20band%20and%20crew%20to%20give%20you%20an%20inside%20look%20into%20our%20lives%20on%20the%20road.%20Get%20#FutureHearts

The server api will response in json format

{
	"response_time": 0.001,
	"predict": [{
			"label": "BAND",
			"score": "0.5"
		},
		{
			"label": "ORGANIZATION",
			"score": "0.498047"
		}
	]
}

Run a Prediction CLI

To run a prediction Command Line Interface, please specify the env MODEL of the model file to run and use the example script cli:

cd examples/
export MODEL=models/langid_model.bin
node cli
Loading model...
model loaded.
Welcome to FastText.js CLI
Type exit or CTRL-Z to exit
> hello how are you?
> [ { label: 'EN', score: '0.988627' },
{ label: 'BN', score: '0.000935369' } ]
> das is seher gut!
> [ { label: 'DE', score: '0.513201' },
{ label: 'EN', score: '0.411016' } ]
rien ne va plus
> [ { label: 'FR', score: '0.951547' },
{ label: 'EO', score: '0.00760891' } ]
exit
> model unloaded.

Language Identificaton Server

In this example we use the fastText compressed languages model (176 languages) we host.

cd examples/
export MODEL=./data/lid.176.ftz 
export PORT=9001
node server

and then

http://localhost:9001/?text=%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EB%9E%84%EB%9E%84%EB%9D%BC\n%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EC%9E%A5%EC%9C%A4%EC%A0%95%20%ED%8A%B8%EC%9C%84%EC%8A%A4%ED%8A%B8%20%EC%B6%A4%EC%9D%84%20%EC%B6%A5%EC%8B%9C%EB%8B%A4

that will be correctly detected as KO:

{
	"response_time": 0.001,
	"predict": [{
			"label": "KO",
			"score": "1"
		},
		{
			"label": "TR",
			"score": "1.95313E-08"
		}
	]
}

Train Language Identification Model

We train a langauge identification model from scratch. Please see in the examples/models folder.

Training set and Test set format

The trainFile and testFile are a TSV or CSV file where the fist column is the label, the second column is the text sample. FastText.js will try to normalize the dataset to the FastText format using FastText.prepareDataset method. You do not have to call this method explicitly by the way, FastText.js will do for you. For more info see here.

Datasets

We host some example datasets in order to train, test and predict FastText models on the fly. For more info how to download and work with datasets, please see in the examples/datasets folder.

Models

We host some example pretrained models. For more info how to download and work with pretrained models, please see in the examples/models folder.

Other Versions

Supported Platforms

In this release FastText.js comes with precompiled binaries for linux, macOS and Windows in the lib/bin/ folder. The Windows version is a 64-bit compiled version. It requires the Visual C++ Redistributable for Visual Studio 2015 components. See here for more info about the Windows version.

External Binary

To use an external binary version use the bin option to specify the executable absolute path:

var fastText = new FastText({
    bin: '/usr/local/bin/fasttext'
    loadModel: DATA_ROOT + '/sms_model.bin' // must specifiy filename and ext
});

A executable not found in path error will be thrown if the executable has not been found in the specified path.

How It Works

Precompiled binaries runs FastText natively. A node child_process spawn will fork a new FastText native process tu run at OS speed, manage the state, the errors and the output of the process to the JavaScript API.

Disclaimer

For more info about FastText and FastText license see here.

Acknowledgments

I thank you the following devs that helped me to improve FastText.js

shoegazerstella

fasttext.js's People

Contributors

Stargazers

Watchers

fasttext.js's Issues

Fail to load WASM error

Heya, I am currently facing this error when calling FastText.loadWASM():

TypeError: Failed to parse URL from /home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.wasm
/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:230
      throw ex;
      ^

RuntimeError: abort(TypeError: Failed to parse URL from /home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.wasm) at Error
    at jsStackTrace (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1937:19)
    at stackTrace (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1954:16)
    at process.abort (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1653:44)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
    at process.abort (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1659:11)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)

Node.js v18.15.0

Below is my code snippet, following the example given in this issue:

async function main() {
    const modelPath = path.resolve(__dirname, "../model");
    console.log(modelPath + `/300-dim-10-epoch.bin`);
    let FastText = require("fasttext.js");
    const ft =  new FastText({
        loadModel: modelPath + `/300-dim-10-epoch.bin`
    })
    try {
        await ft.loadWASM();
        const vec = ft.getWordVector("hello");
        console.log(vec);
    }catch(err){
        console.log(err);
    }
}

main();

There is no issues with my model directory and I tried running the snippet via the npm install package and git clone installation method.

ERROR: Cannot run server

Steps to reproduce

Clone project
Run train
$ node examples/train.js
Run server
$ node examples/server.js

Expected behaviour

Server starting with below message

model loaded
server is listening on 3000

Actual behaviour

Fail to start server with below message

events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at exports._errnoException (util.js:1024:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:851:14)

I wll fix and make PR

ERROR: Cannot install via Docker

Steps to reproduce

Clone project
Build the docker image
$ docker build -t fasttext.js .
Run server by docker
$ docker run --rm -it -p 3000:3000 fasttext.js node fasttext.js/examples/server.js

Expected behaviour

Server starting on docker with below message

model loaded
server is listening on 3000

Actual behaviour

Fail to start server on docker with below message

events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at _errnoException (util.js:1022:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:880:14)

npm module: loadSentence is not a function

Given a script, after installing fasttext.js with npm:

import FastText from 'fasttext.js';

const ft = new FastText({
        loadModel: '/wiki.simple.vec'
});
console.log(ft);
ft.loadSentence();

My output is

FastText {
  samplesCallbacks: {},
  dataCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  dataErrorCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  onExitCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  dataAppendCallback: null,
  onErrorDataAppendCallback: null,
  _options: {
    bin: '.../node_modules/fasttext.js/lib/bin/darwin/fasttext',
    child: { detached: false },
    debug: false,
    preprocess: true,
    trainFile: '',
    testFile: '',
    serializeTo: '',
    loadModel: '/wiki.simple.vec',
    train: {
      wordNgrams: 2,
      minCount: 1,
      minCountLabel: 1,
      minn: 3,
      maxn: 6,
      t: 0.0001,
      bucket: 10000000,
      dim: 10,
      lr: 0.1,
      ws: 5,
      loss: 'ns',
      lrUpdateRate: 100,
      epoch: 5,
      thread: 5
    },
    trainIncremental: false,
    test: { precisionRecall: 1, verbosity: 2 },
    predict: { mostlikely: 2, verbosity: 2, normalize: true }
  },
  exec: [Function (anonymous)],
  send: [Function (anonymous)],
  sendEOF: [Function (anonymous)],
  learn: [Function (anonymous)]
}
file:///.../test.js:7
ft.loadSentence();
   ^

TypeError: ft.loadSentence is not a function
    at file:///.../test.js:7:4
    at ModuleJob.run (node:internal/modules/esm/module_job:197:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:337:24)
    at async loadESM (node:internal/process/esm_loader:88:5)
    at async handleMainPromise (node:internal/modules/run_main:61:12)

I do see that loadSentence is added to the prototype in the lib/index.js file in github but it's not in the node_modules version

Executable not found in path on Windows

Hi !

Node version: 12.13.1
OS: Windows 10

I'm trying to create a simple fastText model for text classification, but the lib is not working fine on Windows. As soon as I add this code:

const fastText = new FastText({
      serializeTo: './models/fastText',
      trainFile: './trainingData.txt'
});

I have the error:

Error: executable not found in path
    at new FastText (C:\Users\madelpech\[...]\node_modules\fasttext.js\lib\index.js:104:23)

I have looked to the code and in the index.js file line 99 you are calling this._options.bin = Util.GetBinFolder('fasttext');. I have checked the binaries and it appears that the Windows binary is not named fasttext but fastText.exe instead. This should be the issue here.

Can you have a look and tell me if I am right?

Thanks!

listen only to localhost

I think this has to be add , so not DDos Attack to the port !

bug in FastText.prototype.nn()

fasttext.js/lib/index.js

FastText.prototype.nn = function (data) {
: 
  self.dataAppendCallback = onDataCallback;
}

↓

FastText.prototype.nn = function (data) {
: 
  self.dataAppendCallback = onDataAppendCallback;
}

INFO: Could not find files for the given pattern(s).

I'm trying to run the module on Windows. When I try to initialize it like so:

const FastText = require('fasttext.js');

const fastText = new FastText({
    serializeTo: './model',
    trainFile: './train.txt'
});

I receive the error: INFO: Could not find files for the given pattern(s)..
Yes, I definitely have a train.txt file in the same folder as index.js (where I execute this code). I even tried using path.join(__dirname, '/train.txt'), however I got the same result.

Using in the browser with WASM

Are the uploaded binaries for WASM usable in the browser? I tried following https://fasttext.cc/docs/en/webassembly-module.html#build-a-webpage-that-uses-fasttext to load fasttext using the provided binaries but I got import errors:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">
</head>
<body>
    <script type="module">
        import {FastText, addOnPostRun} from "./fasttext.js";

        addOnPostRun(() => {
            let ft = new FastText();
            console.log(ft);
        });

    </script>
</body>
</html>

Uncaught SyntaxError: import not found: addOnPostRun in the browser.

Nearest Neighbor sometimes returns only 1 result

Using nearest neighbor, every once in a while locally (and always when hosted on beanstalk with docker), the result set will only contain a single result. Running the same request again may return the full result set.

I identified the issue as this line self.dataAppendCallback = onDataCallback; where it should be onAppendDataCallback instead.

I cannot do a pull request at the moment, but may be able to in the future.

Load into Memory

Dear @loretoparisi
I installed your fasttext.js in order to solve memory problem that we discus about in facebookresearch/fastText#276 (comment)

Now when i run :
node fasttext_predict.js
it take like 5 sec to load the module,

"use strict";

(function() {

var DATA_ROOT='./data';

var FastText = require('./fasttext.js/lib/index');
var fastText = new FastText({
    loadModel: DATA_ROOT + '/model_gender.bin' // must specifiy filename and ext
});

var sample="Bashar Al Masri";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="Hisahm al mjude";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
   fastText.unload();
})
.catch(error => {
    console.error("predict error",error);
});

}).call(this);

and It return to stdout the prediction and exit , due to fastText.unload();
Now i need to call this file "node fasttext_predict.js UserName" from any place passing some args [UserName] to it and return to the stdout the result directly , since you saide it will be loaded into memory , in order to be able to get this result from the php webserver.

It is the same problem with the C++ file loading , i need it to be run in the background !

Different sentence vector compared to Python

When loading bin model trained with Python, the sentences embedding are different compared to Python.

Node.js:

Python

better error message when .bin is not found on load()

Right now it's the following:

events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at exports._errnoException (util.js:1024:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:851:14)

Node installing problem

hello :)

once i want to run the server file:
node server.js i got this error:


events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at _errnoException (util.js:1022:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:880:14)

Please Advice
node -v
v8.11.1
+++++++++++++++++++
npm -v
5.6.0

fasttext.js
Current

DeprecationWarning: Buffer() is deprecated due to security and usability issues when i move my script to another server

Replace new Buffer(data,encoding) with Buffer.from(data,encoding)

Trying to get in touch regarding a security issue

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

Move repo to a GitHub org

Have you considered moving to an org repo?

I have github.com/fasttext. We can do steps like this:

I can make you admin.
I remove myself as admin.
You can transfer this repo there.
You can rename this repo js.

Then it will be at github.com/fasttext/js.

Other related tools can go in other repos, for example fastText in an AWS Lambda.

Is there any plan for StarSpace :)

Dear loretoparisi,

I loved your FastText project , and i believe you aim on StarSpace and load it into memory :) Is there any plan to simplify StarSpace , since i feel it is not clear as fastText. There is no clear docs and examples

So , i have a big faith in your style , and i hope to see a step form you in this regard

With Respect

npm?

Firstly bravo and thank you for this

I think for this lib to gain momentum, which will ultimately benefit all users, it will be good to put it on npm.

Is it in the plans? Is there any reason not to?

calculate distance feature?

maybe add an api to calculate vector distance

fasttext print-word-vectors trainresult.bin < queries.txt

住宅 -0.3543 -0.36086 -0.1972 -0.48346 -0.4279 0.084653 -0.74038 -0.77876 -0.69068 -0.42149 0.41304 0.9636 -0.11907 -0.081701 0.27681 -0.15278 -0.17322 -0.27368 -0.69611 0.42335 0.11701 -0.43995 0.1868 0.38824 0.42387 0.46397 0.38974 -0.59129 0.69363 0.26292 -0.36955 -0.27438 1.0732 0.0046569 -0.39709 0.44935 0.67039 -0.39564 -0.080179 0.0036072 -0.48187 -0.66577 0.27598 -0.54607 1.0294 -0.29769 0.52144 -0.044384 0.15926 -1.0104 0.80332 -0.60356 0.40641 -0.039965 0.41868 -0.0072699 0.069652 -0.12544 -0.30716 0.21804 -0.36222 -0.51133 -0.24029 -0.7333 0.26404 -0.30949 -0.17224 -0.52331 -1.1139 -0.26803 0.4566 0.28051 -0.50781 0.26043 0.11501 0.17622 -0.1344 -0.46 0.00035005 0.13337 0.50925 -0.82658 0.32135 -0.33323 0.75423 -0.60863 0.42117 0.35665 -0.17826 -0.82987 0.53353 -0.12717 -0.46963 0.15568 0.4642 -0.16868 -0.18377 0.65137 -0.0067536 1.4116

别墅 0.00094935 0.0073073 -0.00094808 -0.0010876 0.0012463 0.0014312 -0.0026107 0.0041731 0.0024454 -0.00093893 0.0045996 0.00050681 -0.00040101 0.0015428 0.0065499 -0.0007207 -0.0022505 -0.0046939 0.0039677 0.0047148 -0.0031379 0.0042863 -0.0056759 -0.0031934 0.0037867 0.006272 0.0050499 -0.0022674 0.0062237 0.00062629 0.0033722 -0.0027245 0.0016423 -0.0037467 -0.00014838 -0.0048198 0.0043823 0.002268 -0.00093589 -0.0034395 -0.0021894 0.0013966 -0.0010953 -0.00073448 0.0012601 0.00037782 -0.0012559 -0.00079777 0.0022461 -0.00085852 -0.001242 0.0039883 0.0017836 -0.00036524 -0.0013768 -0.0036831 0.0023176 0.0027225 0.0010305 0.0020299 0.00057907 -3.4135e-05 0.0029027 -0.00064469 7.3418e-05 -0.0051284 -0.0001829 -0.004983 -0.0024 -0.002313 -2.4026e-05 0.0068082 -0.0062092 0.0045259 -0.0023891 0.0015408 0.00077602 -0.0024638 0.0056508 0.0036942 -0.00089141 -0.0031128 -0.0040772 -0.00063497 -0.006542 -0.0016326 0.002223 -0.0040703 -3.8115e-05 -0.0020506 -0.003437 0.0037226 -0.0062743 0.00098213 0.00030893 0.0013302 -0.002533 0.0038249 -0.0050515 0.0025223

Question : usage of pretrainedVectors?

when i'm trying to train a text classifier using pretrainedVectors: ./wiki.da.vec' resulting in model having size 2.79 GB and taking more time than usual.

var fastText = new FastText({
        serializeTo: './output_model',
        trainFile: './input.txt',
        debug: true,
        train: {
            dim: 300,
            pretrainedVectors: './wiki.da.vec'
        }
    });

Did the same with fasttext python library
model = fasttext.train_supervised('input.txt', dim=300, pretrainedVectors='wiki.da.vec',verbose=4)

And getting model with size 300+ MB and giving proper classification.

Did i miss anything while using fasttext.js ?

async/await version

This would be easy to do with promisify in newer versions of Node.

const {promisify} = require('util');
exports.exists = promisify(train);
...

Personally I far prefer async/await to the callback syntax.

This package cannot be used in node.js worker thread

because you use process.chdir() it throws error when run inside worker:

TypeError [ERR_WORKER_UNSUPPORTED_OPERATION]: process.chdir() is not supported in workers

Is it library outdated?

looks that last commit was in the last year and I am unable to run this library. My node version 21.5.0, Windows 10, and anything I try gives me the error:

TypeError: fetch failed
C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:230
      throw ex;
      ^

RuntimeError: abort(TypeError: fetch failed) at Error
    at jsStackTrace (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1937:19)
    at stackTrace (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1954:16)
    at process.abort (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1653:44)
    at process.emit (node:events:519:28)
    at emit (node:internal/process/promises:150:20)
    at processPromiseRejections (node:internal/process/promises:284:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
    at process.abort (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1659:11)
    at process.emit (node:events:519:28)
    at emit (node:internal/process/promises:150:20)
    at processPromiseRejections (node:internal/process/promises:284:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)

So the question is: This library is still supporting?

Incremental training

Update binaries to support Incremental Training as for #1327
Code base: https://github.com/SergeiAlonichau/fastText

Different prediction for the same keyword same model

Dear author.

Thank you for this wonderful node add on.
I have strange problem

When i test directe predictions with fasttext i mean without node, i have no problem.

But when i pass same keyword to the node server, each time i have different label different accuracy.

wget -qO- http://local
host:3030/?text=beshoo

Each time i send this url i have different label.

Regards

predict in an AWS Lambda or GCP Function?

Model size makes serverless a bit odd, but in theory it should work fine once the function is called often enough to keep the re-usable container loaded.

Have you tried it and do you plan to support it or do you think it is not doable or not sensical?

https://cloud.google.com/functions/
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-handler.html

I cannot run this project on apple M1 silicon

This is Train.js

This is 100_normal_stemming_train.tsv

This is the running result

When I try to test the dataset with the model 100_normal_stemming.bin training by fasttext python package, it shows as blow

Time for scale up, fastText.Js with Redis and Clustering mode

Node.js Cluster Module
Node.js has implementation the core cluster modules, that allowing applications to run on more than one core.

Cluster module a parent/master process can be forked in any number of child/worker processes and communicate with them sending messages via IPC communication.

Moreover if we support caching process, by Redis which is more complex version of Memcached.

Redis always served and modified data in the server’s main memory. The impact is system will quickly retrieve data that will be needed. It also reduces time to open web pages and make your site faster. Redis works to help and improve load performance

Using Redis we can store cache using SET and GET, besides that redis also can work with complex type data like Lists, Sets, ordered data structures, and so forth.

And we can work with Node.js clustering with PM2

PM2 is a production process manager for Node.js applications with a built-in load balancer. It allows you to keep applications alive forever, to reload them without downtime and to facilitate common system admin tasks. One of its nicer features is an automatic use of Node’s Cluster API. PM2 gives your application the ability to be run as multiple processes, without any code modifications.

I hope this may interest you to add this wonderful methods to this wonderful API

Source : https://goo.gl/MMDe3m

loretoparisi / fasttext.js Goto Github PK

fasttext.js's Introduction

What is FastText and FastText.js

What's New

Table of Contents

FastText.js APIs

How to Install

Install via NPM

Install via Docker

WASM

How to Use

Train

Train Supervised

Train Unsupervised

Track Progress

Test

Test-Labels

Predict

Multi-Label

Nearest Neighbor

Tools

Confusion Matrix

Examples

Train

Test

Predict

Run a Prediction Server

Run a Prediction CLI

Language Identificaton Server

Train Language Identification Model

Training set and Test set format

Datasets

Models

Other Versions

Supported Platforms

External Binary

How It Works

Disclaimer

Acknowledgments

fasttext.js's People

Contributors

Stargazers

Watchers

Forkers

fasttext.js's Issues

Steps to reproduce

Expected behaviour

Actual behaviour

Steps to reproduce

Expected behaviour

Actual behaviour

This is Train.js

This is 100_normal_stemming_train.tsv

This is the running result

When I try to test the dataset with the model 100_normal_stemming.bin training by fasttext python package, it shows as blow

Recommend Projects

Recommend Topics

Recommend Org