Code Monkey home page Code Monkey logo

go-vsm's Introduction

go-vsm

GoDoc

Vector Space Model implementation in Go.

This package provides document search based on the algebraic Vector Space Model. The weighting scheme used is the TF-IDF.

Usage

import "github.com/quan-to/go-vsm/vsm"

Construct a VSM object and use the methods of the VSM object for training:

vs := vsm.New(nil)

docs := []vsm.Document{
        {
                Sentence: "Shipment of gold damaged in a fire.",
                Class:    "d1",
        },
        {
                Sentence: "Delivery of silver arrived in a silver truck.",
                Class:    "d2",
        },
        {
                Sentence: "Shipment of gold arrived in a truck.",
                Class:    "d3",
        },
        ...,
}


// Statically training
for _, doc := range docs {
        if err := vs.StaticTraining(doc); err != nil {
                // Error occurred during training.
        }
}

Dynamic Training

Static training is executed once, and for most cases it's enough:

docs := []vsm.Document{
        {
                Sentence: "Shipment of gold damaged in a fire.",
                Class:    "d1",
        },
        {
                Sentence: "Shipment of gold arrived in a truck.",
                Class:    "d3",
        },
}

vs := vsm.New(nil)

for _, doc := range docs {
        err := vs.StaticTraining(doc)
        fmt.Println(err)
}

But if you've got a stream of data and need a more reactive behaviour for the training process, the dynamic training might be the best choice:

docCh := make(chan Document)

go func() {
        defer close(docCh)

        // Loads document from some source dynamically
        // and sends it to the training channel.
        docCh <- vsm.Document{
                Sentence: "Delivery of silver arrived in a silver truck.",
                Class:    "d2",
        }
}()

trainCh := vs.DynamicTraining(context.Background(), docCh)

// Checks if error occurred during the training process.
for {
        res, ok := <-trainCh
        // trainCh closed. All train data was consumed.
        if !ok {
                break
        }

        if res.Err != nil {
                // Handles error.
        }
}

Search

Search applies the Vector Space Model to compare the deviation of angles between each document vector and the query vector.

doc, err := vs.Search("gold silver truck.")

fmt.Println(doc.Class, err)

Testing

Go to vsm folder and run:

go test -v -cover

This package provides a way of testing through file:

go test -v -fromfile

The -fromfile flag tells the test to run tests over the testdata/training.json file.

If you want to specify another testing file:

go test -v -fromfile -filename="training-2.json"

The -filename flag should point to a file inside the testdata folder. See the training.json file for details on its format.

LICENSE - MIT

see LICENSE

go-vsm's People

Contributors

airtonjal avatar rdleal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

fossabot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.