Code Monkey home page Code Monkey logo

n_gram's Introduction

Info

Simple tool for training n-gram language model. Inspired by this course.

Usage

use n_gram::*;

fn main() {
    // Initializing model
    let config = Config::default();
    let mut model = Model::new(config);

    // Loading and tokenizing corpus
    let corpus = tiny_corpus()
          .iter()
          .map(|t| sos(eos(tokenize(t.to_owned()))))
          .collect::<Vec<_>>();

    model.train(corpus);

    // Now you are ready to generate something
    let mut tokens = sos(tokenize("The quick".to_owned()));
    let max = 10; // max number of generated tokens
    model.generate(&mut tokens, max);

    // Save model
    model.save("model.json").unwrap();

    // Reset model
    model.reset();

    // Load model back
    model.load("model.json").unwrap();
}

Examples

I've trained a trigram model on 20000 samples from the Tiny Stories dataset. Here are some examples of generated text:

  • "__sos__ Once upon a time a mom, a dad, a big sister, and a little girl below shouted, "Look Mama! A talking cloud!" The little girl opened her hand, and the monkey happily ate it all in one day. She was so kind he said yes and showed him the pin. "I poked you with this. It is a storm. The waves were so tall and wide, it seemed like something was calling her to come to an end eventually. They all had an incredible songbird inside. Billy was happy and excited. __eos__"
  • "__sos__ Once upon a time there was a light girl with a basket. She then sent the basket to the washing machine. While the laundry was all hung up, Daisy and her family were getting ready to fly it, it suddenly flew away! The lion felt bad for being rude. He said, "It's my p leasure. It's important to remember to forgive. __eos__"
  • "__sos__ Once upon a time a family lived in a stream with many stones on the ground, it glistened in the sunshine. From that day forth they were always with her and learn with her and waved goodbye to Mommy. The bus driver was happy and flew away happily. Timmy felt proud of their pictures. __eos__"

Installation

cargo add n_gram

If you want to save & load your models:

cargo add n_gram --features=saveload

If you want to load tiny corpus for training:

cargo add n_gram --features=corpus

Links

n_gram's People

Contributors

georgiyozhegov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.