Code Monkey home page Code Monkey logo

nanosearch's Introduction

nanosearch

A tiny search engine.

Suitable for in-browser use, this provides n-gram based search results.

Quickstart

import { SearchEngine } from '@toastdriven/nanosearch';

// Create a search engine.
const engine = new SearchEngine();

// Index some documents.
// First parameter is the unique document ID, second is the document text.
engine.add("abc", "The dog is a 'hot dog'.");
engine.add("def", "Dogs > Cats");
engine.add("ghi", "the quick brown fox jumps over the lazy dog");
engine.add("jkl", "Am I lazy, or just work smart?");

// Then, you can let the user search on the engine...
let myDogResults = engine.search("my dog");
myDogResults.count(); // 3

for(let res of myDogResults.iterator()) {
  console.log(res.docId); // ex: "def"
  console.log(res.score); // ex: 0.2727272727272727
}

// ...including limiting results (to just one)...
let lazyResults = engine.search("lazy");
let topResult = lazyResults.at(0);
console.log(topResult);

// ...or making pages of ten results!
let dogResults = engine.search("dogs");
let pageOne = dogResults.slice(0, 10);
let pageTwo = dogResults.slice(10, 20);
console.log(pageOne);
console.log(pageTwo);

Installation

$ npm install @toastdriven/nanosearch

Requirements

  • ES6 (or similar translation/polyfill)

Tests

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ npm test

Docs

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ ./node_modules/.bin/jsdoc -r -d ~/Desktop/out --package package.json --readme README.md src

License

New BSD

nanosearch's People

Contributors

toastdriven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanosearch's Issues

LocalStorage and/or IndexedDB support

Being able to use the index between page refreshes would be ideal. Some kind of local storage within the user's browser.

LocalStorage is easy, but has size restrictions. IndexedDB lacks those size restrictions, but I've never poked at it.

More/better tests

I cheated out on this due to time (there's only so much you can do in a single lunch hour!), so there's really only some integration-y tests right now. Actual unit tests for each method of the library would be good.

Version the index

We should include a version number in the index itself, to allow for future handling/upgrades.

Handle Unicode better

This goes hand-in-hand with #1. We want to be able to support Unicode, so this should get included in that rewrite.

Store term positions

A big deficiency with the existing code is that we're currently only storing the number of time a term appears in a document.

Ideally, we'd be storing a list of the positions of the terms instead. This would allow for better scoring (e.g. "how close are the terms") & better querying (e.g. exact matches).

Unfortunately, this is non-trivial to implement (& backward-incompatible):

  • Rewrite the preprocessor process code, iterating over the document body & generating a list instead of effectively just .toLowerCase().replace().split()
  • Change the preprocessor process api to emit a list of [[word, position], [word, position], ...] values
  • Change the tokenizer tokenize api to include the position as well
  • Change the index to store lists of positions instead of just the count
  • Change scoring to check the length of the lists instead of just the count

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.