Code Monkey home page Code Monkey logo

elasticlunr.js's People

Contributors

aaroncraig10e avatar benpickles avatar cvan avatar darkle avatar daveallie avatar deerawan avatar eiriksm avatar gitter-badger avatar gregglind avatar hackjutsu avatar jakiestfu avatar jaylett avatar kant avatar kix avatar kkirsche avatar lvivier avatar mihaivalentin avatar nikolas avatar nolanlawson avatar olivernn avatar pborreli avatar piranna avatar richardpoole avatar roark avatar rushton avatar samuelmeuli avatar shrmnk avatar srenauld avatar tony-jacobs avatar weixsong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticlunr.js's Issues

Faceting / aggregation

Do you think, like Solr or ES, faceting (or aggregation) is something feasible in a such lightweight search engine ?

Incorrect partial matching

Is it possible for elasticlunr to match partial words? For example, if we run the example app and type in 'problem' there are lots of results. But I would expect to get the same results when I type 'prob', 'probl' and 'proble' because the partial word does match content in the index. Is this configurable?

In any case, it doesn't seem to work correctly. I have a data set with many pages matching the word 'celebrate' - the results returned in include matches for 'celebrates', 'celebrated' and 'celebration'. But when I type in 'celebr' I also get the matches as if partial matching does work, although when using 'celeb' or 'celebra' I get no matches. Why is this?

I wanted to try elasticlunr because it creates a smaller index than lunr.js, although lunr.js allows partial matching which is what I need for my app, which returns new results on every character typed. Does having the smaller index sacrifice partial matching?

Getting an error of TypeError: Cannot call method 'getDocFreq' of undefined

Hi, I'm getting an odd error when I try to run the example from the elasticlunr documentation in node.js.
Running the following code in node:

var http = require('http');

var elasticlunr = require("elasticlunr");

var server = http.createServer();

var index = elasticlunr(function () {
    this.addField('title');
    this.addField('body');
    this.setRef('id');
});

var doc1 = {
    "id": 1,
    "title": "Oracle released its latest database Oracle 12g",
    "body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}

var doc2 = {
    "id": 2,
    "title": "Oracle released its profit report of 2015",
    "body": "As expected, Oracle released its profit report of 2015, during the good sales of database and hardware, Oracle's profit of 2015 reached 12.5 Billion."
}

index.addDoc(doc1);
index.addDoc(doc2);

index.search("Oracle database profit", {
  fields: {
      title: {boost: 2},
      body: {boost: 1}
  }
});

server.listen(process.env.PORT || 3000, process.env.IP || "0.0.0.0", function(){
  var addr = server.address();
  console.log("Chat server listening at", addr.address + ":" + addr.port);
});

I get the following error message:

/home/ubuntu/workspace/node_modules/elasticlunr/elasticlunr.js:731                                                                         
  var df = this.index[field].getDocFreq(term);                                                                                             
                             ^                                                                                                             
TypeError: Cannot call method 'getDocFreq' of undefined                                                                                    
    at elasticlunr.Index.idf (/home/ubuntu/workspace/node_modules/elasticlunr/elasticlunr.js:731:30)                                       
    at elasticlunr.Index.computeSquaredWeight (/home/ubuntu/workspace/node_modules/elasticlunr/elasticlunr.js:898:22)                      
    at Array.forEach (native)                                                                                                              
    at elasticlunr.Index.computeSquaredWeight (/home/ubuntu/workspace/node_modules/elasticlunr/elasticlunr.js:894:15)                      
    at elasticlunr.Index.search (/home/ubuntu/workspace/node_modules/elasticlunr/elasticlunr.js:779:28)                                    
    at Object.<anonymous> (/home/ubuntu/workspace/server.js:29:7)                                                                          
    at Module._compile (module.js:456:26)                                                                                                  
    at Object.Module._extensions..js (module.js:474:10)                                                                                    
    at Module.load (module.js:356:32)                                                                                                      
    at Function.Module._load (module.js:312:12) 

Any help would be appreciated.

8. Save & Load Index (NODE JS ONLY)

You should mention that the section 8 file can only be used on node.

  1. Save & Load Index

You just need to build index only one time offline, and then save the index to a JSON file, for future usage, you just need to load the index file.

Save the index to JSON file as followings:

var elasticlunr = require('./elasticlunr.js'),
fs = require('fs');

fs = require('fs'); is not available off-server, so this is "node.js only".

one can however - in a pinch -
do a

  questions.forEach(function (question) {
    idx.addDoc(question);
  });
  
  console.log(JSON.stringify(idx));

and then expand that in the console and copy/paste it MANUALLY into the example_index.json file.
that works but is limited on length. other solutions that push such content to your clipboard are also available and could be included as a convenience to the developer

https://clipboardjs.com/

A "COPY JSON IDX to CLIPBOARD" button would be a fine example.

in a pinch you can just create a <textarea>, then populate it during design time

Bulk Add

I'm loading up a database with ~30K documents and adding them via a for loop takes ~3 seconds on a desktop (which would be crazy slow on a phone). I was thinking of adding an addDocs(Array) function that would use setTimeout to load everything in a non-blocking manner. However, I was hoping there might be a faster method....

Having external document store instead of default

Hi guys,

Great job with this module.

Right now, document store is in memory and we are loading index from files all at once.

What i working on is to add that in external document storage for same so that i can add lots of index without having out of memory issue.

Here is my concern:

Right now we are adding all the links in this format:
{"link": {attr key value}}

which is great for one site, i want to add multiple site in the elastic lunr at Server side and want to share the common front-end across various sites and while doing search I dont want to fetch content of abc.com [eg] even though keyword match because user is searching on pqr.com.

What you guys suggest about optimal schema for same considering less code changes in the module.

How to use persistant storage ?

HI
i want to know is there any way we can use persistence storage to store index so later when i restart my app i do not have to build index again ?

Thanks

"Token Expandation" causes JavaScript error

Hi, when I enable Token Expandation by adding the expand: true option for a search on the index, I get the following error message in chrome:
"Uncaught TypeError: Cannot read property 'length' of undefined" (elasticlunr.js:948)

Looking at the source, this would be the following code:

elasticlunr.Index.prototype.coordNorm = function (scores, docTokens, n) {
  for (var doc in scores) {
    var tokens = docTokens[doc].length;
    scores[doc] = scores[doc] * tokens / n;
  }

  return scores;
};

And indeed, docTokens[doc] is undefined. The reason is, that docTokens itself is already an empty object. It is defined in the fieldSearch function, but in case of a penalty nothing is added.

For me it looks like fieldSearchStats should always be called in fieldSearch, not only if there is no penalty - but I am not sure about the intention of this code.

Do I miss something? Do I need to add anything to the initialisation of elasticlunr?

do not understand boolean model

Below is my configuration. I am getting the same results with boolean as 'AND' as when I use 'OR'. For example if I search for 'title1 region2', I will get all results that match title1 and region2, even if a result that matches title1 falls within different region than region2. Maybe I just don't understand what the boolean logic setting is supposed to do.

{
fields: {
title: {
boost: 2
},
description: {
boost: 2
},
region_id: {
boost: 1
},
id: {
boost: 1
}
},
boolean: "AND"
}

index.js removeDoc

in lib/index.js:

line no: 225

var docRef = doc[this._ref];

should be

var docRef = doc[this.ref];

Due to this bug, documents are getting removed from the documentStore but still remain in the index.

example is broken

GET http://elasticlunr.com/example/assets/bootstrap/js/jquery-1.11.3.min.js 
index.html:233 GET http://elasticlunr.com/example/assets/bootstrap/js/bootstrap.min.js 
index.html:129 Uncaught TypeError: Cannot set property 'href' of null(anonymous function) @ index.html:129(anonymous function) @ index.html:130
require.js:34 GET http://elasticlunr.com/example/elasticlunr.js h.load @ require.js:34i.load @ require.js:29$.load @ require.js:18$.fetch @ require.js:17$.check @ require.js:19$.enable @ require.js:23i.enable @ require.js:27(anonymous function) @ require.js:23(anonymous function) @ require.js:8v @ require.js:7$.enable @ require.js:22$.init @ require.js:17(anonymous function) @ require.js:26
require.js:8 Uncaught Error: Script error for: elasticlunr.js
http://requirejs.org/docs/errors.html#scripterrorC @ require.js:8i.onScriptError @ require.js:29

Stop sharing Google Analytics account with lunrjs.com

When this project was forked from lunr.js the index.html used for the homepage (and possibly others) copied the google analytics snippet.

Please remove all google analytics snippets that include the account number used by Lunr.

Any i18n support?

It seems elasticlunr doesn't support indexing Chinese by default.

In case I miss something in the configuration, I would like to create this issue for discussion.

Does elasticlunr support exact match?

Just curious to know if elasticlunr.js support exact match or any workaround to mimic exact match?

By exact match I mean support for searching a term is phrase like "elasticlunrjs is great".

No results when only entering first couple of letters

This is obviously me not understanding the algorithms in place, but when you search for the first few letters in a index, you get no results when it seems like you should..

var index = elasticlunr(function(){
    this.addField('name');
    this.setRef('id');
});

index.addDoc({id: 1, name: 'Elvis Presley'});
index.addDoc({id: 2, name: 'Queen Victoria'});
index.addDoc({id: 3, name: 'Plato'});
index.addDoc({id: 4, name: 'Angelina Jolie'});
index.addDoc({id: 5, name: 'Abraham Lincoln'});

// no results
index.search('p');
index.search('pl');
index.search('pla');
index.search('plat');

//result
index.search('plato');

Was hoping if someone could comment on this.

Does not match on expected terms

Certain terms do not produce matches as expected.

For instance, given the following docs:

    ;([{
      id: 'a',
      title: 'Mr. Green kills Colonel Mustard',
      body: 'Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.',
      wordCount: 19
    },{
      id: 'b',
      title: 'Plumb waters green plant ',
      body: 'Professor Plumb has a green plant in his study',
      wordCount: 9
    },{
      id: 'c',
      title: 'Scarlett helps Professor',
      body: 'Miss Scarlett watered Professor Plumbs green plant while he was away from his office last week.',
      wordCount: 16
    },{
      id: 'd',
      title: 'title',
      body: 'handsome',
    },{
      id: 'e',
      title: 'title abc',
      body: 'hand',
    }]).forEach(function (doc) { idx.addDoc(doc); });

a search on the term candlestick does not produce a hit.

I'm guessing this has to do with the stemmer, as in Elasticsearch I've had mixed results using the Porter stemmer.

We are using this library now for a new project, so I am happy to work on a fix to this and send a PR. Just wanted to post the issue here for anyone else having the same issue.

lunr-languages?

The README.md mentions using elasticlunr.js to index other language documents with lunr-languages, but the repository seems gone now. Can I still index, say, French documents, or it's only English-only for now?

expand doesn't work with bool: 'AND'

Looks like prefix search (expand: true) doesn't work when bool: 'AND'.

I think the behavior should be that AND should apply to the user's tokens, not the expanded ones.

Searching Different on Safari, is it supported ?

I have a search example which works in Chrome but not Safari. If you look at the results variable in the Safari screenshot you will see that the ref number goes from 0 upwards and the score is NaN. The exact same query in chrome shows ref fields with values I expect and scores that make sense.

Are there known problems with Safari, the search string "was" works in chrome. Safari doesn't fail on all searches, I've noticed that the string "mas" happens to work.

elasticlunr_chrome
elasticlunr_safari

one character off

Is there a way to produce search results when a user may have misspelled a word by one character? For example 'afganistan' instead of 'afghanistan'?

Word association dictionary

Awesome work!

Would it possible to add mechanism to support word association? This could be a map that each implementation would need to manually configure and maintain. To build up a rich word association map that specific to the domain of the use case. It would introduce an additional dimension of usefulness beyond explicit token match.

Of course, I could just create this outside of the elasticlunr library and have it the word association map augment the search input accordingly.

Diacritic free search

Hi,

Could you give me some pointers on indexing/searching words ignoring diacritics. For example, I want Gödel and Godel match, sama as Şarap and Sarap.

And thanks for this great library and the documentation.

enabling/disabling DocumentStore on per-field basis

Have you considered enabling/disabling the DocumentStore option on per-field basis? I'm adding elasticlunr to my blog now and was thinking of indexing the entire post content (while my blog is small at least), but I wouldn't need that field stored in the DocumentStore. I would like the other fields in the DocumentStore, though, so I can display links to found posts.

How is JQuery referenced in the example's html page?

I see you call Jquery with require as '_' in app.js

How does one call that within the index.html page, as JQuery is not defined.
not familiar with require.

I would like to add some other JQuery things but I am struggling.
thanks

Add multi-field search

Right now Elasticlunr will only search all fields with a the same query string. It would be beneficial if different query strings could be set per field. For example:

index.search({
    tag: "faq",
    body: "Oracle"
  }, {
    fields: {
      tag: {boost: 2},
      body: {boost: 1}
    },
    bool: "AND"
  });

That is, search both tag and body fields, but with different query terms. This would correspond the Lucene query syntax tag:faq AND body:Oracle.

AFAIC this would not need any modifications to the index and if the first argument to search could be an object instead of a simple String, existing code using the search would also be backwards compatible.

Example incorrectly compares value of field not length

in App.js

 $('input').bind('keyup', debounce(function () {
    if ($(this).val() < 2) return

should be

 $('input').bind('keyup', debounce(function () {
    if ($(this).val().length < 2) return

otherwise the demo will NEVER search if the term entered is < 2 (Like: 1.9999999999999999999)
You can see the live demo example is broken in this way

Search multiple fields into one query

Hello,

I came to Elasticlunr from Lunr because I was interested by the functionnalities regarding the search by field, and I wasn't disappointed, it works really fine, and Elasticlunr, as the original Lunr search engine, is a great piece of work, and thanks for developing it. But still for the website that I'm working on, I would love to implement an "advanced search" page, with the possibility for the user to search for different strings in multiple fields during the same query, to find the item which fulfills all conditions. Well, not many fields, since it is a client-side search engine, but Elasticlunr works so fast that it looks like it could certainly handle one query with three different parameters for different fields.

For now, if I understand correctly, we can query the same words, so in the example below, "Oracle dabase profit" would be searched in Title and body.

index.search("Oracle database profit", {
    fields: {
        title: {boost: 2},
        body: {boost: 1}
    },
    boolean: "OR"
});

And it would be great if we could search for THE item with "Database" in its title, and "Oracle profit" in its body, for instance. Having one function to handle that case is the issue that I want to raise, if it hasn't been already dismissed as technically impossible.

How to implement this with the already built-in functions

Certainly, there must be a way of already doing that with the built-in functions already implemented. Here we leave the general issue to broach one difficulty that I have to implement this (but I think it may be of interest for others Elasticlunr users, if somebody could help me to solve it and share the solutions). I was thinking of two distinct queries ( query1 = "database" in title, query2 = "Oracle profit" in body), storing the results into arrays, and comparing thereafter the two (or more if searching into more fields) arrays of results to find matching values in this arrays.

My issue is that, as a beginner in JS, I don't succeed to make the fieldSearch function working. The final goal is, for each selected field in a form, to call a function which search for the string in this particular field, and then compare the three arrays of result to find the matching values, and therefore the item which fullfills alls the condition of the search.

I have this, which is supposed to take the query, and the field selected by the user in a dropdown list, and pass them as parameters to the fieldSearch function, display the result of the search.

 $(document).ready(function() {
       $('button#search').on('click', function () {
             var query = $("input#keyword-search").val();
             var SelectedField = $('#optgroupDublinCore option:selected').text();
             var queryTokens = elasticlunr.index.search.pipeline.run(elasticlunr.tokenizer(query));
             var result = index.fieldSearch(queryTokens, SelectedField); 
             var resultdiv = $('#results');
              resultdiv.append('<p class="">Found '+result.length+' result(s)</p>');
              for (var item in result) {
               var ref = result[item].ref;
               var searchitem = '<div class="result"><p><a href="{{ site.baseurl    }}'+store[ref].link+'">'+store[ref].title+'</a> by '+store[ref].author+'type :'+store[ref].type+'</p></div>';
    alert(searchitem);
    resultdiv.append(searchitem);
  }})})

But I get that error "Uncaught type error : cannot read property run of undefined". Obviously it had to do with the queryTokens variable, and the keyword "this" in this.search.pipeline.run(elasticlunr.tokenizer(query));, which I have replaced by "elasticlunr.index.search", assuming that it was the object referred by "this".

If someone can help me with this, I would really appreciate ! And it could be of use for others, maybe to add as an example of implementation in the doc., since multiple fields queries are an interesting feature.

publish latest version to npm

Currently the latest version in nom repository is 0.9.5 and in github is 0.9.6

Could you please publish the latest version to npm?

Searching fields using boolean AND doesn't yield any results

Hi,

I'm coming from lunrjs because I'm interested in your field searching feature.

My index json looks similar to this:

{
  "index": [{
    "name": "my name is A",
    "array1": [
      "array1 value1",
      "array1 value2"
    ],
    "array2": [
      "array2 value1",
      "array2 value2"
    ],
    "id": "nameA"
  }, {
    "name": "my name is B",
    "array1": [
      "array1 value4",
      "array1 value10"
    ],
    "array2": [
      "array2 value1",
      "array2 value5"
    ],
    "id": "nameB"
  }]
}

What I need to be able to do is search this index using queries such as this (I know the syntax is not correct):

q = idx.search(+name:"my name is B")
q.length == 1
q.results == ["nameB"]
q = idx.search(+array1:"array1 value1")
q.length == 1
q.results == ["nameA"]
q = idx.search(+array1:"array1 value1" +array2:"array2 value1")
q.length == 2
q.results == ["nameA", "nameB"]

The problem is that when I use my real json file and run the searches above I don't get the expected results.

  1. I've tried the approach below, but it doesn't work and returns zero results. It's as if there is a problem matching the exact string "my name is B".
q = idx.search("my name is B", {
    fields: {
        name: {}
    },
    bool: "AND"
});
q.length
  1. Similar problem to (1). I want to find the exact string "array1 value1" in the field "array1", but here too the results are zero when I used AND.
q = idx.search("array1 value1", {
    fields: {
        array1: {}
    }
});
q.length == 1
  1. Here I want to find the exact string "array1 value1" in the field "array1" AND "array2 value1" in the field "array2".
q = idx.search("array1 value1 array2 value1", {
    fields: {
        array1: {},
        array2: {}
    }
});
q.length == 2

I hope you understand what I mean and can help me find a solution to these issues. Thanks.

Update to newest lunr version

Update the plugin to the newest lunr version.
There are some new methods which are used by lunr languages. (e.g. generateStopWordFilter)

Searching a multiple word term in a field

Excellent work and thanks for making it available. I have one issue with regards to search terms that consist of multiple words, i.e. compound terms. For instance, how would I search for

'public transport' OR car

without getting results that match either public or transport rather the whole term. Is this currently possible?

Thanks

Index on all fields

Is it possible to index on all fields? For example if I want to store documents that might not always have the same fields, but I want to be able to search across any or all fields.

Highlighting feature

Would be great to have such feature. In any form, either highlighted matched words (configurable to which elements/tags to wrap) or just positional information of matched tokens (so highlighting itself could be done outside of the elasticlunr.js).

Fuzzy search?

Is there fuzzy search capability in elasticlunr?

Thanks,

John

Fuzzy searching

Thanks for this search engine, I've added it to a mini web app I built and it works quite well. I was wondering if it's possible to do more "fuzzy" searches on the data that's indexed. For example, I log last accessed date for my documents and update it. If I could do the following search using the indexer, that would help me a lot:

Find all documents that have access date greater than [timestamp]

Or, similar example in different context. Say I have a bunch of documents, each with a sentence like the following:

99 bottles of beer on the wall
98 bottles of beer on the wall
...
10 bottles of beer on the wall

And I want to perform the following query:

Find all documents where [## bottles of beer on the wall] > 50

Always throw user configuration parse failed, will use default configuration

Hi,

First of all, thx for elasticlunr, great job!

Second, I'm using elasticlunr on the backend side, from node process, my index is typical, based exactly on example from main page elasticlunr

I'm always getting warn: user configuration parse failed, will use default configuration, whatever I use as search config, empty string, empty object, nothing ... always throw warn
I'm using node ver 6.x

Regards,
PS

Placeholder in Search

Is it possible to use placeholder in the search term?
Or alternatively enable partial matching.
searchterm "be"
found results:

  • "havebeen"
  • "unbearable"

sorting by both relevance and 2nd numeric field like timestamp

I've looked though docs for elasticsearch and there it looks like a secondary sorting field can be added, is that also possible with elasticlunr? I'd like to sort results by both relevancy and a timestamp. I could manually do this once I have the result set however if it's already built in, even better.

Problem with the stemmer

There seems to be a problem with the stemmer.
The following
query = 'Things to do'
will be stemmed to

console.log(elunr.stemmer(elunr.trimmer(query));
'Things to do'

It has nothing to do with capitalization, the same happens with 'things to do'
the s should be deleted from things
a look at the online porter stemmer, shows the correct result
http://9ol.es/porter_js_demo.html

Boost complete matches more than partial matches

Is it possible to make with elaticlunr configuration which will support boost complete matches more than partial matches.

Example:

Search query: "Flight Change"
Current results:

  • "Flight disruption - Involuntary Rebooking - Schedule Change"
  • "Voluntary Rebooking - Change Flight Date"
  • "Voluntary Rebooking - Urgent Rebooking / Urgent Flight Change"

But I want to have results in order like this:

  • "Voluntary Rebooking - Urgent Rebooking / Urgent Flight Change"
  • "Voluntary Rebooking - Change Flight Date"
  • "Flight disruption - Involuntary Rebooking - Schedule Change"

Is it possible to do with elasticlunr?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.