Comments (8)
Status update time!
Turns out there were a lot more things that could be done, low-hanging fruits and improvements. In particular, I've also put the elastic
and lunr
back into elasticlunr
.
The branch I've been working on, which is much closer to being a PR candidate, now has the following changes made to it:
Indexing:
- Each field is now a separate entity, complete with the ability to add/edit pipeline elements, define whether the field's full-text input needs to be stored and whether the TF/IDF elements need to be kept. Goes a long way to addressing #103, #43
- Term positions are stored (can be enabled/disabled). This allows the user to highlight documents. Featured mention in #91
- Batch addition, updates and removals! As the costliest operation in index management is having to recalculate the IDF (it depends on the # of terms in the index), people adding or modifying a lot of documents may find this useful.
Search
- The
search
function now supports three dialects. In order to easily allow people to move to this library (preferably without rewriting most of their queries), the fork now supports three dialects: the one present inlunr
,elasticlunr
and a subset of elasticsearch DSL. Everything maps back to the elasticsearch DSL, and this should be considered for advanced queries. There are a couple of examples in the tests. In particular, this goes a long way to fixing #86 - Field-length norm is now a factor in scoring. This should go a long way to boost relevance
- Every part of the search now has a firmly-defined external API
Storage
- I went through and tried to improve the way an index is serialized. The new format is more than just a plain
toJSON()
of the index. This also opens the way for space-saving optimizations. The old format is still fully importable
Tests
Basically everything is reworked. A ton of test cases were testing the internal state of objects more than whether a given feature worked. This is probably the biggest change.
A couple of tests were added to cover new components (the new index and DSL) or third-party library support (lunr-languages
)
My next step is documentation and benchmarks.
from elasticlunr.js.
@srenauld, @weixsong - given that this PR is merged, is there anyplace I can look up usage? current docs are out of date.
With default options, an index is now more than 2x the size of the one produced by lunr.js
ElasticLunr seemed very appealing due to ability to store fields & being able to do an 'AND' search on all terms - but without usage info, it's hard to use.
from elasticlunr.js.
@srenauld , thanks very much for the contribution.
from elasticlunr.js.
@srenauld How's this effort going?
from elasticlunr.js.
@smurrayatwork on this, I've done very little due to a PR being frozen for months. On other fronts, I almost have a plug-in for this library for nextjs
, which may or may not be useful to some.
from elasticlunr.js.
@raghur Hello, and sorry about the delay!
I'm in the process of working on a side branch for this purpose specifically. The version currently available on npm
is still 0.9.5 (i.e. before the changes). As such, all the old documentation is up to date, and so are all the specifics.
The branch I am working on (intermittently, admittedly. I've had a ton of things coming my way recently) is over here. I wired together a small example leveraging the gatsby plugin to both enforce backward-compatibility (wouldn't want to wreck a plugin by accident) and to showcase the use.
Regarding the index size, if you are using the new version, I'd be curious to see what you are indexing. This is the kind of feedback that is extremely valuable, as I can change the index format in the new version while still remaining backward-compatible (the old was a straight JSON.stringify()
of the inner state - the new is a bit smarter). As nobody is currently using the new format (AFAIK), I'd be super interested to know what you are indexing and what your results are.
from elasticlunr.js.
@srenauld - thanks for responding.. No problem at all.
TBH, I went through some of your changes and while I absolutely LIKE the features you list, I ran into a few issues (hence the reason I picked master instead of a released version at NPM). Having a working example is great for starters and exactly the thing I was missing picking up master here!
Content I'm indexing is from my blog - json is here - https://blog.rraghur.in/index.json. It isn't a large blog by any means and index was coming up to about 4.8MB (Lunr produced 2.3 - 3MB) I think.
code to build index is here https://gitlab.com/rraghur/rraghur.gitlab.io/blob/elasticlunr/build-index.js
Also, options like not to storedocuments (I didn't want that on the full text blog content), etc weren't working and I was just running into too much. So finally wrote the comment here and switched back to lunr.
I'll try your branch and gatsby example early next week...
from elasticlunr.js.
An option like storeDocuments
to control whether fields are persisted to the index would be VERY useful for me. But it doesn't seem to work in v0.9.5. Does anything like it exist?
from elasticlunr.js.
Related Issues (20)
- remove a field after indexing
- The link to https://github.com/weixsong/lunr-languages is broken! HOT 1
- Separate document and query token pipelines? HOT 3
- Turn off stemmer for specific queries? HOT 1
- Support browser ES6 imports HOT 1
- Delegate or add maintainers HOT 4
- Typescript compile issue with addField HOT 2
- move to org HOT 4
- Unexpected search results HOT 7
- Stop words in hyphenated compounds HOT 2
- Search index by time
- single page simple example? HOT 1
- Feature Request: Binomial Coefficient Entropy Coding for Inverted Index Postings
- Facing this issue with importing elasticlunr TypeError: (0 , elasticlunr_1.default) is not a function HOT 1
- Show result for special characters in the search? HOT 1
- Speed
- synonym functionality
- "lunr is not defined" when strict parsing is enabled HOT 2
- Has anyone tried modifying elasticlunr to not use global variables?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticlunr.js.