blugelabs / bluge Goto Github PK

View Code? Open in Web Editor NEW

1.8K 33.0 121.0 604 KB

indexing library for Go

License: Apache License 2.0

Go 100.00%

bluge's Introduction

Bluge

modern text indexing in go - blugelabs.com

Features

Supported field types:
- Text, Numeric, Date, Geo Point
Supported query types:
- Term, Phrase, Match, Match Phrase, Prefix
- Conjunction, Disjunction, Boolean
- Numeric Range, Date Range
BM25 Similarity/Scoring with pluggable interfaces
Search result match highlighting
Extendable Aggregations:
- Bucketing
  - Terms
  - Numeric Range
  - Date Range
- Metrics
  - Min/Max/Count/Sum
  - Avg/Weighted Avg
  - Cardinality Estimation (HyperLogLog++)
  - Quantile Approximation (T-Digest)

Indexing

    config := bluge.DefaultConfig(path)
    writer, err := bluge.OpenWriter(config)
    if err != nil {
        log.Fatalf("error opening writer: %v", err)
    }
    defer writer.Close()

    doc := bluge.NewDocument("example").
        AddField(bluge.NewTextField("name", "bluge"))

    err = writer.Update(doc.ID(), doc)
    if err != nil {
        log.Fatalf("error updating document: %v", err)
    }

Querying

    reader, err := writer.Reader()
    if err != nil {
        log.Fatalf("error getting index reader: %v", err)
    }
    defer reader.Close()

    query := bluge.NewMatchQuery("bluge").SetField("name")
    request := bluge.NewTopNSearch(10, query).
        WithStandardAggregations()
    documentMatchIterator, err := reader.Search(context.Background(), request)
    if err != nil {
        log.Fatalf("error executing search: %v", err)
    }
    match, err := documentMatchIterator.Next()
    for err == nil && match != nil {
        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })
        if err != nil {
            log.Fatalf("error loading stored fields: %v", err)
        }
        match, err = documentMatchIterator.Next()
    }
    if err != nil {
        log.Fatalf("error iterator document matches: %v", err)
    }

Repobeats

License

Apache License Version 2.0

bluge's People

Contributors

Stargazers

Watchers

Forkers

ikawaha voldyman michaeljs1990 bonedaddy isgasho mewbak daniel-007 pombredanne rubiojr amnonbc itmeze fancybits tezos-commons n-creativesystem abeusher mbrukman abutaha dizhaung showsmall cmeninwa j5s hadryan wiltonlazary rockybean leo2904 pubgo dolanor-galaxy muharihar gavinljj super-rain developgo sthagen linuxerwang demonoid81 yianz unparalleledbeauty forkkit pyjcode didip hengfeiyang beyang zincsearch waddyano havetrytwo icaas devopstoday11 rohankumardubey scotthaleen der-ofenmeister chenyang8094 vaibhav-kaushal wzl-bxg appbaseio gqadonis yibit novsochetra pfxstar clipperhouse newpanjing shibingli justintung fly512 fzambia maisonprosper wzhongda haifenghuang avey777 altoplano suryatmodulus hopkings2008 aaron3s sparksustc kugouming 89trillion-ziqiang crtrpt aditya-sood gtrevg lingxialingdu shadown anoop-qasolve anacrolix ucwong x-debug jperkin abob9 netqyq xaviersampedro-bandai masaimu siepkes jiwliu pkafma-aon mrchypark goswamig lcpinto ainilili fatelei zhendliu sw1136562366 iq-scm tatris-io

bluge's Issues

find way to re-enable search optimization tests

Problem: they depend on actual implementation, not possible with current stubs used for testing.

review all changes since bleve fork

Bluge diverged from bleve at commit 81b1327ecc52ddaec8e6b373c84297c6253feee6

make it possible to use bluge with mmap disabled

The current implementation always uses mmap to open segments from disk. There should be a new option to make it possible to use regular file I/O instead.

Today, when a segment is loaded this section of code is invoked:

https://github.com/blugelabs/bluge/blob/master/index/directory_fs.go#L141-L165

It opens the file with a operating system shared lock, mmaps the file, prepares a close func over these, and builds a segment.Data using NewDataBytes(), passing the mmap'd data.

Instead, we should have a configurable function which can decide at runtime whether to mmap the file or not. Initially, we have an implementations for always mmap, and never mmap, with no runtime logic. A bluge top-level config option should let you toggle between these. In the future, applications may even directly provide this function (open issue on Bleve repo requesting this).

When the function tells us to open without mmap, we simply open the file with operating system shared lock, and then instead call segment.NewDataFile()

Export fields in BaseSearch?

BaseSearch seems like it would be nice to have its fields exported so other packages adding searchers can use this to build off since it implements a lot of the interface boilerplate. Any reason to hide BaseSearch fields from other packages?

deletion policy scans directory too frequently

scanning files in the directory is relatively expensive, instead we should track known files, and at significantly higher interval verify this

make search duration (took) available via aggregations

Just like Total and MaxScore became aggregations, do the same for the search duration (took)

phrase position integration test

After fixing #7 enable the phrase position sensitive phrase test in the integration test suite.

review all memory sizing functions

Many of the Size() functions are no longer correct. Need to do a full review of these.

How do I access stored fields other than _id ?

Hi! Just started learning this tool few hours ago. Could not find a way to access stored fields.
So I have a slice of objects, it is quite huge, but for example I have added only two here. I can easily get the _id field, but the field name is empty when I try too get it with match.DocValues("name"), or non-existent when I just trying to print all stored fields.

Am I overlooking something here?

Code:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/blugelabs/bluge"
)

type city struct {
	id   string
	name string
}

func main() {
	cities := []city{
		{id: "BER", name: "berlin"},
		{id: "MOW", name: "moscow"},
	}
	cfg := cityBluge(cities)
	findCities(cfg, "ber")
}

func cityBluge(cities []city) *bluge.Config {
	config := bluge.DefaultConfig("cities")
	writer, err := bluge.OpenWriter(config)
	if err != nil {
		log.Fatalf("error opening writer: %v", err)
	}
	defer writer.Close()

	batch := bluge.NewBatch()

	for _, c := range cities {
		doc := bluge.NewDocument(c.id)
		doc.AddField(bluge.NewTextField("name", c.name))
		// batch.Insert(doc)
		batch.Update(doc.ID(), doc)
	}

	err = writer.Batch(batch)
	if err != nil {
		log.Fatalf("error updating document: %v", err)
	}
	return &config
}

func findCities(cfg *bluge.Config, city string) {
	reader, err := bluge.OpenReader(*cfg)
	if err != nil {
		log.Fatalf("error getting index reader: %v", err)
	}
	defer reader.Close()

	query := bluge.NewPrefixQuery(city).SetField("name")
	request := bluge.NewTopNSearch(10, query)
	dmi, err := reader.Search(context.Background(), request)
	if err != nil {
		log.Fatalf("error executing search: %v", err)
	}
	match, err := dmi.Next()
	for err == nil && match != nil {
		bb := match.DocValues("name")
		fmt.Printf("%v\n", bb)
		err = match.VisitStoredFields(func(field string, value []byte) bool {
			fmt.Printf("New match: %s, %s\n", field, string(value))
			return true
		})
		if err != nil {
			log.Fatalf("error loading stored fields: %v", err)
		}
		match, err = dmi.Next()
	}
	if err != nil {
		log.Fatalf("error iterator document matches: %v", err)
	}
}

Output:

user@host:~/bluge-demo$ go run main.go 
[]
New match: _id, BER

review new API inconsistencies

In some places we have NewXYZ() in other places we have XYZ()

Sometimes there is tension between type XYZ ... and func XYZ() ....

Need to make choices and be consistent.

rectangle spatial index

A great project, is there any plan to increase the rectangle spatial index, such as NewGeoRectField for geo query of lines or polygons.
:)

In memory directory causes nil pointer dereference

The in memory directory implementation causes nil pointer deference because the Load method returns a nil closer..

stacktrace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x114c2ee]

goroutine 12 [running]:
github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0002e3ec0, 0x109145a, 0x8000000000000000)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc00049e200, 0x3, 0x1bd995)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc00003cd80, 0xc0002ec9c0, 0xc0000503c0)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
exit status 2

reimplement force merge

Force merge was added to bleve after the fork.

Unfortunately it looks like a redesign may be needed to fit this into the current bluge index.

Adding an aggregation to AllMatches search causes panic

When you add an aggregation to AllMatches it panics due to assignment to nil map.

test case

func TestAllMatchesWithAggregation(t *testing.T) {
	query := NewMatchQuery("bluge").SetField("name")
	request := NewAllMatches(query)

	request.AddAggregation("score", aggregations.MaxStartingAt(search.DocumentScore(), 0))
}

result:

╰─➤  go test -run AllMatches                                                                  1 ↵
--- FAIL: TestAllMatchesWithAggregation (0.00s)
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 6 [running]:
testing.tRunner.func1.1(0x1265fc0, 0x12f63b0)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1076 +0x30d
testing.tRunner.func1(0xc000001380)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1079 +0x41a
panic(0x1265fc0, 0x12f63b0)
	/usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:969 +0x175
github.com/blugelabs/bluge/search.Aggregations.Add(...)
	/Users/voldyman/dev/bluge/search/aggregations.go:29
github.com/blugelabs/bluge.(*AllMatches).AddAggregation(...)
	/Users/voldyman/dev/bluge/search.go:200
github.com/blugelabs/bluge.TestAllMatchesWithAggregation(0xc000001380)
	/Users/voldyman/dev/bluge/search_test.go:1270 +0x105
testing.tRunner(0xc000001380, 0x12bb3c0)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1127 +0xef
created by testing.(*T).Run
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1178 +0x386
exit status 2
FAIL	github.com/blugelabs/bluge	0.080s

log message "error loading snapshot" reported under simple workload

@rubiojr reports here #39 that when using a Reader not obtained from the Writer and while the Writer is still open, the message "error loading snapshot epoch: %d: %v" appears.

API to fast drop everything in index

This is particularly useful for in-memory indexes, but should apply to all.

overhaul numeric/date/geo fields

Introduce a new field type which uses something like BKD for more efficient indexing.

DocIDQuery - How to query a single document by DocID

Bleve used to have a DocIDQuery, Is that not built on purpose in bluge. What is the recommendation for querying a specific document by DocID?

Indexeddb store advice

I am wishing to make Bluge work inside a browser where all the golang code is compiled to wasm.

fortunately there is a great lib that gives the developer a file system backed by indexeddb here:

https://github.com/hack-pad/hackpadfs

uses

https://github.com/hack-pad/go-indexeddb

Can you advice about the things in Bluge that would require to be done.

I suspect the segment api repo would be where the work would be done with a different go tag for JS WASM , but perhaps I am wrong.

also let me know if this is something you would be interested in having in Bluge or if I need to maintain a forked repo etc

the gui I am using is gioui btw which I am really liking . It builds for web, desktop and mobile and can use bluge lobs although I have not fully tested it with bluge on mobile yet.

I am just wanting to get gioui and Bluge working on the Wasm build of gioui.

here is the code and examples:

https://github.com/gioui

Install fails due to willf/bitset

$ go get -u github.com/blugelabs/bluge
go get: github.com/willf/[email protected] updating to
	github.com/willf/[email protected]: parsing go.mod:
	module declares its path as: github.com/bits-and-blooms/bitset
	        but was required as: github.com/willf/bitset

InMemoryOnly returns err: unable to find a usable snapshot

I would expect following code to work, but i get:
unable to open reader: error opening index: unable to find a usable snapshot
when i run bluge.OpenReader(config)


import (
	"context"
	"fmt"
	"log"
	"testing"

	"github.com/blugelabs/bluge"
)

func TestCanRunSimpleSearch(t *testing.T) {

	config := bluge.InMemoryOnlyConfig()

	writer, err := bluge.OpenWriter(config)
	if err != nil {
		log.Fatalf("error opening writer: %v", err)
	}

	doc := bluge.NewDocument("town:1")
	doc.AddField(bluge.NewTextField("en", "Denia, Alicante"))

	err = writer.Insert(doc)
	if err != nil {
		log.Fatalf("error updating document: %v", err)
	}

	err = writer.Close()
	if err != nil {
		log.Fatalf("error closing writer: %v", err)
	}

	reader, err := bluge.OpenReader(config)
	if err != nil {
		log.Fatalf("unable to open reader: %v", err)
	}

	defer func() {
		err = reader.Close()
		if err != nil {
			log.Fatalf("error closing reader: %v", err)
		}
	}()

	query := bluge.NewMatchQuery("denia")
	query.SetField("en")

	req := bluge.NewTopNSearch(5, query)

	dmi, err := reader.Search(context.Background(), req)
	if err != nil {
		log.Fatalf("error executing search: %v", err)
	}

	next, err := dmi.Next()
	for err == nil && next != nil {
		err = next.VisitStoredFields(func(field string, value []byte) bool {
			if field == "_id" {
				fmt.Println(string(value))
			}
			return true
		})
		if err != nil {
			log.Fatalf("error accessing stored fields: %v", err)
		}
		next, err = dmi.Next()
	}
	if err != nil {
		log.Fatalf("error iterating results: %v", err)
	}
}

implement token position_increment (gap)

Since array_positions have been removed, we need a way to allow indexing of multiple values that do not match a phrase.

review segment/type version loading

The snapshot file format records the type/version of all segments, this should be used to correctly load that segment. Review that this is actually the case.

Real-time reader does not observe changes to the index

Hi there! Coming from bleve and was surprised by the following behavior:

If I open a writer, and then simultaneously open a "near real-time reader" (with writer.Reader()) and start writing to the index, searches via the reader are unable to pick up results for anything that is written after the reader was opened. Is this the expected behavior or should the reader be able to observe future updates?

If this is the expected behavior, is it more common to keep a long-lived reader open and swap it out for a new one when the index is updated, or is the best practice to acquire a reader only when you need to perform a search and then discard it?

Compile failure on 32bit systems

[android-armv8l] ../gopath/pkg/mod/github.com/fancybits/[email protected]/index/config.go:194:3: constant 9223372036854775807 overflows int

review config API

Based on feedback from @steveyen finding the WithXYZ() convention inconsistent.

I will document the larger rationale here and seek wider input...

How to get a single field value without iterating all of them

From the documentation the shown way to get a field value is to use something like the following on a returned document.

        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })

However in some cases I only want to pull a single value from the document. I tried a combination of doc.LoadDocumentValues and doc.DocValues however it doesn't seem that I can retrieve a single value from a given document at present. This is using the in memory index currently.

port recent changes from bleve

Especially this security bug fix for HTML highlighting: blevesearch/bleve@b9b7759

change standard analyzer to keep stop words

This change brings us inline with default behavior for most popular search tools today.

method to backup from snapshot

reimplement index builder

The index builder functionality was added to bleve after the bluge fork.

It makes sense to rethink this issue. It now seems like the IndexBuilder maps more cleanly onto a IndexWriter that has simply been configured to behave slightly differnetly, possibly with some new config options or methods.

Performance benchmarking

I am looking to index system logs ranging in multiples of gigagbytes. Going by the performance benchmarks compared to other libraries for bleve at https://mosuka.github.io/search-benchmark-game/ , can we expect the performance of bluge to be comparable to bleve?

update highlighters to use strings.Builder

The highlighter currently uses many += instead of strings.Builder

update Geo Distance Query to use strongly typed distance

Currently Geo Distance Query takes distance as a string which is parsed, instead it should take strongly typed value and unit

see:

bluge/query.go

Line 519 in fd30562

func NewGeoDistanceQuery(lon, lat float64, distance string) *GeoDistanceQuery {

Race condition in InMemory usage

I encountered this multiple times while trying to load 2k documents into the in memory index.

stack trace


fatal error: concurrent map writes

goroutine 13 [running]:
runtime.throw(0x12467fc, 0x15)
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0009cfa70 sp=0xc0009cfa40 pc=0x1034112
runtime.mapassign_fast64(0x1202ec0, 0xc000074a20, 0x1430, 0xc0002f2900)
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/map_fast64.go:101 +0x33e fp=0xc0009cfab0 sp=0xc0009cfa70 pc=0x10124de
github.com/blugelabs/bluge/index.(*InMemoryDirectory).Persist(0xc00000e0c8, 0x1243273, 0x4, 0x1430, 0x29db9958, 0xc000f0c0a0, 0xc0002f2900, 0x1282400, 0xc000f0c0a0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/directory_mem.go:70 +0xd0 fp=0xc0009cfb00 sp=0xc0009cfab0 pc=0x1153f10
github.com/blugelabs/bluge/index.(*Writer).merge(0xc000053200, 0xc000e780a0, 0xa, 0xa, 0xc000f0c050, 0xa, 0xa, 0x1430, 0xa, 0x1418e01, ...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:368 +0x135 fp=0xc0009cfb78 sp=0xc0009cfb00 pc=0x1158af5
github.com/blugelabs/bluge/index.(*Writer).executeMergeTask(0xc000053200, 0xc0002f2a20, 0xc000e7c220, 0xc000f0c000, 0xc000e7c0c0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:144 +0x58e fp=0xc0009cfd80 sp=0xc0009cfb78 pc=0x11576ee
github.com/blugelabs/bluge/index.(*Writer).planMergeAtSnapshot(0xc000053200, 0xc0002f2a20, 0xc000526300, 0xa, 0x4c4b40, 0x4024000000000000, 0xa, 0x7d0, 0x4000000000000000, 0x0, ...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:118 +0x32e fp=0xc0009cfe40 sp=0xc0009cfd80 pc=0x11570ae
github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc000053200, 0xc0002f2a20, 0xc000066420)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:56 +0x293 fp=0xc0009cffc8 sp=0xc0009cfe40 pc=0x1156a93
runtime.goexit()
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0009cffd0 sp=0xc0009cffc8 pc=0x1067361
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:131 +0x8cd

goroutine 1 [runnable]:
github.com/blugelabs/bluge/index.(*Writer).prepareSegment(0xc000053200, 0xc00064cbd0, 0xc000f08020, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:324 +0x345
github.com/blugelabs/bluge/index.(*Writer).Batch(0xc000053200, 0xc00042c000, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:272 +0x2a6
github.com/blugelabs/bluge.(*Writer).Batch(...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/writer.go:63
main.(*Index).IndexDocuments(0xc0001bc3c0, 0xc00074dd50, 0x1, 0x1, 0xc0003fc000, 0x1280e40)
	/Users/shekher/workplace/shekher/src/Shekher/proj/index.go:31 +0x6b
main.createInMemIndex(0xc00010e200, 0xc0000747b0, 0xc0001dfea8)
	/Users/shekher/workplace/shekher/src/Shekher/proj/main.go:63 +0x426
main.main()
	/Users/shekher/workplace/shekher/src/Shekher/proj/main.go:15 +0x47

goroutine 7 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 8 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 9 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 10 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 11 [runnable]:
github.com/blugelabs/bluge/index.(*Writer).introducePersist(0xc000053200, 0xc000f08130, 0x2820)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/introducer.go:181 +0x69a
github.com/blugelabs/bluge/index.(*Writer).introducerLoop(0xc000053200, 0xc0002f2960, 0xc0002f29c0, 0xc0002f2a20, 0xc0000663c0, 0x2820)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/introducer.go:74 +0x2ef
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:127 +0x7fd

goroutine 12 [select]:
github.com/blugelabs/bluge/index.(*Writer).prepareIntroducePersist(0xc000053200, 0xc0002f29c0, 0xc0003b66f0, 0x1, 0x1, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:372 +0x3e5
github.com/blugelabs/bluge/index.(*Writer).persistSnapshotDirect(0xc000053200, 0xc0002f29c0, 0xc0003fc200, 0xc0003fc200, 0x5fe51000)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:325 +0x3ba
github.com/blugelabs/bluge/index.(*Writer).persistSnapshot(0xc000053200, 0xc0002f2a20, 0xc0002f29c0, 0xc0003fc200, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:233 +0x65
github.com/blugelabs/bluge/index.(*Writer).persisterLoop(0xc000053200, 0xc0002f2a20, 0xc0002f29c0, 0xc0000663c0, 0xc000066420, 0x281e)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:81 +0x4ed
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:129 +0x877

update optimizations to be triggered by new scoring, not score=none

remove unnecessary query types

Several query types duplicate functionality. Simplify things by removing these duplicates:

Disjunction (functionality provided by Boolean)
Conjunction (functionality provided by Boolean)
Phrase (same as match phrase with no analyzer)

merge method for aggregations/buckets

Aggregations only work with TopNSearch?

If I use bluge.NewAllMatches and add aggregations, no data is ever returned in the buckets.

This reproduces the failure in the test suite:

diff --git a/test/aggregations_test.go b/test/aggregations_test.go
index bddc664..41a3cdd 100644
--- a/test/aggregations_test.go
+++ b/test/aggregations_test.go
@@ -223,7 +223,7 @@ func aggregationsTests() []*RequestVerify {
        return []*RequestVerify{
                {
                        Comment: "category inventory, by type",
-                       Request: bluge.NewTopNSearch(0,
+                       Request: bluge.NewAllMatches(
                                bluge.NewTermQuery("inventory").
                                        SetField("category")),
                        Aggregations: search.Aggregations{

Is this expected behavior?

consider API consolidation to use date range everywhere

Our date range aggregations work with a date range structure, but our date range queries take 2 separate params.

Consider consolidating these.

investigate whether we can support child documents in a way similar to lucene

Now that we removed array positions, being able to offer a way to index/search child documents would help fill the gap.

How do I insert/update a json document the format of which you don't know ahead of time

How to index an unknown json document? I could do that in bleve using index.index("id", doc)

I could not find a mechanism to do that in bluge as I have to add every field manually using doc.AddField

fuzzy search score based on fuzziness of match

Release of zinc search

Hey Marty, Just wanted to let you know of my project zinc based on bluge. Thanks a ton for making bluge.

should Writer's Reader() method return err?

Currently it returns (*Reader, error), but it the error is always nil.

OpenWriter is failing after removing all entries

When I delete all entries and try to open the writer it is failing with the following errors
 	error loading snapshot epoch: 18: error peeking snapshot format version 18: EOF from writer.go
 	error opening index: existing snapshots found, but none could be loaded, exiting

This error is not happening If I leave at least one entry. 

config := bluge.DefaultConfig("index-data")
writer, err := bluge.OpenWriter(config)
if err != nil {
	log.Println(err)
	return
}

doc := bluge.NewDocument("a1")
doc.AddField(bluge.NewTextField("TestKey1", "TestKey Data1"))
if err = writer.Update(doc.ID(), doc); err != nil {
	log.Println(err)
	return
}

doc = bluge.NewDocument("a2")
doc.AddField(bluge.NewTextField("TestKey2", "TestKey Data2"))
if err = writer.Update(doc.ID(), doc); err != nil {
	log.Println(err)
	return
}

writer.Close()

writer, err = bluge.OpenWriter(config)
if err != nil {
	log.Println(err)
	return
}

if err = writer.Delete(bluge.Identifier("a1")); err != nil {
	log.Println(err)
	return
}

if writer.Delete(bluge.Identifier("a2")); err != nil {  // If I dont remove this second entry everything working fine
	log.Println(err)
	return
}

writer.Close()

writer, err = bluge.OpenWriter(config)
if err != nil {
	log.Println(err) 
	return
}
writer.Close()

Option "IncludeLocations()" doesn't work with "bluge.NewAllMatches"

Both NewTopNSearch and NewAllMatches have option: IncludeLocations().
This option only works on NewTopNSearch. All of the bluge.NewAllMatches matches have empty .Locations property.

Investigate how we can support indices being stored in non-local filesystems

For large indices, it may be beneficial to store the index in an object store like S3 and have multiple workers process the search queries. Currently we could implement a new Directory interface that fetches the files from S3 when Load is called, but it would save a ton of bandwidth if we could retrieve the necessary parts from these index files at query time. This would currently involve implementing a new segment.Data, with the appropriate ReaderAt methods that fetch the data from the cloud.

It would be useful to see if this pattern can be applied, and evaluate if any of the code/interfaces should be changed (for example Data.NewFromFile could really benefit from a Data.NewFromReaderAt in this case)