Code Monkey home page Code Monkey logo

bluge's Introduction

Bluge Bluge

PkgGoDev Tests Lint

modern text indexing in go - blugelabs.com

Features

  • Supported field types:
    • Text, Numeric, Date, Geo Point
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
  • BM25 Similarity/Scoring with pluggable interfaces
  • Search result match highlighting
  • Extendable Aggregations:
    • Bucketing
      • Terms
      • Numeric Range
      • Date Range
    • Metrics
      • Min/Max/Count/Sum
      • Avg/Weighted Avg
      • Cardinality Estimation (HyperLogLog++)
      • Quantile Approximation (T-Digest)

Indexing

    config := bluge.DefaultConfig(path)
    writer, err := bluge.OpenWriter(config)
    if err != nil {
        log.Fatalf("error opening writer: %v", err)
    }
    defer writer.Close()

    doc := bluge.NewDocument("example").
        AddField(bluge.NewTextField("name", "bluge"))

    err = writer.Update(doc.ID(), doc)
    if err != nil {
        log.Fatalf("error updating document: %v", err)
    }

Querying

    reader, err := writer.Reader()
    if err != nil {
        log.Fatalf("error getting index reader: %v", err)
    }
    defer reader.Close()

    query := bluge.NewMatchQuery("bluge").SetField("name")
    request := bluge.NewTopNSearch(10, query).
        WithStandardAggregations()
    documentMatchIterator, err := reader.Search(context.Background(), request)
    if err != nil {
        log.Fatalf("error executing search: %v", err)
    }
    match, err := documentMatchIterator.Next()
    for err == nil && match != nil {
        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })
        if err != nil {
            log.Fatalf("error loading stored fields: %v", err)
        }
        match, err = documentMatchIterator.Next()
    }
    if err != nil {
        log.Fatalf("error iterator document matches: %v", err)
    }

Repobeats

Alt

License

Apache License Version 2.0

bluge's People

Contributors

michaeljs1990 avatar mschoch avatar rubiojr avatar tmm1 avatar voldyman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bluge's Issues

make it possible to use bluge with mmap disabled

The current implementation always uses mmap to open segments from disk. There should be a new option to make it possible to use regular file I/O instead.

Today, when a segment is loaded this section of code is invoked:

https://github.com/blugelabs/bluge/blob/master/index/directory_fs.go#L141-L165

It opens the file with a operating system shared lock, mmaps the file, prepares a close func over these, and builds a segment.Data using NewDataBytes(), passing the mmap'd data.

Instead, we should have a configurable function which can decide at runtime whether to mmap the file or not. Initially, we have an implementations for always mmap, and never mmap, with no runtime logic. A bluge top-level config option should let you toggle between these. In the future, applications may even directly provide this function (open issue on Bleve repo requesting this).

When the function tells us to open without mmap, we simply open the file with operating system shared lock, and then instead call segment.NewDataFile()

Export fields in BaseSearch?

BaseSearch seems like it would be nice to have its fields exported so other packages adding searchers can use this to build off since it implements a lot of the interface boilerplate. Any reason to hide BaseSearch fields from other packages?

How do I access stored fields other than _id ?

Hi! Just started learning this tool few hours ago. Could not find a way to access stored fields.
So I have a slice of objects, it is quite huge, but for example I have added only two here. I can easily get the _id field, but the field name is empty when I try too get it with match.DocValues("name"), or non-existent when I just trying to print all stored fields.

Am I overlooking something here?

Code:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/blugelabs/bluge"
)

type city struct {
	id   string
	name string
}

func main() {
	cities := []city{
		{id: "BER", name: "berlin"},
		{id: "MOW", name: "moscow"},
	}
	cfg := cityBluge(cities)
	findCities(cfg, "ber")
}

func cityBluge(cities []city) *bluge.Config {
	config := bluge.DefaultConfig("cities")
	writer, err := bluge.OpenWriter(config)
	if err != nil {
		log.Fatalf("error opening writer: %v", err)
	}
	defer writer.Close()

	batch := bluge.NewBatch()

	for _, c := range cities {
		doc := bluge.NewDocument(c.id)
		doc.AddField(bluge.NewTextField("name", c.name))
		// batch.Insert(doc)
		batch.Update(doc.ID(), doc)
	}

	err = writer.Batch(batch)
	if err != nil {
		log.Fatalf("error updating document: %v", err)
	}
	return &config
}

func findCities(cfg *bluge.Config, city string) {
	reader, err := bluge.OpenReader(*cfg)
	if err != nil {
		log.Fatalf("error getting index reader: %v", err)
	}
	defer reader.Close()

	query := bluge.NewPrefixQuery(city).SetField("name")
	request := bluge.NewTopNSearch(10, query)
	dmi, err := reader.Search(context.Background(), request)
	if err != nil {
		log.Fatalf("error executing search: %v", err)
	}
	match, err := dmi.Next()
	for err == nil && match != nil {
		bb := match.DocValues("name")
		fmt.Printf("%v\n", bb)
		err = match.VisitStoredFields(func(field string, value []byte) bool {
			fmt.Printf("New match: %s, %s\n", field, string(value))
			return true
		})
		if err != nil {
			log.Fatalf("error loading stored fields: %v", err)
		}
		match, err = dmi.Next()
	}
	if err != nil {
		log.Fatalf("error iterator document matches: %v", err)
	}
}

Output:

user@host:~/bluge-demo$ go run main.go 
[]
New match: _id, BER

review new API inconsistencies

In some places we have NewXYZ() in other places we have XYZ()

Sometimes there is tension between type XYZ ... and func XYZ() ....

Need to make choices and be consistent.

rectangle spatial index

A great project, is there any plan to increase the rectangle spatial index, such as NewGeoRectField for geo query of lines or polygons.
:)

In memory directory causes nil pointer dereference

The in memory directory implementation causes nil pointer deference because the Load method returns a nil closer..

stacktrace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x114c2ee]

goroutine 12 [running]:
github.com/blugelabs/bluge/index.(*closeOnLastRefCounter).DecRef(0xc0002e3ec0, 0x109145a, 0x8000000000000000)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/segment_plugin.go:113 +0xae
github.com/blugelabs/bluge/index.(*Snapshot).decRef(0xc00049e200, 0x3, 0x1bd995)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:77 +0xb3
github.com/blugelabs/bluge/index.(*Snapshot).Close(...)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/snapshot.go:89
github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc00003cd80, 0xc0002ec9c0, 0xc0000503c0)
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/merge.go:75 +0x4c5
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/voldyman/go/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:131 +0x8cd
exit status 2

reimplement force merge

Force merge was added to bleve after the fork.

Unfortunately it looks like a redesign may be needed to fit this into the current bluge index.

Adding an aggregation to AllMatches search causes panic

When you add an aggregation to AllMatches it panics due to assignment to nil map.

test case

func TestAllMatchesWithAggregation(t *testing.T) {
	query := NewMatchQuery("bluge").SetField("name")
	request := NewAllMatches(query)

	request.AddAggregation("score", aggregations.MaxStartingAt(search.DocumentScore(), 0))
}

result:

╰─➤  go test -run AllMatches                                                                  1 ↵
--- FAIL: TestAllMatchesWithAggregation (0.00s)
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 6 [running]:
testing.tRunner.func1.1(0x1265fc0, 0x12f63b0)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1076 +0x30d
testing.tRunner.func1(0xc000001380)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1079 +0x41a
panic(0x1265fc0, 0x12f63b0)
	/usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:969 +0x175
github.com/blugelabs/bluge/search.Aggregations.Add(...)
	/Users/voldyman/dev/bluge/search/aggregations.go:29
github.com/blugelabs/bluge.(*AllMatches).AddAggregation(...)
	/Users/voldyman/dev/bluge/search.go:200
github.com/blugelabs/bluge.TestAllMatchesWithAggregation(0xc000001380)
	/Users/voldyman/dev/bluge/search_test.go:1270 +0x105
testing.tRunner(0xc000001380, 0x12bb3c0)
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1127 +0xef
created by testing.(*T).Run
	/usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1178 +0x386
exit status 2
FAIL	github.com/blugelabs/bluge	0.080s

Indexeddb store advice

I am wishing to make Bluge work inside a browser where all the golang code is compiled to wasm.

fortunately there is a great lib that gives the developer a file system backed by indexeddb here:

https://github.com/hack-pad/hackpadfs

uses

https://github.com/hack-pad/go-indexeddb

Can you advice about the things in Bluge that would require to be done.

I suspect the segment api repo would be where the work would be done with a different go tag for JS WASM , but perhaps I am wrong.

also let me know if this is something you would be interested in having in Bluge or if I need to maintain a forked repo etc

the gui I am using is gioui btw which I am really liking . It builds for web, desktop and mobile and can use bluge lobs although I have not fully tested it with bluge on mobile yet.

I am just wanting to get gioui and Bluge working on the Wasm build of gioui.

here is the code and examples:

https://github.com/gioui

InMemoryOnly returns err: unable to find a usable snapshot

I would expect following code to work, but i get:
unable to open reader: error opening index: unable to find a usable snapshot
when i run bluge.OpenReader(config)


import (
	"context"
	"fmt"
	"log"
	"testing"

	"github.com/blugelabs/bluge"
)

func TestCanRunSimpleSearch(t *testing.T) {

	config := bluge.InMemoryOnlyConfig()

	writer, err := bluge.OpenWriter(config)
	if err != nil {
		log.Fatalf("error opening writer: %v", err)
	}

	doc := bluge.NewDocument("town:1")
	doc.AddField(bluge.NewTextField("en", "Denia, Alicante"))

	err = writer.Insert(doc)
	if err != nil {
		log.Fatalf("error updating document: %v", err)
	}

	err = writer.Close()
	if err != nil {
		log.Fatalf("error closing writer: %v", err)
	}

	reader, err := bluge.OpenReader(config)
	if err != nil {
		log.Fatalf("unable to open reader: %v", err)
	}

	defer func() {
		err = reader.Close()
		if err != nil {
			log.Fatalf("error closing reader: %v", err)
		}
	}()

	query := bluge.NewMatchQuery("denia")
	query.SetField("en")

	req := bluge.NewTopNSearch(5, query)

	dmi, err := reader.Search(context.Background(), req)
	if err != nil {
		log.Fatalf("error executing search: %v", err)
	}

	next, err := dmi.Next()
	for err == nil && next != nil {
		err = next.VisitStoredFields(func(field string, value []byte) bool {
			if field == "_id" {
				fmt.Println(string(value))
			}
			return true
		})
		if err != nil {
			log.Fatalf("error accessing stored fields: %v", err)
		}
		next, err = dmi.Next()
	}
	if err != nil {
		log.Fatalf("error iterating results: %v", err)
	}
}

review segment/type version loading

The snapshot file format records the type/version of all segments, this should be used to correctly load that segment. Review that this is actually the case.

Real-time reader does not observe changes to the index

Hi there! Coming from bleve and was surprised by the following behavior:

If I open a writer, and then simultaneously open a "near real-time reader" (with writer.Reader()) and start writing to the index, searches via the reader are unable to pick up results for anything that is written after the reader was opened. Is this the expected behavior or should the reader be able to observe future updates?

If this is the expected behavior, is it more common to keep a long-lived reader open and swap it out for a new one when the index is updated, or is the best practice to acquire a reader only when you need to perform a search and then discard it?

review config API

Based on feedback from @steveyen finding the WithXYZ() convention inconsistent.

I will document the larger rationale here and seek wider input...

How to get a single field value without iterating all of them

From the documentation the shown way to get a field value is to use something like the following on a returned document.

        err = match.VisitStoredFields(func(field string, value []byte) bool {
            if field == "_id" {
                fmt.Printf("match: %s\n", string(value))
            }
            return true
        })

However in some cases I only want to pull a single value from the document. I tried a combination of doc.LoadDocumentValues and doc.DocValues however it doesn't seem that I can retrieve a single value from a given document at present. This is using the in memory index currently.

reimplement index builder

The index builder functionality was added to bleve after the bluge fork.

It makes sense to rethink this issue. It now seems like the IndexBuilder maps more cleanly onto a IndexWriter that has simply been configured to behave slightly differnetly, possibly with some new config options or methods.

Race condition in InMemory usage

I encountered this multiple times while trying to load 2k documents into the in memory index.

stack trace


fatal error: concurrent map writes

goroutine 13 [running]:
runtime.throw(0x12467fc, 0x15)
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0009cfa70 sp=0xc0009cfa40 pc=0x1034112
runtime.mapassign_fast64(0x1202ec0, 0xc000074a20, 0x1430, 0xc0002f2900)
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/map_fast64.go:101 +0x33e fp=0xc0009cfab0 sp=0xc0009cfa70 pc=0x10124de
github.com/blugelabs/bluge/index.(*InMemoryDirectory).Persist(0xc00000e0c8, 0x1243273, 0x4, 0x1430, 0x29db9958, 0xc000f0c0a0, 0xc0002f2900, 0x1282400, 0xc000f0c0a0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/directory_mem.go:70 +0xd0 fp=0xc0009cfb00 sp=0xc0009cfab0 pc=0x1153f10
github.com/blugelabs/bluge/index.(*Writer).merge(0xc000053200, 0xc000e780a0, 0xa, 0xa, 0xc000f0c050, 0xa, 0xa, 0x1430, 0xa, 0x1418e01, ...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:368 +0x135 fp=0xc0009cfb78 sp=0xc0009cfb00 pc=0x1158af5
github.com/blugelabs/bluge/index.(*Writer).executeMergeTask(0xc000053200, 0xc0002f2a20, 0xc000e7c220, 0xc000f0c000, 0xc000e7c0c0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:144 +0x58e fp=0xc0009cfd80 sp=0xc0009cfb78 pc=0x11576ee
github.com/blugelabs/bluge/index.(*Writer).planMergeAtSnapshot(0xc000053200, 0xc0002f2a20, 0xc000526300, 0xa, 0x4c4b40, 0x4024000000000000, 0xa, 0x7d0, 0x4000000000000000, 0x0, ...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:118 +0x32e fp=0xc0009cfe40 sp=0xc0009cfd80 pc=0x11570ae
github.com/blugelabs/bluge/index.(*Writer).mergerLoop(0xc000053200, 0xc0002f2a20, 0xc000066420)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/merge.go:56 +0x293 fp=0xc0009cffc8 sp=0xc0009cfe40 pc=0x1156a93
runtime.goexit()
	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0009cffd0 sp=0xc0009cffc8 pc=0x1067361
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:131 +0x8cd

goroutine 1 [runnable]:
github.com/blugelabs/bluge/index.(*Writer).prepareSegment(0xc000053200, 0xc00064cbd0, 0xc000f08020, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:324 +0x345
github.com/blugelabs/bluge/index.(*Writer).Batch(0xc000053200, 0xc00042c000, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:272 +0x2a6
github.com/blugelabs/bluge.(*Writer).Batch(...)
	/Users/shekher/go/src/github.com/blugelabs/bluge/writer.go:63
main.(*Index).IndexDocuments(0xc0001bc3c0, 0xc00074dd50, 0x1, 0x1, 0xc0003fc000, 0x1280e40)
	/Users/shekher/workplace/shekher/src/Shekher/proj/index.go:31 +0x6b
main.createInMemIndex(0xc00010e200, 0xc0000747b0, 0xc0001dfea8)
	/Users/shekher/workplace/shekher/src/Shekher/proj/main.go:63 +0x426
main.main()
	/Users/shekher/workplace/shekher/src/Shekher/proj/main.go:15 +0x47

goroutine 7 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 8 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 9 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 10 [select]:
github.com/blugelabs/bluge/index.analysisWorker(0xc0002f28a0, 0xc0002f2900)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:549 +0xcf
github.com/blugelabs/bluge/index.OpenWriter.func1()
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:69 +0x45
created by github.com/blugelabs/bluge/index.defaultConfig.func2
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/config.go:205 +0x33

goroutine 11 [runnable]:
github.com/blugelabs/bluge/index.(*Writer).introducePersist(0xc000053200, 0xc000f08130, 0x2820)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/introducer.go:181 +0x69a
github.com/blugelabs/bluge/index.(*Writer).introducerLoop(0xc000053200, 0xc0002f2960, 0xc0002f29c0, 0xc0002f2a20, 0xc0000663c0, 0x2820)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/introducer.go:74 +0x2ef
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:127 +0x7fd

goroutine 12 [select]:
github.com/blugelabs/bluge/index.(*Writer).prepareIntroducePersist(0xc000053200, 0xc0002f29c0, 0xc0003b66f0, 0x1, 0x1, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:372 +0x3e5
github.com/blugelabs/bluge/index.(*Writer).persistSnapshotDirect(0xc000053200, 0xc0002f29c0, 0xc0003fc200, 0xc0003fc200, 0x5fe51000)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:325 +0x3ba
github.com/blugelabs/bluge/index.(*Writer).persistSnapshot(0xc000053200, 0xc0002f2a20, 0xc0002f29c0, 0xc0003fc200, 0x0, 0x0)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:233 +0x65
github.com/blugelabs/bluge/index.(*Writer).persisterLoop(0xc000053200, 0xc0002f2a20, 0xc0002f29c0, 0xc0000663c0, 0xc000066420, 0x281e)
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/persister.go:81 +0x4ed
created by github.com/blugelabs/bluge/index.OpenWriter
	/Users/shekher/go/src/github.com/blugelabs/bluge/index/writer.go:129 +0x877

remove unnecessary query types

Several query types duplicate functionality. Simplify things by removing these duplicates:

  • Disjunction (functionality provided by Boolean)
  • Conjunction (functionality provided by Boolean)
  • Phrase (same as match phrase with no analyzer)

Aggregations only work with TopNSearch?

If I use bluge.NewAllMatches and add aggregations, no data is ever returned in the buckets.

This reproduces the failure in the test suite:

diff --git a/test/aggregations_test.go b/test/aggregations_test.go
index bddc664..41a3cdd 100644
--- a/test/aggregations_test.go
+++ b/test/aggregations_test.go
@@ -223,7 +223,7 @@ func aggregationsTests() []*RequestVerify {
        return []*RequestVerify{
                {
                        Comment: "category inventory, by type",
-                       Request: bluge.NewTopNSearch(0,
+                       Request: bluge.NewAllMatches(
                                bluge.NewTermQuery("inventory").
                                        SetField("category")),
                        Aggregations: search.Aggregations{

Is this expected behavior?

OpenWriter is failing after removing all entries

When I delete all entries and try to open the writer it is failing with the following errors
 	error loading snapshot epoch: 18: error peeking snapshot format version 18: EOF from writer.go
 	error opening index: existing snapshots found, but none could be loaded, exiting

This error is not happening If I leave at least one entry. 

config := bluge.DefaultConfig("index-data")
writer, err := bluge.OpenWriter(config)
if err != nil {
	log.Println(err)
	return
}

doc := bluge.NewDocument("a1")
doc.AddField(bluge.NewTextField("TestKey1", "TestKey Data1"))
if err = writer.Update(doc.ID(), doc); err != nil {
	log.Println(err)
	return
}

doc = bluge.NewDocument("a2")
doc.AddField(bluge.NewTextField("TestKey2", "TestKey Data2"))
if err = writer.Update(doc.ID(), doc); err != nil {
	log.Println(err)
	return
}

writer.Close()

writer, err = bluge.OpenWriter(config)
if err != nil {
	log.Println(err)
	return
}

if err = writer.Delete(bluge.Identifier("a1")); err != nil {
	log.Println(err)
	return
}

if writer.Delete(bluge.Identifier("a2")); err != nil {  // If I dont remove this second entry everything working fine
	log.Println(err)
	return
}

writer.Close()

writer, err = bluge.OpenWriter(config)
if err != nil {
	log.Println(err) 
	return
}
writer.Close()

Investigate how we can support indices being stored in non-local filesystems

For large indices, it may be beneficial to store the index in an object store like S3 and have multiple workers process the search queries. Currently we could implement a new Directory interface that fetches the files from S3 when Load is called, but it would save a ton of bandwidth if we could retrieve the necessary parts from these index files at query time. This would currently involve implementing a new segment.Data, with the appropriate ReaderAt methods that fetch the data from the cloud.

It would be useful to see if this pattern can be applied, and evaluate if any of the code/interfaces should be changed (for example Data.NewFromFile could really benefit from a Data.NewFromReaderAt in this case)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.