mosuka / bayard Goto Github PK
View Code? Open in Web Editor NEWA full-text search and indexing server written in Rust.
License: MIT License
A full-text search and indexing server written in Rust.
License: MIT License
Ubuntu 20.04環境にてbayardを動かそうと sudo apt install rustc を実行したあと
https://bayard-search.github.io/bayard/install.html に従い「cargo install bayard」を実行したところ、パッケージが足りずに失敗しました。
最終的には 「sudo apt install librust-openssl-dev cmake golang」を行うことでセットアップが完了しました。
Initialize TokenizerManager from JSON.
Implement storage to store Raft logs using RocksDB.
I was able to get server running thanks for clear instructions.
Once the server is running, how do I query from a webpage ?
I am new to gRPC, hence I am looking for suggestions on how to query and consume results.
Thank you - Bayard team
Hi,
I’ve not used gRPC before but I’d like to give it a shot. As far as docs go, where do I look for any info regarding setting up gPRC calls to the Bayard API?
Cheers
See build log:
$ cargo install bayard --version 0.8.5
Updating crates.io index
Installing bayard v0.8.5
Compiling libc v0.2.81
Compiling autocfg v1.0.1
Compiling cfg-if v0.1.10
Compiling proc-macro2 v1.0.24
Compiling unicode-xid v0.2.1
Compiling syn v1.0.54
Compiling lazy_static v1.4.0
Compiling log v0.4.11
Compiling memchr v2.3.4
Compiling cfg-if v1.0.0
Compiling futures-core v0.3.8
Compiling bitflags v1.2.1
Compiling slab v0.4.2
Compiling cc v1.0.66
Compiling futures-io v0.3.8
Compiling pkg-config v0.3.19
Compiling once_cell v1.5.2
Compiling serde_derive v1.0.118
Compiling pin-project-lite v0.1.11
Compiling byteorder v1.3.4
Compiling fnv v1.0.7
Compiling serde v1.0.118
Compiling proc-macro-hack v0.5.19
Compiling getrandom v0.1.15
Compiling futures-sink v0.3.8
Compiling proc-macro-nested v0.1.6
Compiling itoa v0.4.6
Compiling bytes v0.5.6
Compiling pin-utils v0.1.0
Compiling encoding_index_tests v0.1.4
Compiling ppv-lite86 v0.2.10
Compiling pin-project-internal v0.4.27
Compiling ryu v1.0.5
Compiling version_check v0.9.2
Compiling pin-project-lite v0.2.0
Compiling hashbrown v0.9.1
Compiling maybe-uninit v2.0.0
Compiling adler v0.2.3
Compiling matches v0.1.8
Compiling foreign-types-shared v0.1.1
Compiling openssl v0.10.31
Compiling tinyvec_macros v0.1.0
Compiling scopeguard v1.1.0
Compiling crc32fast v1.2.1
Compiling native-tls v0.2.6
Compiling httparse v1.3.4
Compiling percent-encoding v2.1.0
Compiling openssl-probe v0.1.2
Compiling try-lock v0.2.3
Compiling const_fn v0.4.4
Compiling httpdate v0.3.2
Compiling yada v0.4.0
Compiling encoding_rs v0.8.26
error[E0658]: `while` is not allowed in a `const`
--> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/yada-0.4.0/src/builder.rs:250:5
|
250 | / while i < next_unused.len() - 1 {
251 | | next_unused[i] = (i + 1) as u8;
252 | | i += 1;
253 | | }
| |_____^
|
= note: see issue #52000 <https://github.com/rust-lang/rust/issues/52000> for more information
error[E0658]: `while` is not allowed in a `const`
--> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/yada-0.4.0/src/builder.rs:259:5
|
259 | / while i < prev_unused.len() {
260 | | prev_unused[i] = (i - 1) as u8;
261 | | i += 1;
262 | | }
| |_____^
|
= note: see issue #52000 <https://github.com/rust-lang/rust/issues/52000> for more information
error: aborting due to 2 previous errors
For more information about this error, try `rustc --explain E0658`.
error: could not compile `yada`.
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `bayard v0.8.5`, intermediate artifacts can be found at `/tmp/cargo-installmpvaBs`
Caused by:
build failed
Updating to
$ rustc --version
rustc 1.48.0 (7eac88abb 2020-11-16)
helped.
It does not provide a correct search result when the word contains "Turkish i" char.
Index documents
./bin/bayard set 1 '{ "text" : "quıt" }'
./bin/bayard set 2 '{ "text" : "quit" }'
./bin/bayard set 3 '{ "text" : "QUIT" }'
./bin/bayard set 4 '{ "text" : "QUİT" }'
Search for "quit":
./bin/bayard search text:"quit"
Actual Results
[
{"id":["3"],"text":["QUIT"]},
{"id":["2"],"text":["quit"]}
]
Expected Results
[
{"id":["4"],"text":["QUİT"]},
{"id":["3"],"text":["QUIT"]},
{"id":["2"],"text":["quit"]},
{"id":["1"],"text":["quıt"]}
]
Was wondering whether sharding is on the current roadmap?
Support CORS on REST server #90
Bayard looks great! I'm interested in transactional full-text search.
What can happen in a concurrent scenario like the following?
client1 | client2 |
---|---|
A) bulk-set [{id:1},{id:2}] | X) bulk-set [{id:5},{id:6}] |
B) bulk-set [{id:3},{id:4}] | Y) bulk-set [{id:3},{id:4}] |
C) commit | Z) rollback |
Among other things:
5
and 6
if C
"happens before" Z
but "happens after" X
?Y
overrule documents in B
if C
"happens before" Z
but "happens after" Y
?I feel like there is room in the "market" for log indexing (that can be used by Kibana) that isn't as heavy as Elasticsearch. Do you have any interest in making the bayard
API respond to Kibana
Elasticsearch format API requests and serve Elasticsearch API format responses?
One of Solr's killer features are dynamic fields (like *_t
). I also saw that your project blast is essentially able to do something similar (by guessing the type of field). Any plans to do something similar in bayard?
I wish I could have a changefeed from cockroachdb populate a search index in an idempotent way using the updated
timestamp for automatic merge resolution of documents with the same id.
Would you be willing to add a feature like external document versioning (from elasticsearch...)?
Edit 1: So if document with id 1 exists in the index with version 5, any set/bulk-set with document id 1 at version 5 or less is ignored, but version 6 and above replace the indexed value.
I wonder what the storage layer is ?
I think it's possible to make this run in web browsers as a service compiled to wasm.
Depends on the storage though being used.
Does not need raft but just a indexdb like storage.
Probably rust implementation that work on client and web.then the wasm service worker can be called via the protos of the current API
Would be pretty useful for our project at GetCourageNow on GitHub.
Make the text analyzers configurable.
Hi,
In a cluster scenario, do the HTTP Gateway or gRPC run on all modes or only on the master node?
Would be cool to have a slack channel or I can combine all my questions into one thread.
Bayard looks good.
Thanks
Support TLS on REST server #91
Hi. I've been trying out bayard and it's great so far. One thing that I've noticed now that the faceted search doesn't seem to be working. I'm using this docker image. It is tagged as v0.4.0 and has been pushed 2 months ago. Looking at CHANGES.md, it looks like faceted search has been implemented in v0.3.0 so I would assume it's already available in the docker container.
Perhaps my understanding of faceted search is wrong. Here's a minimal example:
data.jsonl (inspired from tantivy examples):
{"_id": "1", "name": "Cat", "category": ["/Felidae/Felinae/Felis"]}
{"_id": "2", "name": "Canada lynx", "category": ["/Felidae/Felinae/Lynx"]}
{"_id": "3", "name": "Cheetah", "category": ["/Felidae/Felinae/Acinonyx"]}
{"_id": "4", "name": "Tiger", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "5", "name": "Lion", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "6", "name": "Jaguar", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "7", "name": "Sunda clouded leopard", "category": ["/Felidae/Pantherinae/Neofelis"]}
{"_id": "8", "name": "Fossa", "category": ["/Eupleridae/Cryptoprocta"]}
schema.json:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": false
}
},
{
"name": "category",
"type": "hierarchical_facet"
}
]
Then, through the web api, I request the following:
curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Felidae/Felinae'
which results in
{
"count": 1,
"docs": [
{
"fields": {
"_id": [
"1"
],
"category": [
"/Felidae/Felinae/Felis"
]
},
"score": 2.016771
}
],
"facet": {
"category": {
"/Felidae/Felinae/Felis": 1
}
}
}
This is what I expect because I'm searching in the correct category. However, searching in a different category will yield the same document:
curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Eupleridae'
{
"count": 1,
"docs": [
{
"fields": {
"_id": [
"1"
],
"category": [
"/Felidae/Felinae/Felis"
]
},
"score": 2.016771
}
],
"facet": {
"category": {}
}
}
I would expect 0 documents to be returned, since no element has the name "cat" in the category "/Eupleridae".
I also noticed that "facet" is filled differently but I'm not sure how to interpret that.
This is just a minimal example. I've had a more data and I've queried for terms which exist in a category, but still other elements were returned. Am I misunderstanding faceted search, using bayard wrong, am I using an unreleased feature or is this indeed a bug?
Hi, we're currently using elasticsearch for https://deces.machid.io (using powerful levensthein fuzzy search).
We use bulk search to record linkage usecase (usefull for clinic studies).
We would be enthousiastic to study the port from elastiscearch to bayard, but bayard is missing this key feature.
Do you plan to have such bulk search api ?
if so, well done !
Docker Imageのbayard-restの/v1/documents/{id}
(GET)を実行すると、500エラー(failed to get client for node: id=0
)が発生します。
/v1/documents/1
(POST)や、/v1/commit
(GET)は正常完了します。
bayard-cliで、bayard-restの/v1/documents/{id}
と同等のコマンドを実行するとデータを取得できるため、bayard-restの不具合ではないか?と推測しています。
恐縮ですが、ご確認をよろしくお願いします🙏
$ git clone https://github.com/bayard-search/bayard.git; cd bayard
$ # docker-compose.ymlを下記のように修正
$ docker-compose up -d
$ curl -X PUT \
--header 'Content-Type: application/json' \
--data-binary @./examples/doc_ja.json \
'http://localhost:8000/v1/documents/1' # HTTP Status 200 OK
$ curl -X GET 'http://localhost:8000/v1/commit' # HTTP Status 200 OK
$ curl -X GET 'http://localhost:8000/v1/documents/1' # HTTP Status 500 NG
failed to get client for node: id=0
$ docker run --net=bayard_default --rm --name bayard-cli bayardsearch/bayard-cli:latest get 1 --server=bayard:5000 # OK
{"_id":["1"],"description":["検索エンジン(けんさくエンジン、英: search engine)は、狭義にはインターネットに存在する情報(ウェブページ、ウェブサイト、画像ファイル、ネットニュースなど)を検索する機能およびそのプログラム。インターネットの普及初期には、検索としての機能のみを提供していたウェブサイトそのものを検索エンジンと呼んだが、現在では様々なサービスが加わったポータルサイト化が進んだため、検索をサービスの一つとして提供するウェブサイトを単に検索サイトと呼ぶことはなくなっている。広義には、インターネットに限定せず情報を検索するシステム全般を含む。"],"name":["検索エンジン"],"url":["https://ja.wikipedia.org/wiki/%E6%A4%9C%E7%B4%A2%E3%82%A8%E3%83%B3%E3%82%B8%E3%83%B3"]}
docker-compose.yml は日本語の全文検索を試したかったので、下記のように修正しました。
version: '3'
services:
bayard:
container_name: bayard
image: bayardsearch/bayard:latest
entrypoint: bayard
volumes:
- ./examples/schema_ja.json:/etc/bayard/schema_ja.json
command:
- '--host=bayard'
- '--raft-port=7000'
- '--index-port=5000'
- '--metrics-port=9000'
- '--data-directory=/tmp/bayard'
- '--tokenizer-file=/etc/bayard/tokenizer.json'
- '--schema-file=/etc/bayard/schema_ja.json'
- '1'
ports:
- "15000:5000"
- "17000:7000"
- "19000:9000"
bayard-rest:
container_name: bayard-rest
image: bayardsearch/bayard-rest:latest
entrypoint: bayard-rest
ports:
- "18000:8000"
command:
- '--host=0.0.0.0'
- '--port=8000'
- '--index-address=bayard:5000'
depends_on:
- bayard
Expected result:
Can't find schema.json or it's not readable. Default location (`./etc/schema.json`) could be overridden with `-s, --schema-file` command-line paramethers
Actual result:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1084:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
It seems the documents are all in a single index. Do you plan to support multiple indexes that user can add documents into and search from separate index?
Thanks for building this great piece of software!
Hi there, currently I'm trying to use Bayard
as index server for my side project.
But I've tried many times when Bayard
encountered illegal query (eq: A - B
)
It will stop and never reply for any queries, so that I have to restart the server to make things works again.
(I followed the instructions in documentation to build a 3-nodes Bayard
cluster.)
I traced the source code and found out that maybe it was caused by these lines in bayard-server/src/index/server.rs
I've tried to modified these two unwap()
to other Rust error handling methods (eg: match .... { Ok() => ... , Err() => ... }
.
But it seems that the sender is still stuck in middle. (I've checked the requests was normally sent into the server by changing the debug!
to println!
in
)
Hope to have a solution here, thanks or I'll have to create a filter to prevent illegal queries.
Bayard
perfectly fits my needs except for supporting multiple indexes 😭
Would be great if bayard will support percolate queries.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.