Code Monkey home page Code Monkey logo

bayard's Introduction

bayard's People

Contributors

barrotsteindev avatar dependabot[bot] avatar eko avatar fulmicoton avatar gitter-badger avatar hhatto avatar iyesin avatar kenoss avatar klausondrag avatar messense avatar mosuka avatar msmakhlouf avatar robatipoor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayard's Issues

client endpoint access

I was able to get server running thanks for clear instructions.
Once the server is running, how do I query from a webpage ?
I am new to gRPC, hence I am looking for suggestions on how to query and consume results.
Thank you - Bayard team

gRPC doc

Hi,

I’ve not used gRPC before but I’d like to give it a shot. As far as docs go, where do I look for any info regarding setting up gPRC calls to the Bayard API?

Cheers

Documentation mentions Rust >= 1.39.0 but failing to build with 1.45.2

See build log:

$ cargo install bayard --version 0.8.5
    Updating crates.io index
  Installing bayard v0.8.5
   Compiling libc v0.2.81
   Compiling autocfg v1.0.1
   Compiling cfg-if v0.1.10
   Compiling proc-macro2 v1.0.24
   Compiling unicode-xid v0.2.1
   Compiling syn v1.0.54
   Compiling lazy_static v1.4.0
   Compiling log v0.4.11
   Compiling memchr v2.3.4
   Compiling cfg-if v1.0.0
   Compiling futures-core v0.3.8
   Compiling bitflags v1.2.1
   Compiling slab v0.4.2
   Compiling cc v1.0.66
   Compiling futures-io v0.3.8
   Compiling pkg-config v0.3.19
   Compiling once_cell v1.5.2
   Compiling serde_derive v1.0.118
   Compiling pin-project-lite v0.1.11
   Compiling byteorder v1.3.4
   Compiling fnv v1.0.7
   Compiling serde v1.0.118
   Compiling proc-macro-hack v0.5.19
   Compiling getrandom v0.1.15
   Compiling futures-sink v0.3.8
   Compiling proc-macro-nested v0.1.6
   Compiling itoa v0.4.6
   Compiling bytes v0.5.6
   Compiling pin-utils v0.1.0
   Compiling encoding_index_tests v0.1.4
   Compiling ppv-lite86 v0.2.10
   Compiling pin-project-internal v0.4.27
   Compiling ryu v1.0.5
   Compiling version_check v0.9.2
   Compiling pin-project-lite v0.2.0
   Compiling hashbrown v0.9.1
   Compiling maybe-uninit v2.0.0
   Compiling adler v0.2.3
   Compiling matches v0.1.8
   Compiling foreign-types-shared v0.1.1
   Compiling openssl v0.10.31
   Compiling tinyvec_macros v0.1.0
   Compiling scopeguard v1.1.0
   Compiling crc32fast v1.2.1
   Compiling native-tls v0.2.6
   Compiling httparse v1.3.4
   Compiling percent-encoding v2.1.0
   Compiling openssl-probe v0.1.2
   Compiling try-lock v0.2.3
   Compiling const_fn v0.4.4
   Compiling httpdate v0.3.2
   Compiling yada v0.4.0
   Compiling encoding_rs v0.8.26
error[E0658]: `while` is not allowed in a `const`
   --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/yada-0.4.0/src/builder.rs:250:5
    |
250 | /     while i < next_unused.len() - 1 {
251 | |         next_unused[i] = (i + 1) as u8;
252 | |         i += 1;
253 | |     }
    | |_____^
    |
    = note: see issue #52000 <https://github.com/rust-lang/rust/issues/52000> for more information

error[E0658]: `while` is not allowed in a `const`
   --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/yada-0.4.0/src/builder.rs:259:5
    |
259 | /     while i < prev_unused.len() {
260 | |         prev_unused[i] = (i - 1) as u8;
261 | |         i += 1;
262 | |     }
    | |_____^
    |
    = note: see issue #52000 <https://github.com/rust-lang/rust/issues/52000> for more information

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0658`.
error: could not compile `yada`.

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `bayard v0.8.5`, intermediate artifacts can be found at `/tmp/cargo-installmpvaBs`

Caused by:
  build failed

Updating to

$ rustc --version
rustc 1.48.0 (7eac88abb 2020-11-16)

helped.

Full Text returns wrong results for Turkish

It does not provide a correct search result when the word contains "Turkish i" char.

Index documents

./bin/bayard set 1 '{ "text" : "quıt" }'
./bin/bayard set 2 '{ "text" : "quit" }'
./bin/bayard set 3 '{ "text" : "QUIT" }'
./bin/bayard set 4 '{ "text" : "QUİT" }'

Search for "quit":

./bin/bayard search text:"quit"

Actual Results

[
  {"id":["3"],"text":["QUIT"]},
  {"id":["2"],"text":["quit"]}
]

Expected Results

[
  {"id":["4"],"text":["QUİT"]},
  {"id":["3"],"text":["QUIT"]},
  {"id":["2"],"text":["quit"]},
  {"id":["1"],"text":["quıt"]}
]

Sharding

Was wondering whether sharding is on the current roadmap?

Consistency guarantees

Bayard looks great! I'm interested in transactional full-text search.

What can happen in a concurrent scenario like the following?

client1 client2
A) bulk-set [{id:1},{id:2}] X) bulk-set [{id:5},{id:6}]
B) bulk-set [{id:3},{id:4}] Y) bulk-set [{id:3},{id:4}]
C) commit Z) rollback

Among other things:

  1. Can client1 commit documents 5 and 6 if C "happens before" Z but "happens after" X?
  2. Do documents in Y overrule documents in B if C "happens before" Z but "happens after" Y?
  3. If client1 and client2 are talking to different bayard nodes, is the behavior different than talking to the same bayard node?
  4. Are client transactions isolated from each other even if they are talking to the same bayard node?

Elasticsearch + Kibana compatibility layer

I feel like there is room in the "market" for log indexing (that can be used by Kibana) that isn't as heavy as Elasticsearch. Do you have any interest in making the bayard API respond to Kibana Elasticsearch format API requests and serve Elasticsearch API format responses?

Support for dynamic fields?

One of Solr's killer features are dynamic fields (like *_t). I also saw that your project blast is essentially able to do something similar (by guessing the type of field). Any plans to do something similar in bayard?

WASM for client sode

I wonder what the storage layer is ?

I think it's possible to make this run in web browsers as a service compiled to wasm.

Depends on the storage though being used.
Does not need raft but just a indexdb like storage.

Probably rust implementation that work on client and web.then the wasm service worker can be called via the protos of the current API

Would be pretty useful for our project at GetCourageNow on GitHub.

Cluster

Hi,

In a cluster scenario, do the HTTP Gateway or gRPC run on all modes or only on the master node?

Would be cool to have a slack channel or I can combine all my questions into one thread.

Bayard looks good.

Thanks

Faceted search not working with docker image v0.4.0

Hi. I've been trying out bayard and it's great so far. One thing that I've noticed now that the faceted search doesn't seem to be working. I'm using this docker image. It is tagged as v0.4.0 and has been pushed 2 months ago. Looking at CHANGES.md, it looks like faceted search has been implemented in v0.3.0 so I would assume it's already available in the docker container.

Perhaps my understanding of faceted search is wrong. Here's a minimal example:
data.jsonl (inspired from tantivy examples):

{"_id": "1", "name": "Cat", "category": ["/Felidae/Felinae/Felis"]}
{"_id": "2", "name": "Canada lynx", "category": ["/Felidae/Felinae/Lynx"]}
{"_id": "3", "name": "Cheetah", "category": ["/Felidae/Felinae/Acinonyx"]}
{"_id": "4", "name": "Tiger", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "5", "name": "Lion", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "6", "name": "Jaguar", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "7", "name": "Sunda clouded leopard", "category": ["/Felidae/Pantherinae/Neofelis"]}
{"_id": "8", "name": "Fossa", "category": ["/Eupleridae/Cryptoprocta"]}

schema.json:

[
  {
    "name": "_id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "basic",
        "tokenizer": "raw"
      },
      "stored": true
    }
  },
  {
    "name": "name",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": false
    }
  },
  {
    "name": "category",
    "type": "hierarchical_facet"
  }
]

Then, through the web api, I request the following:

curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Felidae/Felinae'

which results in

{
  "count": 1,
  "docs": [
    {
      "fields": {
        "_id": [
          "1"
        ],
        "category": [
          "/Felidae/Felinae/Felis"
        ]
      },
      "score": 2.016771
    }
  ],
  "facet": {
    "category": {
      "/Felidae/Felinae/Felis": 1
    }
  }
}

This is what I expect because I'm searching in the correct category. However, searching in a different category will yield the same document:

curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Eupleridae'
{
  "count": 1,
  "docs": [
    {
      "fields": {
        "_id": [
          "1"
        ],
        "category": [
          "/Felidae/Felinae/Felis"
        ]
      },
      "score": 2.016771
    }
  ],
  "facet": {
    "category": {}
  }
}

I would expect 0 documents to be returned, since no element has the name "cat" in the category "/Eupleridae".

I also noticed that "facet" is filled differently but I'm not sure how to interpret that.

This is just a minimal example. I've had a more data and I've queried for terms which exist in a category, but still other elements were returned. Am I misunderstanding faceted search, using bayard wrong, am I using an unreleased feature or is this indeed a bug?

Bulk search API

Hi, we're currently using elasticsearch for https://deces.machid.io (using powerful levensthein fuzzy search).

We use bulk search to record linkage usecase (usefull for clinic studies).
We would be enthousiastic to study the port from elastiscearch to bayard, but bayard is missing this key feature.

Do you plan to have such bulk search api ?

Dockerのbayard-restで500エラーが発生します

概要

Docker Imageのbayard-restの/v1/documents/{id}(GET)を実行すると、500エラー(failed to get client for node: id=0)が発生します。

/v1/documents/1(POST)や、/v1/commit(GET)は正常完了します。

bayard-cliで、bayard-restの/v1/documents/{id}と同等のコマンドを実行するとデータを取得できるため、bayard-restの不具合ではないか?と推測しています。

恐縮ですが、ご確認をよろしくお願いします🙏

環境

  • CentOS 7
  • macOS Catalina

手順

$ git clone https://github.com/bayard-search/bayard.git; cd bayard
$ # docker-compose.ymlを下記のように修正
$ docker-compose up -d
$ curl -X PUT \
    --header 'Content-Type: application/json' \
    --data-binary @./examples/doc_ja.json \
    'http://localhost:8000/v1/documents/1' # HTTP Status 200 OK
$ curl -X GET 'http://localhost:8000/v1/commit' # HTTP Status 200 OK
$ curl -X GET 'http://localhost:8000/v1/documents/1' # HTTP Status 500 NG
failed to get client for node: id=0

$ docker run --net=bayard_default --rm --name bayard-cli bayardsearch/bayard-cli:latest get 1 --server=bayard:5000 # OK
{"_id":["1"],"description":["検索エンジン(けんさくエンジン、英: search engine)は、狭義にはインターネットに存在する情報(ウェブページ、ウェブサイト、画像ファイル、ネットニュースなど)を検索する機能およびそのプログラム。インターネットの普及初期には、検索としての機能のみを提供していたウェブサイトそのものを検索エンジンと呼んだが、現在では様々なサービスが加わったポータルサイト化が進んだため、検索をサービスの一つとして提供するウェブサイトを単に検索サイトと呼ぶことはなくなっている。広義には、インターネットに限定せず情報を検索するシステム全般を含む。"],"name":["検索エンジン"],"url":["https://ja.wikipedia.org/wiki/%E6%A4%9C%E7%B4%A2%E3%82%A8%E3%83%B3%E3%82%B8%E3%83%B3"]}

docker-compose.yml は日本語の全文検索を試したかったので、下記のように修正しました。

version: '3'
services:
  bayard:
    container_name: bayard
    image: bayardsearch/bayard:latest
    entrypoint: bayard
    volumes:
      - ./examples/schema_ja.json:/etc/bayard/schema_ja.json
    command:
      - '--host=bayard'
      - '--raft-port=7000'
      - '--index-port=5000'
      - '--metrics-port=9000'
      - '--data-directory=/tmp/bayard'
      - '--tokenizer-file=/etc/bayard/tokenizer.json'
      - '--schema-file=/etc/bayard/schema_ja.json'
      - '1'
    ports:
      - "15000:5000"
      - "17000:7000"
      - "19000:9000"

  bayard-rest:
    container_name: bayard-rest
    image: bayardsearch/bayard-rest:latest
    entrypoint: bayard-rest
    ports:
      - "18000:8000"
    command:
      - '--host=0.0.0.0'
      - '--port=8000'
      - '--index-address=bayard:5000'
    depends_on:
      - bayard

Cryptic panic message if schema.json was not found

Expected result:

Can't find schema.json or it's not readable. Default location (`./etc/schema.json`) could be overridden with `-s, --schema-file` command-line paramethers

Actual result:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1084:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Support multiple indexes

It seems the documents are all in a single index. Do you plan to support multiple indexes that user can add documents into and search from separate index?

Thanks for building this great piece of software!

unwrap() causing errors

Hi there, currently I'm trying to use Bayard as index server for my side project.
But I've tried many times when Bayard encountered illegal query (eq: A - B)
It will stop and never reply for any queries, so that I have to restart the server to make things works again.
(I followed the instructions in documentation to build a 3-nodes Bayard cluster.)

image
I traced the source code and found out that maybe it was caused by these lines in bayard-server/src/index/server.rs

image

image

image

I've tried to modified these two unwap() to other Rust error handling methods (eg: match .... { Ok() => ... , Err() => ... } .
But it seems that the sender is still stuck in middle. (I've checked the requests was normally sent into the server by changing the debug! to println! in
image
)

Hope to have a solution here, thanks or I'll have to create a filter to prevent illegal queries.
Bayard perfectly fits my needs except for supporting multiple indexes 😭

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.