Code Monkey home page Code Monkey logo

quickwit's Introduction

tuziben.me

quickwit's People

quickwit's Issues

compatible with ES client

We will get the client noticed that the server is not Elasticsearch and we do not support this unknown product error message when querying the Qucikwit with the Golang client.

That's because some Elasticsearch clients will check the response header whether the heads contain X-Elastic-Product and the value is Elasticsearch

Source code in Elasticsearch clients

quickwit query 模式

doc_mapping:
  store_source: false
  timestamp_field: timestamp
  mode: dynamic
  field_mappings:
    - name: timestamp
      type: datetime
      input_formats:
        - iso8601
      output_format: rfc3339
      stored: true
      indexed: true
      fast: true
      precision: milliseconds

    - name: request_url
      type: text
      tokenizer: default
     # 默认分词, 无法使用词组组合的搜索方式 比如: request_url: "a b c"
     # 只能使用 request_url:a AND request_url:b AND reqeust_url:c 来组合
     # 但这样有一个副作用: 它可以fiter出包含了 a b c 的文档, 但不是精确匹配 a b c

    - name: request_url_fast
      type: text
      # es 中 term 查询, keyword 类型

    - name: request_url_position
      type: text
      tokenizer: default
      record: position
      # 分词, 且支持词组组合的搜索

    - name: redirect_url_position_fast
      type: text
      tokenizer: default
      record: position
      fast: true

elasticsearch query

在elastcisearch中,如何设定 text 的类型, 让搜索可以满足以下的需求:
已知词组: small women laptop backpack , 现在希望搜索“laptop small”也能命中这个结果

在Elasticsearch中,你可以使用合适的分析器(analyzer)和查询技术来实现这种需求。对于这个问题,你可能需要使用 n-gram 分析器来生成所有可能的子词,以支持反向搜索。

以下是一个示例的映射和分析器设置,以及一个例子查询:

PUT your_index
{
  "mappings": {
    "properties": {
      "your_text_field": {
        "type": "text",
        "analyzer": "custom_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "custom_shingle": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 2
        }
      },
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "custom_shingle"]
        }
      }
    }
  }
}

在这个例子中:

  • custom_shingle 是一个 shingle 过滤器,它将单词组合成所有可能的二元组(2-gram)。

  • custom_analyzer 是一个自定义的分析器,使用标准分词器,然后应用了两个过滤器:

    • lowercase: 将所有单词转换为小写,确保不区分大小写。
    • custom_shingle: 应用了 shingle 过滤器,生成所有可能的二元组。

通过这个设置,当你索引文档时,"small women laptop backpack" 会被分析为 "small women", "women laptop", "laptop backpack",从而支持你的需求。

查询时,你也需要使用相同的分析器来处理搜索词。以下是一个示例查询:

GET your_index/_search
{
  "query": {
    "match": {
      "your_text_field": "laptop small"
    }
  }
}

这个查询使用相同的分析器处理搜索词,以确保搜索可以命中包含 "laptop small" 的文档。请注意,由于分析器的存在,实际上搜索的是生成的二元组。

Quickwit index mapping config

Github archive data set

version: 0.6

index_id: gh-archive

doc_mapping:
  field_mappings:
    - name: id
      type: text
      tokenizer: raw
    - name: type
      type: text
      fast: true
      tokenizer: raw
    - name: public
      type: bool
      fast: true
    - name: payload
      type: object
      field_mappings:
        - name: pull_request
          type: object
          field_mappings:
            - name: body              # enable phrase queries search for body filed, like 'who get this'
              type: text
              tokenizer: default
              record: position
    - name: org
      type: json
      tokenizer: default
    - name: repo
      type: json
      tokenizer: default
    - name: actor
      type: json
      tokenizer: default
    - name: other
      type: json
      tokenizer: default
    - name: created_at
      type: datetime
      fast: true
      input_formats:
        - rfc3339
      fast_precision: seconds
  timestamp_field: created_at

indexing_settings:
  commit_timeout_secs: 10

lucene OR

elb_status_code:(403 or 201)

or OK
OR not ok

search gh-archive-* error

Describe the bug
A clear and concise description of what the bug is.

{
"message": "index ID pattern deepflow%2A is invalid. patterns must match the following regular expression: ^[a-zA-Z\\*][a-zA-Z0-9-_\\.\\*]{0,254}$"
}

Steps to reproduce (if applicable)
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Configuration:
Please provide:

  1. Output of quickwit --version
  2. The index_config.yaml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.