tuziben / quickwit Goto Github PK
View Code? Open in Web Editor NEWThis project forked from quickwit-oss/quickwit
Sub-second search & analytics engine on cloud storage
Home Page: https://quickwit.io
License: Other
This project forked from quickwit-oss/quickwit
Sub-second search & analytics engine on cloud storage
Home Page: https://quickwit.io
License: Other
We will get the client noticed that the server is not Elasticsearch and we do not support this unknown product
error message when querying the Qucikwit with the Golang client.
That's because some Elasticsearch clients will check the response header whether the heads contain X-Elastic-Product
and the value is Elasticsearch
Source code in Elasticsearch clients
doc_mapping:
store_source: false
timestamp_field: timestamp
mode: dynamic
field_mappings:
- name: timestamp
type: datetime
input_formats:
- iso8601
output_format: rfc3339
stored: true
indexed: true
fast: true
precision: milliseconds
- name: request_url
type: text
tokenizer: default
# 默认分词, 无法使用词组组合的搜索方式 比如: request_url: "a b c"
# 只能使用 request_url:a AND request_url:b AND reqeust_url:c 来组合
# 但这样有一个副作用: 它可以fiter出包含了 a b c 的文档, 但不是精确匹配 a b c
- name: request_url_fast
type: text
# es 中 term 查询, keyword 类型
- name: request_url_position
type: text
tokenizer: default
record: position
# 分词, 且支持词组组合的搜索
- name: redirect_url_position_fast
type: text
tokenizer: default
record: position
fast: true
在elastcisearch中,如何设定 text 的类型, 让搜索可以满足以下的需求:
已知词组: small women laptop backpack , 现在希望搜索“laptop small”也能命中这个结果
在Elasticsearch中,你可以使用合适的分析器(analyzer)和查询技术来实现这种需求。对于这个问题,你可能需要使用 n-gram 分析器来生成所有可能的子词,以支持反向搜索。
以下是一个示例的映射和分析器设置,以及一个例子查询:
PUT your_index
{
"mappings": {
"properties": {
"your_text_field": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"settings": {
"analysis": {
"filter": {
"custom_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "custom_shingle"]
}
}
}
}
}
在这个例子中:
custom_shingle
是一个 shingle 过滤器,它将单词组合成所有可能的二元组(2-gram)。
custom_analyzer
是一个自定义的分析器,使用标准分词器,然后应用了两个过滤器:
lowercase
: 将所有单词转换为小写,确保不区分大小写。custom_shingle
: 应用了 shingle 过滤器,生成所有可能的二元组。通过这个设置,当你索引文档时,"small women laptop backpack" 会被分析为 "small women", "women laptop", "laptop backpack",从而支持你的需求。
查询时,你也需要使用相同的分析器来处理搜索词。以下是一个示例查询:
GET your_index/_search
{
"query": {
"match": {
"your_text_field": "laptop small"
}
}
}
这个查询使用相同的分析器处理搜索词,以确保搜索可以命中包含 "laptop small" 的文档。请注意,由于分析器的存在,实际上搜索的是生成的二元组。
Github archive data set
version: 0.6
index_id: gh-archive
doc_mapping:
field_mappings:
- name: id
type: text
tokenizer: raw
- name: type
type: text
fast: true
tokenizer: raw
- name: public
type: bool
fast: true
- name: payload
type: object
field_mappings:
- name: pull_request
type: object
field_mappings:
- name: body # enable phrase queries search for body filed, like 'who get this'
type: text
tokenizer: default
record: position
- name: org
type: json
tokenizer: default
- name: repo
type: json
tokenizer: default
- name: actor
type: json
tokenizer: default
- name: other
type: json
tokenizer: default
- name: created_at
type: datetime
fast: true
input_formats:
- rfc3339
fast_precision: seconds
timestamp_field: created_at
indexing_settings:
commit_timeout_secs: 10
elb_status_code:(403 or 201)
or OK
OR not ok
Describe the bug
A clear and concise description of what the bug is.
{
"message": "index ID pattern deepflow%2A
is invalid. patterns must match the following regular expression: ^[a-zA-Z\\*][a-zA-Z0-9-_\\.\\*]{0,254}$
"
}
Steps to reproduce (if applicable)
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Configuration:
Please provide:
quickwit --version
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.