Code Monkey home page Code Monkey logo

querqy-elasticsearch's Introduction

Build Querqy Docker Integration Tests for Querqy Querqy for Solr Querqy Core

โš ๏ธ IMPORTANT: Querqy 5.5 for Solr introduces some breaking changes that will affect you if you are upgrading from an older version and if

  • you are using Info Logging, or
  • rely on the debug output format, or
  • you are using a custom rewriter implementation

See here for the release notes: https://querqy.org/docs/querqy/release-notes.html#major-changes-in-querqy-for-solr-5-5-1

Querqy

Querqy is a framework for query preprocessing in Java-based search engines.

This is the repository for querqy-core, querqy-lucene and querqy-solr. Repositories for further Querqy integrations can be found at:

Documentation and 'Getting started'

Visit docs.querqy.org/querqy/ for detailed documentation.

Please make sure you read the release notes!

Check out Querqy.org for related projects that help you speed up search software development.

Developer channel: Join #querqy on the Relevance & Matching Tech Slack space

License

Querqy is licensed under the Apache License, Version 2.

Contributing

Please read our developer guidelines before contributing.

querqy-elasticsearch's People

Contributors

dependabot[bot] avatar dobestler avatar johannesdaniel avatar renekrie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

querqy-elasticsearch's Issues

GET request for rewriter

Hi everybody,
currently it's only possible to PUT and UPDATE a rewriter (incl. the rules).
It'd be great to add a GET in order to actually see the configured rewriters (especially the rules that have been added to the rewriter).

What do you guys think?

Thanks in advance!

Info Logging fails with querqy-elasticsearch-1.7.es892.0

  1. Installed local ElasticSearch 8.9.2 on MacOS (Intel)
  2. Installed querqy-elasticsearch-1.7.es892.0 plugin
  3. Disabled Security Settings in config/elasticsearch.yml:
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
xpack.security.http.ssl:
  enabled: false
  keystore.path: certs/http.p12
xpack.security.transport.ssl:
  enabled: false
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
  1. Running it with JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/21.0.1/21.0.1+12-29] (ES bundled JDK)
  2. Upload a rewriter with Info Logging enabled
  3. Issue search requestion with Info Logging enabled
curl -X GET "localhost:9200/car/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "querqy": {
      "matching_query" : {
        "query": "infologging"
      },
      "query_fields": [ "model"],
      "rewriters": ["common_rules_infologging"],
      "info_logging": {
            "id":"REQ-ID-0043",
             "type": "DETAIL"
           }
    }
  }
}
'
  1. ERROR Response:
[2024-04-09T16:46:01,674][DEBUG][o.e.a.s.TransportSearchAction] [#####.local] All shards failed for phase: [query]org.elasticsearch.index.query.QueryShardException: failed to create query: Invalid type definition for type `querqy.rewrite.logging.ActionLog`: Failed to construct BeanSerializer for [simple type, class querqy.rewrite.logging.ActionLog]: (java.lang.IllegalArgumentException) Cannot access public java.lang.String querqy.rewrite.logging.ActionLog.getMessage() (from class querqy.rewrite.logging.ActionLog; failed to set access: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks") (through reference chain: java.util.LinkedList[0])
	at [email protected]/org.elasticsearch.index.query.SearchExecutionContext.toQuery(SearchExecutionContext.java:454)
	at [email protected]/org.elasticsearch.search.SearchService.parseSource(SearchService.java:1259)
	at [email protected]/org.elasticsearch.search.SearchService.createContext(SearchService.java:1040)
	at [email protected]/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:666)
	at [email protected]/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:543)

Taking a hunch: Some "sensitive reflection operation" within the info logging code is not wrapped like this
image

UPDATE
Interestingly, the request with "type": "REWRITER_ID" works just fine.

curl -X GET "localhost:9200/car/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "querqy": {
      "matching_query" : {
        "query": "infologging"
      },
      "query_fields": [ "model"],
      "rewriters": ["common_rules_infologging"],
      "info_logging": {
            "id":"REQ-ID-0043",
             "type": "REWRITER_ID"
           }
    }
  }
}
'
REWRITER_ID[ QUERQY ] {"id":"REQ-ID-0043","msg":["common_rules_infologging"]}

Wrong Info Logging with wildcard replace rule

Having a replace rule with wildcards:
fahrrad* => velo$1

(Velo is swissgerman-french-influenced word for Fahrrad - Fahrradschloss becomes Veloschloss)

leads to these wrong info logs:
"fahrradschloss" =>
[2023-04-17T14:32:10,732][INFO ][q.e.i.Log4jSink ] [esp1-dataHorse-3]DETAIL[ QUERQY ] {"id":"86769c95-9c63-43ac-8002-6e29ff07d3c6","msg":{"replace_rules_ruleset":[{"APPLIED_RULES":["fahrradschloss => []"]}]}}

"fahrradschloss abus" =>
Query "[2023-04-17T14:31:30,699][INFO ][q.e.i.Log4jSink ] [esp1-dataHorse-5]DETAIL[ QUERQY ] {"id":"61f6cb32-9af0-4218-bc26-d9e27c8d37a5","msg":{"replace_rules_ruleset":[{"APPLIED_RULES":["fahrradschloss => [abus]"]}]}}

Hypothesis: Info logging seems to not deal well with wildcard replace rules

error installing the plugin on 8.11.4 es

I'm trying to install the plugin on 8.11.4 es getting this error


docker run -it docker.elastic.co/elasticsearch/elasticsearch:8.11.4 bash
elasticsearch@5ce2800c67e0:~$ /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
-> Installing https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
-> Downloading https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
-> Failed installing https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
-> Rolling back https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
-> Rolled back https://github.com/querqy/querqy-elasticsearch/archive/refs/tags/querqy-elasticsearch-1.7.es8114.0.zip
Exception in thread "main" java.lang.IllegalStateException: Plugin [.installing-12740214086509187779] is missing a descriptor properties file.
	at org.elasticsearch.plugins.PluginDescriptor.readFromProperties(PluginDescriptor.java:200)
	at org.elasticsearch.plugins.cli.InstallPluginAction.loadPluginInfo(InstallPluginAction.java:858)
	at org.elasticsearch.plugins.cli.InstallPluginAction.installPlugin(InstallPluginAction.java:918)
	at org.elasticsearch.plugins.cli.InstallPluginAction.execute(InstallPluginAction.java:254)
	at org.elasticsearch.plugins.cli.InstallPluginCommand.execute(InstallPluginCommand.java:89)
	at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:54)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85)
	at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:94)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85)
	at org.elasticsearch.cli.Command.main(Command.java:50)
	at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:64)
elasticsearch@5ce2800c67e0:~$

Question: Filter rule on all search term.

The key ingredient for the rules to work is search term. Do we have a way to setup rules that can be executed for all search term

Usecase: I have 10K products, but on condition basis (I add/remove rewriters) and exclude 100 products by skuId or exclude few categories.

Publish to Maven Central

We will approach this issue as follows:

  • change groupId to org.querqy (#3 )
  • move repo to github.com/querqy
  • publish to Maven Central under org.querqy

Bump json-smart version

Assure json-smart v 2.4.8

We currently pull an earlier version via dependency path querqy-lucene -> querqy-core -> jsonpath:2.4.0, the latter is the version provided by Solr. Possible solution: exclude json-smart (or jsonpath) from dependency and override with latest version.

Improve .querqy index settings

  • allow to set number of replica for .querqy index via config, default to 1 replica
  • .query index mapping: store config map as a string to avoid conflicting automatically assigned field types
  • config should only be stored but not indexed

[querqy] unknown field [multi_match]

We are heavily using multi match with field boost. Do we have similar option?

{
    "querqy": {
        "multi_match": {
            "fields": [
                "title^60",
                "description^60",
                "id^40",
                "name^30",
                "category^25",
                "short_description^10"
            ],
            "operator": "and",
            "query": "taner",
            "type": "most_fields"
        },
        "rewriters": [
            "replace",
            "common_rules"
        ]
    }
}

Elasticsearch 7.17.11

Hello,

we're currently looking into running some Elasticsearch Updates on the 7.x branch. I saw there's Querqy 1.5 for Elasticsearch 7.17.2 available. Is there any Chance you can build and publish a Version built against 7.17.11 (current latest 7.17, released yesterday)?

Thanks in advance.

Implement info logging

Implement the ES equivalent of info logging (https://docs.querqy.org/querqy/solr-plugin-configuration.html#info-logging).

The goal is that we can track information that was emitted by rewriters. The existing implementation at Querqy Core level provides an InfoLogging framework and the Common Rules Rewriter already produces the information which rules where applied. As part of the general info logging framework, this information can be sent to a Sink.

In Solr, the only Sink implementation returns the log messages as part of the search request response. This option is not available in ES as we cannot manipulate the response. The idea is to create a Sink implementation that simply logs the messages using Java logging (as usual in ES). We should provide some request parameter that will be passed through and appended to the log message so that the message can be related to a query or to a request id.

Rewriter config should allow size > 32k

For rewriter configs > 32k we get the following exception:

{ "error":{ "root_cause":[ { "type":"illegal_argument_exception", "reason":"DocValuesField \"config\" is too large, must be <= 32766" } ], "type":"illegal_argument_exception", "reason":"DocValuesField \"config\" is too large, must be <= 32766" } }

Let's either disable DocValues in the mapping or split configs.

Workaround for CommonRules rewriter: split into multiple rewriters.

Rewriter Filter with dot in name

Hi everyone

While using the parameters to filter the selected rewriters, we encountered an issue. An example index with settings is given below. It seems that a dot in the name of the rewriter leads to unexpected behaviour. If we upload two identical rewriters, one with a dot (common_rules.test) and another with a underscore (common_rules_test) as difference in the name, the filter does not have the same effect. In the case of the underscore, the filter is applied as expected. However, in the case of the dot, the filter is appearently ignored. Other functionalities besides the filter seem to work for both cases. The example was done with querqy version 1.4.es721.0.

Best regards

PUT /test
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "brand": { "type": "text" },
      "shortSummary": { "type": "text" }
    }
  }
}

PUT /test/_doc/1
{
  "title": "Notebook",
  "brand": "HP",
  "shortSummary": "very slim"
}
PUT /test/_doc/2
{
  "title": "Laptop",
  "brand": "Apple",
  "shortSummary": "very slim"
}

POST test/_search
{
  "query": {
    "match": {
      "title": "Notebook"
    }
  }
}

PUT  /_querqy/rewriter/common_rules_test
{
    "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory",
    "config": {
        "rules" : "notebook => \nSYNONYM: laptop \n UP(1000): laptop \n @{ \n _id: \"ID1\" \n, priority: 5, \n group: [\"hardware\"], \n tenant: [\"t1\", \"t3\"] \n }@"
    }
}

PUT  /_querqy/rewriter/common_rules.test
{
    "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory",
    "config": {
        "rules" : "notebook => \nSYNONYM: laptop \n UP(1000): laptop \n @{ \n _id: \"ID1\" \n, priority: 5, \n group: [\"hardware\"], \n tenant: [\"t1\", \"t3\"] \n }@"
    }
}

#Filter is not applied correctly, synonym is applied, notebook and laptop are returned
POST test/_search
{
   "query": {
       "querqy": {
           "matching_query": {
               "query": "notebook"
           },
           "query_fields": [
               "title^3.0", "brand^2.1", "shortSummary"
           ],
           "rewriters": [
             {
               "name": "common_rules.test",
               "params": {
                 "criteria": {
                   "filter": "$[?(\"ThisIsJustATest\" in @.group)]"
                 }
               }
             }]
      }
  }
}

#Filter is applied correctly, synonym is not applied, only notebook is returned
POST test/_search
{
   "query": {
       "querqy": {
           "matching_query": {
               "query": "notebook" 
           },
           "query_fields": [
               "title^3.0", "brand^2.1", "shortSummary"
           ],
           "rewriters": [
             {
               "name": "common_rules_test",
               "params": {
                 "criteria": {
                   "filter": "$[?(\"ThisIsJustATest\" in @.group)]"
                 }
               }
             }]
      }
  }
}

Info logging should send log message only once per shard

When we use more than one shard, the new info logging (#11) creates 2 identical log messages per shard (for >=2 shards) as the query is parsed twice - once for the query phase and once for the fetch phase. We should assure that the message is only logs once.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.