Comments (5)
I can verify that this is happening, but this does not appear to be a flatcurve issue. It is, indeed, the snowball filter that causes two terms (the original term and the stemmed term) to be passed into flatcurve as an "OR" search.
Additional search terms I tested: "running" and "smashes". I see these queries:
May 23 22:51:29 imap(user)<139><ZS5ErrXftrJ/AAAB>: Debug: fts-flatcurve(imaptest): Query (body:running* OR body:run*) matches=0 uids=
May 23 22:51:29 imap(user)<139><ZS5ErrXftrJ/AAAB>: Debug: fts-flatcurve(imaptest): Query (body:smash* OR body:smashes*) matches=0 uids=
I will verify with the team whether this is the expected behavior of snowball.
Flatcurve has no way of knowing that the terms are related to each other. And it makes little sense for flatcurve to manually optimize these queries by filtering redundant search terms - that's quite complicated code, and Xapian index queries are plenty fast that this is not going to make any difference. So if this is truly the expected behavior of snowball, the current behavior is fine.
from dovecot-fts-flatcurve.
What are your Dovecot fts_filters settings?
It sort of looks like there is something weird going on with snowball filtering (or related). Since the first term is the stemmed version of the original query - without the English suffix ("ing"). https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_filters
from dovecot-fts-flatcurve.
hi @slusarz
What are your Dovecot fts_filters settings?
currently, similar to the settings I'd been using with my prior fts_solr install,
...
mail_plugins = virtual acl fts fts_flatcurve
plugin {
fts = flatcurve
fts_enforced = yes
fts_autoindex = yes
fts_autoindex_max_recent_msgs = 999
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
fts_filters = normalizer-icu lowercase snowball stopwords
fts_filters_en = lowercase snowball english-possessive stopwords
fts_languages = en es de fr it pt
fts_language_config = /usr/share/libexttextcat/fpdb.conf
fts_tokenizers = generic email-address
fts_tokenizer_generic = algorithm=simple
# BUG: wait for dovecot 2.3.19 ...
# https://github.com/slusarz/dovecot-fts-flatcurve/issues/22
# fts_header_excludes = *
# fts_header_includes = From To Cc Bcc Subject Message-ID
}
...
from dovecot-fts-flatcurve.
fwiw, while troubleshooting tika searching, I exec'd a body search for a term "mairzy".
& noted in logs:
Debug: fts-flatcurve(INBOX): Query (body:mairzy* OR body:mairzi*) matches=0 uids=
notice that OR'd query is not just a redundant truncation, but an unwanted variant, namely:
mairzy*
, which would return desired results,
vs
mairzi*
, which would not
from dovecot-fts-flatcurve.
So I think you answered your own question - the snowball filter does potentially provide independent terms that WILL result in different search results.
So maybe the snowball filter could be improved to not pass these terms to the FTS backend if they are redundant. But regardless, the snowball filter is not a flatcurve component, it's a core component, so this is the wrong place to be discussing that improvement. So I'm going to close this ticket (since, even with the redundant queries, flatcurve is returning the correct results).
from dovecot-fts-flatcurve.
Related Issues (20)
- BODY text search doesn't seem to work as expected HOT 3
- dlopen() failed: lib21_fts_flatcurve_plugin.so: undefined symbol HOT 6
- attachment search support config? HOT 2
- Build failing HOT 1
- Searching for phrases with IMAP SEARCH vs. doveadm search, v0.2.0 vs. v0.3.0 HOT 3
- Feature Request: Change Flatcurve Index Location via config parameter HOT 1
- Separate directory for indexing? HOT 7
- Search by some email addresses is not working HOT 3
- Virtual search doesn't consider unindexed messages HOT 2
- backtrace during indexing HOT 2
- Inconsistent results when searching in attachment HOT 4
- Filename too long errors during FTS optimize HOT 5
- Panic when running a search HOT 2
- Inaccurate results while searching for a phrase in subject HOT 2
- Excessively long string sent to xapian to index HOT 1
- Using fts-flatcure with Dovecot 2.3.13 / Debian Bookworm HOT 2
- Example config anywhere? HOT 2
- Error: fts: Failed to initialize backend 'flatcurve': Unknown backend HOT 2
- imap segfault with fts flatcurve HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dovecot-fts-flatcurve.