Comments (4)
I need to find some time to create some test scripts for this, so haven't yet been able to triage...
I will note that if this statement is true: 'it doesn't find "attachment" because it scans the mail without decoding the PDF.' -- this would be a bug in Dovecot core, not flatcurve. Dovecot core code (i.e. core maybe matching) should see the exact same text from a decoded attachment that flatcurve would, so if that's not working correctly it will need to be fixed there.
from dovecot-fts-flatcurve.
I hope this is helpful: I created a test in my fork: edieterich@ae73576
All the FTS plugins that come with Dovecot 2.3 pass the test, Flatcurve fails. Here's the test run: https://github.com/edieterich/dovecot-fts-flatcurve/actions/runs/3836744310
2023-01-04T09:39:35.9613970Z Testing GitHub Issue #38 using Solr
2023-01-04T09:39:38.0055653Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:38.0056124Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:38.0056465Z extensions: 0/0 individual commands failed
2023-01-04T09:39:38.0071773Z
2023-01-04T09:39:38.0072108Z Testing GitHub Issue #38 using Lucene
2023-01-04T09:39:39.5179508Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:39.5180465Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:39.5181480Z extensions: 0/0 individual commands failed
2023-01-04T09:39:39.5188413Z
2023-01-04T09:39:39.5192999Z Testing GitHub Issue #38 using Squat
2023-01-04T09:39:41.0278305Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:41.0280452Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:41.0281061Z extensions: 0/0 individual commands failed
2023-01-04T09:39:41.0281386Z
2023-01-04T09:39:41.0281625Z Testing GitHub Issue #38 using Flatcurve
2023-01-04T09:39:42.5530073Z *** Test issue-38 command 4/5 (line 9)
2023-01-04T09:39:42.5530968Z - failed: Missing 1 untagged replies (1 mismatches)
2023-01-04T09:39:42.5531629Z - first unexpanded: search 1
2023-01-04T09:39:42.5532214Z - first expanded: search 1
2023-01-04T09:39:42.5532787Z - best match: SEARCH
2023-01-04T09:39:42.5533603Z - Command: search or body attachment header reply-to attachment
2023-01-04T09:39:42.5533993Z
2023-01-04T09:39:42.5543942Z 1 test groups: 1 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:42.5544380Z base protocol: 1/5 individual commands failed
2023-01-04T09:39:42.5544727Z extensions: 0/0 individual commands failed
2023-01-04T09:39:42.5549769Z ERROR: Failed test (/dovecot/imaptest/issue-38/issue-38)!
I had to patch Squat to make it not run into a "NO [SERVERBUG]" failure.
There appears to be some minimum search term limit in fts_lucence, so I'm searching for "bodybody" instead of "body" as in my original description to get a match.
from dovecot-fts-flatcurve.
Thank you for your assistance in generating tests!
I've pushed a much smaller commit that isolates the issue. Currently, a single test fails in that branch:
ok search or body attachment header x-foo test2
* search 1
Here is what is needed to trigger the failing result:
- It MUST be an OR search
- The matching OR clause MUST be in decoded text (i.e. attachment data that is decoded via Tika or decode2text.sh)
- The non-matching OR clause MUST be in a non-indexed header
The issue is that the flatcurve query correctly finds the term in the attachment indexed text, but it also does a header search on the non-indexed headers. This is the one search category that causes "maybe" matches, since flatcurve itself can't verify which header a term is located in if it is not one of the indexed headers (to, from, cc, etc.). Here, this part of the query will return no results ... but the ENTIRE query is marked as a "maybe" search due to current search limitations. Thus, the message is marked as a maybe match and passed back to FTS core code, but that code does not do any attachment decoding when doing a manual search (manual decoding would be crushing resource use if done real-time for all non-FTS indexed searches), so it doesn't find either the body or the header search, so it returns no match.
AND searches are not affected because the fts core code breaks up the queries before passing to flatcurve - thus the body is returned as a real match and the header search is returned as a maybe match, but the fts core will correctly do a manual query since it has access to header data.
Solution here is tricky and will take some thinking. Either we manually separate ALL OR searches internally within flatcurve, or we flag non-indexed header searches and do those queries separately from the rest of the search string.
from dovecot-fts-flatcurve.
Fixed by MR #41
from dovecot-fts-flatcurve.
Related Issues (20)
- Build failing HOT 1
- Searching for phrases with IMAP SEARCH vs. doveadm search, v0.2.0 vs. v0.3.0 HOT 3
- Feature Request: Change Flatcurve Index Location via config parameter HOT 1
- Separate directory for indexing? HOT 7
- Search by some email addresses is not working HOT 3
- Virtual search doesn't consider unindexed messages HOT 2
- backtrace during indexing HOT 2
- Filename too long errors during FTS optimize HOT 5
- Panic when running a search HOT 2
- Inaccurate results while searching for a phrase in subject HOT 2
- Excessively long string sent to xapian to index HOT 1
- Using fts-flatcure with Dovecot 2.3.13 / Debian Bookworm HOT 2
- Example config anywhere? HOT 2
- Error: fts: Failed to initialize backend 'flatcurve': Unknown backend HOT 2
- imap segfault with fts flatcurve HOT 11
- remove use of .la files HOT 7
- fts_filter_normalizer_icu: libicu support not built in HOT 6
- Assertion crash for search in virtual mailbox
- Segfault with Dovecot 1:2.3.21+dfsg1-2 from Debian HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dovecot-fts-flatcurve.