Code Monkey home page Code Monkey logo

Comments (4)

slusarz avatar slusarz commented on June 15, 2024

I need to find some time to create some test scripts for this, so haven't yet been able to triage...

I will note that if this statement is true: 'it doesn't find "attachment" because it scans the mail without decoding the PDF.' -- this would be a bug in Dovecot core, not flatcurve. Dovecot core code (i.e. core maybe matching) should see the exact same text from a decoded attachment that flatcurve would, so if that's not working correctly it will need to be fixed there.

from dovecot-fts-flatcurve.

edieterich avatar edieterich commented on June 15, 2024

I hope this is helpful: I created a test in my fork: edieterich@ae73576

All the FTS plugins that come with Dovecot 2.3 pass the test, Flatcurve fails. Here's the test run: https://github.com/edieterich/dovecot-fts-flatcurve/actions/runs/3836744310

2023-01-04T09:39:35.9613970Z Testing GitHub Issue #38 using Solr
2023-01-04T09:39:38.0055653Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:38.0056124Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:38.0056465Z extensions: 0/0 individual commands failed
2023-01-04T09:39:38.0071773Z
2023-01-04T09:39:38.0072108Z Testing GitHub Issue #38 using Lucene
2023-01-04T09:39:39.5179508Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:39.5180465Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:39.5181480Z extensions: 0/0 individual commands failed
2023-01-04T09:39:39.5188413Z
2023-01-04T09:39:39.5192999Z Testing GitHub Issue #38 using Squat
2023-01-04T09:39:41.0278305Z 1 test groups: 0 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:41.0280452Z base protocol: 0/5 individual commands failed
2023-01-04T09:39:41.0281061Z extensions: 0/0 individual commands failed
2023-01-04T09:39:41.0281386Z
2023-01-04T09:39:41.0281625Z Testing GitHub Issue #38 using Flatcurve
2023-01-04T09:39:42.5530073Z *** Test issue-38 command 4/5 (line 9)
2023-01-04T09:39:42.5530968Z  - failed: Missing 1 untagged replies (1 mismatches)
2023-01-04T09:39:42.5531629Z  - first unexpanded: search 1
2023-01-04T09:39:42.5532214Z  - first expanded: search 1
2023-01-04T09:39:42.5532787Z  - best match: SEARCH
2023-01-04T09:39:42.5533603Z  - Command: search or body attachment header reply-to attachment
2023-01-04T09:39:42.5533993Z
2023-01-04T09:39:42.5543942Z 1 test groups: 1 failed, 0 skipped due to missing capabilities
2023-01-04T09:39:42.5544380Z base protocol: 1/5 individual commands failed
2023-01-04T09:39:42.5544727Z extensions: 0/0 individual commands failed
2023-01-04T09:39:42.5549769Z ERROR: Failed test (/dovecot/imaptest/issue-38/issue-38)!

I had to patch Squat to make it not run into a "NO [SERVERBUG]" failure.

There appears to be some minimum search term limit in fts_lucence, so I'm searching for "bodybody" instead of "body" as in my original description to get a match.

from dovecot-fts-flatcurve.

slusarz avatar slusarz commented on June 15, 2024

Thank you for your assistance in generating tests!

I've pushed a much smaller commit that isolates the issue. Currently, a single test fails in that branch:

ok search or body attachment header x-foo test2
* search 1

Here is what is needed to trigger the failing result:

  • It MUST be an OR search
  • The matching OR clause MUST be in decoded text (i.e. attachment data that is decoded via Tika or decode2text.sh)
  • The non-matching OR clause MUST be in a non-indexed header

The issue is that the flatcurve query correctly finds the term in the attachment indexed text, but it also does a header search on the non-indexed headers. This is the one search category that causes "maybe" matches, since flatcurve itself can't verify which header a term is located in if it is not one of the indexed headers (to, from, cc, etc.). Here, this part of the query will return no results ... but the ENTIRE query is marked as a "maybe" search due to current search limitations. Thus, the message is marked as a maybe match and passed back to FTS core code, but that code does not do any attachment decoding when doing a manual search (manual decoding would be crushing resource use if done real-time for all non-FTS indexed searches), so it doesn't find either the body or the header search, so it returns no match.

AND searches are not affected because the fts core code breaks up the queries before passing to flatcurve - thus the body is returned as a real match and the header search is returned as a maybe match, but the fts core will correctly do a manual query since it has access to header data.

Solution here is tricky and will take some thinking. Either we manually separate ALL OR searches internally within flatcurve, or we flag non-indexed header searches and do those queries separately from the rest of the search string.

from dovecot-fts-flatcurve.

slusarz avatar slusarz commented on June 15, 2024

Fixed by MR #41

from dovecot-fts-flatcurve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.