Thank you for the wonderful library. I have queries that have some f

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Yes sorry, you have to write a rule: <div class="highlight highlight-source-python

Allowing double quotes about luqum HOT 13 CLOSED

stevesmit commented on August 9, 2024

Allowing double quotes

from luqum.

Comments (13)

alexgarel commented on August 9, 2024 1

You'll have to make your own parser.py file (PLY is not very flexible on that point, see issue 49).

You can change PHRASE_RE but I imagine you have to verify that you have as much " at the begining as in the end ". So you may better go duplicating most of it ! So maybe adding a DPHRASE_RE copying PHRASE_RE but with double quote and a t_DPHRASE alike t_PHRASE is the best way to go.

from luqum.

alexgarel commented on August 9, 2024

Hi @stevesmit where does this double double quote comes from ?
Is it normally supported by Lucene ? For me it's not. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

For what I understand, the parser is right there, it's the expression you should fix.

from luqum.

stevesmit commented on August 9, 2024

It comes from a company's proprietary query syntax which is very Lucene-like, I suppose. Fair enough - any thoughts on editing the grammar specification to allow parsing such expressions? I don't mind (and would have to!) editing a bit of source on my side.

from luqum.

stevesmit commented on August 9, 2024

Alright I tried that, but it unfortunately doesn't parse it (still giving the same output as before). I added the following to pieces of code to parser.py as you mentioned:

DPHRASE_RE = r'''
(?P<phrase>  # phrase
  ""          # opening double quotes
  (?:        # repeating
    [^\\"]   # - a char which is not escape or end of phrase
    |        # OR
    \\.      # - an escaped char
  )*
  ""          # closing double quotes
)'''

@lex.TOKEN(DPHRASE_RE)
def t_DPHRASE(t):
    m = re.match(DPHRASE_RE, t.value, re.VERBOSE)
    value = m.group("phrase")
    t.value = Phrase(value)
    return t

Is there any other code that needs to be edited to take account of this change?

from luqum.

alexgarel commented on August 9, 2024

Maybe add DPHRASE in precedence, before PHRASE ?

from luqum.

alexgarel commented on August 9, 2024

Also add DPHRASE in tokens.

from luqum.

stevesmit commented on August 9, 2024

Alright I did that, and I got the following notice when loading the library:

WARNING: Token 'DPHRASE' defined, but not used
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 2 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts

Output is still the same as before :/

from luqum.

alexgarel commented on August 9, 2024

Yes sorry, you have to write a rule:

def p_double_quoting(p):
    'unary_expression : DPHRASE'
    p[0] = p[1]

from luqum.

stevesmit commented on August 9, 2024

Alright did that, now got this warning when importing the library:

WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 1 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts

And still get the same output when parsing. Is there anywhere I need to point to this rule that I've added?

from luqum.

alexgarel commented on August 9, 2024

Yes sorry, you should also try yourself ;-) just mimic what's done for PHRASE and report here when you're done !

So yes you have to add it there:

def p_phrase_or_term(p):
    '''phrase_or_term : TERM
                      | PHRASE
                      | DPHRASE'''
    p[0] = p[1]

Also you may want to add it to p_proximity:

def p_proximity(p):
    '''unary_expression : PHRASE APPROX
                        | DPHRASE  APPROX'''
    p[0] = Proximity(p[1], p[2])

from luqum.

stevesmit commented on August 9, 2024

Unfortunately still getting the exact same output as before after trying that.

from luqum.

alexgarel commented on August 9, 2024

Maybe you have to change PHRASE_RE so that it does not match "" alone ? Or at least "" followed by some char.

So maybe

PHRASE_RE = r'''
(?P<phrase>  # phrase
  "          # opening quote
  (?:        # repeating
    [^\\"]   # - a char which is not escape or end of phrase
    |        # OR
    \\.      # - an escaped char
  )+
  "        # closing quote
  |  # or 
  ""(?!\w)  # empty quote but no char after
)'''

from luqum.

stevesmit commented on August 9, 2024

@alexgarel That seems to have done the trick! Thanks very much.

from luqum.

Allowing double quotes about luqum HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent