Code Monkey home page Code Monkey logo

Comments (13)

alexgarel avatar alexgarel commented on August 9, 2024 1

You'll have to make your own parser.py file (PLY is not very flexible on that point, see issue 49).

You can change PHRASE_RE but I imagine you have to verify that you have as much " at the begining as in the end ". So you may better go duplicating most of it ! So maybe adding a DPHRASE_RE copying PHRASE_RE but with double quote and a t_DPHRASE alike t_PHRASE is the best way to go.

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Hi @stevesmit where does this double double quote comes from ?
Is it normally supported by Lucene ? For me it's not. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

For what I understand, the parser is right there, it's the expression you should fix.

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

It comes from a company's proprietary query syntax which is very Lucene-like, I suppose. Fair enough - any thoughts on editing the grammar specification to allow parsing such expressions? I don't mind (and would have to!) editing a bit of source on my side.

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

Alright I tried that, but it unfortunately doesn't parse it (still giving the same output as before). I added the following to pieces of code to parser.py as you mentioned:

DPHRASE_RE = r'''
(?P<phrase>  # phrase
  ""          # opening double quotes
  (?:        # repeating
    [^\\"]   # - a char which is not escape or end of phrase
    |        # OR
    \\.      # - an escaped char
  )*
  ""          # closing double quotes
)'''
@lex.TOKEN(DPHRASE_RE)
def t_DPHRASE(t):
    m = re.match(DPHRASE_RE, t.value, re.VERBOSE)
    value = m.group("phrase")
    t.value = Phrase(value)
    return t

Is there any other code that needs to be edited to take account of this change?

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Maybe add DPHRASE in precedence, before PHRASE ?

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Also add DPHRASE in tokens.

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

Alright I did that, and I got the following notice when loading the library:

WARNING: Token 'DPHRASE' defined, but not used
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 2 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts

Output is still the same as before :/

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Yes sorry, you have to write a rule:

def p_double_quoting(p):
    'unary_expression : DPHRASE'
    p[0] = p[1]

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

Alright did that, now got this warning when importing the library:

WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 1 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts

And still get the same output when parsing. Is there anywhere I need to point to this rule that I've added?

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Yes sorry, you should also try yourself ;-) just mimic what's done for PHRASE and report here when you're done !

So yes you have to add it there:

def p_phrase_or_term(p):
    '''phrase_or_term : TERM
                      | PHRASE
                      | DPHRASE'''
    p[0] = p[1]

Also you may want to add it to p_proximity:

def p_proximity(p):
    '''unary_expression : PHRASE APPROX
                        | DPHRASE  APPROX'''
    p[0] = Proximity(p[1], p[2])

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

Unfortunately still getting the exact same output as before after trying that.

from luqum.

alexgarel avatar alexgarel commented on August 9, 2024

Maybe you have to change PHRASE_RE so that it does not match "" alone ? Or at least "" followed by some char.

So maybe

PHRASE_RE = r'''
(?P<phrase>  # phrase
  "          # opening quote
  (?:        # repeating
    [^\\"]   # - a char which is not escape or end of phrase
    |        # OR
    \\.      # - an escaped char
  )+
  "        # closing quote
  |  # or 
  ""(?!\w)  # empty quote but no char after
)'''

from luqum.

stevesmit avatar stevesmit commented on August 9, 2024

@alexgarel That seems to have done the trick! Thanks very much.

from luqum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.