Comments (13)
You'll have to make your own parser.py file (PLY is not very flexible on that point, see issue 49).
You can change PHRASE_RE
but I imagine you have to verify that you have as much "
at the begining as in the end "
. So you may better go duplicating most of it ! So maybe adding a DPHRASE_RE
copying PHRASE_RE
but with double quote and a t_DPHRASE
alike t_PHRASE
is the best way to go.
from luqum.
Hi @stevesmit where does this double double quote comes from ?
Is it normally supported by Lucene ? For me it's not. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
For what I understand, the parser is right there, it's the expression you should fix.
from luqum.
It comes from a company's proprietary query syntax which is very Lucene-like, I suppose. Fair enough - any thoughts on editing the grammar specification to allow parsing such expressions? I don't mind (and would have to!) editing a bit of source on my side.
from luqum.
Alright I tried that, but it unfortunately doesn't parse it (still giving the same output as before). I added the following to pieces of code to parser.py as you mentioned:
DPHRASE_RE = r'''
(?P<phrase> # phrase
"" # opening double quotes
(?: # repeating
[^\\"] # - a char which is not escape or end of phrase
| # OR
\\. # - an escaped char
)*
"" # closing double quotes
)'''
@lex.TOKEN(DPHRASE_RE)
def t_DPHRASE(t):
m = re.match(DPHRASE_RE, t.value, re.VERBOSE)
value = m.group("phrase")
t.value = Phrase(value)
return t
Is there any other code that needs to be edited to take account of this change?
from luqum.
Maybe add DPHRASE
in precedence
, before PHRASE
?
from luqum.
Also add DPHRASE in tokens
.
from luqum.
Alright I did that, and I got the following notice when loading the library:
WARNING: Token 'DPHRASE' defined, but not used
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 2 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts
Output is still the same as before :/
from luqum.
Yes sorry, you have to write a rule:
def p_double_quoting(p):
'unary_expression : DPHRASE'
p[0] = p[1]
from luqum.
Alright did that, now got this warning when importing the library:
WARNING: Token 'SEPARATOR' defined, but not used
WARNING: There are 1 unused tokens
Generating LALR tables
WARNING: 11 shift/reduce conflicts
And still get the same output when parsing. Is there anywhere I need to point to this rule that I've added?
from luqum.
Yes sorry, you should also try yourself ;-) just mimic what's done for PHRASE and report here when you're done !
So yes you have to add it there:
def p_phrase_or_term(p):
'''phrase_or_term : TERM
| PHRASE
| DPHRASE'''
p[0] = p[1]
Also you may want to add it to p_proximity
:
def p_proximity(p):
'''unary_expression : PHRASE APPROX
| DPHRASE APPROX'''
p[0] = Proximity(p[1], p[2])
from luqum.
Unfortunately still getting the exact same output as before after trying that.
from luqum.
Maybe you have to change PHRASE_RE so that it does not match ""
alone ? Or at least ""
followed by some char.
So maybe
PHRASE_RE = r'''
(?P<phrase> # phrase
" # opening quote
(?: # repeating
[^\\"] # - a char which is not escape or end of phrase
| # OR
\\. # - an escaped char
)+
" # closing quote
| # or
""(?!\w) # empty quote but no char after
)'''
from luqum.
@alexgarel That seems to have done the trick! Thanks very much.
from luqum.
Related Issues (20)
- Set analyze_wildcard and allow_leading_wildcard to false globally
- Should allow subclass ENested
- match doesn't return results if default_field is '*' HOT 2
- Parser fails with 'TypeError' on invalid query "~]" HOT 4
- Parser fails with 'TypeError' on invalid query "a^" HOT 2
- Parsing error in multithreading HOT 8
- Allow to override `E` elements HOT 2
- Visitor example HOT 2
- Keyword fields containing wildcards cannot be searched for exactly
- Parse fails on word commas (eg "hi , bye") HOT 4
- Publish a new release of luqum HOT 2
- Parser fails at single quote character (eg "hi 'bye'")
- Non-decimal boost throws TypeError instead of ParseError (eg "hello^foo")
- Is there any chance to support ComplexPhraseQuery? HOT 2
- "Field expression is not valid" for Range in LuceneCheck HOT 1
- Unknown item type Phrase during LuceneCheck HOT 1
- Unknown item type Phrase in LuceneCheck
- ISO 8601 date with timezone fails to parse without doublequotes HOT 1
- Parser fails at double quote character HOT 3
- Unable specify negative values in Range queries
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from luqum.