Code Monkey home page Code Monkey logo

invenio-query-parser's Introduction

Invenio-Query-Parser

https://github.com/inveniosoftware/invenio-query-parser/actions?query=workflow%3ACI.png?branch=master https://coveralls.io/repos/inveniosoftware/invenio-query-parser/badge.png?branch=master https://pypip.in/v/invenio-query-parser/badge.png https://pypip.in/d/invenio-query-parser/badge.png https://readthedocs.io/projects/invenio-query-parser/badge/?version=latest

About

Search query parser supporting Invenio and SPIRES search syntax.

Installation

Invenio-Query-Parser is on PyPI so all you need is:

pip install invenio-query-parser

Documentation

Documentation is readable at http://invenio-query-parser.readthedocs.io or it can be built using Sphinx:

pip install invenio-query-parser[docs]
python setup.py build_sphinx

Testing

Running the test suite is as simple as:

python setup.py test

invenio-query-parser's People

Contributors

alizeepace avatar jalavik avatar jirikuncar avatar lnielsen avatar osso avatar panos512 avatar samihiltunen avatar tiborsimko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

invenio-query-parser's Issues

ES query parser inconsistency comparing to Q on empty string and no parameters

I don't know if this is a bug or a planned design, but if you compare the IQ function from invenio_query_parser.contrib.elasticsearch and the standard Q function from elasticsearch_dsl.query there is a difference on how they behave when initialized without parameters and with empty string.

from elasticsearch_dsl.query import Q
Q()
# MatchAll()

from invenio_query_parser.contrib.elasticsearch import IQ
IQ()
# TypeError: invenio_query() missing 1 required positional argument: 'pattern'

and

from elasticsearch_dsl.query import Q
Q('') 
# ** elasticsearch_dsl.exceptions.UnknownDslObject: DSL class `` does not exist in query.

from invenio_query_parser.contrib.elasticsearch import IQ
IQ('')
# MatchAll()

would it make sense to make the IQ behave the same way as Q?

spires: journal searching

Originally by hoc on 2011-08-09

find j Phys.Rev.,D41,2330 [works]
http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2CD41%2C2330

find j Phys.Rev., D41,2330 [does not work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+j+Phys.Rev.%2C+D41%2C2330

This whitespace rule is far too strict. Whitespace following punctuation should be ignored
([,.:])\s+ -> $1

As a follow-on, if we display publications in the following form:
Phys.Rev. D41 (1990) 2330
why can't people search on them in this form? It seems like an obvious thing they'd try, without having to learn another form for searching.

SPIRES query "find foo witten"

Originally on 2012-08-24

Currently, when users use SPIRES search syntax, find foo witten is being interpreted as a Boolean search for the word foo and the word witten when foo was not recognised as being one of valid Originally on 2012-08-24

Currently, when users use SPIRES search syntax, find foo witten is being interpreted as a Boolean search for the word foo and the word witten when foo was not recognised as being one of valid SPIRES syntax operators.

When one typed find foo witten into SPIRES, the system replied_Your search term was not valid for this database. Please try your search again._ which was much more reasonable behaviour.

We may want to fix this on INSPIRE too, otherwise it may be confusion-prone to see some hits when foo matches a small subset of real records.

parser: is not working with special character

It is impossible to do search with special character like in "Gård"

Some pointers for the resolution:

In [9]: from invenio.modules.search.api import parser

In [13]: import pypeg2

In [14]: pypeg2.parse("toto", parser(), whitespace="")
Out[14]: Main(Query([SimpleQuery(ValueQuery(Value(SimpleValue('toto'))))]))

In [15]: pypeg2.parse("Gård", parser(), whitespace="")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-64494a74f7c6> in <module>()
----> 1 pypeg2.parse("Gård", parser(), whitespace="") ...

In [16]: from __future__ import unicode_literals, print_function

In [17]: pypeg2.parse("Gård", parser(), whitespace="")
Out[17]: Main(Query([SimpleQuery(ValueQuery(Value(SimpleValue(u'G\xe5rd'))))]))

From the pypeg documentation :

Caveat: pyPEG 2.x is written for Python 3. You can use it with Python 2.7 with the following import (you don't need that for Python 3):

from __future__ import unicode_literals, print_function

originally submitted by @PXke as inveniosoftware/invenio#3296

Missing operators <, >, >=, <=

The DSL is currently missing support for the operators <, >, <=, >=. When these are added, tests for these operators should be enabled in test_walkers.py. The tests were added in pr #42.

Create "docs" folder

  • create docs folder
  • enable hook for Read the Docs
  • add test for strict sphinx build

RFC search: introduce new operator => (inclusive "date after" search)

@invenio-developers commented on Wed Apr 30 2014

Originally by annetteh on 2011-07-22

The search "date after year1" does not exclude year1.
"fin a ellis and date > 2009" gives the same result as "author:ellis date:2009->2011"


@invenio-developers commented on Wed Apr 30 2014

Originally by valkyrie (@valkyriesavage) on 2011-08-04

I am looking at this ticket, and Joe is sitting across the room from me. We were discussing the possibility of adding a new operator to invenio syntax that will indicate an open interval instead of a closed interval. Thoughts on this:

Introduce new operator =>
( '=' implies closed interval (at least in my head), so maybe => becomes the new -> )

2009->2011 is (2009, 2011) , the open interval not including either 2009 or 2011
2009=>2011 is [2009, 2011] , the closed interval including both 2009 and 2011

Does anyone have thoughts on this? It seems strange to change the meaning of the good ol' arrow operator, but we didn't have any other terrific ideas for a new operator that didn't make the old one confusing.


@invenio-developers commented on Wed Apr 30 2014

Originally by hoc on 2011-08-04

Seems like it might cause some confusion. I suspect anyone searching "2009->2011" would think it would include both 2009 and 2011.

The problem here is people searching "date > 1999" want to get everything from 2000 on but not 1999. Why not handle dates with the same syntax used for topcite and do this search with the command "date 2000+"? That way the arrow operator could keep its meaning and people would have an intuitive way to search on dates using something they're already familiar with for citations.


@jirikuncar commented on Wed Mar 25 2015

Introduce new operator => (inveniosoftware/invenio#755 (comment))

@tiborsimko @jalavik shall we make an RFC from this issue?


@tiborsimko commented on Thu Mar 26 2015

@jirikuncar We can transform this very issue into an RFC here? Saves electrons :)

Some quick thoughts from me:

  • If "date after" is the only use case, then I like Heath's proposal for "2000+", since it would preserve the behaviour of the good old arrow operator, and people are used to type N+ for citation count queries or author count queries already.
  • If there are some use case for non-numeric alphabetic queries (e.g. "subject:hep->iii"), then we should ponder how these would read with inclusive/exclusive operators.
  • Depending on the wanted milestone, we can take inspiration from Elasticsearch syntax here?

@jirikuncar commented on Mon Mar 30 2015

  • updated:<=2015-01-01
  • numberofauthors:>100
  • year:>2009 and year:<2015

@kaplun commented on Mon Mar 30 2015

👍

spires: syntax converter parenthesis bug

In inspirehep/invenio#71 @tsgit reports:
[...]
these 2 searches
find collaboration LIGO and a whiting, b f and a Weiss, r
find collaboration LIGO and (a whiting, b f and a Weiss, r)
return vastly different results (90 vs 152 records)

this is because
find collaboration LIGO and (a whiting, b f and a Weiss, r)
is incorrectly converted to
collaboration:LIGO and ((author:"whiting, b* f*") and (author:"Weiss, r)*" or exactauthor:"Weiss, r *" or exactauthor:"Weiss, r" or author:"Weiss, r), *")

where you see the closing paren becomes part of the author search term. Invenio search then ignores all parens, since they are mismatched, the warning message is

"Search syntax misunderstood. Ignoring all parentheses in the query. If this doesn't help, please check your search and try again."

so now we have a boolean or search for Weiss, r which finds additional records. The underlying problem is spires-to-inveno syntax conversion's handling of the parenthesized expression.
[...]

spires: `whois` operator

Originally on 2012-01-30

In SPIRES query mode, a search for whois ellis should expand into find name ellis in HEPNames collection. This is easy to do, since whois operator cannot be apparently combined in Boolean queries and whatnot, so a simple query replacement will do. The only complication not to forget is to alter the searched collection to HEPNames regardless of the starting collection the user originated his/her query from.

Original issue: inveniosoftware/invenio#877

find d 2012 and p england

So: I really think we should no longer work on INSPIRE-related search engine functionalities for Invenio 1.x. So I am moving inveniosoftware/invenio#718 to this repository.

find d 2012 and p england is currently interpreted by query parser as:

AndOp(KeywordOp(Keyword('year'), Value('2012')), KeywordOp(Keyword('year'), Value('p england')))

Is this the correct interpretation? @tsgit, @tiborsimko?

setup: pypeg2 dependency

invenio-query-parser depends on a personal clone of pypeg2:

-e git+https://github.com/osso/pypeg@a67f6ba75da71a8ee9694caaa6a4716354409fbb#egg=pypeg2

Recently, the pypeg2 author was quick to reply when we asked to put the package on PyPI, see:

Could we depend on that? If yes, it would be nice to have only PyPI dependency for the release. If not, what about sending patches upstream?

P.S. I have not looked at @Osso's changes in detail, but looking from the helicopter, they are mostly dealing with testing only...

parser: query parser does not parse all possible combinations of NOT operator correctly

The NOT operator is not parsed as NotOp.

OR NOT

NOT is parsed as a Value.

Example:
test AND (author:"Alexander, S" OR NOT author:"Costa, M S")

gives

AndOp(ValueQuery(Value('test')), AndOp(OrOp(KeywordOp(Keyword('author'), DoubleQuotedValue('Alexander, S')), ValueQuery(Value('NOT'))), KeywordOp(Keyword('author'), DoubleQuotedValue('Costa, M S'))))

AND (NOT ...

NOT is parsed as a Value.

Example:
test AND (NOT author:"Alexander, S")

gives

AndOp(ValueQuery(Value('test')), AndOp(ValueQuery(Value('NOT')), KeywordOp(Keyword('author'), DoubleQuotedValue('Alexander, S'))))

Only AND NOT works

NOT is then properly parsed as NotOp

Example:

test AND (author:"Alexander, S" AND NOT author:"Costa, M S")

gives

AndOp(ValueQuery(Value('test')), AndOp(KeywordOp(Keyword('author'), DoubleQuotedValue('Alexander, S')), NotOp(KeywordOp(Keyword('author'), DoubleQuotedValue('Costa, M S')))))

If some operations are not supported like OR NOT, then server should return an error. It should not parse it sometime as a Value and sometime as NotOp.

previously reported as inveniosoftware/invenio#3366

tests: simplify doctest execution

The following cookiecutter change:

inveniosoftware/cookiecutter-invenio-module#98

should be propagated to this Invenio module.

Namely, in run-tests.sh, the sphinx for doctests is invoked after pytest run:

$ tail -3 ./\{\{\ cookiecutter.project_shortname\ \}\}/run-tests.sh
sphinx-build -qnNW docs docs/_build/html && python setup.py test && sphinx-build -qnNW -b doctest docs docs/_build/doctest

This sometimes led to problems on Travis CI with the second sphinx-build run due
to "disappearing" dependencies after the example application was tested.

A solution that worked for invenio-marc21 (see
inveniosoftware/invenio-marc21#49 (comment))
and that was integrated in cookiecutter (see
inveniosoftware/cookiecutter-invenio-module#98) was to
run doctest execution in pytest, removing the second sphinx-build invocation.

This both solved Travis CI build failures and simplified test suite execution.

Note that this change may necessitate to amend the code tests etc so that things
would be executed with the Flask application context (see
inveniosoftware/invenio-marc21@09e98fc).

spires: `whereis` operator

Originally by hoc on 2012-01-30

In SPIRES query mode, a search for whereis harvard should expand into a "harvard" search in the INST collection. This is easy to do, since whereis operator cannot be apparently combined in Boolean queries and whatnot, so a simple query replacement will do. The only complication not to forget is to alter the searched collection to INST regardless of the starting collection the user originated his/her query from.

Original issue: inveniosoftware/invenio#888

SPIRES query "find a Ellis, Jonathan Richard"

SPIRES used to do complex runtime query expansion for first names in author queries. (Hence INSPIRE doing the same.)

Talking to @Osso a few months ago, the query parser did not treat this situation yet. Hence this issue in order not to forget to address this topic.

For reference, here is runtime query expansion for "find a Ellis, Jonathan Richard":

Search stage 1: search_pattern_parenthesised() searched
'(author:"Ellis, Jonathan* Richard*" or exactauthor:"Ellis, J
Richard" or exactauthor:"Ellis, Jo Richard" or
exactauthor:"Ellis, Jon Richard" or exactauthor:"Ellis, Jona
Richard" or exactauthor:"Ellis, Jonat Richard" or
exactauthor:"Ellis, Jonath Richard" or exactauthor:"Ellis,
Jonatha Richard")'.

Search stage 1: search_pattern_parenthesised() returned ['+',
'author:"ellis, jonathan* richard*" | exactauthor:"ellis, j
richard" | exactauthor:"ellis, jo richard" | exactauthor:"ellis,
jon richard" | exactauthor:"ellis, jona richard" |
exactauthor:"ellis, jonat richard" | exactauthor:"ellis, jonath
richard" | exactauthor:"ellis, jonatha richard"'].

Search stage 1: basic search units are: [['+', 'ellis, jonathan*
richard*', 'author', 'a'], ['|', 'ellis, j richard',
'exactauthor', 'a'], ['|', 'ellis, jo richard', 'exactauthor',
'a'], ['|', 'ellis, jon richard', 'exactauthor', 'a'], ['|',
'ellis, jona richard', 'exactauthor', 'a'], ['|', 'ellis, jonat
richard', 'exactauthor', 'a'], ['|', 'ellis, jonath richard',
'exactauthor', 'a'], ['|', 'ellis, jonatha richard',
'exactauthor', 'a']]

spires: problem with author search

Originally by hoc on 2012-05-15 as inveniosoftware/invenio#1062

There's a problem with searching via initials versus full-names.

find a pi, s y not a Pi, So Young
in SPIRES this has (correctly) a zero result, because all
Pi, So Young should be indexed to Pi, S Y
so this search would only find a non-zero result if there were, say,
Pi, Shan Yu (and there isn't, all S.Y. Pi are So Young Pi).

Looking at something where dashes aren't an issue:
find a crewther, r j not a crewther, rodney james
should find zero but instead finds all the papers not listing his full name.

Compare this with the well-functioning situation for single initials:
find a quigg, c not a quigg, chris
finds zero because there's only one person who signs "Chris Quigg"

or "C. Quigg"
find a quigg, c not a quigg, curtis
gets 113 of his 248 papers, it just weeds out the ones he's signed as "Chris Quigg"

use "greedy" parsing when no keywords

For searches where there are no keywords (like a free text search for title or abstract), currently the invenio-query-parser will break each of the individual values by spaces and create a big query where each value is separated by the AND boolean operator.

In order for ElasticSearch to do a better job on ranking it is recommended to pass the whole string and a list of fields to match from.

This was implemented outside of the query parser for INSPIRE (see inspirehep/invenio-search@f2c9a07) but might be better for the query-parser to support it.

SPIRES: search mode when terms contain quoted colons

@tiborsimko commented on Wed Apr 30 2014

Originally on 2011-09-28

The SPIRES search syntax mode does not work well for keyword searches
containing terms including colons. The SQPP/SPIRES syntax parser is
probably confused by the presence of colon, even though (i) the
colon is quoted and (i) the term before the colon is not valid logical field.
The Invenio search syntax mode works well here. Compare:

  keyword:"coupling: Yukawa"      ... good, 4117 records found
  find keyword "coupling: Yukawa" ... bad, 3 records found
  find keyword coupling: Yukawa   ... bad, 0 records found

spires: check title search

Originally on 2011-06-22 as inveniosoftware/invenio#700

If I type

find (title top or t or top-quark or top-antitop) and (a abachi or abbott or abazov) and cn d0

it returns 71 papers (I am hoping to get all 80) so this is close.

If I type

find (title top or t or top-quark or top-antitop or t-tbar) and (a abachi or abbott or abazov) and cn d0

just adding t-tbar trying to find the missing 9 papers, then I get only 6 records returned,

Problem is clear from here:

http://inspirebeta.net/search?ln=en&ln=en&verbose=9&p=find+%28title+top+or+t+or+top-quark+or+top-antitop+or+t-bar%29+and+%28a+abachi+or+abbott+or+abazov%29+and+cn+d0&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

Where s_p_p completely mangles the ands and ors. I'm guessing this is because the user used "t" as a search term (top quark) and this is confusing things...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.