Code Monkey home page Code Monkey logo

inspire-query-parser's Introduction

INSPIRE-QueryParser

https://travis-ci.org/inspirehep/inspire-query-parser.svg?branch=master https://coveralls.io/repos/github/inspirehep/inspire-query-parser/badge.svg?branch=master

About

A PEG-based query parser for INSPIRE.

inspire-query-parser's People

Contributors

ammirate avatar chris-asl avatar drjova avatar harunurhan avatar iulianav avatar jacquerie avatar michamos avatar mjedr avatar monaawi avatar nooraangelva avatar pazembrz avatar szymonlopaciuk avatar vbalbp avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

inspire-query-parser's Issues

report number queries with *

Search should support also report number queries with * eg. r atlas-conf-* . It used to support it in a previous version

es-visitor: date bugfixing

So, it seems that date queries in ES are not working, as some records are missing some of the date fields that we are querying on.

"range": {
    "date_field1": {
        "gt": "2017"
    },
    "date_field2": {
        "gt": "2017"
    },
    ...
}

Thus we need to generate a chain of bool.should [ range1, range2, ... ] kind of query.
This is for >, >=, <, <= operators.

The tricky part would be the "equality", which comes from queries d 2016 or d = 2017-08.
These have to be expressed in range queries, as well.
For this, it will be helpful to look into visitor_utils.

TODO

  • range queries with bool.should
  • range queries for equality
  • wildcard queries
  • partial value
  • exact value

Do some stripping in case of failed parsing

For this badly formatted query: t: t: /electroweak bosons/, we need to do some stripping, i.e. removing the : character before directing it to ElasticSearch otherwise it's a baddly formatted query for its parser.

The query that's being generated is:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "match": {
                        "_collections": "literature"
                    }
                }
            ],
            "minimum_should_match": "0<1",
            "must": [
                {
                    "query_string": {
                        "query": "t: t: /electroweak bosons/",
                        "default_field": "_all"
                    }
                }
            ]
        }
    },
    "from": 0,
    "size": 25
}

That's a simple fix.
We could also introduce error handling during our parsing step.

global: support 'irn' queries

@michamos informed me of a new keyword we need to support, that is irn.
An example query is find irn 5988462, this operator adds support for searching SPIRES identifiers.

These are stored inside external_system_identifiers when schema = SPIRES.

support type code queries

Type code should support the following codes:
tc b : should map to book
tc bookchapter : should map to book chapter
tc c : should map to article

tc note : should map to note
tc proceedings : should map to proceedings
tc t : should map to thesis

@michamos can you pls check whether the above is correct?

Do we also need to support the following?
tc core : When the schema contains core=true
tc p : When the schema contains refereed= true

Is there a mapping for the existing:
i | Introductory and l | Lectures ?

Is there a mapping for the new doc types:

  • activity report
  • conference paper
  • report
    ?

maximum recursion depth exceeded with query [foo and 'bar']

Given this query foo and "bar" the parser crashes with aforementionened error.

Output is:

...
File "/Users/chris/development/inspire/inspire-query-parser/inspire_query_parser/parser.py", line 439, in parse
    t, right_operand = parser.parse(text_after_bool_op, cls.grammar[2])
  File "build/bdist.macosx-10.12-x86_64/egg/pypeg2/__init__.py", line 792, in parse
  File "build/bdist.macosx-10.12-x86_64/egg/pypeg2/__init__.py", line 1102, in _parse
  File "build/bdist.macosx-10.12-x86_64/egg/pypeg2/__init__.py", line 885, in _parse
...

which is repeated unitl it crashes.

es-visitor: make 'analyze_wildcard' parameter default to True

The analyze_wildcard flag parameter in _generate_query_string_query was introduced in case we wanted to disable wildcard analysis.

It seems that everywhere that is being used in the codebase is set to True, which is actually the default behaviour of ElasticSearch (see docs).

For now it seems better to make the parameter True by default and if that's the case, don't generate the "analyze_wildcard": True in that method.

es-visitor: exclude supervisors from author search

We need to exclude supervisor role from authors search, as for the query a arkani-hamed we got a match on this 1426739 records, while he is a supervisor and not an author.

To be done after merging the 1st version of author search and merged after the demo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.