Code Monkey home page Code Monkey logo

lucene's Introduction

lucene   Build Status Coverage Status npm version

Parse, modify and stringify lucene queries.

Installation | Try It | Usage | Grammar | History


Installation

npm install --save lucene
-or-
yarn add lucene

Usage

const lucene = require('lucene');

const ast = lucene.parse('name:frank OR job:engineer');
console.log(ast);
// {
//   left: {
//     field: 'name',
//     term: 'frank'
//   },
//   operator: 'OR',
//   right: {
//     field: 'job',
//     term: 'engineer'
//   }
// }

console.log(lucene.toString(ast));
// name:frank OR job:engineer

Grammar

The parser is auto-generated from a PEG implementation in JavaScript called PEG.js.

To test the grammar without using the generated parser, or if you want to modify it, try out PEG.js online. This is a handy way to test arbitrary queries and see what the results will be like or debug a problem with the parser for a given piece of data.

History

This project is based on thoward/lucene-query-parser.js and its forks (most notably xomyaq/lucene-queryparser). The project is forked to allow some broader changes to the API surface area, project structure and additional capabilities.

lucene's People

Contributors

annelhote avatar bastien avatar bripkens avatar camerondavison avatar dependabot[bot] avatar i11v avatar ichenlei avatar ironykins avatar jlbernal avatar lotti avatar lucecc avatar mahnunchik avatar matthiasg avatar nicolashenry avatar thoward avatar verocca avatar vsetka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lucene's Issues

Support for lower case operators

I'm using the library to allow for free text search and if I do:
a OR b

the tree builds up with "operator": "OR", like this one

{
   "left": {
      "field": "<implicit>",
      "fieldLocation": null,
      "term": "a",
      "quoted": false,
      "regex": false,
      "termLocation": {
         "start": {
            "offset": 0,
            "line": 1,
            "column": 1
         },
         "end": {
            "offset": 2,
            "line": 1,
            "column": 3
         }
      },
      "similarity": null,
      "boost": null,
      "prefix": null
   },
   "operator": "OR",
   "right": {
      "field": "<implicit>",
      "fieldLocation": null,
      "term": "b",
      "quoted": false,
      "regex": false,
      "termLocation": {
         "start": {
            "offset": 5,
            "line": 1,
            "column": 6
         },
         "end": {
            "offset": 6,
            "line": 1,
            "column": 7
         }
      },
      "similarity": null,
      "boost": null,
      "prefix": null
   }
}

However if I have a or b, the operator is implicit ("operator": "<implicit>",) as can be seen in this tree:

{
   "left": {
      "field": "<implicit>",
      "fieldLocation": null,
      "term": "a",
      "quoted": false,
      "regex": false,
      "termLocation": {
         "start": {
            "offset": 0,
            "line": 1,
            "column": 1
         },
         "end": {
            "offset": 2,
            "line": 1,
            "column": 3
         }
      },
      "similarity": null,
      "boost": null,
      "prefix": null
   },
   "operator": "<implicit>",
   "right": {
      "left": {
         "field": "<implicit>",
         "fieldLocation": null,
         "term": "or",
         "quoted": false,
         "regex": false,
         "termLocation": {
            "start": {
               "offset": 2,
               "line": 1,
               "column": 3
            },
            "end": {
               "offset": 5,
               "line": 1,
               "column": 6
            }
         },
         "similarity": null,
         "boost": null,
         "prefix": null
      },
      "operator": "<implicit>",
      "right": {
         "field": "<implicit>",
         "fieldLocation": null,
         "term": "b",
         "quoted": false,
         "regex": false,
         "termLocation": {
            "start": {
     

Would be nice to have a way of expanding the grammar and therefore the parser to contain the lower case equivalents of the operators.

Can't parse range query string with colon symbol

try to parse this lucene string:

"creation_date:[2017-06-09T10:18:33Z TO 2017-06-09T10:18:33Z]"

and you'll get this:

{
  "message": "Expected \".\", \"TO\", [^: \\t\\r\\n\\f{}()\"\\/\\^~[\\]] or whitespace but \":\" found.",
  "expected": [
    {
      "type": "literal",
      "value": ".",
      "description": "\".\""
    },
    {
      "type": "literal",
      "value": "TO",
      "description": "\"TO\""
    },
    {
      "type": "class",
      "value": "[^: \\t\\r\\n\\f{}()\"\\/\\^~[\\]]",
      "description": "[^: \\t\\r\\n\\f{}()\"\\/\\^~[\\]]"
    },
    {
      "type": "other",
      "description": "whitespace"
    }
  ],
  "found": ":",
  "offset": 28,
  "line": 1,
  "column": 29,
  "name": "SyntaxError"
}

escaping tilde

https://runkit.com/embed/kx7k2fbprecw

> lucene.parse('foo~bar:"hello"')
> {
  "left": {
    "boost": null,
    "field": "<implicit>",
    "prefix": null,
    "quoted": false,
    "similarity": 0.5,
    "term": "foo"
  },
  "operator": "<implicit>",
  "right": {
    "boost": null,
    "field": "bar",
    "prefix": null,
    "proximity": null,
    "quoted": true,
    "term": "hello"
  }
}

I'm having issues escaping the tilde on the field. It seems to work for some other special chars. Any suggestions here?

Named field fuzzy search?

I'm getting a syntax error when trying to do age:~30

Is there anyway to do a fuzzy search on a named field? If not, could it be added?

"AND NOT" is mishandled

An extra space between "AND NOT" will result in an incorrect AST:

Correct:

'datacenter:"dca1" AND NOT @reserved.collector.filename:"executor"'

Mangled:

'datacenter:"dca1" AND  NOT @reserved.collector.filename:"executor"'

Anti-slashes are not properly handled

Hello.

I'm trying to use a field name with a space in it. It seems that I can make this work with the java lucene SyntaxParser but not with your library (please omit the fact that using a space in a field name is probably a very bad idea ;) ).

Here is what I do in Java:
image

And here a re a couple tests that give inconsistent results afaik:

image

Here is the code to reproduce:

var lucene = require("lucene")
var ast = lucene.parse('name:"hello there" AND (tags.tag one:(a OR c) AND tags.tag2:b)');
console.log(lucene.toString(ast), "as expected 1");

ast = lucene.parse("name:\"hello there\" AND (tags.tag one:(a OR c) AND tags.tag2:b)");
console.log(lucene.toString(ast), "as expected 2");

ast = lucene.parse("name:\"hello there\" AND (tags.tag\ one:(a OR c) AND tags.tag2:b)");
console.log(lucene.toString(ast), "not sure what was to expect here but feels weird to have lost the antislash");

ast = lucene.parse("name:\"hello there\" AND (tags.tag\\ one:(a OR c) AND tags.tag2:b)");
console.log(lucene.toString(ast), "if we lost the antislash previously we should have had one antislach here i suppose ?");

ast = lucene.parse('name:"hello there" AND (tags.tag\ one:(a OR c) AND tags.tag2:b)');
console.log(lucene.toString(ast), "I was definitly expecting the anti slash to remain here");

ast = lucene.parse('name:"hello there" AND (tags.tag\\ one:(a OR c) AND tags.tag2:b)');
console.log(lucene.toString(ast), "And here suddenly I have two anti slashes");

Grouping order for query like "𝑎 AND 𝑏 AND 𝑐"

It seems the query: "𝑎 AND 𝑏 AND 𝑐" is by default grouped as "𝑎 AND (𝑏 AND 𝑐)".

Would it be unreasonable to expect it be grouped instead as "(𝑎 AND 𝑏) AND 𝑐"?

I'm creating a filter based on this library, and I stumbled upon a particular query that makes me thing that the latter might be more natural.

E.g.: For this data:

const data = [
  { /* 0 */ name: 'C-3PO', species: 'Droid', height: 1.7526, misc: {} },
  { /* 1 */ name: 'R2-D2', species: 'Droid', height: 1.1, misc: {} },
  { /* 2 */ name: 'Anakin Skywalker', species: 'Human', height: 1.9 },
  { /* 3 */ name: 'Obi-Wan Kenobi', species: 'Human', height: 1.8, misc: {} },
  { /* 4 */ name: 'Han Solo', species: 'Human', height: 1.8, misc: {} },
  { /* 5 */ name: 'Princess Leia', species: 'Human', height: 1.5, misc: {} },
];

If I query:

an AND NOT wan AND NOT han

I expect the result to be

{ /* 2 */ name: 'Anakin Skywalker', ... }

right?

But that happens only when the query is specifically formatted as:

(an AND NOT wan) AND NOT han

To elaborate step-by-step:

Case 1: 'an AND NOT wan AND NOT han'

Query split as

{
left: 'an', 
operator: 'AND NOT', 
right: 'wan AND NOT han'
}
  1. Parse left side 'an' = 3 results: [Anakin, Obi-Wan, Han Solo]

  2. Parse right side: 'wan AND NOT han'

    Query split as:

    {
      left: 'wan', 
      operator: 'AND NOT', 
      right: 'han'
    }
    1. Parse left side 'wan' = 1 result: [Obi-Wan]

    2. Parse right side 'han' = 1 result: [Han Solo]

    3. Apply operator AND NOT

      [Obi-Wan] AND NOT [Han Solo] 
      

      = 1 results: [Obi-Wan]

  3. Apply operator AND NOT

    [Anakin, Obi-Wan, HanSolo] AND NOT [Obi-Wan]
    

    = 2 results: [Anakin, Han Solo]

End Result: [Anakin, Han Solo]

Case 2: '(an AND NOT wan) AND NOT (han)'

Query split as:

{
  left: 'an AND NOT wan', 
  operator: 'AND NOT', 
  right: 'han'
}
  1. Parse left side 'an AND NOT wan'

    Query split as:

    {
      left: 'an', 
      operator: 'AND NOT', 
      right: 'wan'
    }
    1. Parse left side 'an' => 3 results: [Anakin, Obi-Wan, Han Solo]

    2. Parse right side 'wan' => 1 results: [Obi-Wan]

    3. Apply operator AND NOT

      [Anakin, Obi-Wan, Han Solo] AND NOT [Obi-Wan] 
      

      = 2 results: [Anakin, Han Solo]

  2. Parse right side 'han' = 1 results: [Han Solo]

  3. Apply operator AND NOT

    [Anakin, Han Solo] AND NOT [Han Solo]
    

    = 1 results: [Anakin]

End Result: [Anakin]

So, as you can see only Case 2 gives the expected result.

Unless my expectations or algorithm is flawed in which case I'd appreciate the correction.

`/` in the query results in parse error

https://runkit.com/embed/j4erp5p1jqly

var lucene = require("lucene")
lucene.parse('field:test/')

results in

peg$SyntaxError: Expected "!", "&&", "(", "+", "-", ".", "AND NOT", "AND", "NOT", "OR NOT", "OR", "[", "\"", "\\", "^", "{", "||", "~", [^: \t\r\n\x0C{}()"/\^~[\]], end of input, or whitespace but "/" found.

but according to https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters / should not be escaped.

Not sure if this is a bug in the parser or is this invalid lucene syntax?

Whitespace after opening parenthesis is breaking parser

Considering this query :

lucene.parse('foo AND ( bar OR baz)');

// Output: SyntaxError 
Line 1, column 11: Expected "!", "&&", "+", "-", "AND NOT", "AND", "NOT", "OR NOT", "OR", "||", or whitespace but "b" found.

Syntax Error is thrown because of the whitespace just after the opening parenthesis

Grammar potentially incorrectly parses fields with whitespaces before terms

Firstly, many thanks for the great library!

I have been testing it a lot and found what I believe to be a small issue. Parsing a field with one or more spaces after the colon followed by another term incorrectly groups the term with the field.

Example: color:     red parses as

{
   "left": {
      "field": "color",
      "term": "red",
      ...
   }
}

I would expect it to parse as a syntax error or two separate terms. I am by no means an expert so I could be mistaken.

ES6 modules

Would you consider using ES6 modules?

Thanks :)

Problem stringifying the AST with a parenthesized negated expression

If we have a parenthesized expression that has a start (no left-hand expression), parenthesis is not placed correctly when stringifying the AST.

Example:

const { parse, toString } = require('lucene')

toString(parse('my.prop:value1 AND (NOT _exists_:other.prop OR other.prop:value2)'))
// Result is -> "my.prop:value1 AND NOT (_exists_:other.prop OR other.prop:value2)"

At a glance, the fix should be simple. Check if parenthesized is set when concatenating start and make sure start is not set when adding an opening parenthesis for a parenthesized left-hand.

Date rounding is reported as an error

Hello,

First of all, thanks for providing this great library (very handy in many cases) !
I have noticed that date rounding is reported as an error.

Consider the following valid Lucene query :

dateModified_date:[NOW/YEAR TO NOW]

The slash after "NOW" is reported as unexpected :

Line 1, column 23: Expected ".", "TO", "\\", [^ \t\r\n\x0C{}()"/\^~[\]], or whitespace but "/" found.

We get the very same result when using the PEG grammar defined in this repository with PEG.js online.

Thanks for your attention

Malformed return queries from the toString operation

when submitting a query like: name:(-frank)
from the "toString" operation returns: name:(-frank
example:

const lucene = require('lucene');

const ast = lucene.parse('name:(-frank)');
console.log(ast);

// {
// left:
// { left:
// { field: '',
// term: 'frank',
// quoted: false,
// similarity: null,
// boost: null,
// prefix: '-' },
// parenthesized: true,
// field: 'name'
// }
// }

console.log(lucene.toString(ast));
// name:(-frank


I saw that I modify the code in the toString.js file, as below, it works, what do you say?
.....
if (ast.left) {
if (ast.parenthesized) {
result += '(';
}
result += toString(ast.left);

    if (ast.parenthesized && !ast.right) {
        result += ')';
    }
}

......

Falsy numbers as term values lead to invalid queries in toString()

When a Node has term: 0, the query returned by toString() will lack a value:

> const lucene = require("lucene");
> lucene.toString({
... "left": {
..... "field": "field",
..... "fieldLocation": {
....... "start": {
......... "offset": 0,
......... "line": 1,
......... "column": 1
......... },
....... "end": {
......... "offset": 5,
......... "line": 1,
......... "column": 6
......... }
....... },
..... "term": 0,  // <-----
..... "quoted": false,
..... "regex": false,
..... "termLocation": {
....... "start": {
......... "offset": 6,
......... "line": 1,
......... "column": 7
......... },
....... "end": {
......... "offset": 7,
......... "line": 1,
......... "column": 8
......... }
....... },
..... "similarity": null,
..... "boost": null,
..... "prefix": null
..... }
... });
'field:'

Changing "term": 0 to "term": "0" fixes the problem, returning 'field:0'.

I think this is due to checking falsy values in these places:

if (ast.term || (ast.term === '' && ast.quoted)) {

if (ast.term_min) {

Of course, a workaround is to always use strings as term values, but allowing numbers and returning an invalid query like this is very confusing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.