Code Monkey home page Code Monkey logo

sql's Introduction

OpenSearch SQL

OpenSearch enables you to extract insights out of OpenSearch using the familiar SQL or Piped Processing Language (PPL) query syntax. Use aggregations, group by, and where clauses to investigate your data. Read your data as JSON documents or CSV tables so you have the flexibility to use the format that works best for you.

The following projects are related to the SQL plugin, but stored in the different repos. Please refer to links below for details. This document will focus on the SQL plugin for OpenSearch.

Code Summary

SQL Engine

Test and build [![SQL CI][sql-ci-badge]][sql-ci-link]
Code coverage codecov
Distribution build tests OpenSearch IT tests OpenSearch IT code
Backward compatibility tests BWC tests

Repository Checks

DCO Checker Developer certificate of origin
Link Checker Link Checker

Issues

good first issues open
features open
enhancements open
bugs open
untriaged open
nolabel open

Highlights

Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, JOINs, set operations, sub-queries etc. Beyond the standard functions, OpenSearch functions are provided for better analytics and visualization. Please check our documentation for more details.

Recently we have been actively improving our query engine primarily for better correctness and extensibility. Behind the scene, the new enhanced engine has already supported both SQL and Piped Processing Language. Please find more details in SQL Engine V2 - Release Notes.

Documentation

Please refer to the SQL Language Reference Manual, Piped Processing Language (PPL) Reference Manual, OpenSearch SQL/PPL Engine Development Manual and Technical Documentation for detailed information on installing and configuring plugin.

Forum

For additional help with the plugin, including questions about opening an issue, visit the OpenSearch Forum.

Contributing

See developer guide and how to contribute to this project.

Attribution

This project is based on the Apache 2.0-licensed elasticsearch-sql project. Thank you eliranmoyal, shi-yuan, ansjsun and everyone else who contributed great code to that project. Read this for more details Attributions.

Code of Conduct

This project has adopted an Open Source Code of Conduct.

Security

If you discover a potential security issue in this project we ask that you notify OpenSearch Security directly via email to [email protected]. Please do not create a public GitHub issue.

License

See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.

Copyright

Copyright OpenSearch Contributors. See NOTICE for details.

sql's People

Contributors

abbashus avatar acarbonetto avatar arsen-es avatar chloe-zh avatar dai-chen avatar davidcui1225 avatar dblock avatar dependabot[bot] avatar derek-ho avatar forestmvey avatar gabefernandez310 avatar galkk avatar jackiehanyang avatar jngz-es avatar jordanw-bq avatar joshuali925 avatar kavithacm avatar margarit-h avatar matthewryanwells avatar mitchellgale avatar opensearch-trigger-bot[bot] avatar penghuo avatar peterzhuamazon avatar ps48 avatar rupal-bq avatar seankao-az avatar vamsi-amazon avatar ykmr1224 avatar yury-fridlyand avatar zhongnansu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql's Issues

[BUG] Highlight.js used is EOL

Describe the bug
A EOL warning is displayed

To Reproduce
Steps to reproduce the behavior:

  1. Open Query Workbench
  2. Click "Explain"
  3. View the developer console in the browser and see the message "Version 9 of Highlight.js has reached EOL and is no longer supported. Please upgrade or ask whatever dependency you are using to upgrade. https://github.com/highlightjs/highlight.js/issues/2877"

Expected behavior
No EOL messages

Plugins
All OpenSearch Plugins

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Version [e.g. 22] Chrome Version 90.0.4430.212 (Official Build) (x86_64)
    (OpenSearch 1.0-RC1)

Data type conversion improvement

Is your feature request related to a problem? Please describe.
Currently the type check is very strict and thus not very user friendly. We need to support implicit casting and make it consistent with explicit casting. For example: if A can be converted to B implicitly, this should be done by explicit CAST function too.

  1. opendistro-for-elasticsearch/sql#1056: Comparison between date and string
  2. opendistro-for-elasticsearch/sql#1061: Comparison between bool and string
  3. opendistro-for-elasticsearch/sql#510: Comparison between date and long?

Cast function in PPL: opendistro-for-elasticsearch/sql#924

More SQL types supported:

  1. DECIMAL
  2. SIGNED INT
  3. CHAR
  4. etc

Describe the solution you'd like
Relax the type check by well defined conversion rules.

Describe alternatives you've considered
Follow the strict data type grammar, such as TIMESTAMP(...).

[Feature] Support querying a data stream

Is your feature request related to a problem? Please describe.

A data stream is an index abstraction very similar to an index alias. Under the hood, it points to multiple backing indices and allows searches using a single named resource. SQL/PPL already supports querying an index alias by resolving the index expression to its corresponding concrete indices.

But when trying to query a data stream, the index expression resolution doesnโ€™t include backing indices of a data stream and results in the following error.

POST /_opensearch/_sql
{
  "query": "SELECT * FROM my-data-stream LIMIT 10"
}

{
  "error": {
    "reason": "Error occurred in OpenSearch engine: no such index [my-data-stream]",
    "details": "org.opensearch.index.IndexNotFoundException: no such index [my-data-stream]\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
    "type": "IndexNotFoundException"
  },
  "status": 404
}

Describe the solution you'd like

Index expression resolution should include backing indices of a data stream to make the query behaviour of data streams consistent with index aliases.

Additional context

This change builds on top of the following feature to support data streams in OpenSearch:
opensearch-project/OpenSearch#675

Disable the DELETE operation by default.

User manual: https://github.com/opensearch-project/sql/blob/main/docs/user/dml/delete.rst

1.1 Description

For some reason, the DELETE statement was not removed from NLPchina-ES-SQL codebase before ODFE launch. This is the only DML supported and makes SQL plugin non readonly.

Requests from community:

  1. opendistro-for-elasticsearch/sql#1003
  2. opendistro-for-elasticsearch/sql#1008

1.2 Proposal & Impact

  • Option 1 Disable completely: Users wonโ€™t be able to delete docs via SQL endpoint. From then on, they can only do this by DSL.
  • Option 2 Disable by setting: Add a new setting to disable DELETE by default, however user with permission can change the setting to enable DELETE.

OpenSearch renaming in doctest log

Describe the bug
One of the prerequisite of the doc test is to start a local cluster, currently the doctest is logging this task as :doctest:startES. Not sure if there is any other similar legacy naming in doc test.

To Reproduce
Steps to reproduce the behavior:

  1. run ./gradlew :doctest:doctest or ./gradlew build
  2. See error when doctest is started

Expected behavior
startOpenSearch or startOS

Plugins
SQL

Host/Environment (please complete the following information):

  • OS: MacOS
  • Version: Catalina 10.15.7

Additional context
Add any other context about the problem here.

Index expression improvement

Is your feature request related to a problem? Please describe.
Index expression is supposed to be defined clearly in both SQL and PPL.

  1. opendistro-for-elasticsearch/sql#842: Comma-separated index names in PPL
  2. opendistro-for-elasticsearch/sql#169: Multi-index in SQL
  3. opendistro-for-elasticsearch/sql#221: Conflicts with comma join

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Why aggregation still use ElasticsearchQueryRequest instead of ElasticsearchScrollRequest

ElasticsearchQueryRequest has a size limit(default is 200) which make aggregation did not work well. Apparently you know that cause I see the comment in class of ElasticsearchNodeClient:

  /**
   * TODO: Scroll doesn't work for aggregation. Support aggregation later.
   */
  @Override
  public ElasticsearchResponse search(ElasticsearchRequest request) {
    return request.search(
        req -> client.search(req).actionGet(),
        req -> client.searchScroll(req).actionGet()
    );
  }

And I see ElasticsearchScrollRequest is already there, but ElasticsearchIndexScan still use ElasticsearchQueryRequest in its constructor function:

  /**
   * Todo.
   */
  public ElasticsearchIndexScan(ElasticsearchClient client,
                                Settings settings, String indexName,
                                ElasticsearchExprValueFactory exprValueFactory) {
    this.client = client;
    this.request = new ElasticsearchQueryRequest(indexName,
            settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory);
  }

I am wondering why.

Add support for distinct aggregation

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
Add support for the following aggregations. Note that COUNT(DISTINCT) is highest priority and others are optional. MySQL reference: https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count

  1. COUNT(DISTINCT expr)
  2. AVG(DISTINCT expr)
  3. SUM(DISTINCT expr)
  4. MIN(DISTINCT expr)
  5. MAX(DISTINCT expr)

PPL may also needs to support count distinct. Reference: https://docs.splunk.com/Documentation/SCS/current/SearchReference/Aggregatefunctions#distinct_count.28.26lt.3Bvalue.26gt.3B.29_or_dc.28.26lt.3Bvalue.26gt.3B.29

Describe alternatives you've considered
N/A

[BUG] Non-paginated results returned silently if query not supported by cursor

Describe the bug
If the query is not supported by cursor to paginate, such as GROUP BY, full result set is returned silently instead of error with clear message.

To Reproduce

  1. Create any index
  2. Enable cursor setting manually or merge PR #75
  3. Run a GROUP BY query
  4. See the empty response as below
curl -H 'Content-Type: application/json' -XPOST "http://localhost:9200/_plugins/_sql" -d'
{
   "query": "SELECT state.keyword, count(1) FROM accounts GROUP BY state",
   "fetch_size": 10
}'
{
  "schema": [
    {
      "name": "state.keyword",
      "type": "double"
    },
    {
      "name": "COUNT(1)",
      "type": "double"
    }
  ],
  "total": 51,
  "datarows": [
    [
      "TX",
      0
    ],
    [
      "MD",
      0
    ],
    [
      "ID",
    ......
  ],
  "size": 51,
  "status": 200
}

Expected behavior
Expect an error thrown with clear message saying that the type of query is not supported for pagination yet.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] Mac
  • Version [e.g. 22] OpenSearch beta-1 (or current develop branch)

[BUG] Tableau integration with OpenDistro using JDBC driver

Describe the bug
I am trying to connect to a remote OpenDistro (v1.13.2) using Tableau (v2021.1) via the JDBC driver (opendistro-sql-jdbc-1.13.0.0.jar). Tableau is able list the indexes and the fields in each index. It is able to associate the data type of each field appropriately (STRING vs DOUBLE etc). However, when I try to create a Tableau chart by dragging/dropping index fields into the Tableau column/row boxes, I get the following error:

2021-05-06 13:16:03.900 -0500 (,,,,2,77) grpc-default-executor-6 : INFO com.tableau.connect.service.ProtocolService - Running query
SELECT kibana_sample_data_flights.Carrier AS Carrier
FROM kibana_sample_data_flights
GROUP BY kibana_sample_data_flights.Carrier
2021-05-06 13:16:03.900 -0500 (,,,,2,77) grpc-default-executor-6 : INFO com.tableau.connect.grpc.GrpcProtocolService - End local request 77 /runQuery.
2021-05-06 13:16:03.911 -0500 (,,,,2,77) pool-3-thread-4 : INFO com.tableausoftware.jdbc.FetchSizeHelpers - Setting max result buffer size to 322122547 bytes, 30% of the max heap size.
2021-05-06 13:16:03.911 -0500 (,,,,2,77) pool-3-thread-4 : INFO com.tableausoftware.jdbc.FetchSizeHelpers - Setting adaptive fetch size to 2000.
2021-05-06 13:16:03.911 -0500 (,,,,2,77) pool-3-thread-4 : INFO com.tableausoftware.data.ProtobufJDBCResultSet - BindingInfo{columnName='Carrier', columnTypeName='DOUBLE', columnLabel='Carrier', columnType=8, precision=15, scale=0, columnDisplaySize=25, protobufType=DOUBLE, nullable=unknown}
2021-05-06 13:16:03.911 -0500 (,,,,2,77) pool-3-thread-4 : ERROR com.tableau.connect.service.QueryTask - Query for protocol 2 exiting with error class java.sql.SQLDataException
2021-05-06 13:16:03.911 -0500 (,,,,2,78) grpc-default-executor-6 : ERROR com.tableau.connect.grpc.GrpcProtocolService - Failed in fetchResults. SQLState=null, ErrorCode=0.
java.sql.SQLDataException: Can not parse Logstash Airways as a Double

To Reproduce
Steps to reproduce the behavior:

  1. Open up Tableau 2021.1
  2. Select "Other Databases (JDBC)".
  3. In the "Other Databases (JDBC) dialog type:
  • URL: jdbc:elasticsearch://https://[remote-opendistro-server-dns-or-ip]:9200/
  • Dialect: you can select any one but I selected MySQL
  • Username: [userid for your local OpenDistro]
  • Password: [password for your local OpenDistro]
  • Properties file: [path to your properties file]. Create a properties file with the following contents:
    • auth=BASIC
    • hostnameVerification=true
    • user=xxxxx
    • password=xxxxx
    • trustSelfSigned=true
    • useSSL=true
    • trustStoreLocation=[path to jks file]
    • trustStorePassword=xxxxx
  1. Click on "Sign In" button

  2. In the next screen select the desired table/index. I selected the kibana_sample_data_flights index. You should then see the field names from the index displayed.

  3. Click on the "Sheet 1" tab to create a new chart.

  4. Drag/Drop a string field into the "Columns/Rows" boxes. I dragged the "Carrier" field into the Columns box You should then get an error like:
    An error occurred while communicating with the Other Databases (JDBC) data source 'kibana_sample_data_flights (jdbc:elasticsearch://https://wd14efkode01.aci.awscloud:9200/)'
    Bad Connection: Tableau could not connect to the data source.
    Error Code: FAB9A2C5
    Can not parse Logstash Airways as a Double
    SELECT kibana_sample_data_flights.Carrier AS Carrier
    FROM kibana_sample_data_flights
    GROUP BY kibana_sample_data_flights.Carrier

  5. Check the Tableau logs and you will see that the error is:
    java.sql.SQLDataException: Can not parse Logstash Airways as a Double
    at com.amazon.opendistroforelasticsearch.jdbc.types.TypeHelper.stringConversionException(TypeHelper.java:33) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.types.DoubleType.asDouble(DoubleType.java:50) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.types.DoubleType.fromValue(DoubleType.java:38) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.types.DoubleType.fromValue(DoubleType.java:22) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.types.BaseTypeConverter.convert(BaseTypeConverter.java:60) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.ResultSetImpl.getObjectX(ResultSetImpl.java:589) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.ResultSetImpl.getObjectX(ResultSetImpl.java:583) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.ResultSetImpl.getDoubleX(ResultSetImpl.java:305) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.ResultSetImpl.getDouble(ResultSetImpl.java:299) ~[?:?]
    at com.tableausoftware.data.ProtobufJDBCResultSet.addValue(ProtobufJDBCResultSet.java:451) ~[jdbcserver.jar:20211.0.17]
    at com.tableausoftware.data.ProtobufJDBCResultSet.addRow(ProtobufJDBCResultSet.java:410) ~[jdbcserver.jar:20211.0.17]
    at com.tableausoftware.data.ProtobufJDBCResultSet.fetchRows(ProtobufJDBCResultSet.java:230) ~[jdbcserver.jar:20211.0.17]
    at com.tableausoftware.data.ProtobufJDBCResultSet.nextBlockImpl(ProtobufJDBCResultSet.java:282) ~[jdbcserver.jar:20211.0.17]
    at com.tableausoftware.data.ProtobufResultSet.nextBlock(ProtobufResultSet.java:48) ~[jdbcserver.jar:20211.0.17]
    at com.tableau.connect.service.QueryTask.readData(QueryTask.java:139) ~[jdbcserver.jar:20211.0.17]
    at com.tableau.connect.service.QueryTask.call(QueryTask.java:102) ~[jdbcserver.jar:20211.0.17]
    at com.tableau.connect.service.QueryTask.call(QueryTask.java:56) ~[jdbcserver.jar:20211.0.17]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]
    Caused by: java.lang.NumberFormatException: For input string: "Logstash Airways"
    at jdk.internal.math.FloatingDecimal.readJavaFormatString(Unknown Source) ~[?:?]
    at jdk.internal.math.FloatingDecimal.parseDouble(Unknown Source) ~[?:?]
    at java.lang.Double.parseDouble(Unknown Source) ~[?:?]
    at java.lang.Double.valueOf(Unknown Source) ~[?:?]
    at com.amazon.opendistroforelasticsearch.jdbc.types.DoubleType.asDouble(DoubleType.java:48) ~[?:?]
    ... 19 more

Or for query issues:

  1. Create the index with test documents by 'POST ...'
  2. Run the query 'SELECT ...'
  3. See error

Expected behavior
I expect that the integration between Tableau/OpenDistro via JDBC driver works for the remote OpenDistro like it does for the local Opendistro.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [RedHat Linux]
  • Version [Red Hat Enterprise Linux Server 7.9 (Maipo)]
  • Kernel: Linux 3.10.0-1160.25.1.el7.x86_64
  • OpenDistro v1.13.2
  • Tableau Desktop For Mac v2021.1

Additional context
Add any other context about the problem here.

Replace the icon with OpenSearch new icon image in ODBC installers

OpenSearch RC 1 release

Get SQL/PPL, JDBC, ODBC, CLI and Query Workbench ready for OpenSearch RC 1 release, including:

  1. Build against OpenSearch 1.x branch (1.0 later) instead of beta1 or rc1 tag.
  2. Version bump and rename artifacts to xxx-1.0.0.0-rc1.zip
  3. Add release notes
  4. Merge to main branch and cut release branch from there
  5. Tag and publish release

Plugin Settings Backwards Compatibility

  1. Add support backwards compatibility for SQL plugin settings
  2. Currently PPL settings can be only changed via Elasticsearch _cluster/settings, we need to enable PPL settings in plugin settings using the similar endpoint with SQL settings. Related issue: opendistro-for-elasticsearch/sql#981
  3. Change the settings to consistently in _plugins/_query/settings endpoint and in plugins.sql/ppl/query.* naming for SQL/PPL/common settings

OpenSearch function support

Is your feature request related to a problem? Please describe.
Currently both SQL (new engine) and PPL doesn't support any OpenSearch functions. We should prioritize this, especially the most common full text search and aggregate functions.

Here are the feature requests from community:

  1. opendistro-for-elasticsearch/sql#1093: Percentile function support in PPL stats
  2. opendistro-for-elasticsearch/sql#1075: Geo function support in SQL
  3. opendistro-for-elasticsearch/sql#878: Geo point support
  4. opendistro-for-elasticsearch/sql#530: Histogram
  5. opendistro-for-elasticsearch/sql#313: Geo point support

Geo point: opendistro-for-elasticsearch/sql#1064
Meta fields: opendistro-for-elasticsearch/sql#1038
Aggregate params: opendistro-for-elasticsearch/sql#615
Array data: opendistro-for-elasticsearch/sql#618

Describe the solution you'd like
Expose OpenSearch functions in SQL/PPL.

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Simple SELECT queries fail to be anonymized

Describe the bug
QueryDataAnonymizer throws exception and fail to handle simple SELECT queries.

To Reproduce
Steps to reproduce the behavior:

  1. Use test index accounts
  2. Run the query SELECT * FROM accounts GROUP BY state HAVING count(1) > 10
  3. See error

Expected behavior
Expect identifier names in the original query be anonymized.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Version [e.g. 22] OpenSearch SQL 1.0.0.0-beta1

Additional context

Error logs as below:

[2021-04-27T14:34:54,532][WARN ][c.a.o.s.l.u.QueryDataAnonymizer] Caught an exception when anonymizing sensitive data
[2021-04-27T14:34:54,534][INFO ][c.a.o.s.l.p.RestSqlAction] Incoming request /_opensearch/_sql?format=jdbc: SELECT * FROM accounts GROUP BY state HAVING count(1) > 10;

JOIN support improvement

Is your feature request related to a problem? Please describe.

There is limited support for JOIN queries which needs improvement. In particular, post processing such as filtering and aggregation after inner/outer join is most wanted. This capability is required by PPL lookup command.

Here are the feature requests from community:

  1. opendistro-for-elasticsearch/sql#505: Left join
  2. opendistro-for-elasticsearch/sql#351: Multi-join
  3. opendistro-for-elasticsearch/sql#301: Aggregate after join
  4. opendistro-for-elasticsearch/sql#289: Select function after join
  5. opendistro-for-elasticsearch/sql#236: Aggregate after join
  6. opendistro-for-elasticsearch/sql#221: Comma join
  7. opendistro-for-elasticsearch/sql#124: Left join with WHERE
  8. opendistro-for-elasticsearch/sql#110: Aggregate after join
  9. opendistro-for-elasticsearch/sql#987: Multi-join

Describe the solution you'd like
N/A

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Legacy settings are rejected by RestQuerySettingAction

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
Send request from OpenSearch Dashboards dev tools:

PUT _plugins/_query/settings
{
  "transient" : {
    "opendistro.sql.enabled" : "true"
  }
}

Got error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "action_request_validation_exception",
        "reason" : "Validation Failed: 1: no settings to update;"
      }
    ],
    "type" : "action_request_validation_exception",
    "reason" : "Validation Failed: 1: no settings to update;"
  },
  "status" : 400
}

Expected behavior
Setting should be able to update with either legacy setting opendistro.sql.*, opendistro.ppl.*, opendistro.query.* or the new setting plugins.sql.*, plugins.ppl.*, plugins.query.*

Plugins
Please list all plugins currently enabled.
SQL

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Version [e.g. 22] Catalina

Additional context
Add any other context about the problem here.

[BUG] Query creates unexpected behaviour and cannot be explained

Describe the bug
Some queries will execute and yet cannot be explained. Query result was unexpected.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Query Workbench
  2. Run SHOW tables LIKE .% (note the .)
  3. Click the Run button
  4. See the results (I have two less here than with SHOW tables LIKE % - why? ๐Ÿคทโ€โ™‚๏ธ Doesn't seem to follow the wildcard pattern)
  5. Click 'Explain'
  6. The results will be "This query is not explainable."

Expected behavior
Any run-able query can be explained.

Plugins
All OpenSearch Plugins.

Host/Environment (please complete the following information):

  • OS: MacOs
  • Version Chrome Version 90.0.4430.212 (Official Build) (x86_64)
  • OpenSearch 1.0 RC-1

Add support for stats aggregate function

Is your feature request related to a problem? Please describe.
Currently stats aggregate function support is missing in both SQL and PPL.

Describe the solution you'd like
Add support for the following stats aggregate function in SQL. MySQL reference: https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_std

1.STD
2.STDDEV
3.STDDEV_POP
4.STDDEV_SAMP
5.VAR_POP
6.VAR_SAMP
7.VARIANCE

PPL also needs to support some of them. Reference: https://docs.splunk.com/Documentation/SCS/current/SearchReference/Aggregatefunctions#stdev.28.26lt.3Bvalue.26gt.3B.29

Note that these functions may be optimized by pushing down to OpenSearch stats/extended_stats aggregation API: https://docs-beta.opensearch.org/docs/opensearch/metric-agg/#stats-extended_stats-matrix_stats. This can be part of #44 so this task can focus on basic implementation without pushdown.

Describe alternatives you've considered
N/A

Set operation improvement

Is your feature request related to a problem? Please describe.
Improvements on set operations such as union and intersection by migrating to the new SQL engine.

Here are the feature requests from community:

  1. opendistro-for-elasticsearch/sql#388: Old set operation only works with JSON format.
  2. opendistro-for-elasticsearch/sql#371: Union multi indices
  3. opendistro-for-elasticsearch/sql#114: Explain issue

Describe the solution you'd like
N/A

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

Disable the legacy SQL/PPL Setting

1.1 Descriptions
The following SQL plugin settings may be deprecated:

1.2 Proposal & Impacts

  • [A] Enable and remove the setting: Users cannot disable new engine: should be fine because new engine is already turned on in ODFE 1.13.
  • [B] Disable and remove all the settings: No semantic check when queries fall back.
  • [C] Make JDBC default and remove the setting: This was added long time ago when default response format switched from JSON to JDBC (table). Should be fine to remove this directly.

Open question: Should JSON formatter be deprecated? Currently queries are handled by old engine if format=json present. This may confuse users if the old engine behaves different due to missing feature.

[BUG] Why nested has a limit of 3?

Search for "size" : 3, in the text below.

Why this limit? It appears returning only three result top.

I tried to look in code but I couldn't find where it is defined.
Can anyone point it to me?

SQL query:

POST /_opendistro/_sql
{
  "query" : """
        SELECT e.name AS employeeName,
               p.name AS projectName
        FROM employees_nested AS e,
             e.projects AS p
        WHERE p.name LIKE '%security%'
        """
}

Explain:

{
  "from" : 0,
  "size" : 200,
  "query" : {
    "bool" : {
      "filter" : [
        {
          "bool" : {
            "must" : [
              {
                "nested" : {
                  "query" : {
                    "wildcard" : {
                      "projects.name" : {
                        "wildcard" : "*security*",
                        "boost" : 1.0
                      }
                    }
                  },
                  "path" : "projects",
                  "ignore_unmapped" : false,
                  "score_mode" : "none",
                  "boost" : 1.0,
                  "inner_hits" : {
                    "ignore_unmapped" : false,
                    "from" : 0,
                    "size" : 3,
                    "version" : false,
                    "seq_no_primary_term" : false,
                    "explain" : false,
                    "track_scores" : false,
                    "_source" : {
                      "includes" : [
                        "projects.name"
                      ],
                      "excludes" : [ ]
                    }
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "name"
    ],
    "excludes" : [ ]
  }
}

Enable scalar function aggregation in PPL

Is your feature request related to a problem? Please describe.

opendistro-for-elasticsearch/sql#1079

Describe the solution you'd like
Actually the core engine already has this capability. What is needed is just to turn on this in ANTLR grammar in PPL.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Nested field query improvement

Is your feature request related to a problem? Please describe.
Nested field query is supported by following PartiQL specification: https://github.com/opensearch-project/sql/blob/main/docs/user/beyond/partiql.rst. There are limitations regarding the nested levels and query types that needs improvement.

  1. opendistro-for-elasticsearch/sql#1055: Comparison between nested field
  2. opendistro-for-elasticsearch/sql#963: ?
  3. opendistro-for-elasticsearch/sql#797: Nested field involved in join
  4. opendistro-for-elasticsearch/sql#399: Select *
  5. opendistro-for-elasticsearch/sql#398: Aggregate over nested field
  6. opendistro-for-elasticsearch/sql#245: Nested field involved in join
  7. opendistro-for-elasticsearch/sql#155: Aggregate over nested field
  8. opendistro-for-elasticsearch/sql#152: Deep nested field
  9. opendistro-for-elasticsearch/sql#87: Filter nested field

Describe the solution you'd like
N/A

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

Subquery support improvement

Is your feature request related to a problem? Please describe.
Currently there are only limited support for subqueries in FROM and WHERE clause: https://github.com/opensearch-project/sql/blob/main/docs/user/dql/complex.rst#subquery.

Here are the feature requests from community:

  1. [Low] opendistro-for-elasticsearch/sql#1109: Subqueries in SELECT clause
  2. [Medium] opendistro-for-elasticsearch/sql#1111: JOIN between subqueries

Describe the solution you'd like
N/A

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

Ensure backwards compatibility with ODFE

Coming from opensearch-plugins#12, enable upgrade to OpenSearch, while supporting backward compatibility.

  • Rename Namespaces (e.g. com.amazon.opendistroforelasticsearch to org.opensearch)
  • Rename Classes, Methods, Variables
  • Rename Remaining Identifiers (e.g. Opendistro to OpenSearch).
  • Rename, and Backwards Compatibility for Rest APIs
  • Rename, and Backwards Compatibility for Settings
  • Rename, and Backwards Compatibility for Indices
  • Run in a backwards compatible way on top of OpenSearch 1.0 that has joined an ES 7.10.x cluster
  • Run in a backwards compatible way on top of OpenSearch 1.0 that has joined an ODFE 1.13.x cluster
  • Drop in replacement for the Opendistro version 1.13 of the plugin

Supporting IN in new Engine [BUG]

[2021-05-19T01:03:13,199][INFO ][c.a.o.s.l.p.RestSQLQueryAction] [elasticsearch_wqrgrid0660_0] Request is falling back to old SQL engine due to: Failed to parse query due to offending symbol [in] at: '
    SELECT ID FROM crygojxttsi6fvsthw2eaq 
    WHERE (Type in' <--- HERE... More details: Expecting tokens in {'AND', 'IS', 'LIKE', 'NOT', 'OR', 'REGEXP', '*', '/', '%', '+', '-', '=', '>', '<', '!', '.', ')'}```

It looks like the IN is not yet supported in the new engine. 

Some queries that have both `IN` and a comparison between two fields will fail because the old engine doesn't seem to be able to handle the comparison of the two fields, and the new one doesn't handle IN.

I would like to work on a patch, do you know of any specific blocker or macro steps that I should work on? 
A 3-4 liners practical guidance would be great :) 

Thank you!

How to enable Debug for integ-test ES server in elastic search

Running normal tests (without ES RestIntegTestTask) in intellij is very easy. We just import the project and run it it the Debug.

However I cannot find a way to run the underlying ES search server in debug mode.
I tried to add -Dcluster.debug -Dtest.debug, but it doesn't seem opening any port and the test will just run without stopping at breakpoints.

I tried to look at the code of the RestIntegTestTask gradle task (https://github.com/search?q=RestIntegTestTask&type=code), but not much insight from it.

Any advice?

Date time query improvement

Is your feature request related to a problem? Please describe.
Many date functions are already supported though some common ones are still missing (in new engine). Meanwhile time zone support is also high demand.

Here are the feature requests from community:

  1. Functions: opendistro-for-elasticsearch/sql#709
    1.1 opendistro-for-elasticsearch/sql#1073: NOW()
  2. Time zone: opendistro-for-elasticsearch/sql#1062

Describe the solution you'd like
N/A

Describe alternatives you've considered
N/A

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.