It would be really great, instead of using the REST API of ES, to allow querying ES di

Yes, it could be useful. You can also use the JDBC <a href="https://github.com/Anc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Allow querying ES index directly with CQL about elassandra HOT 14 CLOSED

strapdata commented on May 22, 2024

Allow querying ES index directly with CQL

from elassandra.

Comments (14)

vroyer commented on May 22, 2024

Yes, it could be useful.
You can also use the JDBC sql4es driver to query elassandra with search and group by features. Here an example of code :

Class.forName("nl.anchormen.sql4es.jdbc.ESDriver");
Connection con = DriverManager.getConnection("jdbc:sql4es://localhost:9300/twitter?cluster.name=Test%20Cluster");
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("SELECT user,avg(size),count(*) FROM tweet GROUP BY user");
ResultSetMetaData rsmd = rs.getMetaData();
int nrCols = rsmd.getColumnCount();
while(rs.next()){
    for(int i=1; i<=nrCols; i++){
         System.out.println(rs.getObject(i));
     }
}
rs.close();
con.close();

Of course, there is no connexion failover but it could be added by integrating the cassandra driver (or a fat cassandra client) to know about cassandra nodes status. Thus, the JDBC driver would be fault tolerant….

from elassandra.

ddorian commented on May 22, 2024

@vroyer It would be nicer to use the same cql-connection as you normally do with cassandra. This way you lower overhead by not keeping http connections to the nodes.

from elassandra.

ddorian commented on May 22, 2024

@vroyer
Can't there be a simple function like /_search that gets a json string argument and returns a json-string response ? This would just call the internal /_search function and return 1 row with only 1 column which has a json string like the normal request. The same could be done with /_msearch.
This should be much simpler to implement on the server-side compared to lucene-index. And since es returns many things compared to "simple rows" that lucene-index does, this should be the way to go.
In best case, it could also accept a partition-key(s) on the "where" clause, so the cql-client can forward to the right node if using a TokenAwareBalancer.

Then each client, can write a separate transport-adapter if they want to support this interface (example in python: https://elasticsearch-py.readthedocs.io/en/master/transports.html) that internally issues cql-queries but has the api of elastic-search.

Makes sense ?

from elassandra.

vroyer commented on May 22, 2024

Hi,
I’m not sure to understand your need, but if you need a JSON REST access to C*, elassandra can act as a gateway as describe on http://doc.elassandra.io/en/latest/mapping.html#elassandra-as-a-json-rest-gateway http://doc.elassandra.io/en/latest/mapping.html#elassandra-as-a-json-rest-gateway
Thanks’.
Vincent.

Le 6 oct. 2016 à 13:07, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer
Can't there be a simple function like /_search that gets a json string argument and returns a json-string response ? This would just call the internal /_search function and return 1 row with only 1 column which has a json string like the normal request. The same could be done with /_msearch.
This should be much simpler to implement on the server-side compared to lucene-index. And since es returns many things compared to "simple rows" that lucene-index does, this should be the way to go.
In best case, it could also accept a partition-key(s) on the "where" clause, so the cql-client can forward to the right node if using a TokenAwareBalancer.

Then each client, can write a separate transport-adapter if they want to support this interface (example in python: https://elasticsearch-py.readthedocs.io/en/master/transports.html https://elasticsearch-py.readthedocs.io/en/master/transports.html) that internally issues cql-queries but has the api of elastic-search.

Makes sense ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmQ6R-ndNFJi8_HiIzP2mnCD5Y8gdks5qxNZhgaJpZM4H_-P6.

from elassandra.

ddorian commented on May 22, 2024

@vroyer I meant the exact opposite, accessing elasticsearch from cassandra.
My idea of the simplest implementation would be:
"SELECT es_search(json_string) FROM table where partition_key=X LIMIT 1".
json_string = {index:'', doc_type:'', body:{}, params:{routing,source, etc}}
es_search() function would just call the /_search function inside elasticsearch and return 1 json-blob which is the body of the http-response that elastic-search normally returns.

This will make that you don't have to keep es-http-connections and contact minimum amount of nodes (since es doesn't have routing->server mapping on the client, while cassandra does).

And then the same thing for /_msearch, /_count etc.

Makes sense ?

from elassandra.

ddorian commented on May 22, 2024

@vroyer What do you think about #14 (comment)

from elassandra.

vroyer commented on May 22, 2024

I'd prefer https://github.com/Anchormen/sql4es with failover+LB features from the C* driver (ES query in CQL won't work with regular reporting tools).

See :
https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/Cluster.html#register-com.datastax.driver.core.Host.StateListener-
https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/LatencyTracker.html

https://github.com/elastic/elasticsearch/blob/2.4/core/src/main/java/org/elasticsearch/client/transport/TransportClientNodesService.java => extend NodeSampler with C* driver monitoring features to provide HA+LB...

Le 8 nov. 2016 à 11:36, ddorian a écrit :

@vroyer What do you think about #14 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from elassandra.

ddorian commented on May 22, 2024

@vroyer My whole reason was to use less resources/connections to the nodes when using it in your application. I don't want to learn sql4es, because it will suck compared to es-query-lang (missing features, less flexibility etc).
Reporting can use the existing es-api.

from elassandra.

vroyer commented on May 22, 2024

Do you mean that elasticsearch client API consume more resources that the CQL driver ? (due to connection pooling ?)

Le 9 nov. 2016 à 10:26, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer My whole reason was to use less resources/connections to the nodes when using it in your application. I don't want to learn sql4es, because it will suck compared to es-query-lang (missing features, less flexibility etc).
Reporting can use the existing es-api.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmb2F65RFcNeGH4aYrKqGVz0QngOHks5q8ZHfgaJpZM4H_-P6.

from elassandra.

ddorian commented on May 22, 2024

@vroyer I mean that you will have to each node 2 connections, for es and for cql. And while you can _route on es, you go to random-node, and that node knows how to REALLY _route to the right node. While with cql you can _route to the right node with 1 network hop (makes sense?).

I'm saying that es-api is slower/less-efficient because of protocol(http,json) and not having metadata on the client to route to the right node.
Example in this case you can change json input/output of elastic to msgpack in cql. And you keep only 1 connection/driver on your app for talking to both es/cql.

from elassandra.

vroyer commented on May 22, 2024

Yes, i agree, but the idea is to mix the 2 drivers to help the ES client driver to connect to an available node.

Instead of doing connection pooling, the ES driver could get this information from the CQL driver (CQL use server to client notification whereas ES use periodic pooling), and for ES search with routing, ES driver could get to right node from the CQL driver token map

And ES driver use a binary protocol on 9300/tcp.

Le 9 nov. 2016 à 13:18, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer I mean that you will have to each node 2 connections, for es and for cql. And while you can _route on es, you go to random-node, and that node knows how to REALLY _route to the right node. While with cql you can _route to the right node with 1 network hop (makes sense?).

I'm saying that es-api is slower/less-efficient because of protocol(http,json) and not having metadata on the client to route to the right node.
Example in this case you can change json input/output of elastic to msgpack in cql. And you keep only 1 connection/driver on your app for talking to both es/cql.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmZoEJs7QLSMH_lTlpJtlQEWu_SUeks5q8boogaJpZM4H_-P6.

from elassandra.

ddorian commented on May 22, 2024

I think 9300 has only a java-client and is only used for inter-node communication. They even did a rest-api-client for java.

In my idea, it would take:::

adding some functions to cassandra that internally calls the es-api (like search(), msearch() etc).
either use the cql client directly, OR, you can write an adapter/transport (for each language/driver) to make use of that connection (in python https://elasticsearch-py.readthedocs.io/en/master/transports.html)

If using 2, you would have to just change the transport/adapter and es would use internally cql + msgpack and have the same api on the client.

I don't know how your idea can be developed though? Does it require more/less developer-time on server or client ?
In your idea, es would still make http-connection + json serialization on 9200 port (or you would need to write custom serializers (like the java client) for each language).

from elassandra.

hkroger commented on May 22, 2024

Basically the approach would be similar as with this:
https://github.com/Stratio/cassandra-lucene-index

SELECT * FROM tweets WHERE expr(tweets_index, '{ filter: {type: "range", field: "time", lower: "2014/04/25", upper: "2014/05/01"} }');

from elassandra.

vroyer commented on May 22, 2024

This feature is now supported in the Enterprise version of Elassandra, as described in the documentation. It also provides ES aggregation support from Apache Spark, as explained here.

from elassandra.

Allow querying ES index directly with CQL about elassandra HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent