Code Monkey home page Code Monkey logo

Comments (14)

vroyer avatar vroyer commented on May 22, 2024

Yes, it could be useful.
You can also use the JDBC sql4es driver to query elassandra with search and group by features. Here an example of code :

Class.forName("nl.anchormen.sql4es.jdbc.ESDriver");
Connection con = DriverManager.getConnection("jdbc:sql4es://localhost:9300/twitter?cluster.name=Test%20Cluster");
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("SELECT user,avg(size),count(*) FROM tweet GROUP BY user");
ResultSetMetaData rsmd = rs.getMetaData();
int nrCols = rsmd.getColumnCount();
while(rs.next()){
    for(int i=1; i<=nrCols; i++){
         System.out.println(rs.getObject(i));
     }
}
rs.close();
con.close();

Of course, there is no connexion failover but it could be added by integrating the cassandra driver (or a fat cassandra client) to know about cassandra nodes status. Thus, the JDBC driver would be fault tolerant….

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer It would be nicer to use the same cql-connection as you normally do with cassandra. This way you lower overhead by not keeping http connections to the nodes.

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer
Can't there be a simple function like /_search that gets a json string argument and returns a json-string response ? This would just call the internal /_search function and return 1 row with only 1 column which has a json string like the normal request. The same could be done with /_msearch.
This should be much simpler to implement on the server-side compared to lucene-index. And since es returns many things compared to "simple rows" that lucene-index does, this should be the way to go.
In best case, it could also accept a partition-key(s) on the "where" clause, so the cql-client can forward to the right node if using a TokenAwareBalancer.

Then each client, can write a separate transport-adapter if they want to support this interface (example in python: https://elasticsearch-py.readthedocs.io/en/master/transports.html) that internally issues cql-queries but has the api of elastic-search.

Makes sense ?

from elassandra.

vroyer avatar vroyer commented on May 22, 2024

Hi,
I’m not sure to understand your need, but if you need a JSON REST access to C*, elassandra can act as a gateway as describe on http://doc.elassandra.io/en/latest/mapping.html#elassandra-as-a-json-rest-gateway http://doc.elassandra.io/en/latest/mapping.html#elassandra-as-a-json-rest-gateway
Thanks’.
Vincent.

Le 6 oct. 2016 à 13:07, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer
Can't there be a simple function like /_search that gets a json string argument and returns a json-string response ? This would just call the internal /_search function and return 1 row with only 1 column which has a json string like the normal request. The same could be done with /_msearch.
This should be much simpler to implement on the server-side compared to lucene-index. And since es returns many things compared to "simple rows" that lucene-index does, this should be the way to go.
In best case, it could also accept a partition-key(s) on the "where" clause, so the cql-client can forward to the right node if using a TokenAwareBalancer.

Then each client, can write a separate transport-adapter if they want to support this interface (example in python: https://elasticsearch-py.readthedocs.io/en/master/transports.html https://elasticsearch-py.readthedocs.io/en/master/transports.html) that internally issues cql-queries but has the api of elastic-search.

Makes sense ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmQ6R-ndNFJi8_HiIzP2mnCD5Y8gdks5qxNZhgaJpZM4H_-P6.

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer I meant the exact opposite, accessing elasticsearch from cassandra.
My idea of the simplest implementation would be:
"SELECT es_search(json_string) FROM table where partition_key=X LIMIT 1".
json_string = {index:'', doc_type:'', body:{}, params:{routing,source, etc}}
es_search() function would just call the /_search function inside elasticsearch and return 1 json-blob which is the body of the http-response that elastic-search normally returns.

This will make that you don't have to keep es-http-connections and contact minimum amount of nodes (since es doesn't have routing->server mapping on the client, while cassandra does).

And then the same thing for /_msearch, /_count etc.

Makes sense ?

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer What do you think about #14 (comment)

from elassandra.

vroyer avatar vroyer commented on May 22, 2024

I'd prefer https://github.com/Anchormen/sql4es with failover+LB features from the C* driver (ES query in CQL won't work with regular reporting tools).

See :
https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/Cluster.html#register-com.datastax.driver.core.Host.StateListener-
https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/LatencyTracker.html

https://github.com/elastic/elasticsearch/blob/2.4/core/src/main/java/org/elasticsearch/client/transport/TransportClientNodesService.java => extend NodeSampler with C* driver monitoring features to provide HA+LB...

Le 8 nov. 2016 à 11:36, ddorian a écrit :

@vroyer What do you think about #14 (comment)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer My whole reason was to use less resources/connections to the nodes when using it in your application. I don't want to learn sql4es, because it will suck compared to es-query-lang (missing features, less flexibility etc).
Reporting can use the existing es-api.

from elassandra.

vroyer avatar vroyer commented on May 22, 2024

Do you mean that elasticsearch client API consume more resources that the CQL driver ? (due to connection pooling ?)

Le 9 nov. 2016 à 10:26, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer My whole reason was to use less resources/connections to the nodes when using it in your application. I don't want to learn sql4es, because it will suck compared to es-query-lang (missing features, less flexibility etc).
Reporting can use the existing es-api.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmb2F65RFcNeGH4aYrKqGVz0QngOHks5q8ZHfgaJpZM4H_-P6.

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

@vroyer I mean that you will have to each node 2 connections, for es and for cql. And while you can _route on es, you go to random-node, and that node knows how to REALLY _route to the right node. While with cql you can _route to the right node with 1 network hop (makes sense?).

I'm saying that es-api is slower/less-efficient because of protocol(http,json) and not having metadata on the client to route to the right node.
Example in this case you can change json input/output of elastic to msgpack in cql. And you keep only 1 connection/driver on your app for talking to both es/cql.

from elassandra.

vroyer avatar vroyer commented on May 22, 2024

Yes, i agree, but the idea is to mix the 2 drivers to help the ES client driver to connect to an available node.

Instead of doing connection pooling, the ES driver could get this information from the CQL driver (CQL use server to client notification whereas ES use periodic pooling), and for ES search with routing, ES driver could get to right node from the CQL driver token map

And ES driver use a binary protocol on 9300/tcp.

Le 9 nov. 2016 à 13:18, ddorian [email protected] a écrit :

@vroyer https://github.com/vroyer I mean that you will have to each node 2 connections, for es and for cql. And while you can _route on es, you go to random-node, and that node knows how to REALLY _route to the right node. While with cql you can _route to the right node with 1 network hop (makes sense?).

I'm saying that es-api is slower/less-efficient because of protocol(http,json) and not having metadata on the client to route to the right node.
Example in this case you can change json input/output of elastic to msgpack in cql. And you keep only 1 connection/driver on your app for talking to both es/cql.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #14 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmZoEJs7QLSMH_lTlpJtlQEWu_SUeks5q8boogaJpZM4H_-P6.

from elassandra.

ddorian avatar ddorian commented on May 22, 2024

I think 9300 has only a java-client and is only used for inter-node communication. They even did a rest-api-client for java.

In my idea, it would take:::

  1. adding some functions to cassandra that internally calls the es-api (like search(), msearch() etc).
  2. either use the cql client directly, OR, you can write an adapter/transport (for each language/driver) to make use of that connection (in python https://elasticsearch-py.readthedocs.io/en/master/transports.html)

If using 2, you would have to just change the transport/adapter and es would use internally cql + msgpack and have the same api on the client.

I don't know how your idea can be developed though? Does it require more/less developer-time on server or client ?
In your idea, es would still make http-connection + json serialization on 9200 port (or you would need to write custom serializers (like the java client) for each language).

from elassandra.

hkroger avatar hkroger commented on May 22, 2024

Basically the approach would be similar as with this:
https://github.com/Stratio/cassandra-lucene-index

SELECT * FROM tweets WHERE expr(tweets_index, '{ filter: {type: "range", field: "time", lower: "2014/04/25", upper: "2014/05/01"} }');

from elassandra.

vroyer avatar vroyer commented on May 22, 2024

This feature is now supported in the Enterprise version of Elassandra, as described in the documentation. It also provides ES aggregation support from Apache Spark, as explained here.

from elassandra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.