I'm sorry to open a github issue, didn't find any other ways to ask a question and did

For the last problem - I have to use <a href="http://stackoverflow.com/questions/12117

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Indexing data injected from Cassandra [QUESTION] about elassandra HOT 10 CLOSED

strapdata commented on May 23, 2024

Indexing data injected from Cassandra [QUESTION]

from elassandra.

Comments (10)

vroyer commented on May 23, 2024

Hi,
You can build your elasticsearch mapping from your CQL schema with the discover feature introduced by elassandra (See http://doc.elassandra.io/en/latest/mapping.html#bidirectionnal-mapping ). It will create a mapping for your cassandra table. By default, text column are not analyzed (considered as word), but if you want to index an analyzed column, you can mix automatic discovery and explicit mapping as shown in the doc (use "^(?!(foo|bar)).*" if you want automatic discover for all columns but foo and bar).

To create your index and the mapping at the same time:

curl -XPUT 'http://localhost:9200/my_index' -d'{
   "settings":{ "keyspace":"my_keyspace"},
   "mapping":{  "my_table": { "discover":".*" } }
}'

If your table contains data when you create the first index, cassandra trigger an index rebuild (like form cassandra secondary indices) and the compaction manager then build the index. If you have a huge volume of data, as the compaction is a slow background process, it can be quite long (use nodetool compactionstats to monitor it), so, i would recommend to
1-create your CQL schema.
2-create your elasticsearch mapping from the CQL schema with a discover
3-load your data

Finally, any CQL insert/update/delete in your indexed cassandra table will update your elasticsearch index (like with a standrard cassandra secondary index). And if you remove an elasticsearch index, it will not remove your underlying cassandra data, you need to truncate the underlying table.

Hope this help.
Vincent.

from elassandra.

mishka2000 commented on May 23, 2024

Many thanks, that's surely helped. Maybe it worth to add this simple example to Elassandra documentation page ? I bet many people want to evaluate Elassandra and don't have a lot of time to learn how to setup testing environment with it..

My CSV file format captures some filesystem information and is formatted as following (first column is running counter for _id field):

1,2015-06-16 09:38:28,/srv,0,Directory
2,2015-06-16 09:38:28,/srv/www,0,Directory
3,2010-05-05 17:04:57,/srv/www/htdocs,0,Directory
4,2010-05-05 17:04:57,/srv/www/cgi-bin,0,Directory

I had to perform just one command to create Cassandra tables and ElasticSearch indices for each column:

curl -XPUT 'http://localhost:9200/vm_index' -d '{
    "settings": { "keyspace":"vm" },
    "mappings": {
        "fs1" : {
            "properties" : {
                "time"       : {"type":"date", "format":"y-M-d H:m:s", "cql_collection":"singleton", "index":"analyzed"},
                "name"       : {"type":"string", "cql_collection":"singleton", "index":"analyzed"},
                "size"       : {"type":"long", "cql_collection":"singleton", "index":"analyzed"},
                "type"       : {"type":"string", "cql_collection":"singleton", "index":"analyzed"}
            }
        }
    }
}'

After that I’ve used Brian’s cassandra-loader to bulk load the CSV file into Cassandra:

./cassandra-loader -f ./csv.txt -host localhost -dateFormat "y-M-d H:m:s" -schema 'vm.fs1("_id", "time", "name", "size", "type")'

After that I was able to perform full-text search, for example on “name” field:

curl -XGET 'http://localhost:9200/vm_index/_search?pretty=true' -d '{ "query" : { "wildcard" : {"name" : "*platform*"}}}'

The only problem that I still have is that above wildcard searching chokes on forward slashes.. :(

from elassandra.

mishka2000 commented on May 23, 2024

For the last problem - I have to use Path Hierarchy Tokenizer.. :)

from elassandra.

timd112 commented on May 23, 2024

I have a question regarding DELETION of data from Cassandra and how it effects the Elastic Search Indexes. I thought this may be a good place to ask, but can create another issue if you'd like.

When I create an Cassandra table with a Primary Key containing a clustering column, and delete one of the rows in Cassandra, the corresponding index in ElasticSearch is not removed. So I can still query the index, however the _source field is empty because it's no longer in Cassandra. This also affects rows that are dropped/removed due to TTL expiration.

Is this the intended behavior?

When I delete the row from a Cassandra table with a simple primary key, the ElasticSearch index is removed (as I would expect).

Create keyspace and tables:

CREATE KEYSPACE test_ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;

CREATE TABLE test_ks.cass_es_test (
  uid text PRIMARY KEY,
  content text
);
INSERT INTO test_ks.cass_es_test (uid, content) VALUES ('uid1', 'test contents');

CREATE TABLE test_ks.cass_es_test_cluster_key (
  uid text,
  content text,
  PRIMARY KEY (uid, content)
);
INSERT INTO test_ks.cass_es_test_cluster_key (uid, content) VALUES ('uid2', 'test contents cluster key');

Create ES Index:

curl -XPUT 'http://localhost:9200/es_test' -d'
{
  "settings":{ "keyspace":"test_ks"},
  "mappings":{  
     "cass_es_test": { "discover": ".*"},
     "cass_es_test_cluster_key": { "discover": ".*"}
   }
 }'

Query ES Index:

[tdriscoll@xxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "es_test",
      "_type" : "cass_es_test_cluster_key",
      "_id" : "[\"uid2\",\"test contents cluster key\"]",
      "_score" : 1.0,
      "_source":{"uid":"uid2","content":"test contents cluster key"}
    }, {
      "_index" : "es_test",
      "_type" : "cass_es_test",
      "_id" : "uid1",
      "_score" : 1.0,
      "_source":{"uid":"uid1","content":"test contents"}
    } ]
  }
}

Delete Cassandra Rows:

DELETE FROM test_ks.cass_es_test WHERE uid='uid1';
DELETE FROM test_ks.cass_es_test_cluster_key WHERE uid='uid2';

Query ES:

[tdriscoll@xxxxxxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "es_test",
      "_type" : "cass_es_test_cluster_key",
      "_id" : "[\"uid2\",\"test contents cluster key\"]",
      "_score" : 1.0
    } ]
  }
}

from elassandra.

vroyer commented on May 23, 2024

Hi,

There is a issue with delete on compound key.
It require a delete by query in the elasticsearch layer, because you need to delete all wide rows associated to a partition key, and this is not yet implemented.
I will try to add quickly.
Thanks.

Le 11 oct. 2016 à 23:43, Tim Driscoll [email protected] a écrit :

I have a question regarding DELETION of data from Cassandra and how it effects the Elastic Search Indexes. I thought this may be a good place to ask, but can create another issue if you'd like.

When I create an Cassandra table with a Primary Key containing a clustering column, and delete one of the rows in Cassandra, the corresponding index in ElasticSearch is not removed. So I can still query the index, however the _source field is empty because it's no longer in Cassandra.

Is this the intended behavior?

When I delete the row from a Cassandra table with a simple primary key, the ElasticSearch index is removed (as I would expect).

Create keyspace and tables:

CREATE KEYSPACE test_ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;

CREATE TABLE test_ks.cass_es_test (
uid text PRIMARY KEY,
content text
);
INSERT INTO test_ks.cass_es_test (uid, content) VALUES ('uid1', 'test contents');

CREATE TABLE test_ks.cass_es_test_cluster_key (
uid text,
content text,
PRIMARY KEY (uid, content)
);
INSERT INTO test_ks.cass_es_test_cluster_key (uid, content) VALUES ('uid2', 'test contents cluster key');
Create ES Index:

curl -XPUT 'http://localhost:9200/es_test' -d'
{
"settings":{ "keyspace":"test_ks"},
"mappings":{
"cass_es_test": { "discover": "."},
"cass_es_test_cluster_key": { "discover": "."}
}
}'
Query ES Index:

[tdriscoll@xxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "["uid2","test contents cluster key"]",
"_score" : 1.0,
"_source":{"uid":"uid2","content":"test contents cluster key"}
}, {
"_index" : "es_test",
"_type" : "cass_es_test",
"_id" : "uid1",
"_score" : 1.0,
"_source":{"uid":"uid1","content":"test contents"}
} ]
}
}
Delete Cassandra Rows:

DELETE FROM test_ks.cass_es_test WHERE uid='uid1';
DELETE FROM test_ks.cass_es_test_cluster_key WHERE uid='uid2';
Query ES:

[tdriscoll@xxxxxxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "["uid2","test contents cluster key"]",
"_score" : 1.0
} ]
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #44 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmTF6MT2Vb7_GBmIvqunPYjivfXcdks5qzAMCgaJpZM4J5_j8.

from elassandra.

timd112 commented on May 23, 2024

It appears there may be a similar issue with partitioned indexes. I truncated the underlying table, but the indexes still remained (source content was gone, as expected)

 curl -XPUT "http://<host>:9200/test_20161021" -d '{
  "settings": {
      "keyspace":"test_ks",
      "index.partition_function":"toDayIndex test_{0,date,yyyyMMdd} timestamp_utc"
  },
  "mappings": {
      "cass_es_test" : { 
        "properties" : {
          "content":    { "type": "string", "cql_collection": "singleton" },
          "timestamp_utc":    { "type": "date", "cql_collection": "singleton" }
        }
      }
  }
}'

from elassandra.

ddorian commented on May 23, 2024

@timd112 that's already documented in the docs.

from elassandra.

timd112 commented on May 23, 2024

Yeah, I see mention of deleting the old indexes manually. However it seems
a little inconsistent. If i delete an entry from Cassandra, the entry is
removed from the index. However, if the table is truncated (or presumably
TTL'd), the item is still in the index, however, the data is missing.

This is probably a separate issue though. Also, if this particular
scenario is documented, I've completely missed it and apologize.

On Fri, Oct 21, 2016 at 7:57 AM, ddorian [email protected] wrote:

@timd112 https://github.com/timd112 that's already documented in the
docs.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#44 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADWoWjxxXJPojXxrEC4u-5mZgzvPrsagks5q2NLtgaJpZM4J5_j8
.

from elassandra.

vroyer commented on May 23, 2024

Yes, truncating a cassandra table does not involve any update at the elasticsearch layer.
It would involve a delete by query to remove all documents where _type = , but that’s not yet implemented (you should have a message in logs ""truncateBlocking at [{}], not implemented »).

Vincent.

Le 21 oct. 2016 à 16:49, Tim Driscoll [email protected] a écrit :

It appears there may be a similar issue with partitioned indexes. I truncated the underlying table, but the indexes still remained (source content was gone, as expected)

curl -XPUT "http://:9200/test_20161021" -d '{
"settings": {
"keyspace":"test_ks",
"index.partition_function":"toDayIndex test_{0,date,yyyyMMdd} timestamp_utc"
},
"mappings": {
"cass_es_test" : {
"properties" : {
"content": { "type": "string", "cql_collection": "singleton" },
"timestamp_utc": { "type": "date", "cql_collection": "singleton" }
}
}
}
}'
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #44 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmdySpFLERu5vt7LwRd6W2-f-xrEEks5q2NEVgaJpZM4J5_j8.

from elassandra.

vroyer commented on May 23, 2024

Elassandra 2.4.2-5+ now supports CQL TRUNCATE. When truncating a cassandra table, it removes associated documents form the Elasticseach index (it plays a deleteByQuery).

from elassandra.

Indexing data injected from Cassandra [QUESTION] about elassandra HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent