Comments (10)
Hi,
You can build your elasticsearch mapping from your CQL schema with the discover feature introduced by elassandra (See http://doc.elassandra.io/en/latest/mapping.html#bidirectionnal-mapping ). It will create a mapping for your cassandra table. By default, text column are not analyzed (considered as word), but if you want to index an analyzed column, you can mix automatic discovery and explicit mapping as shown in the doc (use "^(?!(foo|bar)).*" if you want automatic discover for all columns but foo and bar).
To create your index and the mapping at the same time:
curl -XPUT 'http://localhost:9200/my_index' -d'{
"settings":{ "keyspace":"my_keyspace"},
"mapping":{ "my_table": { "discover":".*" } }
}'
If your table contains data when you create the first index, cassandra trigger an index rebuild (like form cassandra secondary indices) and the compaction manager then build the index. If you have a huge volume of data, as the compaction is a slow background process, it can be quite long (use nodetool compactionstats to monitor it), so, i would recommend to
1-create your CQL schema.
2-create your elasticsearch mapping from the CQL schema with a discover
3-load your data
Finally, any CQL insert/update/delete in your indexed cassandra table will update your elasticsearch index (like with a standrard cassandra secondary index). And if you remove an elasticsearch index, it will not remove your underlying cassandra data, you need to truncate the underlying table.
Hope this help.
Vincent.
from elassandra.
Many thanks, that's surely helped. Maybe it worth to add this simple example to Elassandra documentation page ? I bet many people want to evaluate Elassandra and don't have a lot of time to learn how to setup testing environment with it..
My CSV file format captures some filesystem information and is formatted as following (first column is running counter for _id field):
1,2015-06-16 09:38:28,/srv,0,Directory
2,2015-06-16 09:38:28,/srv/www,0,Directory
3,2010-05-05 17:04:57,/srv/www/htdocs,0,Directory
4,2010-05-05 17:04:57,/srv/www/cgi-bin,0,Directory
I had to perform just one command to create Cassandra tables and ElasticSearch indices for each column:
curl -XPUT 'http://localhost:9200/vm_index' -d '{
"settings": { "keyspace":"vm" },
"mappings": {
"fs1" : {
"properties" : {
"time" : {"type":"date", "format":"y-M-d H:m:s", "cql_collection":"singleton", "index":"analyzed"},
"name" : {"type":"string", "cql_collection":"singleton", "index":"analyzed"},
"size" : {"type":"long", "cql_collection":"singleton", "index":"analyzed"},
"type" : {"type":"string", "cql_collection":"singleton", "index":"analyzed"}
}
}
}
}'
After that I’ve used Brian’s cassandra-loader to bulk load the CSV file into Cassandra:
./cassandra-loader -f ./csv.txt -host localhost -dateFormat "y-M-d H:m:s" -schema 'vm.fs1("_id", "time", "name", "size", "type")'
After that I was able to perform full-text search, for example on “name” field:
curl -XGET 'http://localhost:9200/vm_index/_search?pretty=true' -d '{ "query" : { "wildcard" : {"name" : "*platform*"}}}'
The only problem that I still have is that above wildcard searching chokes on forward slashes.. :(
from elassandra.
For the last problem - I have to use Path Hierarchy Tokenizer.. :)
from elassandra.
I have a question regarding DELETION of data from Cassandra and how it effects the Elastic Search Indexes. I thought this may be a good place to ask, but can create another issue if you'd like.
When I create an Cassandra table with a Primary Key containing a clustering column, and delete one of the rows in Cassandra, the corresponding index in ElasticSearch is not removed. So I can still query the index, however the _source field is empty because it's no longer in Cassandra. This also affects rows that are dropped/removed due to TTL expiration.
Is this the intended behavior?
When I delete the row from a Cassandra table with a simple primary key, the ElasticSearch index is removed (as I would expect).
Create keyspace and tables:
CREATE KEYSPACE test_ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE test_ks.cass_es_test (
uid text PRIMARY KEY,
content text
);
INSERT INTO test_ks.cass_es_test (uid, content) VALUES ('uid1', 'test contents');
CREATE TABLE test_ks.cass_es_test_cluster_key (
uid text,
content text,
PRIMARY KEY (uid, content)
);
INSERT INTO test_ks.cass_es_test_cluster_key (uid, content) VALUES ('uid2', 'test contents cluster key');
Create ES Index:
curl -XPUT 'http://localhost:9200/es_test' -d'
{
"settings":{ "keyspace":"test_ks"},
"mappings":{
"cass_es_test": { "discover": ".*"},
"cass_es_test_cluster_key": { "discover": ".*"}
}
}'
Query ES Index:
[tdriscoll@xxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "[\"uid2\",\"test contents cluster key\"]",
"_score" : 1.0,
"_source":{"uid":"uid2","content":"test contents cluster key"}
}, {
"_index" : "es_test",
"_type" : "cass_es_test",
"_id" : "uid1",
"_score" : 1.0,
"_source":{"uid":"uid1","content":"test contents"}
} ]
}
}
Delete Cassandra Rows:
DELETE FROM test_ks.cass_es_test WHERE uid='uid1';
DELETE FROM test_ks.cass_es_test_cluster_key WHERE uid='uid2';
Query ES:
[tdriscoll@xxxxxxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "[\"uid2\",\"test contents cluster key\"]",
"_score" : 1.0
} ]
}
}
from elassandra.
Hi,
There is a issue with delete on compound key.
It require a delete by query in the elasticsearch layer, because you need to delete all wide rows associated to a partition key, and this is not yet implemented.
I will try to add quickly.
Thanks.
Le 11 oct. 2016 à 23:43, Tim Driscoll [email protected] a écrit :
I have a question regarding DELETION of data from Cassandra and how it effects the Elastic Search Indexes. I thought this may be a good place to ask, but can create another issue if you'd like.
When I create an Cassandra table with a Primary Key containing a clustering column, and delete one of the rows in Cassandra, the corresponding index in ElasticSearch is not removed. So I can still query the index, however the _source field is empty because it's no longer in Cassandra.
Is this the intended behavior?
When I delete the row from a Cassandra table with a simple primary key, the ElasticSearch index is removed (as I would expect).
Create keyspace and tables:
CREATE KEYSPACE test_ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true;
CREATE TABLE test_ks.cass_es_test (
uid text PRIMARY KEY,
content text
);
INSERT INTO test_ks.cass_es_test (uid, content) VALUES ('uid1', 'test contents');CREATE TABLE test_ks.cass_es_test_cluster_key (
uid text,
content text,
PRIMARY KEY (uid, content)
);
INSERT INTO test_ks.cass_es_test_cluster_key (uid, content) VALUES ('uid2', 'test contents cluster key');
Create ES Index:curl -XPUT 'http://localhost:9200/es_test' -d'
{
"settings":{ "keyspace":"test_ks"},
"mappings":{
"cass_es_test": { "discover": "."},
"cass_es_test_cluster_key": { "discover": "."}
}
}'
Query ES Index:[tdriscoll@xxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "["uid2","test contents cluster key"]",
"_score" : 1.0,
"_source":{"uid":"uid2","content":"test contents cluster key"}
}, {
"_index" : "es_test",
"_type" : "cass_es_test",
"_id" : "uid1",
"_score" : 1.0,
"_source":{"uid":"uid1","content":"test contents"}
} ]
}
}
Delete Cassandra Rows:DELETE FROM test_ks.cass_es_test WHERE uid='uid1';
DELETE FROM test_ks.cass_es_test_cluster_key WHERE uid='uid2';
Query ES:[tdriscoll@xxxxxxxxxxx ~]$ curl -XGET 'http://localhost:9200/es_test/_search?pretty'
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "es_test",
"_type" : "cass_es_test_cluster_key",
"_id" : "["uid2","test contents cluster key"]",
"_score" : 1.0
} ]
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #44 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmTF6MT2Vb7_GBmIvqunPYjivfXcdks5qzAMCgaJpZM4J5_j8.
from elassandra.
It appears there may be a similar issue with partitioned indexes. I truncated the underlying table, but the indexes still remained (source content was gone, as expected)
curl -XPUT "http://<host>:9200/test_20161021" -d '{
"settings": {
"keyspace":"test_ks",
"index.partition_function":"toDayIndex test_{0,date,yyyyMMdd} timestamp_utc"
},
"mappings": {
"cass_es_test" : {
"properties" : {
"content": { "type": "string", "cql_collection": "singleton" },
"timestamp_utc": { "type": "date", "cql_collection": "singleton" }
}
}
}
}'
from elassandra.
@timd112 that's already documented in the docs.
from elassandra.
Yeah, I see mention of deleting the old indexes manually. However it seems
a little inconsistent. If i delete an entry from Cassandra, the entry is
removed from the index. However, if the table is truncated (or presumably
TTL'd), the item is still in the index, however, the data is missing.
This is probably a separate issue though. Also, if this particular
scenario is documented, I've completely missed it and apologize.
On Fri, Oct 21, 2016 at 7:57 AM, ddorian [email protected] wrote:
@timd112 https://github.com/timd112 that's already documented in the
docs.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#44 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADWoWjxxXJPojXxrEC4u-5mZgzvPrsagks5q2NLtgaJpZM4J5_j8
.
from elassandra.
Yes, truncating a cassandra table does not involve any update at the elasticsearch layer.
It would involve a delete by query to remove all documents where _type = , but that’s not yet implemented (you should have a message in logs ""truncateBlocking at [{}], not implemented »).
Vincent.
Le 21 oct. 2016 à 16:49, Tim Driscoll [email protected] a écrit :
It appears there may be a similar issue with partitioned indexes. I truncated the underlying table, but the indexes still remained (source content was gone, as expected)
curl -XPUT "http://:9200/test_20161021" -d '{
"settings": {
"keyspace":"test_ks",
"index.partition_function":"toDayIndex test_{0,date,yyyyMMdd} timestamp_utc"
},
"mappings": {
"cass_es_test" : {
"properties" : {
"content": { "type": "string", "cql_collection": "singleton" },
"timestamp_utc": { "type": "date", "cql_collection": "singleton" }
}
}
}
}'
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #44 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AJzHmdySpFLERu5vt7LwRd6W2-f-xrEEks5q2NEVgaJpZM4J5_j8.
from elassandra.
Elassandra 2.4.2-5+ now supports CQL TRUNCATE. When truncating a cassandra table, it removes associated documents form the Elasticseach index (it plays a deleteByQuery).
from elassandra.
Related Issues (20)
- Cassandra Starting error
- What to do with Elassandra? HOT 1
- Elassandra Starting Error on cpu.stat
- Is Windows supported ?
- Elasendra Error - shard or keyspace unavailable
- issue after upgrade elassandra from 6.2.3 to 6.8.4
- Unable to create new indices
- Elassandra index on blob
- Difference response for same API request HOT 2
- Inconsistent Data Querying ElasticSearch HOT 1
- Jar Hell when installing ingest-attachment plugin HOT 1
- is the project still active ? HOT 1
- override default Authenticator of Cassandra to PasswordAuthenticator
- cassandra -e is not working while bringing up the sever on windows. -e is not valid argument
- elassandra connectivity problem
- elasticsearch couldn't connect
- Elassandra not working on windows 10
- Run source code error
- elassandra importing cql database but not creating elasticsearch indexes HOT 2
- Elassandra not able to run in Kubernetes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elassandra.