cassandra-learn

Adding notes on cassandra best practices.

Notes

CQL does not support aggregation queries like max, min, avg
CQL does not support group by, having queries
CQL does not support joins
CQL does not support OR queries
CQL does not support wildcard queries
CQL does not support union, intersection queries
Table columns cannot be filtered without creating the index
Greater than (>) and less than (<) are only supported on clustering column.
CQL is not suitable for analytic purposes because it has so many limitations

Keyspace

Cassandra keyspace is a namespace that defines how data is replicated on nodes.
Typically a cluster has one keyspace per application
When creating a keyspace, using the replication strategy class SimpleStrategy for development, in production use NetworkTopologyStrategy

Basics

Get keyspaces info:

SELECT *
FROM system_schema.keyspaces;

Get tables info:

SELECT *
FROM system_schema.tables
WHERE keyspace_name = 'keyspace name';

Get table info:

SELECT *
FROM system_schema.columns
WHERE keyspace_name = 'keyspace name'
AND table_name = 'table name';

Getting number of partitions

Use nodetool to get the Number of partitions:

$ make sh
$ cd /opt/bitnami/cassandra/bin
$ nodetool tablestats hotel.hotels_by_poi;

Output:

Total number of tables: 45
----------------
Keyspace : hotel
        Read Count: 0
        Read Latency: NaN ms
        Write Count: 1
        Write Latency: 3.026 ms
        Pending Flushes: 0
                Table: hotels_by_poi
                SSTable count: 1
                Old SSTable count: 0
                Space used (live): 5086
                Space used (total): 5086
                Space used by snapshots (total): 0
                Off heap memory used (total): 34
                SSTable Compression Ratio: 1.0476190476190477
                Number of partitions (estimate): 1
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 1
                Local read count: 0
                Local read latency: NaN ms
                Local write count: 1
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bytes repaired: 0.000KiB
                Bytes unrepaired: 0.041KiB
                Bytes pending repair: 0.000KiB
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 16
                Bloom filter off heap memory used: 8
                Index summary off heap memory used: 18
                Compression metadata off heap memory used: 8
                Compacted partition minimum bytes: 36
                Compacted partition maximum bytes: 42
                Compacted partition mean bytes: 42
                Average live cells per slice (last five minutes): NaN
                Maximum live cells per slice (last five minutes): 0
                Average tombstones per slice (last five minutes): NaN
                Maximum tombstones per slice (last five minutes): 0
                Dropped Mutations: 0
                Droppable tombstone ratio: 0.00000

----------------

Check partition ranges

$ nodetool ring

Checking how the data is stored in Cassandra

Reference here. Cassandra used to have the cassandra-cli that shows how the data is stored. Before doing this, create a table and insert some data first.

# Flush data to disk.
$ /opt/bitnami/cassandra/bin/nodetool flush

# Go to the directory where the dump is produced
$ cd /bitnami/cassandra/data/data/hotel/hotels_by_poi-a52429b08bdc11eda5eb23123d1849fe
$ ls
'?'        nb-1-big-CompressionInfo.db   nb-1-big-Digest.crc32   nb-1-big-Index.db        nb-1-big-Summary.db
 backups   nb-1-big-Data.db              nb-1-big-Filter.db      nb-1-big-Statistics.db   nb-1-big-TOC.txt


# Run sstabledump
$ /opt/bitnami/cassandra/tools/bin/sstabledump nb-1-big-Data.db

Output:

[
  {
    "partition" : {
      "key" : [ "sunway" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 20,
        "clustering" : [ "1" ],
        "liveness_info" : { "tstamp" : "2023-01-04T03:06:25.710079Z" },
        "cells" : [
          { "name" : "name", "value" : "sunway" },
          { "name" : "phone", "value" : "1234" }
        ]
      }
    ]
  }
]

Use Expand ON

To show the data in better order.

Pagination in Cassandra

There is no offset in Casssandra, and querying data from different partitions might require a different strategy. Pagination in cassandra can be done using pageState.

Questions

how to handle pagination (limit/offset) in Cassandra, when there is no offset?
how to handle pagination across different partition
what is the best strategy when deciding on the partition key (use timestamp for bucketing, but then you need a way to track the partition keys, which can probably be another table)
what is static column?
best practice for materialized view? can it be used to handle denormalization?
how to deal with atomic insert (lightweight transaction) across different tables?

alextanhongpin / cassandra-learn Goto Github PK

cassandra-learn's Introduction

cassandra-learn

Notes

Basics

Getting number of partitions

Check partition ranges

Checking how the data is stored in Cassandra

Use Expand ON

Pagination in Cassandra

Questions

cassandra-learn's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent