Code Monkey home page Code Monkey logo

cassandra-learn's Introduction

cassandra-learn

Adding notes on cassandra best practices.

Notes

  • CQL does not support aggregation queries like max, min, avg
  • CQL does not support group by, having queries
  • CQL does not support joins
  • CQL does not support OR queries
  • CQL does not support wildcard queries
  • CQL does not support union, intersection queries
  • Table columns cannot be filtered without creating the index
  • Greater than (>) and less than (<) are only supported on clustering column.
  • CQL is not suitable for analytic purposes because it has so many limitations

Keyspace

  • Cassandra keyspace is a namespace that defines how data is replicated on nodes.
  • Typically a cluster has one keyspace per application
  • When creating a keyspace, using the replication strategy class SimpleStrategy for development, in production use NetworkTopologyStrategy

Basics

Get keyspaces info:

SELECT *
FROM system_schema.keyspaces;

Get tables info:

SELECT *
FROM system_schema.tables
WHERE keyspace_name = 'keyspace name';

Get table info:

SELECT *
FROM system_schema.columns
WHERE keyspace_name = 'keyspace name'
AND table_name = 'table name';

Getting number of partitions

Use nodetool to get the Number of partitions:

$ make sh
$ cd /opt/bitnami/cassandra/bin
$ nodetool tablestats hotel.hotels_by_poi;

Output:

Total number of tables: 45
----------------
Keyspace : hotel
        Read Count: 0
        Read Latency: NaN ms
        Write Count: 1
        Write Latency: 3.026 ms
        Pending Flushes: 0
                Table: hotels_by_poi
                SSTable count: 1
                Old SSTable count: 0
                Space used (live): 5086
                Space used (total): 5086
                Space used by snapshots (total): 0
                Off heap memory used (total): 34
                SSTable Compression Ratio: 1.0476190476190477
                Number of partitions (estimate): 1
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 1
                Local read count: 0
                Local read latency: NaN ms
                Local write count: 1
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bytes repaired: 0.000KiB
                Bytes unrepaired: 0.041KiB
                Bytes pending repair: 0.000KiB
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 16
                Bloom filter off heap memory used: 8
                Index summary off heap memory used: 18
                Compression metadata off heap memory used: 8
                Compacted partition minimum bytes: 36
                Compacted partition maximum bytes: 42
                Compacted partition mean bytes: 42
                Average live cells per slice (last five minutes): NaN
                Maximum live cells per slice (last five minutes): 0
                Average tombstones per slice (last five minutes): NaN
                Maximum tombstones per slice (last five minutes): 0
                Dropped Mutations: 0
                Droppable tombstone ratio: 0.00000

----------------

Check partition ranges

$ nodetool ring

Checking how the data is stored in Cassandra

Reference here. Cassandra used to have the cassandra-cli that shows how the data is stored. Before doing this, create a table and insert some data first.

# Flush data to disk.
$ /opt/bitnami/cassandra/bin/nodetool flush

# Go to the directory where the dump is produced
$ cd /bitnami/cassandra/data/data/hotel/hotels_by_poi-a52429b08bdc11eda5eb23123d1849fe
$ ls
'?'        nb-1-big-CompressionInfo.db   nb-1-big-Digest.crc32   nb-1-big-Index.db        nb-1-big-Summary.db
 backups   nb-1-big-Data.db              nb-1-big-Filter.db      nb-1-big-Statistics.db   nb-1-big-TOC.txt


# Run sstabledump
$ /opt/bitnami/cassandra/tools/bin/sstabledump nb-1-big-Data.db

Output:

[
  {
    "partition" : {
      "key" : [ "sunway" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 20,
        "clustering" : [ "1" ],
        "liveness_info" : { "tstamp" : "2023-01-04T03:06:25.710079Z" },
        "cells" : [
          { "name" : "name", "value" : "sunway" },
          { "name" : "phone", "value" : "1234" }
        ]
      }
    ]
  }
]

Use Expand ON

To show the data in better order.

Pagination in Cassandra

There is no offset in Casssandra, and querying data from different partitions might require a different strategy. Pagination in cassandra can be done using pageState.

Questions

  • how to handle pagination (limit/offset) in Cassandra, when there is no offset?
  • how to handle pagination across different partition
  • what is the best strategy when deciding on the partition key (use timestamp for bucketing, but then you need a way to track the partition keys, which can probably be another table)
  • what is static column?
  • best practice for materialized view? can it be used to handle denormalization?
  • how to deal with atomic insert (lightweight transaction) across different tables?

cassandra-learn's People

Contributors

alextanhongpin avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.