Welcome to the Apache Cassandra® Data Modelling workshop! In this two-hour workshop, we show the methodology to build an effective data model with the distributed NoSQL database Apache Cassandra™
.
Using Astra DB, the cloud based Cassandra-as-a-Service platform delivered by DataStax, we will cover the process for every developer who wants to build an application: list the use cases and build an effective data model.
It doesn't matter if you join our workshop live or you prefer to do at your own pace, we have you covered. In this repository, you'll find everything you need for this workshop:
- Objectives
- Frequently Asked Questions
- Materials for the Session
- Create Your Astra DB Instance
- Tables with Single-Row and Multi-Row Partitions
- Dynamic Bucketing
- Working with Data Types
- KDM Data Modeling Tool
- Sensor Data Modeling
- Homework
- What's NEXT
1️⃣ Understand how data is distributed and organized in Apache Cassandra®
2️⃣ Learn how primary, partition, and clustering keys are defined in Apache Cassandra®
3️⃣ Become familiar with CQL data types in Apache Cassandra®
4️⃣ Learn about the data modeling methodology for Apache Cassandra®
🚀 Have fun with an interactive session
1️⃣ Can I run this workshop on my computer?
There is nothing preventing you from running the workshop on your own machine. If you do so, you will need the following:
- git installed on your local system
2️⃣ What other prerequisites are required?
- You will need enough "real estate" on screen, we will ask you to open a few windows and it would not fit on mobiles (tablets should be OK)
- You will need an Astra account: don't worry, we'll work through that in the following
- As "Intermediate level" we expect you to know what java and Spring are.
3️⃣ Do I need to pay for anything for this workshop?
No. All tools and services we provide here are FREE. FREE not only during the session but also after.
4️⃣ Will I get a certificate if I attend this workshop?
Attending the session is not enough. You need to complete the homework detailed below and you will get a nice badge that you can share on LinkedIn or anywhere else.
It doesn't matter if you join our workshop live or you prefer to work at your own pace, we have you covered. In this repository, you'll find everything you need for this workshop:
ASTRA DB
is the simplest way to run Cassandra with zero operations at all - just push the button and get your cluster. No credit card required, 40M read/write operations and about 80GB storage monthly for free - sufficient to run small production workloads. If you end your credits the databases will pause, no charge
Leveraging Database creation guide create a database. Right-Click the button with Open in a new TAB.
Field | Value |
---|---|
Database Name | workshops |
Keyspace Name | sensor_data |
Regions | Select GOOGLE CLOUD , then an Area close to you, then a region with no LOCKER 🔒 icons, those are the region you can use for free. |
ℹ️ Note: If you already have a database
workshops
, simply add a keyspacesensor_data
using theAdd Keyspace
button on the bottom right hand corner of db dashboard page.
While the database is being created, you will also get a Security token:
save it somewhere safe, as it will be needed to later in other workshops (In particular the string starting with AstraCS:...
.)
⚠️ ImportantThe instructor will show you on screen how to create a token but will have to destroy to token immediately for security reasons.
The status will change from Pending
to Active
when the database is ready, this will only take 2-3 minutes. You will also receive an email when it is ready.
A GitHub account may be required to run this hands-on lab in Gitpod.
✅ Part 1: Tables with Single-Row Partitions
✅ Part 2: Tables with Multi-Row Partitions
Consider the table that supports query Find all sensors in a specified network
:
CREATE TABLE sensors_by_network_2 (
network TEXT,
sensor TEXT,
PRIMARY KEY ((network), sensor)
);
Assume that a network may have none to millions of sensors. With dynamic bucketing, we can introduce artificial buckets to store sensors. A network with a few sensors may only need one bucket. A network with many sensors may need many buckets. Once buckets belonging to a particular network get filled with sensors, we can dynamically assign new buckets to store new sensors of this network.
📘 Implement dynamic bucketing in Astra DB
-- Table to manage buckets
CREATE TABLE buckets_by_network (
network TEXT,
bucket TIMEUUID,
PRIMARY KEY ((network), bucket)
) WITH CLUSTERING ORDER BY (bucket DESC);
-- Table to store sensors
CREATE TABLE sensors_by_bucket (
bucket TIMEUUID,
sensor TEXT,
PRIMARY KEY ((bucket), sensor)
);
-- Sample data
INSERT INTO buckets_by_network (network, bucket) VALUES ('forest-net', 49171ffe-0d12-11ed-861d-0242ac120002);
INSERT INTO buckets_by_network (network, bucket) VALUES ('forest-net', 74a13ede-0d12-11ed-861d-0242ac120002);
INSERT INTO sensors_by_bucket (bucket, sensor) VALUES (49171ffe-0d12-11ed-861d-0242ac120002, 's1001');
INSERT INTO sensors_by_bucket (bucket, sensor) VALUES (49171ffe-0d12-11ed-861d-0242ac120002, 's1002');
INSERT INTO sensors_by_bucket (bucket, sensor) VALUES (74a13ede-0d12-11ed-861d-0242ac120002, 's1003');
📘 Add a new sensor to a network
- Get the latest bucket.
SELECT bucket FROM buckets_by_network WHERE network = 'forest-net' LIMIT 1;
- Check the number of sensors in the bucket.
SELECT COUNT(*) AS sensors
FROM sensors_by_bucket WHERE bucket = 74a13ede-0d12-11ed-861d-0242ac120002;
- Depending on the sensors-per-bucket threshold, insert a new sensor into the existing bucket, or create a new bucket and insert into the new bucket.
INSERT INTO sensors_by_bucket (bucket, sensor) VALUES (74a13ede-0d12-11ed-861d-0242ac120002, 's1004');
📘 Retrieve sensors in a specified network
- Retrieve the buckets
SELECT bucket FROM buckets_by_network WHERE network = 'forest-net';
- Retrieve the sensors
SELECT sensor
FROM sensors_by_bucket
WHERE bucket IN (74a13ede-0d12-11ed-861d-0242ac120002, 49171ffe-0d12-11ed-861d-0242ac120002);
📘 Command to execute
// Definition
CREATE TABLE IF NOT EXISTS table_with_list (
uid uuid,
items list<text>,
PRIMARY KEY (uid)
);
// Insert
INSERT INTO table_with_list(uid,items)
VALUES (c7133017-6409-4d7a-9479-07a5c1e79306, ['a', 'b', 'c']);
// Replace
UPDATE table_with_list SET items = ['d', 'e']
WHERE uid = c7133017-6409-4d7a-9479-07a5c1e79306;
// Show result
SELECT * FROM table_with_list ;
// Append to list
UPDATE table_with_list SET items = items + ['f']
WHERE uid = c7133017-6409-4d7a-9479-07a5c1e79306;
// Replace an element (not available in Astra because read before write)
UPDATE table_with_list SET items[0] = ['g']
WHERE uid = c7133017-6409-4d7a-9479-07a5c1e79306;
📘 Command to execute
// Definition
CREATE TABLE IF NOT EXISTS table_with_set (
uid uuid,
animals set<text>,
PRIMARY KEY (uid)
);
// Insert
INSERT INTO table_with_set(uid,animals)
VALUES (87fad746-4adf-4107-9858-df8643564186, {'spider', 'cat', 'dog'});
// Replace
UPDATE table_with_set SET animals = {'pangolin', 'bat'}
WHERE uid = 87fad746-4adf-4107-9858-df8643564186;
// Show result
SELECT * FROM table_with_set;
// Append to Set
UPDATE table_with_set SET animals = animals + {'sheep'}
WHERE uid = 87fad746-4adf-4107-9858-df8643564186;
📘 Command to execute
// Definition
CREATE TABLE IF NOT EXISTS table_with_map (
uid text,
dictionary map<text, text>,
PRIMARY KEY (uid)
);
// Insert
INSERT INTO table_with_map(uid, dictionary)
VALUES ('fr_en', {'fromage':'cheese', 'vin':'wine', 'pain':'bread'});
// Replace
UPDATE table_with_map SET dictionary = {'saucisse': 'sausage'}
WHERE uid = 'fr_en';
// Show result
SELECT * FROM table_with_map;
// Append to Map
UPDATE table_with_map SET dictionary = dictionary + {'frites':'fries'}
WHERE uid = 'fr_en';
📘 Command to execute
// Definition
CREATE TYPE IF NOT EXISTS udt_address (
street text,
city text,
state text,
);
// Use the UDT in a table
CREATE TABLE IF NOT EXISTS table_with_udt (
uid text,
address udt_address,
PRIMARY KEY (uid)
);
// INSERT (not quote on field names like street)
INSERT INTO table_with_udt(uid, address)
VALUES ('superman', {street:'daily planet',city:'metropolis',state:'CA'});
// Replace
UPDATE table_with_udt
SET address = {street:'pingouin alley',city:'antarctica',state:'melting'}
WHERE uid = 'superman';
// Replace a single field
UPDATE table_with_udt
SET address.state = 'melt'
WHERE uid = 'superman';
📘 Command to execute
// Definition
CREATE TABLE IF NOT EXISTS table_with_counters (
handle text,
following counter,
followers counter,
notifications counter,
PRIMARY KEY (handle)
);
// You have a new follower
UPDATE table_with_counters SET followers = followers + 1
WHERE handle = 'clunven';
// Some counters are... null
SELECT * from table_with_counters;
// Set to 0... but set is not valid
UPDATE table_with_counters
SET following = following + 0, notifications = notifications + 0
WHERE handle = 'clunven';
// Following someone
UPDATE table_with_counters SET following = following + 1
WHERE handle = 'clunven';
// You have a new message
UPDATE table_with_counters SET notifications = notifications + 1
WHERE handle = 'clunven';
✅ Download the project XML file.
✅ Open the KDM tool.
✅ Import the project by selecting Import Project
from the menu and specifying file kdm_sensor_data.xml
.
A GitHub account may be required to run this hands-on lab in Gitpod.
-
Complete Working with Data Types. Take a screenshot of the CQL Console showing the rows in tables
table_with_udt
andtable_with_counters
before and after executing the DELETE statements. -
Complete the mini-course Time Series Data Modeling. Take a screenshot of the final screen of the practice lab, with the console output at the right.
-
Submit your homework and be awarded a nice verifiable badge!
We've just scratched the surface of what you can do using Astra DB, built on Apache Cassandra.
Go take a look at DataStax for Developers to see what else is possible. There's plenty to dig into!
Congratulations: you made to the end of today's workshop.
... and see you at our next workshop!
Sincerely yours, The DataStax Developers