Skip to main content

Cassandra Learning Notes

info

In progress

01 Installing Cassandra on Mac OS

Install Python

Mac OS X has a copy of Python preinstalled, but this makes sure you get the newest version.

brew install python

Install cql

To use cqlsh, the Cassandra query language shell, you need to install cql:

pip3 install cql

Install Cassandra

This installs Apache Cassandra:

brew install cassandra

Test the installation:

cqlsh

# output:
# /usr/local/Cellar/cassandra/4.1.0/libexec/bin/cqlsh.py:473: DeprecationWarning: Legacy execution parameters will be removed in 4.0. Consider using execution profiles.
# /usr/local/Cellar/cassandra/4.1.0/libexec/bin/cqlsh.py:503: DeprecationWarning: Setting the consistency level at the session level will be removed in 4.0. Consider using execution profiles and setting the desired consitency level to the EXEC_PROFILE_DEFAULT profile.
# Connected to Test Cluster at 127.0.0.1:9042
# [cqlsh 6.1.0 | Cassandra 4.1.0 | CQL spec 3.4.6 | Native protocol v5]
# Use HELP for help.
# cqlsh>

# 退出 cql 窗口
exit or quit

Starting/Stopping Cassandra

brew services start cassandra

brew services stop cassandra

Cassandra file locations

  • Properties: /usr/local/etc/cassandra
  • Logs: /usr/local/var/log/cassandra
  • Data: /usr/local/var/lib/cassandra/data

Spring Boot application.properties

# Cassandra
spring.data.cassandra.keyspace-name=ned_learning
spring.data.cassandra.contact-points=127.0.0.1
# set up local data center
spring.data.cassandra.local-datacenter=datacenter1
spring.data.cassandra.port=9042

02 Data Model

Internal Structure

Cassandra 的数据模型以列为中心。也就是说,不需要像关系型数据库那样事先定义一个表的所有列,每一行甚至可以包含不同名称的列。

Cassandra 的数据模型由 keyspaces (类似关系型数据库里的 database),column families(类似关系型数据库里的 table), 主键(keys)和列(columns)组成。

对于每一个 column family,不要想象成关系型数据库的表,而要想像成一个多层嵌套的排序散列表(Nested sorted map)。这样能更好地理解和设计 Cassandra 的数据模型。

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

03 Data Partitioning 数据分区

info

"Partitioning is a method of splitting and storing a single logical dataset in multiple databases. By distributing the data among multiple machines, a cluster of database systems can store larger datasets and handle additional requests."

How Sharding Works by Jeeyoung Kim

Primary Key Definition 主键定义

The concept of primary keys is more complex in Cassandra than in traditional databases like MySQL. In Cassandra, the primary key consists of 2 parts:

  • a mandatory partition key. 强制的分区键
  • And, an optional set of clustering columns. 可选的聚集列

Consider the following table:

Table Users | Legend: p - Partition-Key, c - Clustering Column

country (p) | user_email (c) | first_name | last_name | age
----------------------------------------------------------------
US | john@email.com | John | Wick | 55
UK | peter@email.com | Peter | Clark | 65
UK | bob@email.com | Bob | Sandler | 23
UK | alice@email.com | Alice | Brown | 26

Together, the columns user_email and country make up the primary key.

The country column is the partition key (p). The CREATE-statement for the table looks like this:

cqlsh>
CREATE TABLE learn_cassandra.users_by_country (
country text,
user_email text,
first_name text,
last_name text,
age smallint,
PRIMARY KEY ((country), user_email)
);

The first group of the primary key defines the partition key. All other elements of the primary key are clustering columns. 第一组主键定义分区键。主键的所有其他元素都是聚集列。

-- Let’s fill the table  with some data:
cqlsh>
INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
VALUES('US', 'john@email.com', 'John','Wick',55);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
VALUES('UK', 'peter@email.com', 'Peter','Clark',65);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
VALUES('UK', 'bob@email.com', 'Bob','Sandler',23);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
VALUES('UK', 'alice@email.com', 'Alice','Brown',26);

Partitioning is the foundation for scalability, and it is based on the partition key. In this example, partitions are created based on country. All rows with the country US are placed in a partition. All other rows with the country UK will be stored in another partition. 分区是可扩展性的基础,它基于分区键。在此示例中,分区是根据国家/地区创建的。国家/地区为 US 的所有行都放置在分区中。包含 UK 国家/地区的所有其他行将存储在另一个分区中。

In the context of partitioning, the words partition and shard can be used interchangeably. 同义词。

Partitions are created and filled based on partition key values. They are used to distribute data to different nodes. By distributing data to other nodes, you get scalability. You read and write data to and from different nodes by their partition key. 分区是根据分区键值创建和填充的。它们用于将数据分发到不同的节点。通过将数据分发到其他节点,您可以获得可扩展性。您可以通过分区键在不同节点之间读取和写入数据。

The distribution of data is a crucial point to understand when designing applications that store data based on partitions. It may take a while to get fully accustomed to this concept, especially if you are used to relational databases. 在设计基于分区存储数据的应用程序时,数据的分布是理解的关键点。可能需要一段时间才能完全习惯这个概念,特别是如果您习惯了关系数据库。

info

What does horizontal scaling mean?

Horizontal scaling means you can increase throughput by adding more nodes. If your data is distributed to more servers, then more CPU, memory, and network capacity is available. 水平扩展意味着您可以通过添加更多节点来提高吞吐量。

You might ask, then why do you even need email in the primary key?

The answer is that the primary key defines what columns are used to identify rows. You need to add all columns that are required to identify a row uniquely to the primary key. Using only the country would not identify rows uniquely. 如果只用 country 作为主键,就无法区分来自同一个国家的不同用户。这就是为什么 user_email 必须是主键一部分,以确保每一行数据都是独一无二的。

Partitioning Strategies 分区策略

The partition key is vital to distribute data evenly between nodes and essential when reading the data. The previously defined schema is designed to be queried by country because country is the partition key. 分区键对于在节点之间均匀分布数据至关重要,并且在读取数据时至关重要。

-- A query that selects rows by country performs well
cqlsh>
SELECT * FROM learn_cassandra.users_by_country WHERE country='US';

In your cqlsh shell, you will send a request only to a single Cassandra node by default. This is called a consistency level of one, which enables excellent performance and scalability. 默认情况下,您将仅向单个 Cassandra 节点发送请求。这称为一致性级别 1,可实现出色的性能和可扩展性。

If you access Cassandra differently, the default consistency level might not be one.

info

What does consistency level of one mean?

A consistency level of one means that only a single node is asked to return the data. With this approach, you will lose strong consistency guarantees and instead experience eventual consistency. 一致性级别为 1 意味着仅要求单个节点返回数据。使用这种方法,您将失去强大的一致性保证,而是体验最终的一致性。

Bad Query Example

Let's create another table. This one has a partition defined only by the user_email column:

cqlsh>
CREATE TABLE learn_cassandra.users_by_email (
user_email text,
country text,
first_name text,
last_name text,
age smallint,
PRIMARY KEY (user_email)
);

-- Now let’s fill this table with some records
cqlsh>
INSERT INTO learn_cassandra.users_by_email (user_email, country,first_name,last_name,age)
VALUES('john@email.com', 'US', 'John','Wick',55);

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
VALUES('peter@email.com', 'UK', 'Peter','Clark',65);

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
VALUES('bob@email.com', 'UK', 'Bob','Sandler',23);

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
VALUES('alice@email.com', 'UK', 'Alice','Brown',26);

This time, each row is put in its own partition.

This is not bad, per se. If you want to optimize for getting data by email only, it's a good idea:

cqlsh>
SELECT * FROM learn_cassandra.users_by_email WHERE user_email='alice@email.com';

If you set up your table with a partition key for user_email and want to get all users by age, you would need to get the data from all partitions because the partitions were created by user_email.

Talking to all nodes is expensive and can cause performance issues on a large cluster. 与所有节点通信的成本很高,并且可能会导致大型集群上的性能问题。

Cassandra tries to avoid harmful queries. If you want to filter by a column that is not a partition key, you need to tell Cassandra explicitly that you want to filter by a non-partition key column. Cassandra 试图避免有害的查询。如果要按不是分区键的列进行过滤,则需要明确告诉 Cassandra 您要按非分区键列进行过滤。

cqlsh>
SELECT * FROM learn_cassandra.users_by_email WHERE age=26 ALLOW FILTERING;

Without ALLOW FILTERING, the query would not be executed to prevent harm to the cluster by accidentally running expensive queries. 以防止意外运行昂贵的查询对集群造成损害。

Executing queries without conditions (like without a WHERE clause) or with conditions that don’t use the partition key, are costly and should be avoided to prevent performance bottlenecks. 不使用分区键的条件成本高昂,应避免使用以防止出现性能瓶颈。

If you can, partition by a value like country. If you know all the countries, you can then iterate over all available countries, send a query for each one, and collect the results in your application. country 作为分区键可能会使数据分布更加均匀,因为每个国家可能有许多用户。可以通过对每个国家执行单一查询来检索数据,查询次数会减少,因为国家的总数通常远少于用户的总数。

In terms of scalability, it’s worse to just select all rows, because when you use a table partitioned by user_email, all the data is collected in 1 request in a single coordinator. user_email 作为分区键可能导致数据分布不均,因为每个电子邮件地址的数据量可能不一样。需要对每个电子邮件地址分别执行查询,这可能导致大量的网络请求和处理。

info

当使用 user_email 作为分区键时,每个电子邮件地址都会定义一个独立的分区。这意味着,如果你尝试执行一个不带任何过滤条件的查询(例如 SELECT * FROM table;),以期望获取所有的行,那么 Cassandra 需要从每一个包含数据的分区中检索信息。

由于每个 user_email 都是其自己的分区,这样的查询将要求集群的单一协调器 coordinator节点与集群中的每个节点进行通信,以收集全表的数据。这在数据量大时会变得非常低效,因为:

  • 网络开销: 协调器节点需要与集群中的每个节点进行大量的网络交互。
  • 内存和 CPU 资源消耗: 协调器节点必须等待并处理来自每个分区节点的响应,这可能导致资源消耗激增。
  • 读取操作的瓶颈: 所有的读取操作都通过单一的协调器节点,这会成为系统扩展性的瓶颈。

相对而言,如果表是按 country 分区的,假设有一个有限的国家列表,那么查询就可以针对每个国家分别进行,从而分散负载到不同的协调器节点上。这种分散请求的方式减少了任何单个节点的负载,并允许更好地水平扩展,因为处理这些查询的工作可以在多个节点间分摊。

04 Data Replication 数据副本

Replication Strategy 副本策略

Cassandra 在多个节点存储 replica 来确保可靠性和容错性。Replication 策略决定了 replica 保存到哪些节点。

  • replica 的总数由 replication factor 控制。replication factor 1 表示一行数据,只会有一个 replica,被保存在唯一一个节点上。replication factor 2 表示一行数据有两个副本,每个副本保存在不同的节点上。
  • 所有的 replica 同等重要;没有主次 replica 之分。
  • 一般的规则是,replication factor 不应该超过集群里的节点数量。不过,可以先增大 replication factor,再添加更多节点。如果 replication factor 配置超过节点数量,写操作会被拒绝。读操作不受影响。

Cassandra 内置了两种类型的 replication 策略

  • SimpleStrategy: 用于单个数据中心。如果有可能以后会有多个数据中心,应该用 NetworkTopologyStrategy。
  • NetworkTopologyStrategy: 对绝大多数部署方式,都强烈推荐该策略,因为今后的扩展更容易。

关于每个数据中心应该配置几个 replica,一般主要考虑以下两个因素:

  • 保证读操作没有跨数据中心的延时损耗
  • 如何处理硬件故障的情形

对于多数据中心,最常用的两种配置 replica 的策略是:

  • 每个数据中心 2 个 replica:
    • 基于这种配置,对于每个数据中心,即使单个节点故障,还是能够支持 consistency level ONE 的本地读
  • 每个数据中心 3 个 replica:
    • 基于这种配置,对于每个数据中心,即使单个节点故障,还是能够支持 consistency level LOCAL_QUORUM 的本地读.
    • 即使 2 两个节点故障,还是能够支持 consistency level ONE 的本地读

Hands-on Lab

# starting CQL
cqlsh

# create keyspaces with SimpleStrategy
CREATE KEYSPACE simple_keyspace_1
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

# create keyspaces with NetworkTopologyStrategy and a replication factor of 1 for DC-West
CREATE KEYSPACE production_keyspace_1
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC-West': 1};

# create keyspaces with NetworkTopologyStrategy and 1 replica in each datacenter:
CREATE KEYSPACE production_keyspace_2
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC-West': 1, 'DC-East': 1};

# Alter properties of the given keyspace:
ALTER KEYSPACE production_keyspace_2
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC-West': 3, 'DC-East': 5};

How many replicas do you need?

Having 3 replicas per datacenter is a good starting point for relatively small clusters. As the number of nodes in a datacenter becomes larger, a higher replication factor may become a better choice. 对于相对较小的集群来说,每个数据中心有 3 个副本是一个很好的起点。随着数据中心中的节点数量变得越来越大,更高的复制因子可能成为更好的选择。

The number of replicas can affect consistency, availability, latency and throughput. Increasing a replication factor improves availability as it becomes possible to tolerate more replica failures. Also, a larger set of replicas can serve more concurrent requests and result in better response times. 副本数量会影响一致性、可用性、延迟和吞吐量。增加复制因子可以提高可用性,因为可以容忍更多副本故障。此外,更大的副本集可以服务更多并发请求,并获得更好的响应时间。

Consistency Level for Writes 写操作的一致性级别

级别描述用法
ANY任意一个节点写与读操作必须成功。如果所有的 replica 节点都失联了,写操作还是可以在一次 hinted handoff 操作之后,返回成功。如果所有的 replica 节点都失联了,写入的数据,其他的 replica 节点恢复之前,读不到。最小的连贯性保证。并且如果写操作不会成功。相对于其他级别提供最低的一致性和最高的可用性。
ALL写操作必须获得指定的数据模型所有 replica 节点的 commit log 和 memtable。相对于其他级别提供最高的一致性和最低的可用性。
EACH_QUORUM写操作必须获得指定的数据模型所有数据中心的 quorum 数量的 replica 节点的 commit log 和 memtable。用于多数据中心配置且需要维护数据相同级别的一致性。例如,如果你有两个,当一个数据中心出掉时,或者不能满足 quorum 数量的 replica 节点写与读操作成功,写请求将会失败。
LOCAL_ONE在任一个本地数据中心内的 replica 节点写与读操作成功。对于多数据中心的情况,在满足至少一个 replica 节点写与读成功,但是,又不希望有任何跨数据中心的通信。LOCAL_ONE 通常用于这样的需求。
LOCAL_QUORUM本地数据中心内 quorum 数量的 replica 节点写与读操作成功,避免跨数据中心的通信。不适用 SimpleStrategy—一些使用。用于确保本地数据中心内数据的一致性。
LOCAL_SERIAL本地数据中心内 quorum 数量的 replica 节点有条件 (conditionally) 写成功。用于轻量级事务 (lightweight transaction) 下实现 linearizable consistency,避免发生无条件的 (unconditional) 写竞争。
ONE任意一个 replica 节点写与读操作成功。满足最大数量用户的需求。一般来说 coordinator 节点与具体执行的 replica 节点距离最近。

Consistency Level for Reads 读操作的一致性级别

级别描述用法
ALL向所有 replica 节点提交数据,返回所有的 replica 节点的成功响应,timestamp 最新的数据。如果某个 replica 节点响应慢,读操作会失败。相对于其他级别,提供最高的一致性和最低的可用性。
EACH_QUORUM向每个数据中心内 quorum 数量的 replica 节点提交数据,返回向时间戳最新的数据。同 LOCAL_QUORUM。
LOCAL_SERIAL向 SERIAL, 但是只限制为本地数据中心。同 SERIAL。
LOCAL_QUORUM向每个数据中心内 quorum 数量的 replica 节点提交数据,返回向时间戳最新的数据。避免跨数据中心的通信。适用 SimpleStrategy 外的主要。
LOCAL_ONE返回本地数据中心内的任意 coordinator 节点最近的 replica 节点的数据。同写操作 Consistency level 中该级别的用法。
ONE返回 snitch 决定的最近的 replica 节点的结果。模糊情况下,后台会触发 read repair 确保其他 replica 的数据一致。提供最高效的可用性,但是违背的结果不一定最新。
QUORUM选择所有数据中心内 quorum 数量的节点的结果,返回 timestamp 最新的结果。保证最强的一致性,虽然有可能造成延迟。
SERIAL允许读取当前的(包括 uncommitted 的)数据,如果读取的过程中发现 uncommitted 的事务,则 commit 它。容错性最好。
TWO返回两个最近的 replica 的最新数据。和 ONE 类似。
THREE返回三个最近的 replica 的最新数据。和 TWO 类似。

Reference