Apache Cassandra随笔之多节点跨数据中心集群配置以及日常操作

时间：2019-09-19 21:56:53 阅读：188 评论：0 收藏：0 [点我收藏+]

Cassandra是去中心化的集群架构，没有传统集群的中心节点，各个节点地位都是平等的，通过Gossip协议维持集群中的节点信息。为了使集群中的各节点在启动时能发现其他节点，需要指定种子节点（seeds），各节点都先和种子节点通信，通过种子节点获取其他节点列表，然后和其他节点通信。种子节点可以指定多个，通过在 conf/ cassandra.yaml中的seeds属性配置。

环境介绍

主机信息如下表所示：
技术图片
所有节点已安装了jdk 8。如下：

[root@db03 ~]# java -version
java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)

安装cassandra

这里使用的是二进制rpm包进行安装。在各个节点创建yum仓库，内容如下：

[root@db03 ~]# vi /etc/yum.repos.d/cass.repo
[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS

然后通过yum命令在各个节点进行安装：

[root@db03 ~]# yum -y install cassandra

编辑cassandra配置文件

更改各个节点的配置文件内容如下：

[root@db03 ~]# vi /etc/cassandra/default.conf/cassandra.yaml
cluster_name: ‘TCS01‘
num_tokens: 256
    seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
    - seeds:  "192.168.120.83,192.168.120.85"
listen_address:192.168.120.83
endpoint_snitch: GossipingPropertyFileSnitch
start_rpc: true
rpc_address: 192.168.120.83

其中，db04、db05以及db06需要更改listen_address以及rpc_address，将其设置为本机的IP,其他参数保持和db03一致。
endpoint_snitch 对于跨数据中心的集群，此参数的值必须为GossipingPropertyFileSnitch；如果为SimpleSnitch，所有节点都会加入一个数据中心。
配置节点的datacenter名称

编辑cassandra-rackdc.properties文件，设置dc参数，如下：
```
[root@db03 ~]# vi /etc/cassandra/default.conf/cassandra-rackdc.properties
dc=dc1
rack=rack1
```
根据之前的规划，db03和db04属于dc1；db05和db06属于dc2。

启动cassandra服务

先启动种子节点，再启动其他支节点。

启动种子节点

[root@db03 ~]# systemctl enable cassandra
[root@db03 ~]# systemctl start cassandra
[root@db05 ~]# systemctl enable cassandra
[root@db05 ~]# systemctl start cassandra

启动支节点

[root@db04 ~]# systemctl enable cassandra
[root@db04 ~]# systemctl start cassandra
[root@db06 ~]# systemctl enable cassandra
[root@db06 ~]# systemctl start cassandra

验证节点状态信息

cassandra提供了nodetool命令，可以查看集群节点的状态信息，如下：

[root@db03 ~]# nodetool status

技术图片

管理keyspace

键空间(Keyspace)是用于保存列族，用户定义类型的对象。键空间(Keyspace)就像RDBMS中的数据库，其中包含列族，索引，用户定义类型，数据中心意识，键空间(Keyspace)中使用的策略，复制因子等。
查看系统中默认存在的keyspace：

[root@db03 ~]# cqlsh 192.168.120.83
Connected to TCS01 at 192.168.120.83:9042.
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> desc keyspaces;

system_traces  system_schema  system_auth  system  system_distributed

创建keyspace：

cqlsh> CREATE KEYSPACE spacewalk WITH replication = {‘class‘:‘SimpleStrategy‘, ‘replication_factor‘ : 4};
cqlsh> desc keyspaces;

system_schema  system_auth  spacewalk  system  system_distributed  system_traces

cqlsh>

要删除自定义的keyspace，使用下面的命令:

cqlsh> drop keyspace spacewalk;

管理表

在spacewalk键空间上创建表以及导入数据：

创建表

cqlsh:spacewalk> desc tables;
rhnpackagecapability

导入数据

cqlsh:spacewalk> copy rhnpackagecapability(id,name,version,created,modified) from ‘/tmp/d.csv‘ with delimiter=‘,‘ and header=false;

技术图片

删除表
```
cqlsh:spacewalk> drop table rhnpackagecapability;
```
问题总结

在导入数据过程中，会遇到各种各样的报错，下面是我遇到的两种问题：

错误处理1(大于字段限制)

<stdin>:1:Failed to import 5000 rows: Error - field larger than field limit (131072),  given up after 1 attempts

创建cqlshrc文件：

[root@db03 ~]# cp /etc/cassandra/default.conf/cqlshrc.example  ~/.cassandra/cqlshrc
[root@db03 ~]# vi ~/.cassandra/cqlshrc
[csv]
--加大filed_size_limit的值，默认为131072
field_size_limit = 13107200000

错误处理2

Failed to import 20 rows: InvalidRequest - Error from server: code=2200 [Invalid query] message="Batch too large",  will retry later, attempt 1 of 5

编辑cassandra.yaml文件,加大batch_size_fail_threshold_in_kb参数值，比如5120。然后在copy后面加maxbatchsize=1 and minbatchsize=1，如下：

cqlsh> copy mykeysp01.rhnpackagerepodata(id,primary_xml,filelist,other,created,modified) from ‘/u02/tmp/rhnpackagerepodata.csv‘ with maxbatchsize=1 and minbatchsize=1;

技术图片