标签:mongodb sharding
这是一种将海量的数据水平扩展的数据库集群系统,数据分别存储在sharding的各个节点上,使用者通过简单的配置就可以很方便地构建一个分布式MongoDB集群。
MongoDB的数据分块成为chunk,每个chunk都是Collection中一段连续的数据记录,通常最大尺寸是200MB,超出则生成新的数据块。
要构建一个MongoDB Sharding Cluster需要以下三个角色:
即存储实际数据的分片,每个Shard可以使一个mongod实例,也可以使一组mongod实例构成的Replica Set,为了实现每个Shard内部的auto-failover,MongoDB官方建立每个Shard为一组Replica Set。
为了将一个特定的collection存储在多个shard中,需要为该collection指定一个shard key,比如{age:1},shard key可以解决该条记录属于哪个chunk,Config Servers就是用来存储:所有shard节点的配置信息、每个chunk的shard key范围、chunk在各shard的分布情况、该集群中所有DB和collection的sharding配置信息。
这是一个前端路由,客户端由此接入,然后询问Config Servers需要在哪个Shard上查询或者保存记录,再连接i相应的Shard进行操作,最后将结构返回给客户端,客户端只需要将原本发给mongod的查询或者更新请求原封不动地发给Routing Process,而不必关系所操作的记录存储在哪个Shard上。
下面在同一台机器上构建一个简单的Sharding Cluster,架构图如下:
、
说明:
Shard Server 1:20000
Shard Server 2: 20001
Config Server : 30000
Route Process: 40000
1、启动Shard Server
[root@localhost ~]# mkdir -p /data/shard/s0
[root@localhost ~]# mkdir -p /data/shard/s1
[root@localhost ~]# mkdir -p /data/shard/log
[root@localhost bin]# ./mongod --shardsvr --port 20000 --dbpath=/data/shard/s0 --fork --logpath=/data/shard/log/s0.log --directoryperdb
about to fork child process, waiting until server is ready for connections.
forked process: 2551
[root@localhost bin]# ./mongod --shardsvr --port 20001 --dbpath=/data/shard/s1 --fork --logpath=/data/shard/log/s1.log --directoryperdb
about to fork child process, waiting until server is ready for connections.
forked process: 2575
2、启动Config Server
[root@localhost bin]# ./mongod --configsvr --port 30000 -dbpath=/data/shard/config --fork --logpath=/data/shard/log/config.log --directoryperdb
about to fork child process, waiting until server is ready for connections.
forked process: 2594
3、启动Route Process
[root@localhost bin]# ./mongos --port 40000 --configdb localhost:30000 --fork --logpath=/data/shard/log/route.log --chunkSize 1
2015-02-11T10:59:07.297+0800 warning: running with 1 config server should be done only for testing purposes and is not recommended for production
about to fork child process, waiting until server is ready for connections.
forked process: 2621
child process started successfully, parent exiting
说明:
mongos启动参数中,chunkSize这一项是用来指定chunk的大小的,单位是MB,默认大小为200MB,为了方便测试Sharding效果,我们把chunkSize指定为1MB。
4、配置Sharding
接下来我们使用MongoDB Shell登录到mongos,添加Shard节点:
[root@localhost bin]# ./mongo admin --port 40000 //此操作需要连接admin库
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:40000/admin
mongos> db.runCommand({addshard:"localhost:20000"})//添加Shard Server
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> db.runCommand({addshard:"localhost:20001"})
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> db.runCommand({enablesharding:"test"}) //设置分片存储的数据库
{ "ok" : 1 }
mongos> db.runCommand({shardcollection:"test.users",key:{_id:1}}) //设置分片的集合名称,且必须指定Shard key,系统会自动创建索引
{ "collectionsharded" : "test.users", "ok" : 1 }
mongos>
5、验证Sharding正常工作
mongos> use test
switched to db test
mongos> for(var i=1;i<=500000;i++) db.users.insert({age:i,name:"xu",addr:"BJ",country:"china"})
WriteResult({ "nInserted" : 1 })
mongos>
mongos> db.users.stats()
{
"sharded" : true, //说明此表已经被shard
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "test.users",
"count" : 500000,
"numExtents" : 16,
"size" : 56000000,
"storageSize" : 75595776,
"totalIndexSize" : 16294768,
"indexSizes" : {
"_id_" : 16294768
},
"avgObjSize" : 112,
"nindexes" : 1,
"nchunks" : 51,
"shards" : {
"shard0000" : { //在此分片实例上约有24.5M数据
"ns" : "test.users",
"count" : 252408,
"size" : 28269696,
"avgObjSize" : 112,
"storageSize" : 37797888,
"numExtents" : 8,
"nindexes" : 1,
"lastExtentSize" : 15290368,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 8225056,
"indexSizes" : {
"_id_" : 8225056
},
"ok" : 1
},
"shard0001" : { //在此分片上实例上约有23.5M数据
"ns" : "test.users",
"count" : 247592,
"size" : 27730304,
"avgObjSize" : 112,
"storageSize" : 37797888,
"numExtents" : 8,
"nindexes" : 1,
"lastExtentSize" : 15290368,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 8069712,
"indexSizes" : {
"_id_" : 8069712
},
"ok" : 1
}
},
"ok" : 1
}
6、管理维护Sharding
列出所有的Shard Server:
[root@localhost bin]# ./mongo admin --port 40000
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:40000/admin
mongos> db.runCommand({listshards:1}) //列出所有的Shard Server
{
"shards" : [
{
"_id" : "shard0000",
"host" : "localhost:20000"
},
{
"_id" : "shard0001",
"host" : "localhost:20001"
}
],
"ok" : 1
}
mongos>
查看Sharding信息:
mongos> printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("54dac57cd9101f94703b77e3")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:20000" }
{ "_id" : "shard0001", "host" : "localhost:20001" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.users
shard key: { "_id" : 1 }
chunks:
shard0001 26
shard0000 25
too many chunks to print, use verbose if you want to force print
mongos>
判断是否是Sharding
mongos> db.runCommand({isdbgrid:1})
{ "isdbgrid" : 1, "hostname" : "localhost.localdomain", "ok" : 1 }
7、对现有的表进行Sharding
刚才我们是对表test.users进行分片了,下面我们将对库中现有的未分片的表test.users_2进行分片处理
表的初始化状态如下,可以看出它没有被分片过:
mongos> db.users_2.stats();
{
"sharded" : false,
"primary" : "config",
"ok" : 0,
"errmsg" : "Collection [admin.users_2] not found."
}
mongos> use admin
switched to db admin
mongos> db.runCommand({shardcollection:"test.users_2",key:{_id:1}})
{ "collectionsharded" : "test.users_2", "ok" : 1 }
mongos>
mongos> use test
switched to db test
mongos> db.users_2.stats();
{
"sharded" : true, //已经分片
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "test.users_2",
"count" : 0,
"numExtents" : 1,
"size" : 0,
"storageSize" : 8192,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"avgObjSize" : 0,
"nindexes" : 1,
"nchunks" : 1,
"shards" : {
"shard0000" : {
"ns" : "test.users_2",
"count" : 0,
"size" : 0,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
},
"ok" : 1
}
mongos>
8、新增Shard Server
刚才演示的是新增分片表,接下来我们演示如何新增Shard Server
启动一个新的Shard Server进程:
[root@localhost bin]# mkdir /data/shard/s2
[root@localhost bin]# ./mongod --shardsvr --port 20002 --dbpath=/data/shard/s2 --fork --logpath=/data/shard/log/s2.log --directoryperdb
about to fork child process, waiting until server is ready for connections.
forked process: 2862
[root@localhost bin]# ./mongo admin --port 40000
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:40000/admin
mongos> db.runCommand({addshard:"localhost:20002"})
{ "shardAdded" : "shard0002", "ok" : 1 }
mongos> printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("54dac57cd9101f94703b77e3")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:20000" }
{ "_id" : "shard0001", "host" : "localhost:20001" }
{ "_id" : "shard0002", "host" : "localhost:20002" } //新增的Shard Server
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.users
shard key: { "_id" : 1 }
chunks:
shard0002 2
shard0001 25
shard0000 24
too many chunks to print, use verbose if you want to force print
test.users_2
shard key: { "_id" : 1 }
chunks:
shard0000 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)
mongos>
查看分片表状态,以验证新Shard Server
mongos> db.users.stats()
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "test.users",
"count" : 500000,
"numExtents" : 23,
"size" : 56000000,
"storageSize" : 98103296,
"totalIndexSize" : 16319296,
"indexSizes" : {
"_id_" : 16319296
},
"avgObjSize" : 112,
"nindexes" : 1,
"nchunks" : 51,
"shards" : {
"shard0000" : {
"ns" : "test.users",
"count" : 173572,
"size" : 19440064,
"avgObjSize" : 112,
"storageSize" : 37797888,
"numExtents" : 8,
"nindexes" : 1,
"lastExtentSize" : 15290368,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 5665968,
"indexSizes" : {
"_id_" : 5665968
},
"ok" : 1
},
"shard0001" : {
"ns" : "test.users",
"count" : 162943,
"size" : 18249616,
"avgObjSize" : 112,
"storageSize" : 37797888,
"numExtents" : 8,
"nindexes" : 1,
"lastExtentSize" : 15290368,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 5330752,
"indexSizes" : {
"_id_" : 5330752
},
"ok" : 1
},
"shard0002" : { //该分片已经有数据了
"ns" : "test.users",
"count" : 163485,
"size" : 18310320,
"avgObjSize" : 112,
"storageSize" : 22507520,
"numExtents" : 7,
"nindexes" : 1,
"lastExtentSize" : 11325440,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 5322576,
"indexSizes" : {
"_id_" : 5322576
},
"ok" : 1
}
},
"ok" : 1
}
说明:
我们可以发现,当我们新增Shard Server后数据自动分布到了新Shard上,这是由MongoDB内部自己实现的。
9、移除Shard Server
有些时候由于硬件资源有限,所以我们不得不进行一些回收工作,下面我们就要将刚刚启用的Shard Server回收,系统首先会将在这个即将被移除的Shard Server上的数据先平均分配到其他的Shard Server上,然后最终在将这个Shard Server踢下线,我们需要不停的调用db.runCommand({"removeshard":"localhost:20002"})来观察这个移除操作进行到哪里了:
[root@localhost bin]# ./mongo admin --port 40000
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:40000/admin
mongos> use admin
switched to db admin
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "draining started successfully",
"state" : "started", //开始状态
"shard" : "shard0002",
"ok" : 1
}
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(13),
"dbs" : NumberLong(0)
},
"ok" : 1
}
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(7),
"dbs" : NumberLong(0)
},
"ok" : 1
}
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(3),
"dbs" : NumberLong(0)
},
"ok" : 1
}
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(2),
"dbs" : NumberLong(0)
},
"ok" : 1
}
mongos> db.runCommand({"removeshard":"localhost:20002"});
{
"msg" : "removeshard completed successfully",
"state" : "completed",
"shard" : "shard0002",
"ok" : 1
}
最终移除后,当我们再次调用db.runCommand({"removeshard":"localhost:20002"})的时候系统会报错,以便通知我们不存在20002这个端口的Shard Server了,因为它已经被移除掉了。
接下来我们看看sharding的状态:
mongos> use admin
switched to db admin
mongos> printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("54dac57cd9101f94703b77e3")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:20000" }
{ "_id" : "shard0001", "host" : "localhost:20001" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.users
shard key: { "_id" : 1 }
chunks:
shard0000 26
shard0001 25
too many chunks to print, use verbose if you want to force print
test.users_2
shard key: { "_id" : 1 }
chunks:
shard0000 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)
mongos>
发现端口20002的分片已经被删除掉。
第五部分 架构篇 第二十一章 MongoDB Sharding 架构(实践)
标签:mongodb sharding
原文地址:http://blog.csdn.net/xuzheng_java/article/details/43731201