标签:
本系列文章会深入研究 Ceph 以及 Ceph 和 OpenStack 的集成:
(1)安装和部署
(3)Ceph 物理和逻辑结构
(4)Ceph 的基础数据结构
(6)QEMU-KVM 和 Ceph RBD 的 缓存机制总结
(8)基本的性能测试和调优方法
继续学以致用,学习下基本的Ceph性能测试工具和方法。
同 Ceph 的基本操作和常见故障排除方法 一文中的测试环境。
root@ceph1:~# echo 3 > /proc/sys/vm/drop_caches root@ceph1:~# dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=1G count=1 oflag=direct
测试发现,其结果变化非常大,有时候上 75,有时是150.
root@ceph1:~# for i in `mount | grep osd | awk ‘{print $3}‘`; do (dd if=/dev/zero of=$i/deleteme bs=1G count=1 oflag=direct &) ; done
root@ceph1:~# dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=2G count=1 iflag=direct
for i in `mount | grep osd | awk ‘{print $3}‘`; do (dd if=$i/deleteme of=/dev/null bs=1G count=1 iflag=direct &); done
在 ceph1上运行 iperf -s -p 6900,在 ceph2 上运行 iperf -c ceph1 -p 6900,反复多次,两节点之间的带宽大约在 1 Gbits/sec = 128 MB/s。
root@ceph2:~# iperf -c ceph1 -p 6900 ------------------------------------------------------------ Client connecting to ceph1, TCP port 6900 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.56.103 port 41773 connected with 192.168.56.102 port 6900 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.25 GBytes 1.08 Gbits/sec
该工具的语法为:rados bench -p <pool_name> <seconds> <write|seq|rand> -b <block size> -t --no-cleanup
写:
root@ceph1:~# rados bench -p rbd 10 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_ceph1_12884 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 16 0 0 0 - 0 ... 12 15 75 60 19.9671 4 3.05943 2.46556 Total time run: 12.135344 Total writes made: 75 Write size: 4194304 Bandwidth (MB/sec): 24.721 Stddev Bandwidth: 13.5647 Max bandwidth (MB/sec): 36 Min bandwidth (MB/sec): 0 Average Latency: 2.57614 Stddev Latency: 0.781915 Max latency: 4.50816 Min latency: 1.04075
顺序读:
root@ceph1:~# rados bench -p rbd 10 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 16 16 0 0 0 - 0 Total time run: 0.601027 Total reads made: 75 Read size: 4194304 Bandwidth (MB/sec): 499.146 Average Latency: 0.123632 Max latency: 0.209325 Min latency: 0.030446
随机读:
root@ceph1:~# rados bench -p rbd 10 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 3 3 0 0 0 - 0 1 16 138 122 477.298 488 0.01702 0.116519 ... 10 16 1242 1226 488.681 448 0.108589 0.129214 Total time run: 10.092985 Total reads made: 1242 Read size: 4194304 Bandwidth (MB/sec): 492.223 Average Latency: 0.129631 Max latency: 0.297213 Min latency: 0.007133
该工具的语法为:
# rados -p rbd load-gen --num-objects 50 \ #产生的对象数目 --min-object-size 4M \ #最小对象大小 --max-object-size 4M \ #最大对象大小 --max-ops 16 \ #最大操作数目 --min-op-len 4M \ #最小操作长度 --max-op-len 4M \ #最大操作长度 --percent 5 \ #写操作的百分比 --target-throughput 2000 \ #目标吞吐量,单位 MB --run-length 60 #运行时长,单位秒
在 ceph1上运行 rados -p rbd load-gen --percent 5 的结果为:
op 291 completed, throughput=4.87MB/sec
READ : oid=obj-guN0CPHE0KfHzH7 off=292818859 len=1213770
READ : oid=obj-amU99KbGcAMJCN2 off=2249445089 len=1063562
READ : oid=obj-XAHy5Gl60ZyRnC0 off=455072688 len=1022740
WRITE : oid=obj-Y0PuqAicUShMStC off=503699731 len=1000260
WRITE : oid=obj-TdnDOuLOB0X9YSf off=2501861957 len=277815
READ : oid=obj-60XnBIFp4CEbEXj off=2021762322 len=1106889
op 294 completed, throughput=4.88MB/sec
在 client 上运行同样命令的结果为:
op 293 completed, throughput=5MB/sec 54: throughput=4.92MB/sec pending data=0 READ : oid=obj-Q1uD-85wPlEOITm off=2896934498 len=1478411 READ : oid=obj-uqwJvqxZAvOhVEl off=2665994348 len=1666197 READ : oid=obj-AS2Yr0EphkpuwAT off=117521166 len=1146906 READ : oid=obj-P2oaREZHRw4xt-s off=651320312 len=278456 READ : oid=obj-es9s2eMyANDaurQ off=66344631 len=835588 op 296 completed, throughput=4.95MB/sec op 295 completed, throughput=4.97MB/sec
可见,与 rados bench 相比,rados load-gen 的特点是可以产生混合类型的测试负载,而 rados bench 只能产生一种类型的负载。但是不太理解结果结果为什么那么悬殊。
在执行如下命令来准备 Ceph 客户端:
root@client:/var# rbd create bd2 --size 1024 root@client:/var# rbd info --image bd2 rbd image ‘bd2‘: size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rb.0.3841.74b0dc51 format: 1 root@client:/var# rbd map bd2 root@client:/var# rbd showmapped id pool image snap device 1 pool1 bd1 - /dev/rbd1 2 rbd bd2 - /dev/rbd2 root@client:/var# mkfs.xfs /dev/rbd2 log stripe unit (4194304 bytes) is too large (maximum is 256KiB) log stripe unit adjusted to 32KiB meta-data=/dev/rbd2 isize=256 agcount=9, agsize=31744 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=1024 swidth=1024 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@client:/var# mkdir -p /mnt/ceph-bd2 root@client:/var# mount /dev/rbd2 /mnt/ceph-bd2/ root@client:/var# df -h /mnt/ceph-bd2/ Filesystem Size Used Avail Use% Mounted on /dev/rbd2 1014M 33M 982M 4% /mnt/ceph-bd2
rbd bench-write 的语法为:rbd bench-write <RBD image name>,可以带如下参数:
分别在集群 OSD 节点上和客户端上做测试:
(1)在 OSD 节点上做测试
root@ceph1:~# rbd bench-write bd2 --io-total 171997300 bench-write io_size 4096 io_threads 16 bytes 171997300 pattern seq SEC OPS OPS/SEC BYTES/SEC 1 280 273.19 2237969.65 2 574 286.84 2349818.65 ... 71 20456 288.00 2358395.28 72 20763 288.29 2360852.64 elapsed: 72 ops: 21011 ops/sec: 288.75 bytes/sec: 2363740.27
此时 ceph -w 的输出片段:
2016-06-05 07:38:38.654553 mon.0 [INF] pgmap v1729: 272 pgs: 8 stale+active+clean, 264 active+clean; 3323 MB data, 6729 MB used, 3488 MB / 10217 MB avail; 1140 kB/s wr, 570 op/s 2016-06-05 07:38:40.670286 mon.0 [INF] pgmap v1730: 272 pgs: 8 stale+active+clean, 264 active+clean; 3323 MB data, 6735 MB used, 3482 MB / 10217 MB avail; 841 kB/s wr, 420 op/s 2016-06-05 07:38:43.656021 mon.0 [INF] pgmap v1731: 272 pgs: 8 stale+active+clean, 264 active+clean; 3323 MB data, 6742 MB used, 3475 MB / 10217 MB avail; 1219 kB/s wr, 609 op/s
(2)在客户端上做测试
root@client:/home/s1# rbd bench-write bd2 --io-total 171997300
bench-write io_size 4096 io_threads 16 bytes 171997300 pattern seq
SEC OPS OPS/SEC BYTES/SEC
1 263 262.64 2122892.29
2 534 265.89 2041536.83
3 811 269.90 1988839.85
4 1081 269.75 1953255.85
...
91 24857 273.06 1857522.67
92 25126 273.09 1857276.85
elapsed: 92 ops: 25298 ops/sec: 273.06 bytes/sec: 1856564.49
此时集群上 ceph -w 的输出:
2016-06-05 07:31:45.245170 mon.0 [INF] pgmap v1707: 272 pgs: 8 stale+active+clean, 264 active+clean; 3323 MB data, 6728 MB used, 3489 MB / 10217 MB avail; 1106 kB/s wr, 553 op/s
2016-06-05 07:31:48.242721 mon.0 [INF] pgmap v1708: 272 pgs: 8 stale+active+clean, 264 active+clean; 3323 MB data, 6732 MB used, 3485 MB / 10217 MB avail; 1098 kB/s wr, 549 op/s
运行 apt-get install fio 来安装 fio 工具。创建 fio 配置文件:
root@client:/home/s1# cat write.fio [write-4M] description="write test with block size of 4M" ioengine=rbd clientname=admin pool=rbd rbdname=bd2 iodepth=32 runtime=120 rw=write #write 表示顺序写,randwrite 表示随机写,read 表示顺序读,randread 表示随机读 bs=4M
运行 fio 命令,但是出错:
root@client:/home/s1# fio write.fio fio: engine rbd not loadable fio: failed to load engine rbd Bad option <clientname=admin> Bad option <pool=rbd> Bad option <rbdname=bd2> fio: job write-4M dropped fio: file:ioengines.c:99, func=dlopen, error=rbd: cannot open shared object file: No such file or directory
其原因是因为没有安装 fio librbd IO 引擎,因此当前 fio 无法支持 rbd ioengine:
root@client:/home/s1# fio --enghelp Available IO engines: cpuio mmap sync psync vsync pvsync null net netsplice libaio rdma posixaio falloc e4defrag splice sg binject
在运行 apt-get install librbd-dev 命令安装 librbd 后,fio 还是报同样的错误。参考网上资料,下载 fio 代码重新编译 fio:
$ git clone git://git.kernel.dk/fio.git $ cd fio $ ./configure [...] Rados Block Device engine yes [...] $ make
此时 fio 的 ioengine 列表中也有 rbd 了。fio 使用 rbd IO 引擎后,它会读取 ceph.conf 中的配置去连接 Ceph 集群。
下面是 fio 命令和结果:
root@client:/home/s1/fio# ./fio ../write.fio write-4M: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=32 fio-2.11-12-g82e6 Starting 1 process rbd engine: RBD version: 0.1.8 Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/128.0MB/0KB /s] [0/32/0 iops] [eta 00m:00s] write-4M: (groupid=0, jobs=1): err= 0: pid=19190: Sat Jun 4 22:30:00 2016 Description : ["write test with block size of 4M"] write: io=1024.0MB, bw=17397KB/s, iops=4, runt= 60275msec slat (usec): min=129, max=54100, avg=1489.10, stdev=4907.83 clat (msec): min=969, max=15690, avg=7399.86, stdev=1328.55 lat (msec): min=969, max=15696, avg=7401.35, stdev=1328.67 clat percentiles (msec): | 1.00th=[ 971], 5.00th=[ 6325], 10.00th=[ 6325], 20.00th=[ 6521], | 30.00th=[ 6718], 40.00th=[ 7439], 50.00th=[ 7439], 60.00th=[ 7635], | 70.00th=[ 7832], 80.00th=[ 8291], 90.00th=[ 8356], 95.00th=[ 8356], | 99.00th=[14615], 99.50th=[15664], 99.90th=[15664], 99.95th=[15664], | 99.99th=[15664] bw (KB /s): min=245760, max=262669, per=100.00%, avg=259334.50, stdev=6250.72 lat (msec) : 1000=1.17%, >=2000=98.83% cpu : usr=0.24%, sys=0.03%, ctx=50, majf=0, minf=8 IO depths : 1=2.3%, 2=5.5%, 4=12.5%, 8=25.0%, 16=50.4%, 32=4.3%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.0%, 8=0.0%, 16=0.0%, 32=3.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=256/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=1024.0MB, aggrb=17396KB/s, minb=17396KB/s, maxb=17396KB/s, mint=60275msec, maxt=60275msec Disk stats (read/write): sda: ios=0/162, merge=0/123, ticks=0/19472, in_queue=19472, util=6.18%
如果 iodepth = 1 的话,结果是:
root@client:/home/s1# fio/fio write.fio.dep1 write-4M: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=1 fio-2.11-12-g82e6 Starting 1 process rbd engine: RBD version: 0.1.8 Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/8192KB/0KB /s] [0/2/0 iops] [eta 00m:00s] write-4M: (groupid=0, jobs=1): err= 0: pid=19250: Sat Jun 4 22:33:11 2016 Description : ["write test with block size of 4M"] write: io=1024.0MB, bw=20640KB/s, iops=5, runt= 50802msec
不太理解 fio 的结果和 2.3.2 中 rbd bench-write 的结果为什么有大概10倍的差距。具体原因待查。
libaio 是 Linux native asynchronous I/O。
几种测试模式:
这些参数的含义是:
root@client:/home/s1# fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio read-libaio: (g=0): rw=randwrite, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=1 fio-2.11-12-g82e6 Starting 1 thread Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/94302KB/0KB /s] [0/23/0 iops] [eta 00m:00s] read-libaio: (groupid=0, jobs=1): err= 0: pid=20256: Sun Jun 5 10:00:55 2016 write: io=1024.0MB, bw=102510KB/s, iops=25, runt= 10229msec slat (usec): min=342, max=5202, avg=1768.90, stdev=1176.00 clat (usec): min=332, max=165391, avg=38165.11, stdev=27987.64 lat (msec): min=3, max=167, avg=39.94, stdev=28.00 clat percentiles (msec): | 1.00th=[ 8], 5.00th=[ 18], 10.00th=[ 19], 20.00th=[ 20], | 30.00th=[ 22], 40.00th=[ 25], 50.00th=[ 29], 60.00th=[ 31], | 70.00th=[ 36], 80.00th=[ 47], 90.00th=[ 83], 95.00th=[ 105], | 99.00th=[ 123], 99.50th=[ 131], 99.90th=[ 165], 99.95th=[ 165], | 99.99th=[ 165] bw (KB /s): min=32702, max=172032, per=97.55%, avg=99999.10, stdev=36075.23 lat (usec) : 500=0.39% lat (msec) : 4=0.39%, 10=0.39%, 20=21.48%, 50=57.81%, 100=14.45% lat (msec) : 250=5.08% cpu : usr=0.62%, sys=3.65%, ctx=316, majf=0, minf=9 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=256/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: io=1024.0MB, aggrb=102510KB/s, minb=102510KB/s, maxb=102510KB/s, mint=10229msec, maxt=10229msec Disk stats (read/write): sda: ios=0/1927, merge=0/1, ticks=0/30276, in_queue=30420, util=98.71%
工具 | 用途 | 语法 | 说明 |
dd | 磁盘读写性能测试 | dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=direct/dsync/sync | https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd |
iperf | 网络带宽性能测试 | https://iperf.fr/ | |
rados bench | RADOS 性能测试工具 | rados bench -p <pool_name> <seconds> <write|seq|rand> -b <block size> -t --no-cleanup |
|
rados load-gen | RADOS 性能测试工具 |
# rados -p rbd load-gen --num-objects 50 \ #产生的对象数目
--min-object-size 4M \ #最小对象大小
--max-object-size 4M \ #最大对象大小
--max-ops 16 \ #最大操作数目
--min-op-len 4M \ #最小操作长度
--max-op-len 4M \ #最大操作长度
--percent 5 \ #写操作的百分比
--target-throughput 2000 \ #目标吞吐量,单位 MB
--run-length 60 #运行时长,单位秒
|
|
rbd bench-write | ceph 自带的 rbd 性能测试工具 |
rbd bench-write <RBD image name>
|
|
fio + rbd ioengine | fio 结合 rbd IO 引擎的性能测试工具 | 参考 fio --help |
|
fio + libaio | fio 结合 linux aio 的 rbd 性能测试 |
操作 | dd 一个 OSD | dd 两个 OSD | rados bench | rbd bench-write 结果 | ceph tell osd.0 bench | fio + rbd 结果 | fio + libaio 结果 |
顺序写 | 165 | 18 | 18 | 1.3 MB/s |
40 MB/s |
21 (iops 5) | 18(iops 4) |
随机写 | 1.7 MB/s | 19 (iops 4) | 16(iops 4) | ||||
顺序读 | 460 | 130 | 109 | N/A | 111(iops 27) | 111(iops 27) | |
随机读 | 112 | N/A | 115(iops 28) | 128(iops 31) |
注意:由于时间和环境有限,本文不是一个成品,在将来作者会不断更新。
参考链接:
理解 OpenStack + Ceph (8): 基本的性能测试方法
标签:
原文地址:http://www.cnblogs.com/sammyliu/p/5557666.html