HBase如何选取split point

时间：2015-06-07 20:09:57 阅读：101 评论：0 收藏：0 [点我收藏+]

标签：

hbase region split操作的一些细节，具体split步骤很多文档都有说明，本文主要关注regionserver如何选取split point

首先推荐web ui查看hbase region分布的一个开源工具hannibal，建议用daemontool管理hannibal意外退出，自动重启，之前博文写了博文介绍如何使用daemontool管理

假设有一张hbase的table如下表所示，有一个region的大小比较大，可以对这个region进行手动split操作

技术分享

HBase的物理存储树状图如下

Table       (HBase table)
    Region       (Regions for the table)
         Store          (Store per ColumnFamily for each Region for the table)
              MemStore        (MemStore for each Store for each Region for the table)
              StoreFile       (StoreFiles for each Store for each Region for the table)
                    Block     (Blocks within a StoreFile within a Store for each Region for the table)

一种常见的分裂策略是:ConstantSizeRegionSplitPolicy，配置hbase.hregion.max.filesize是指某个store(对应一个column family)的大小

/<hdfs-dir>/<hbasetable>/<xxx(part of region-id)>/<columu-family>

memstore flush到store files时，或者多个store files compact操作时候，会判断是否需要split。

找到最大且不包含reference的store,在这个store下面找到最大的storefile，然后用这个storefile的中间rowkey作为split的点。

RegionSplitPolicy.java
Iterator i$ = stores.values().iterator();

while(i$.hasNext()) {
  Store s = (Store)i$.next();
  byte[] splitPoint = s.getSplitPoint();
  long storeSize = s.getSize();
  if(splitPoint != null && largestStoreSize < storeSize) {
    splitPointFromLargestStore = splitPoint;
    largestStoreSize = storeSize;
  }
}

Store.java

public byte[] getSplitPoint() {
    long e = 0L;
    StoreFile largestSf = null;
    Iterator r = this.storefiles.iterator();

    StoreFile midkey;
    while (r.hasNext()) {
      midkey = (StoreFile) r.next();
      org.apache.hadoop.hbase.regionserver.StoreFile.Reader mk;
      if (midkey.isReference()) {
        assert false : "getSplitPoint() called on a region that can\‘t split!";

        mk = null;
        return (byte[]) mk;
      }

      mk = midkey.getReader();
      if (mk == null) {
        LOG.warn("Storefile " + midkey + " Reader is null");
      } else {
        long fk = mk.length();
        if (fk > e) {
          e = fk;
          largestSf = midkey;
        }
      }
    }

    org.apache.hadoop.hbase.regionserver.StoreFile.Reader r1 = largestSf.getReader();
    if (r1 == null) {
      LOG.warn("Storefile " + largestSf + " Reader is null");
      midkey = null;
      return (byte[]) midkey;
    }

    byte[] midkey1 = r1.midkey();
    //...略
}

所以split实际上并不是完全的等分，因为split point不一定是数据分布的中位点。

参考：

http://blog.javachen.com/2014/01/16/hbase-region-split-policy.html

http://www.cnblogs.com/niurougan/articles/3975463.html

http://hbase.group.iteye.com/group/topic/40359

HBase如何选取split point

标签：

原文地址：http://www.cnblogs.com/yanghuahui/p/4558910.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行