一、Hbase基础
1.概念
Hbase是一个在HDFS上开发的面向列分布式数据库,用于实时地随机访问超大规模数据集,它是一个面向列族的存储器。由于调优和存储都是在列族这个层次上进行,最好所有列族的成员都有相同的“访问模式”和大小特征
2.区域
hbase自动把表水平划分“区域”(region)。
每个区域由表中行的子集构。每个区域由它所属于表,它所包含的第一行及其最后一行(不包括这行)来表示
区域是在hbase集群上分布数据最小单位。用这种方式,一个因为太大而无法放在单台服务器的表会被放到服务器集群上,其中每个节点都负责管理表所有区域的一个子集。
3.实现
Hbase主控制(master):负责启动(bootstrap)一个全新安装,把区域分配给注册的regionserver,恢复regionserver的故障。master的负载很轻。
regionsever:负责零个或多个区域的管理以及响应客户端的读写请。还负责区域划分并通知Hbase master有了新的子区域(daughter region),这样主控机就可以把父区域设为离线,并用子区域替换父区域。
Hbase依赖于ZooKeeper。默认情况下,它管理一个ZooKeeper实例,作为集群的“权威机构”(authority)。Hbase负责根目录表(root catalog table)的位置,当前集群主控机地址等重要信息管理。
ZooKeeper上管理分配事务状诚有助于恢复能够从崩溃服务器遗留的状态开始继续分配。
启动一个客户端到HBase集群连接时,客户端必须至少拿到集群所传递的ZooKeeper集合体(ensemble)的位置。这样,客户端才能访问Zookeeper的层次结构,从而了解集群的属性。如服务器位置
4.多种文件系统接口的实现
HBase通过hadoop的文件系统API来持久久化存储数据。
有多种文件系统接口的实现:一种用于本地化文件系统;一种用于KFS文件系统,Amazon S3以及HDFS。多数人使用HDFS作为存储来运行HBase,目前我公司就这样。
5.运行中的HBase
Hbase内部保留名为-ROOT-和.META.的特殊目录表(catalog table)。它们维护着当前集群上所有区域的表,状态和位置。
-ROOT-表:包含.META.表的区域列表。
.META.表:包含所有用户空间区域(user-space region)的列表。
表中的项使用区域名作为键。
区域名由表名,起始行,创建时间戳进行哈希后的结果组成
6.与区域regionsver交互过程
连接到ZooKeeper集群上的客户端首先查找-ROOT-的位置,然后客户端通过-ROOT-获取所请求行所在范围所属.META.区域位置。客户端接着查找.META.区域来获取用户空间区域所在节点及其位置。接着客户端就可直接和管理那个区域的regionserver进行交互了
7.每个行操作可能要访问三次远程节点。节省代价,可利用缓存
客户端会缓存它们遍历-ROOT-时获取的信息和.META位置以有用户空间区域的开始行和结束行。这样,以后不需要访问.META.表也能得知区域存放的位置。当发生错误时--却区域被移动了,客户端会再去查看.META.获取区域新位置。如果.META.区域也被移动了,客户端就去查看-ROOT-
8.regionsever写操作
到达Regionsver的写操作首先被追架到“提交日志”(commit log)中,然后被加入内存的memstore。如果memstore满,它的内容被“涮入”(flush)文件系统
9、regionserver故障恢复
提交日志存放在HDFS,即使一个regionserver崩溃,主控机会根据区域死掉的regionserver的提交日志进行分割。重新分配后,在打开并使用死掉的regionserver上的区域之前,这些区域会找到属于它们的从被分割提交日志中得到文件,其中包含还没有被持久化存储的更新。这些更新会被“重做”(replay)以使区域恢复到服务器失败前状态
10、regionserver读操作
在读时候,首先查区域memstore。如果memstore找到所要征曾版本,查询结束了。否则,按照次序从新到旧松果“涮新文件"(flush file),直到找到满足查询的版本,或所有刷新文件都处理完止。
11、regionsever监控进程
一个后台进程负责在刷新文件个数到达一个阀值时压缩他们
一个独立的进程监控着刷新文件的大小,一旦文件大小超出预先设定的最大值,便对区域进行分割
二、基本环境准备(参考上一章)
1.机器准备
IP地址 主机名 扮演的角色 10.1.2.208 vm13 master 10.1.2.215 vm7 slave 10.1.2.216 vm8 slave
2.系统版本
CentOS release 6.5
3.时间同步
4.防火墙关闭
5.创建hadoop用户和hadoop用户组
6.修改hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.1.2.214 master 10.1.2.215 slave-one 10.1.2.216 slave-two 10.1.2.208 vm13 10.1.2.197 vm7 10.1.2.198 vm8
7.修改文件句柄
8.JVM环境准备
三、hbase安装配置
1.软件准备
hbase-0.94.16.tar.gz jdk1.7.0_25.tar.gz ookeeper-3.4.5.tar.gz
2.解压包
[hadoop@vm13 local]$ ls -ld /usr/local/hbase-0.94.16/ drwxr-xr-x. 11 hadoop hadoop 4096 Jan 13 13:11 /usr/local/hbase-0.94.16/
3.配置文件修改
3.1 hbase-env.sh
# 指定jdk安装目录 export JAVA_HOME=/usr/local/jdk1.7.0_25 # 指定Hadoop配置目录,看需要性 export HBASE_CLASSPATH=/usr/local/hadoop-1.0.4/conf #设置堆的使用量 export HBASE_HEAPSIZE=2000 export HBASE_OPTS="-XX:ThreadStackSize=2048 -XX:+UseConcMarkSweepGC" #额外ssh选,默认22端口 export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR -p 22" #让hbase管理自带zookeeper,默认开启,所以把它关闭 export HBASE_MANAGES_ZK=false
3.2 hbase-site.xml
hbase.rootdir:设置Hbase数据存放目录
hbase.cluster.distributed:启用Hbase分布模式
hbase.maste:指定Hbase master节点
hbase.zookeeper.quorum:指定Zookeeper集群几点,据说必须为奇数
hbase.zookeeper.property.dataDir:Zookeeper的data目录
?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * Copyright 2010 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> <description>The directory shared by RegionServers.</description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description> The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> <property> <name>hbase.zookeeper.quorum</name> <!--<value>dcnamenode1,dchadoop1,dchadoop3,dchbase1,dchbase2</value>--> <value>vm13,vm7,vm8</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/data0/zookeeper</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>32</value> <description>Default : 10. Count of RPC Listener instances spun up on RegionServers. Same property is used by the Master for count of master handlers. </description> </property> <!--memStore flush policy : no blocking writes--> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> <description>Default : 134217728(128MB) Memstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property> <property> <name>hbase.regionserver.maxlogs</name> <value>64</value> <description>Default : 32.</description> </property> <!-- <property> <name>hbase.regionserver.hlog.blocksize</name> <value>67108864</value> <description>Default is hdfs block size, e.g. 64MB/128MB</description> </property> --> <property> <name>hbase.regionserver.optionalcacheflushinterval</name> <value>7200000</value> <description>Default : 3600000(1 hour). Maximum amount of time an edit lives in memory before being automatically flushed. Default 1 hour. Set it to 0 to disable automatic flushing. </description> </property> <!--memStore flush policy : blocking writes--> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>3</value> <description>Default : 2. Block updates if memstore has hbase.hregion.block.memstore time hbase.hregion.flush.size bytes. Useful preventing runaway memstore during spikes in update traffic. Without an upper-bound, memstore fills such that when it flushes the resultant flush files take a long time to compact or split, or worse, we OOME. </description> </property> <property> <name>hbase.regionserver.global.memstore.lowerLimit</name> <value>0.35</value> <description>Default : 0.35. When memstores are being forced to flush to make room in memory, keep flushing until we hit this mark. Defaults to 35% of heap. This value equal to hbase.regionserver.global.memstore.upperLimit causes the minimum possible flushing to occur when updates are blocked due to memstore limiting. </description> </property> <property> <name>hbase.regionserver.global.memstore.upperLimit</name> <value>0.4</value> <description>Default : 0.4. Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap </description> </property> <property> <name>hbase.hstore.blockingStoreFiles</name> <value>256</value> <description>Default : 7. If more than this number of StoreFiles in any one Store (one StoreFile is written per flush of MemStore) then updates are blocked for this HRegion until a compaction is completed, or until hbase.hstore.blockingWaitTime has been exceeded. </description> </property> <!--split policy--> <property> <name>hbase.hregion.max.filesize</name> <value>4294967296</value> <description>Default : 10737418240(10G). Maximum HStoreFile size. If any one of a column families‘ HStoreFiles has grown to exceed this value, the hosting HRegion is split in two. </description> </property> <!--compact policy--> <property> <name>hbase.hregion.majorcompaction</name> <value>0</value> <description>Default : 1 day. The time (in miliseconds) between ‘major‘ compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to disable automated major compactions. </description> </property> <property> <name>hbase.hstore.compactionThreshold</name> <value>5</value> <description>Default : 3. If more than this number of HStoreFiles in any one HStore (one HStoreFile is written per flush of memstore) then a compaction is run to rewrite all HStoreFiles files as one. Larger numbers put off compaction but when it runs, it takes longer to complete. </description> </property> </configuration>
3.3 regionservers
vm7 vm8
3.4将修改的好/usr/local/hbase-0.94.16整个目录复制到vm7,vm8,注意修改好属主属组hadoop权限
3.5事先创建/data0,并属主属组为hadoop
3.6启动hbase
[hadoop@vm13 conf]$ start-hbase.sh
3.7 验证进程是否成功
[hadoop@vm13 conf]$ jps 5591 HMaster 6564 Jps
[root@vm7 data0]# jps 3391 Jps 2766 HRegionServer
[hadoop@vm8 zookeeper]$ jps 2661 HRegionServer 3258 Jps
四、zookeeper
原文地址:http://zouqingyun.blog.51cto.com/782246/1734691