标签:
HBASE依托于Hadoop的HDFS作为存储基础,因此结构也很类似于Hadoop的Master-Slave模式,Hbase Master Server 负责管理所有的HRegion Server,但Hbase Master Server本身并不存储HBASE中的任何数据。HBASE逻辑上的Table被定义成为一个Region存储在某一台HRegion Server上,HRegion Server 与Region的对应关系是一对多的关系。每一个HRegion在物理上会被分为三个部分:Hmemcache、Hlog、HStore,分别代表了缓存,日志,持久层。通过一次更新流程来看一下这三部分的作用:
由流程可以看出,提交更新操作将会写入到两部分实体中,HMemcache和Hlog中,HMemcache就是为了提高效率在内存中建立缓存,保证了部分最近操作过的数据能够快速的被读取和修改,Hlog是作为同步Hmemcache和Hstore的事务日志,在HRegion Server周期性的发起Flush Cache命令的时候,就会将Hmemcache中的数据持久化到Hstore中,同时会清空Hmemecache中的数据,这里采用的是比较简单的策略来做数据缓存和同步,复杂一些其实可以参照java的垃圾收集机制来做。
在读取Region信息的时候,优先读取HMemcache中的内容,如果未取到再去读取Hstore中的数据。
几个细节:
1. 由于每一次Flash Cache,就会产生一个Hstore File,在Hstore中存储的文件会越来越多,对性能也会产生一定影响,因此达到设置文件数量阀值的时候就会Merge这些文件为一个大文件。
2. Cache大小的设置以及flush的时间间隔设置需要考虑内存消耗以及对性能的影响。
3. HRegion Server每次重新启动的时候会将Hlog中没有被Flush到Hstore中的数据再次载入到Hmemcache,因此Hmemcache过大对于启动的速度也有直接影响。
4. Hstore File中存储数据采用B-tree的算法,因此也支持了前面提到对于Column同Family数据操作的快速定位获取。
5. HRegion可以Merge也可以被Split,根据HRegion的大小决定。不过在做这些操作的时候HRegion都会被锁定不可使用。
6. Hbase Master Server通过Meta-info Table来获取HRegion Server的信息以及Region的信息,Meta最顶部的一个Region是虚拟的一个叫做Root Region,通过Root Region可以找到下面各个实际的Region。
7. 客户端通过Hbase Master Server获得了Region所在的Region Server,然后就直接和Region Server进行交互,而对于Region Server相互之间不通信,只和Hbase Master Server交互,受到Master Server的监控和管理。
There are three major components of the HBase architecture:
The H!BaseMaster (analogous to the Bigtable master server)
The H!RegionServer (analogous to the Bigtable tablet server)
Each will be discussed in the following sections.
The H!BaseMaster is responsible for assigning regions to H!RegionServers. The first region to be assigned is the ROOT region which locates all the META regions to be assigned. Each META region maps a number of user regions which comprise the multiple tables that a particular HBase instance serves. Once all the META regions have been assigned, the master will then assign user regions to the H!RegionServers, attempting to balance the number of regions served by each H!RegionServer.
It also holds a pointer to the H!RegionServer that is hosting the ROOT region.
The H!BaseMaster also monitors the health of each H!RegionServer, and if it detects a H!RegionServer is no longer reachable, it will split the H!RegionServer‘s write-ahead log so that there is now one write-ahead log for each region that the H!RegionServer was serving. After it has accomplished this, it will reassign the regions that were being served by the unreachable H!RegionServer.
In addition, the H!BaseMaster is also responsible for handling table administrative functions such as on/off-lining of tables, changes to the table schema (adding and removing column families), etc.
Unlike Bigtable, currently, when the H!BaseMaster dies, the cluster will shut down. In Bigtable, a Tabletserver can still serve Tablets after its connection to the Master has died. We tie them together, because we do not currently use an external lock-management system like Bigtable. The Bigtable Master allocates tablets and a lock manager (Chubby) guarantees atomic access by Tabletservers to tablets. HBase uses just a single central point for all H!RegionServers to access: the H!BaseMaster.
The META table stores information about every user region in HBase which includes a H!RegionInfo object containing information such as the start and end row keys, whether the region is on-line or off-line, etc. and the address of the H!RegionServer that is currently serving the region. The META table can grow as the number of user regions grows.
The ROOT table is confined to a single region and maps all the regions in the META table. Like the META table, it contains a H!RegionInfo object for each META region and the location of the H!RegionServer that is serving that META region.
Each row in the ROOT and META tables is approximately 1KB in size. At the default region size of 256MB, this means that the ROOT region can map 2.6 x 105 META regions, which in turn map a total 6.9 x 1010 user regions, meaning that approximately 1.8 x 1019 (264) bytes of user data.
The H!RegionServer is responsible for handling client read and write requests. It communicates with the H!BaseMaster to get a list of regions to serve and to tell the master that it is alive. Region assignments and other instructions from the master "piggy back" on the heart beat messages.
When a write request is received, it is first written to a write-ahead log called a HLog. All write requests for every region the region server is serving are written to the same log. Once the request has been written to the HLog, it is stored in an in-memory cache called the Memcache. There is one Memcache for each HStore.
Reads are handled by first checking the Memcache and if the requested data is not found, the MapFiles are searched for results.
When the Memcache reaches a configurable size, it is flushed to disk, creating a new MapFile and a marker is written to the HLog, so that when it is replayed, log entries before the last flush can be skipped. A flush may also be triggered to relieve memory pressure on the region server.
Cache flushes happen concurrently with the region server processing read and write requests. Just before the new MapFile is moved into place, reads and writes are suspended until the MapFile has been added to the list of active MapFiles for the HStore.
When the number of MapFiles exceeds a configurable threshold, a minor compaction is performed which consolidates the most recently written MapFiles. A major compaction is performed periodically which consolidates all the MapFiles into a single MapFile. The reason for not always performing a major compaction is that the oldest MapFile can be quite large and reading and merging it with the latest MapFiles, which are much smaller, can be very time consuming due to the amount of I/O involved in reading merging and writing the contents of the largest MapFile.
Compactions happen concurrently with the region server processing read and write requests. Just before the new MapFile is moved into place, reads and writes are suspended until the MapFile has been added to the list of active MapFiles for the HStore and the MapFiles that were merged to create the new MapFile have been removed.
When the aggregate size of the MapFiles for an HStore reaches a configurable size (currently 256MB), a region split is requested. Region splits divide the row range of the parent region in half and happen very quickly because the child regions read from the parent‘s MapFile.
The parent region is taken off-line, the region server records the new child regions in the META region and the master is informed that a split has taken place so that it can assign the children to region servers. Should the split message be lost, the master will discover the split has occurred since it periodically scans the META regions for unassigned regions.
Once the parent region is closed, read and write requests for the region are suspended. The client has a mechanism for detecting a region split and will wait and retry the request when the new children are on-line.
When a compaction is triggered in a child, the data from the parent is copied to the child. When both children have performed a compaction, the parent region is garbage collected.
The HBase client is responsible for finding H!RegionServers that are serving the particular row range of interest. On instantiation, the HBase client communicates with the H!BaseMaster to find the location of the ROOT region. This is the only communication between the client and the master.
Once the ROOT region is located, the client contacts that region server and scans the ROOT region to find the META region that will contain the location of the user region that contains the desired row range. It then contacts the region server that is serving that META region and scans that META region to determine the location of the user region.
After locating the user region, the client contacts the region server serving that region and issues the read or write request.
This information is cached in the client so that subsequent requests need not go through this process.
Should a region be reassigned either by the master for load balancing or because a region server has died, the client will rescan the META table to determine the new location of the user region. If the META region has been reassigned, the client will rescan the ROOT region to determine the new location of the META region. If the ROOT region has been reassigned, the client will contact the master to determine the new ROOT region location and will locate the user region by repeating the original process described above.
标签:
原文地址:http://www.cnblogs.com/lvfeilong/p/5646dgfdgfdg.html