标签:
在0.98版本中,默认的compaction算法换成了ExploringCompactionPolicy,之前是RatioBasedCompactionPolicy
ExploringCompactionPolicy继承RatioBasedCompactionPolicy,重写了applyCompactionPolicy方法,applyCompactionPolicy是对minor compaction的选择文件的策略算法。
applyCompactionPolicy方法内容:
public List<StoreFile> applyCompactionPolicy(final List<StoreFile> candidates,
boolean mightBeStuck, boolean mayUseOffPeak, int minFiles, int maxFiles) {
//此ratio为后面算法使用,可设置非高峰时间段的ratio(默认:5.0)从而合并更多的数据
final double currentRatio = mayUseOffPeak
? comConf.getCompactionRatioOffPeak() : comConf.getCompactionRatio();
// Start off choosing nothing.
List<StoreFile> bestSelection = new ArrayList<StoreFile>(0);
List<StoreFile> smallest = mightBeStuck ? new ArrayList<StoreFile>(0) : null;
long bestSize = 0;
long smallestSize = Long.MAX_VALUE;
int opts = 0, optsInRatio = 0, bestStart = -1; // for debug logging
// Consider every starting place.
for (int start = 0; start < candidates.size(); start++) {
// Consider every different sub list permutation in between start and end with min files.
for (int currentEnd = start + minFiles - 1;
currentEnd < candidates.size(); currentEnd++) {
List<StoreFile> potentialMatchFiles = candidates.subList(start, currentEnd + 1);
// Sanity checks
if (potentialMatchFiles.size() < minFiles) {
continue;
}
if (potentialMatchFiles.size() > maxFiles) {
continue;
}
// Compute the total size of files that will
// have to be read if this set of files is compacted.
long size = getTotalStoreSize(potentialMatchFiles);
// Store the smallest set of files. This stored set of files will be used
// if it looks like the algorithm is stuck.
if (mightBeStuck && size < smallestSize) {
smallest = potentialMatchFiles;
smallestSize = size;
}
if (size > comConf.getMaxCompactSize()) {
continue;
}
++opts;
if (size >= comConf.getMinCompactSize()
&& !filesInRatio(potentialMatchFiles, currentRatio)) {
continue;
}
++optsInRatio;
if (isBetterSelection(bestSelection, bestSize, potentialMatchFiles, size, mightBeStuck)) {
bestSelection = potentialMatchFiles;
bestSize = size;
bestStart = start;
}
}
}
if (bestSelection.size() == 0 && mightBeStuck) {
LOG.debug("Exploring compaction algorithm has selected " + smallest.size()
+ " files of size "+ smallestSize + " because the store might be stuck");
return new ArrayList<StoreFile>(smallest);
}
LOG.debug("Exploring compaction algorithm has selected " + bestSelection.size()
+ " files of size " + bestSize + " starting at candidate #" + bestStart +
" after considering " + opts + " permutations with " + optsInRatio + " in ratio");
return new ArrayList<StoreFile>(bestSelection);
从代码得知,主要算法如下:
private boolean filesInRatio(final List<StoreFile> files, final double currentRatio) {
if (files.size() < 2) {
return true;
}
long totalFileSize = getTotalStoreSize(files);
for (StoreFile file : files) {
long singleFileSize = file.getReader().length();
long sumAllOtherFileSizes = totalFileSize - singleFileSize;
if (singleFileSize > sumAllOtherFileSizes * currentRatio) {
return false;
}
}
return true;
} private boolean isBetterSelection(List<StoreFile> bestSelection,
long bestSize, List<StoreFile> selection, long size, boolean mightBeStuck) {
if (mightBeStuck && bestSize > 0 && size > 0) {
// Keep the selection that removes most files for least size. That penaltizes adding
// large files to compaction, but not small files, so we don‘t become totally inefficient
// (might want to tweak that in future). Also, given the current order of looking at
// permutations, prefer earlier files and smaller selection if the difference is small.
final double REPLACE_IF_BETTER_BY = 1.05;
double thresholdQuality = ((double)bestSelection.size() / bestSize) * REPLACE_IF_BETTER_BY;
return thresholdQuality < ((double)selection.size() / size);
}
// Keep if this gets rid of more files. Or the same number of files for less io.
return selection.size() > bestSelection.size()
|| (selection.size() == bestSelection.size() && size < bestSize);
} 主要算法至此结束,下面说说其他细节及其优化部分:
步骤6的ratio默认值是1.2,但是打开了非高峰时间段的优化时,可以有不同的值,非高峰的ratio默认值是5.0,此优化目的是为了在业务低估时可以合并更多的数据,目前此优化只能是天的小说时间段,还不算灵活。
算法中关于mightBeStuck的逻辑部分,这个参数是用来表示是否有可能compaction会被卡住,它的状态是 待选文件数 - 正在做compaction的文件数 + futureFiles(默认值是0,有正在做compaction的文件时是1) >= hbase.hstore.blockingStoreFiles (默认是10,此配置在flush中也会用到,以后分析flush的时候会补充),如果是true时:
mightBeStuck的优化部分,相当于保证在很多的文件数的情况下,也可以选出一个最小解去做compaction,而不用再让文件继续增长下去直到有一个合适的组合出现。
此算法跟RatioBasedCompactionPolicy的区别,简单的说就是RatioBasedCompactionPolicy是简单的从头到尾遍历StoreFile列表,遇到一个符合Ratio条件的序列就选定执行Compaction。而ExploringCompactionPolicy则是从头到尾遍历的同时记录下当前最优,然后从中选择一个全局最优列表。
HBase Compaction算法之ExploringCompactionPolicy
标签:
原文地址:http://my.oschina.net/u/220934/blog/363270