hbase SingleColumnValueFilter 列不存在无法过滤

时间：2015-03-11 21:11:34 阅读：196 评论：0 收藏：0 [点我收藏+]

标签：

问题描述

对一张log表按时间过滤

正常数据的话,每行有一个时间戳列timestamp作为操作时间,按这个列值过滤出特定时间段的log信息

但是不知怎么的log表中多了一些垃圾数据(不一定是垃圾数据,只是没有timestamp这个字段)。

过滤第一天的话会有5800条没有操作时间(timestamp),

过滤第二天的时候还是有5800条没有操作时间的,

过滤前两天的时候还是5800条。

问题分析

问题很明显了,就是当某一行没有要过滤的字段时,SingleColumnValueFilter是默认这一行符合过滤条件的。

接下来就要让SingleColumnValueFilter在判断的时候把这个策略改改。

查看源码发现是有方法可以更改这个策略的

代码展现

在SingleColumnValueFilter的源码开头的一段注释中(加粗加大的位置)说明了方法

/**
 * This filter is used to filter cells based on value. It takes a {@link CompareFilter.CompareOp}
 * operator (equal, greater, not equal, etc), and either a byte [] value or
 * a ByteArrayComparable.
 * <p>
 * If we have a byte [] value then we just do a lexicographic compare. For
 * example, if passed value is ‘b‘ and cell has ‘a‘ and the compare operator
 * is LESS, then we will filter out this cell (return true).  If this is not
 * sufficient (eg you want to deserialize a long and then compare it to a fixed
 * long value), then you can pass in your own comparator instead.
 * <p>
 * You must also specify a family and qualifier.  Only the value of this column
 * will be tested. When using this filter on a {@link Scan} with specified
 * inputs, the column to be tested should also be added as input (otherwise
 * the filter will regard the column as missing).
 * <p>
 * To prevent the entire row from being emitted if the column is not found
 * on a row, use {@link #setFilterIfMissing}.
 * Otherwise, if the column is found, the entire row will be emitted only if
 * the value passes.  If the value fails, the row will be filtered out.
 * <p>
 * In order to test values of previous versions (timestamps), set
 * {@link #setLatestVersionOnly} to false. The default is true, meaning that
 * only the latest version‘s value is tested and all previous versions are ignored.
 * <p>
 * To filter based on the value of all scanned columns, use {@link ValueFilter}.
 */

更改代码

SingleColumnValueFilter f1 = new SingleColumnValueFilter(Bytes.toBytes(FAMILY), Bytes.toBytes("timestamp"), CompareOp.GREATER_OR_EQUAL, Bytes.toBytes(starttime));
SingleColumnValueFilter f2 = new SingleColumnValueFilter(Bytes.toBytes(FAMILY), Bytes.toBytes("timestamp"), CompareOp.LESS, Bytes.toBytes(endtime));
f1.setFilterIfMissing(true);　　//true 跳过改行;false 通过该行
f2.setFilterIfMissing(true);
filters.add(f1);
filters.add(f2);

反思

一开始打算继承出一个新类,然后重写部分方法,不过好像还是这样更灵活一些

hbase SingleColumnValueFilter 列不存在无法过滤

标签：

原文地址：http://www.cnblogs.com/erbin/p/4330734.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

hbase SingleColumnValueFilter 列不存在 无法过滤

问题描述

问题分析

代码展现

更改代码

反思

hbase SingleColumnValueFilter 列不存在无法过滤