hadoop中实现定制Writable类

时间：2015-03-12 20:43:41 阅读：186 评论：0 收藏：0 [点我收藏+]

标签：

Hadoop中有一套Writable实现可以满足大部分需求，但是在有些情况下，我们需要根据自己的需要构造一个新的实现，有了定制的Writable，我们就可以完全控制二进制表示和排序顺序。

为了演示如何新建一个定制的writable类型，我们需要写一个表示一对字符串的实现：

blic class TextPair implements WritableComparable<TextPair> {
    private Text first;
    private Text second;
    
    public TextPair() {
        set(new Text(), new Text());
    }
    
    public TextPair(String first, String second) {
        set(new Text(first), new Text(second));
    }
    
    public TextPair(Text first, Text second) {
        set(first, second);
    }
    
    public void set(Text first, Text second) {
        this.first = first;
        this.second = second;
    }
    
    public Text getFirst() {
        return first;
    }
    
    public Text getScond() {
        return second;
    }
    
    public void write(DataOutput out) throws IOException {
        first.write(out);
        second.write(out);
    }
    
    public void readFields(DataInput in) throws IOException {
        first.readFields(in);
        second.readFields(in);
    }
    
    public int hashCode() {
        return first.hashCode() * 163 + second.hashCode();
    }
    
    public boolean equals(Object o) {
        if(o instanceof TextPair) {
            TextPair tp = (TextPair)o;
            return first.equals(tp.first) && second.equals(tp.second);
        }
        return false;
    }
    
    public String toString() {
        return first + "\t" + second;
    }
    
    public int compareTo(TextPair tp) {
        int cmp = first.compareTo(tp.first);
        if(cmp != 0) {
            return cmp;
        }
        return second.compareTo(tp.second);
    }    
}

为速度实现一个RawComparator

还可以进一步的优化，当作为MapReduce里的key，需要进行比较时，因为他已经被序列化，想要比较他们，那么首先要先反序列化成一个对象，然后再调用compareTo对象进行比较，但是这样效率太低了，有没有可能可以直接比较序列化后的结果呢，答案是肯定的，可以。

我们只需要把EmploeeWritable的序列化后的结果拆成成员对象，然后比较成员对象即可：

class Comparator extends WritableComparator {
    private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
    public Comparator() {
        super(TextPair.class);
    }
    public int compara(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
        try {
            int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
            int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
            int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
            if(cmp != 0) {
                return cmp;
            }
            return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1, b2, s2 + firstL2, l2 -  firstL2);
        } catch(IOException e) {
            throw new IllegalArgumentException(e);
        }
    }
}

定制comparators

有时候，除了默认的comparator，你可能还需要一些自定义的comparator来生成不同的排序队列，看一下下面这个示例，只比较name，两个compare是同一意思，都是比较name大小：

    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
        try {
            int firstL1 = WritableUtils.decodeVIntSize(b1[s1])+ readVInt(b1, s1);
            int firstL2 = WritableUtils.decodeVIntSize(b2[s2])+ readVInt(b2, s2);
            return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
        } catch (IOException e) {
            throw new IllegalArgumentException(e);
        }
    }
    
    public int compare(WritableComparable a, WritableComparable b) {
        if(a instanceof Textpair && b instanceof TextPair) {
            return ((TextPair) a).first.compareTo(((TextPair) b).first);
        }
        return super.compare(a, b);
    }

hadoop中实现定制Writable类

标签：

原文地址：http://www.cnblogs.com/archimedes/p/4332080.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行