码迷,mamicode.com
首页 > 其他好文 > 详细

读取SequenceFile中自定义Writable类型值

时间:2016-01-21 12:01:23      阅读:200      评论:0      收藏:0      [点我收藏+]

标签:

1)hadoop允许程序员创建自定义的数据类型,如果是key则必须要继承WritableComparable,因为key要参与排序,而value只需要继承Writable就可以了。以下定义一个DoubleArrayWritable,继承自ArrayWritable。代码如下:

 1 package matrix;
 2 import org.apache.hadoop.io.*;
 3 public class DoubleArrayWritable extends ArrayWritable {   
 4       public DoubleArrayWritable(){
 5           super(DoubleWritable.class);
 6       }
 7       public  double[] convert2double(DoubleWritable[] w){
 8           double[] value=new double[w.length];
 9           for (int i = 0; i < value.length; i++) {
10               value[i]=Double.valueOf(w[i].get());
11           }
12           return value;
13       }
14       
15 
16     }
17     
18    

2)以下就是读取tansB.txt文件,将其值转化为DoubleArrayWritable存储到SequenceFile中。

 1 package convert;
 2 
 3 /**
 4  * Created with IntelliJ IDEA.
 5  * User: hadoop
 6  * Date: 16-1-19
 7  * Time: 下午3:09
 8  * To change this template use File | Settings | File Templates.
 9  */
10 import java.io.IOException;
11 import java.net.URI;
12 
13 import org.apache.hadoop.conf.Configuration;
14 import org.apache.hadoop.fs.FileSystem;
15 import org.apache.hadoop.fs.Path;
16 import org.apache.hadoop.io.DoubleWritable;
17 import org.apache.hadoop.io.IOUtils;
18 import org.apache.hadoop.io.IntWritable;
19 import org.apache.hadoop.io.LongWritable;
20 import org.apache.hadoop.io.SequenceFile;
21 import org.apache.hadoop.io.Text;
22 import org.apache.commons.io.FileUtils;
23 import org.apache.commons.io.LineIterator;
24 
25 
26 
27 //import Jama.Matrix.*;
28 //import  java.io.IOException;
29 import java.io.File;
30 
31 //import javax.sound.midi.SysexMessage;
32 public class SequenceFileWriteDemo {
33     public static void main(String[] args) throws IOException {
34         String uri ="/home/hadoop/srcData/bDoubleArraySeq";
35         Configuration conf = new Configuration();
36         FileSystem fs = FileSystem.get(URI.create(uri), conf);
37         Path path = new Path(uri);
38         IntWritable key = new IntWritable();
39         DoubleArrayWritable value = new DoubleArrayWritable();
40         SequenceFile.Writer writer = null;
41         try {
42             writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
43                     value.getClass());
44 
45 
46             final LineIterator it2 = FileUtils.lineIterator(new File("/home/hadoop/srcData/transB.txt"), "UTF-8");
47             try {
48                 int i=0;
49                 String[] strings;
50                 DoubleWritable[] ArrayDoubleWritables;
51                 while (it2.hasNext()) {
52                     ++i;
53                     final String line = it2.nextLine();
54                     key.set(i);
55                     strings=line.split("\t");
56                     ArrayDoubleWritables=new DoubleWritable[strings.length];
57                     for (int j = 0; j < ArrayDoubleWritables.length; j++) {
58                         ArrayDoubleWritables[j] =new DoubleWritable(Double.valueOf(strings[j]));
59                         
60                     }
61                     
62                     value.set(ArrayDoubleWritables);
63                     writer.append(key,value);
64                     //System.out.println("ffd");
65 
66                 }
67             } finally {
68                 it2.close();
69             }
70 
71         }finally {
72             IOUtils.closeStream(writer);
73         }
74         System.out.println("ok");
75 
76     }
77 
78 }

3)将Seq文件上传,然后使用命令查看此Seq文件中的内容:

hadoop fs -text /lz/data/transBSeq

结果提示:

 java.lang.RuntimeException: java.io.IOException: WritableName can‘t load class:matrix.DoubleArrayWritable

4)原因是新定义的Double数组属于第三方包,hadoop不能直接识别,需要将其以上DoubleArrayWritable的源码打成jar包,然后将此jar包的路径在Master端的hadoop-env.sh文件中配置,在其中加入第三方类的位置信息,多个jar包用逗号(,)分割:

export HADOOP_CLASSPATH=/home/hadoop/DoubleArrayWritable.jar;

5)然后,使用hadoop fs -text /lz/data/transBSeq就可以看到文件的内容了。

参考:

http://www.eveningdrum.com/2014/05/04/hadoop%E4%BD%BF%E7%94%A8%E7%AC%AC%E4%B8%89%E6%96%B9%E4%BE%9D%E8%B5%96jar%E5%8C%85/

读取SequenceFile中自定义Writable类型值

标签:

原文地址:http://www.cnblogs.com/lz3018/p/5147724.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!