码迷,mamicode.com
首页 > 其他好文 > 详细

hadoop输入格式(InputFormat)

时间:2014-09-28 11:45:51      阅读:188      评论:0      收藏:0      [点我收藏+]

标签:style   blog   color   io   ar   for   sp   div   art   

  InputFormat接口里包括两个方法:getSplits()和createRecordReader(),这两个方法分别用来定义输入分片和读取分片的方法。 

 1 public abstract class InputFormat<K, V> {
 2 
 3   /** 
 4    * Logically split the set of input files for the job.  
 5    * 
 6    * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
 7    * for processing.</p>
 8    *
 9    * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
10    * input files are not physically split into chunks. For e.g. a split could
11    * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat
12    * also creates the {@link RecordReader} to read the {@link InputSplit}.
13    * 
14    * @param context job configuration.
15    * @return an array of {@link InputSplit}s for the job.
16    */
17   public abstract 
18     List<InputSplit> getSplits(JobContext context
19                                ) throws IOException, InterruptedException;
20   
21   /**
22    * Create a record reader for a given split. The framework will call
23    * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
24    * the split is used.
25    * @param split the split to be read
26    * @param context the information about the task
27    * @return a new record reader
28    * @throws IOException
29    * @throws InterruptedException
30    */
31   public abstract 
32     RecordReader<K,V> createRecordReader(InputSplit split,
33                                          TaskAttemptContext context
34                                         ) throws IOException, 
35                                                  InterruptedException;
36 
37 }

 

 

 

 

撒发生

hadoop输入格式(InputFormat)

标签:style   blog   color   io   ar   for   sp   div   art   

原文地址:http://www.cnblogs.com/gwgyk/p/3997734.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!