大数据学习第10天

时间：2019-06-15 10:00:33 阅读：94 评论：0 收藏：0 [点我收藏+]

分布式应用开发，计算向数据移动
思路：
1，客户端干了什么
Job
2，框架干了什么
MapTask
ReduceTask
3，MR语义：
相同的key作为一组调用一次reduce
相同是由排序保证的
具体的比较方法实现产生不同的排序标准
计算向数据移动（理想状态）
数据本地化读取

public class MyWordCount {
public static void main(String[] args) throws Exception {
Configuration conf =new Configuration(true);
Job job=Job.getInstance(conf);
// Create a new Job
job.setJarByClass(MyWordCount.class);
// Specify various job-specific parameters
job.setJobName("MG_wordcount");
// job.setInputPath(new Path("in"));
// job.setOutputPath(new Path("out"));
Path input = new Path("/mg/test/test.text");
Path output = new Path("/mg/output");
// 查询文件是否存在存在就删除
if(output.getFileSystem(conf).exists(output)){
output.getFileSystem(conf).delete(output, true);
}
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);

// Submit the job, then poll for progress until the job is complete
job.waitForCompletion(true);

}
}

public class MyMapper extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context) throws InterruptedException, IOException {
// 确定Key的值(默认按分隔符选取,可自定义)
StringTokenizer itr = new StringTokenizer(value.toString());
// 是否还有下个分隔符
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

大数据学习第10天

标签：大数据不同的默认 more get 不同 map create sum

原文地址：https://www.cnblogs.com/lkoooox/p/11026417.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行