标签:单表关联
1.数据样例如下
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma
2.map的代码如下:
public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentMapper.class);
String childname = new String();
String parientname = new String();
String flag = new String();//左右表标识符
@Override
public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
String str[] = ivalue.toString().split(" ");//分割出子和父的名称
if (str[0].compareTo("child") != 0) {//忽略表头
childname = str[0];//得到子名称
parientname = str[1];//得到父名称
// 左表=左表标识+子名称+父名称
flag = "1";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
// 右表=右表标识+子名称+父名称
flag = "2";
logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
}
}
}
代码解析:
第一步,定义以下三个参数:
1.子女名称(childname ):
2.父母名称(parientname ):
3.区分左表和右表的一个标识符号(flag ):
String childname = new String();
String parientname = new String();
String flag = new String();//左右表标识符
第二步,切割数据,分别得到子女名称和父母名称
String str[] = ivalue.toString().split(" ");
childname = str[0];//得到子名称
parientname = str[1];//得到父名称
第三步,做两个key,value的输出,分别标识出左表和右表
第一个:<父母名称,左表表标识符+子名称+父名称>
flag = "1";
output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));
第二个:<子女名称,右表表标识符+子名称+父名称>
flag = "2";
output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));
第四步,mapper结果:
Alice 1+Terry+Alice
Alice 1+Jack+Alice
Alma 1+Mark+Alma
Alma 1+Philip+Alma
Ben 1+Lucy+Ben
Jack 2+Jack+Alice
Jack 1+Tom+Jack
Jack 1+Jone+Jack
Jack 2+Jack+Jesse
Jesse 1+Jack+Jesse
Jesse 1+Terry+Jesse
Jone 2+Jone+Lucy
Jone 2+Jone+Jack
Lucy 1+Tom+Lucy
Lucy 2+Lucy+Ben
Lucy 2+Lucy+Mary
Lucy 1+Jone+Lucy
Mark 2+Mark+Alma
Mark 2+Mark+Terry
Mary 1+Lucy+Mary
Philip 2+Philip+Terry
Philip 2+Philip+Alma
Terry 1+Philip+Terry
Terry 1+Mark+Terry
Terry 2+Terry+Alice
Terry 2+Terry+Jesse
Tom 2+Tom+Lucy
Tom 2+Tom+Jack
4.reduce代码如下:
public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
private static Logger logger = Logger.getLogger(ChildParentReduce.class);
private int num = 0;
@Override
public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {
if (num == 0) {// 构造输出表头
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
int grandchildnum = 0;//多少个孙
int grandparientnum = 0;//多少个爷
String[] grandchild = new String[100];
String[] grandparient = new String[100];
while (ivalue.hasNext()){
String[] record = ivalue.next().toString().split("\\+");//根据“+”把数据分成三份
//左表数据
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到数组中
grandchildnum++;
}
//右表数据
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到数组中
grandparientnum++;
}
}
if (grandchildnum != 0 && grandparientnum != 0) {
//执行笛卡尔乘积
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
}
代码解析:
第一步:如果需要表头就在第一行输出表头
if (num == 0) {// 构造输出表头
output.collect(new Text("grandchild"), new Text("grandparient"));
num++;
}
第二步:定义四个参数,分别用于存放孙子和祖辈的数组,孙子的数量和祖辈的数量
int grandchildnum = 0;//多少个孙
int grandparientnum = 0;//多少个爷
String[] grandchild = new String[100];
String[] grandparient = new String[100];
第三步:解析map中得到的value-list
第一:要解析的内容应该是这样的:以mapper的结果Lucy作为key,解析如下数据:
<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>
循环value:
//左表数据
if (record[0].compareTo("1") == 0) {
grandchild[grandchildnum] = record[1];//拿到子名,放到数组中
grandchildnum++;
}
孙子:Tom,Jone
//右表数据
else if (record[0].compareTo("2") == 0) {
grandparient[grandparientnum] = record[2];//拿到父名,放到数组中
grandparientnum++;
}
祖辈;Ben,Mary
使用笛卡尔乘积,得到祖辈与孙辈的关系结果:
if (grandchildnum != 0 && grandparientnum != 0) {
//执行笛卡尔乘积
for (int i = 0; i < grandparientnum; i++) {
for (int j = 0; j < grandchildnum; j++) {
logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));
output.collect(new Text(grandchild[i]), new Text(grandparient[j]));
}
}
}
Tom,Ben
Tom,Mary
Jone ,Ben
Jone ,Mary
附上main方法:
public static void main(String[] args) {
try {
String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";
String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";
JobConf con = new JobConf(ChildParent2.class);
con.setJobName("childparent");
con.setMapOutputKeyClass(Text.class);
con.setMapOutputValueClass(Text.class);
con.setOutputKeyClass(Text.class);
con.setOutputValueClass(Text.class);
con.setMapperClass(ChildParentMapper.class);
con.setReducerClass(ChildParentReduce.class);
con.setInputFormat(TextInputFormat.class);
con.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(con, new Path(inputDir));
FileOutputFormat.setOutputPath(con, new Path(outputDir));
JobClient.runJob(con);
System.exit(0);
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
本文出自 “钟茂霖博客” 博客,请务必保留此出处http://zhongml.blog.51cto.com/4808277/1877330
案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表
标签:单表关联
原文地址:http://zhongml.blog.51cto.com/4808277/1877330