基于用户电影评价的分析预测

时间：2016-06-21 22:40:17 阅读：465 评论：0 收藏：0 [点我收藏+]

标签：

故事背景

　　在我们的日常生活中，人们已经习惯了看电影。但是，每个人的偏好是不同的，有的人可能喜欢战争片，有人可能更喜欢艺术片，而有的人则可能喜欢爱情片，等等。现在，我们收集了一些的客户和电影的相关信息，目的是找出客户对特定影片的评分，从而预测出客户有可能喜爱的电影并推荐给客户。本次的大数据处理，使用了单词统计、基于用户的协同过滤算法等。

分析预测技术

分析工具：基于Hadoop的MapReduce

数据预处理：利用单词统计将一部分重复的、无用的数据过滤掉

算法：基于用户的协同过滤算法

数据可视化：使用了echart的柱状图和平行坐标图

基于用户的协同过滤算法

　　根据其他用户的观点产生对目标用户的推荐列表。即如果用户对一些项的评分比较相似，则他们对其他项的评分也相似。协同过滤推荐系统使用统计技术搜素目标用户的若干最近邻居，然后根基最近邻居对项的评分预测目标用户对未评分项的评分，选择预测评分最高的前若干项作为推荐结果反馈给用户

实现:

收集可以代表用户兴趣的信息
最近邻搜索，计算两个用户的相似度

余弦相似度：用户i和用户j之间的相识度

　　　　　　　　　　　　技术分享

生成预测结果

可以通过用户U与最近邻集合NBS中项目的评分得到

技术分享

案例

根据电影的基本信息和用户对电影的评价来向用户推荐电影

收集数据：十万级的用户电影评分数据，来源于最新的MovieLens

http://www.datatang.com/data/44295 /

根据movies.dat中的数据，通过单词统计，分析大众对不同种类电影的喜好

package org.bigdata.util;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class classify {
    private static class classifyMapper extends 
            Mapper<LongWritable,Text,Text,IntWritable>{
        @Override
        protected void map(LongWritable key, Text value, 
                Mapper<LongWritable, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
                String[] strs = value.toString().split("::");            
                String[] classes = strs[2].split("\\|");
                for(String str : classes){
                    context.write(new Text(str),new IntWritable(1));
                }
        }
    }
    
    private static class classifyReducer extends 
        Reducer<Text,IntWritable,Text,IntWritable>{
        @Override
        protected void reduce(Text value, Iterable<IntWritable> datas,
                Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
                int count = 0;
                for(IntWritable data : datas){
                    count  = count + data.get();
                }
                context.write(value,new IntWritable(count));
        }
        
    }
    
    public static void main(String[] args) throws Exception{
        Configuration cfg = HadoopCfg.getCfg();
        Job job = Job.getInstance(cfg);
        job.setJobName("classify Count");
        job.setJarByClass(classify.class);
        job.setMapperClass(classifyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(classifyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job,new Path("/input/movies.dat"));
        FileOutputFormat.setOutputPath(job,new Path("/output/"));
        System.exit( job.waitForCompletion(true)?0:1);
    }
}

数据预处理：对ratings.dat中的数据j进行预处理，过滤出评分在4以上的数据

技术分享 ——>

package org.bigdata.util;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class favor {
    private static class favorMapper extends 
            Mapper<LongWritable,Text,Text,IntWritable>{
        @Override
        protected void map(LongWritable key, Text value, 
                Mapper<LongWritable, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
                String[] strs = value.toString().split(",");    
                //int mvote =  (Float.valueOf(strs[2])).intValue();
                int mvote =  (Float.valueOf(strs[2])).intValue();
                if(mvote >= 3)
                {
                    context.write(new Text(strs[1]+"\t"+strs[0]+"\t"+strs[2]),new IntWritable(1));
                }
        }
    }
    
    private static class favorReducer extends 
        Reducer<Text,IntWritable,Text,IntWritable>{
        @Override
        protected void reduce(Text value, Iterable<IntWritable> datas,
                Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int count = 0;
            for(IntWritable data : datas){
                count  = count + data.get();
            }
            context.write(value,new IntWritable(count));
        }
        
    }
    
    public static void main(String[] args) throws Exception{
        Configuration cfg = HadoopCfg.getCfg();
        Job job = Job.getInstance(cfg);
        job.setJobName("favor Count");
        job.setJarByClass(favor.class);
        job.setMapperClass(favorMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(favorReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job,new Path("/select/"));
        FileOutputFormat.setOutputPath(job,new Path("/output/"));
        System.exit( job.waitForCompletion(true)?0:1);
    }
}

数据处理：对过滤后的数据使用基于用户的协同过滤算法进行预测分析

主函数：

package com;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.ToolRunner;

public class UserCF {
    public static void main(String[] args) throws Exception {
        
     ToolRunner.run(new Configuration(), new UserCF1(), args); 
     ToolRunner.run(new Configuration(), new UserCF2(), args); 
     ToolRunner.run(new Configuration(), new UserCF3(), args); 
     ToolRunner.run(new Configuration(), new UserCF4(), args); 
     ToolRunner.run(new Configuration(), new UserCF5(), args); 
     ToolRunner.run(new Configuration(), new UserCF6(), args); 
    }
}

将评过相同电影的用户关联起来

package com;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;

public class UserCF1 extends Configured implements Tool {

    public static class Mapper1 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
//            只获取文本中的用户编号、电影编号、评分
            String[] values = value.toString().split("\t");
            //电影编号作为Key
            context.write(new Text(values[1]), new Text(values[0]+"\t"+values[2]));
        }
    }

    public static class Reducer1 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            List<String> tmp_list = new ArrayList<String>();
            for(Text tmp:values)
            {
                tmp_list.add(tmp.toString());
            }
            
            for(int i=0;i<tmp_list.size();i++)
            {
                String []tmp1 =tmp_list.get(i).split("\t");
                int tmp11 = (Float.valueOf(tmp1[1])).intValue();
                int down1 = tmp11 * tmp11;
                for(int j=0;j<tmp_list.size();j++)
                {
                    String []tmp2 =tmp_list.get(j).split("\t");
                    int tmp21 = (Float.valueOf(tmp2[1])).intValue();
                    int up = tmp11 * tmp21;
                    int down2 = tmp21 * tmp21;
                    //评过同一电影的用户关联起来
                    context.write(new Text(tmp1[0]+" "+tmp2[0]), new Text(up+" "+down1+" "+down2));
                }
            }
            
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF1");

        job.setJarByClass(UserCF1.class);
        job.setMapperClass(Mapper1.class);
        job.setReducerClass(Reducer1.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userCF/train"));
        Path table_path = new Path("/userCF/tmp");
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

余弦相似性算法

package com;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;



public class UserCF2 extends Configured implements Tool {

    public static class Mapper2 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] values = value.toString().split("\t");
            String[] tmp = values[0].split(" ");    //用户
            context.write(new Text(tmp[0]+"\t"+tmp[1]), new Text(values[1]));
        }
    }

    public static class Reducer2 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            
            int up=0;
            int down1=0;
            int down2=0;
            float simi=0;
            for(Text tmp:values)
            {
                String[] tmp_list = tmp.toString().split(" ");
                up=up+Integer.parseInt(tmp_list[0]);
                down1=down1+Integer.parseInt(tmp_list[1]);
                down2=down2+Integer.parseInt(tmp_list[2]);
            }
            //余弦相似性
            float down = (int)Math.sqrt(down1)*(int)Math.sqrt(down2);
            simi=up/down;
           context.write(key, new Text(simi+" si"));
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF2");

        job.setJarByClass(UserCF2.class);
        job.setMapperClass(Mapper2.class);
        job.setReducerClass(Reducer2.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userBase/tmp"));
        Path table_path = new Path("/userBase/simi");
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

将已评过电影的评分和用户相似度关联起来

package com;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;


public class UserCF3 extends Configured implements Tool {

    public static class Mapper3 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] values = value.toString().split("\t");
            //用户1 用户2，相似度
            context.write(new Text(values[0]), new Text(values[1]+"\t"+values[2]));
        }
    }

    public static class Reducer3 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            List<String> user_list = new ArrayList<String>();
            List<String> item_list = new ArrayList<String>();
            for(Text tmp:values)
            {
                String []tmp1=tmp.toString().split("\t");
                String []tmp2= tmp1[1].split(" ");
                //判断到底是哪个文件中的数据
                if(tmp2.length==2)
                {
                    user_list.add(tmp1[0]+"\t"+tmp2[0]);
                }
                else
                {
                    item_list.add(tmp1[0]+"\t"+tmp2[0]);
                }
            }
            //将评分和相似度关联起来
            for(int i=0;i<user_list.size();i++)
            {
                String []tmp1 = user_list.get(i).split("\t");
                for(int j=0;j<item_list.size();j++)
                {
                    String []tmp2 = item_list.get(j).split("\t");
                    context.write(new Text(tmp1[0]+" "+tmp2[0]), new Text(tmp1[1]+" "+tmp2[1]));
                }
            }
            
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF1");

        job.setJarByClass(UserCF3.class);
        job.setMapperClass(Mapper3.class);
        job.setReducerClass(Reducer3.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userCF/train"));
        FileInputFormat.addInputPath(job, new Path(
                "/userCF/simi"));
        Path table_path = new Path("/userCF/tmp2");
        
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

预测用户对所有电影的所有评分

package com;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;
//预测评分
public class UserCF4 extends Configured implements Tool {

    public static class Mapper4 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] values = value.toString().split("\t");
            String [] tmp = values[0].split(" ");
            context.write(new Text(tmp[0]+"\t"+tmp[1]), new Text(values[1]));
        }
    }

    public static class Reducer4 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            double up=0;double down=0;
            for(Text tmp:values)
            {
                String []tmp1 = tmp.toString().split(" ");
                up = up + Double.parseDouble(tmp1[0])*Double.parseDouble(tmp1[1]);
                down = down +Math.abs(Double.parseDouble(tmp1[0]));
            } 
            double score = up/down;
           context.write(key, new Text(score+""));
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF2");

        job.setJarByClass(UserCF4.class);
        job.setMapperClass(Mapper4.class);
        job.setReducerClass(Reducer4.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userCF/tmp2"));
        Path table_path = new Path("/userCF/score");
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

检测实际用户的评分与预测的偏差

package com;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;



public class UserCF5 extends Configured implements Tool {

    public static class Mapper5 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] values = value.toString().split("\t");
    
            context.write(new Text(values[0]+"\t"+values[1]), new Text(values[2]));
        }
    }

    public static class Reducer5 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            int i=0;double tmp1=0,tmp2=0;
            for(Text tmp:values)
            {   
                if(i==0)
                {
                    tmp1=Double.parseDouble(tmp.toString());
                }
                else{
                    tmp2=Double.parseDouble(tmp.toString());
                }
                i++;
            } 
            if(i==2)
            {
                context.write(new Text("mae"), new Text(Math.abs(tmp1-tmp2)+""));
                
            }
           
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF5");

        job.setJarByClass(UserCF5.class);
        job.setMapperClass(Mapper5.class);
        job.setReducerClass(Reducer5.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userCF/score/part-r-00000"));
        FileInputFormat.addInputPath(job, new Path(
                "/userCF/test"));
        Path table_path = new Path("/userCF/tmp3");
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

package com;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.bigdata.util.HadoopCfg;



public class UserCF6 extends Configured implements Tool {

    public static class Mapper6 extends
            Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] values = value.toString().split("\t");
    
            context.write(new Text(values[0]), new Text(values[1]));
        }
    }

    public static class Reducer6 extends
            Reducer<Text, Text, Text, Text> {

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            int num=0;double sum=0;
            for(Text tmp:values)
            {   
                sum=sum + Double.parseDouble(tmp.toString());
                num = num +1;
            }     
           context.write(new Text("mae"), new Text(sum/num+""));
        }
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = HadoopCfg.getCfg();
        Job job = Job.getInstance(conf, "UserCF2");

        job.setJarByClass(UserCF6.class);
        job.setMapperClass(Mapper6.class);
        job.setReducerClass(Reducer6.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(
                "/userCF/tmp3"));
        Path table_path = new Path("/userCF/MAE");
        FileSystem.get(conf).delete(table_path, true);
        FileOutputFormat.setOutputPath(job, table_path);
        job.waitForCompletion(true);
        return 0;
    }
}

数据可视化：将结果转为json数据，通过echart中的柱状图和平行坐标图数据可视化

技术分享

http://echarts.baidu.com/demo.html#mix-zoom-on-value

http://echarts.baidu.com/demo.html#parallel-aqi

结论与启示

通过Hadoop的MapReduce的基于用户的协同过滤算法对数据进行了分析预测
通过柱状图可以知道大众对喜剧片、动作片、爱情片更为喜爱
通过平行坐标图可以得知人们通过电影的评分可以关联起来，通过数据分析推荐可以更为快捷地找到自己喜欢的影片
我们正处于大数据的时代，通过分析预测，我们将更加了解自己及需求

问题

在本次的数据处理过程中，由于数据有些庞大，在处理的过程中，产生了大量的中间文件，将Hadoop中的存储空间占了很大一部分导致不能正常运行完成。（为什么这么说呢？如果一个人评价了10000部影片，那么只要看过其中一部影片的用户就会和这位用户产生关联，就要计算他没看过的其他的所有影片的可能的评价，这样用户越多电影越多就会发生数据大爆炸）

通过网上查找资料，占据的内存太多，几乎没有空余的空间存储。需要采用Hadoop的datanode多磁盘空间处理，增加磁盘，通过hdfs-site.xml中的dfs.datanode.data.dir配置项通过分号分割将新添加的磁盘添加到datanode中。

http://www.makaidong.com/%E5%8D%9A%E5%AE%A2%E5%9B%AD%E6%8E%92%E8%A1%8C/20013.shtml

http://www.zhihu.com/question/19985195

基于用户电影评价的分析预测

标签：

原文地址：http://www.cnblogs.com/sker/p/5605006.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行