码迷,mamicode.com
首页 > 数据库 > 详细

spark-sql分组去重总数统计uv

时间:2018-04-08 21:20:29      阅读:551      评论:0      收藏:0      [点我收藏+]

标签:func   table   pre   oca   cti   select   register   ring   field   

SparkConf sparkConf = new SparkConf();
        sparkConf
                .setAppName("Internal_Func")
                .setMaster("local");

        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
        SQLContext sqlContext = new SQLContext(javaSparkContext);

        List<String> list = new ArrayList<String>();
        list.add("1,1");
        list.add("2,11");
        list.add("2,111");
        list.add("2,111");
        list.add("3,1111");
        list.add("3,11111");

        JavaRDD<String> rdd_str = javaSparkContext.parallelize(list, 5);

        JavaRDD<Row> rdd_row = rdd_str.map(new Function<String, Row>() {
            @Override
            public Row call(String v1) throws Exception {
                String ary[] = v1.split(",");
                return RowFactory.create(ary[0], Long.parseLong(ary[1]));
            }
        });

        List<StructField> fieldList = new ArrayList<StructField>();
        fieldList.add(DataTypes.createStructField("name", DataTypes.StringType, true));
        fieldList.add(DataTypes.createStructField("sc", DataTypes.LongType, true));
        StructType tmp = DataTypes.createStructType(fieldList);

        DataFrame df = sqlContext.createDataFrame(rdd_row, tmp);
        df.registerTempTable("tmp_sc");

        DataFrame df_agg = sqlContext.sql("select name,count(distinct(sc)) from tmp_sc group by name");//去重后分组求和统计

        df_agg.show();

 

spark-sql分组去重总数统计uv

标签:func   table   pre   oca   cti   select   register   ring   field   

原文地址:https://www.cnblogs.com/zzq-include/p/8747107.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!