本博文程序是读取hadoop的hdfs中的文件,使用正则化解析出规定格式的数据,然后加载到sparkSQL数据库中。
正则化如果不太了解,请看正则表达式30分钟入门教程
文件内容大致为:
CREATE TABLE IF NOT EXISTS `rs_user` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`title` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk AUTO_INCREMENT=59573 ;
INSERT INTO `rs_user` (`id`, `uid`, `url`, `title`) VALUES
(1, 269781, ‘http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=721360‘, ‘[体育][其他][2002年亚运会羽毛球男单决赛 陶菲克vs李炫一][rmvb][国语]‘),
(2, 256188, ‘http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=721360‘, ‘[体育][其他][2002年亚运会羽毛球男单决赛 陶菲克vs李炫一][rmvb][国语]‘),
package com.spark.firstApp
import org.apache.spark.SparkContext}
提交任务:
root@Master:/# spark-submit --master spark://192.168.0.10:7077 --class com.spark.firstApp.HelloSpark --executor-memory 100m /root/IdeaProjects/FirstSparkApp/out/artifacts/FirstSparkAppJar/FirstSparkAppJar.jar
输出结果:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/15 21:53:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/04/15 21:53:56 INFO Remoting: Starting remoting
15/04/15 21:53:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:52584]
15/04/15 21:53:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/04/15 21:53:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:54183
15/04/15 21:54:03 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/04/15 21:54:03 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/04/15 21:54:12 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
15/04/15 21:54:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/15 21:54:21 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/15 21:54:21 INFO mapred.FileInputFormat: Total input paths to process : 1
title: [‘[其他][视频][LOL][微笑卷毛1月13号双排三场合集][微笑卷毛解说][mp4]‘]
title: [‘[其他][视频][LOL][SMZ24解说:S5盲僧李青的全场gank之旅_高清][SMZ24解说][mp4]‘]
原文地址:http://blog.csdn.net/a350203223/article/details/45074991