标签:hadoop
Hadoop日志分析系统启动脚本
#!/bin/bash #Flume日志数据的根目录 root_path=/flume #Mapreduce处理后的数据目录 process_path=/process #hive分区时间 partition=`date "+%Y-%m-%d"` #获取前一小时的时间:/YYYY-MM-DD/HH file_path=`date -d "1 hour ago" +"%Y-%m-%d/%H"` #执行Mapreduce程序 # hadoop jar /root/develop/runjar/accesslog.jar hdfs://mycluster $root_path/$file_path $process_path/$file_path hadoop jar /root/develop/runjar/accesslog.jar hdfs://mycluster /flume/2014-10-15/16 /process/2014-10-15/16 #把数据装载到Hive中 #hive -e load data inpath $process_path/$file_path/* into table access_log partition(dt=$partition) hive -e "load data inpath '/process/2014-10-15/16/*' overwrite into table access_log partition(dt='2014-10-15')" #执行Hive脚本,统计数据 hive -e "insert into table access_page_times select cs_uri_stem,count(*) from access_log where dt='2014-10-15' group by cs_uri_stem" #通过sqoop把数据从hive导出到mysql sqoop export --connect jdbc:mysql://ip:3306/fkdb --username root --password 123456 --table access_page_times --export-dir /user/hive/warehouse/access_page_times --input-fields-terminated-by '\001'
标签:hadoop
原文地址:http://blog.csdn.net/panguoyuan/article/details/40152915