标签:
shell编程的重要性:
对于hadoop程序员,通常需要熟悉shell编程,因为shell可以非常方便的运行程序代码。
shell文件格式:
文件名后缀通常是.sh
#!/bin/sh[先指定文件下面用的是哪一个sh]
#这里是注释
shell中的变量:
(1)变量不需要声明,初始化不需要指定类型
(2)变量名称只能有字母、数字、下划线组成,不能使用数字开头
(3)分类: 临时变量 环境变量 (export)
显示变量值使用echo命令 ,加上
示例程序:
[root@hadoop33 mydata]# more app1.sh
#!/bin/sh
i=10
j=20
k=30
echo "she is $i years old,he is $j years old"
echo "I am $k years old"
[root@hadoop33 mydata]# app1.sh
she is 10 years old,he is 20 years old
I am 30 years old
shell中的单引号、双引号、飘号:
(1)单引号不解析任何变量和命令
(2)双引号解析变量但不解析命令
(3)飘号将其中的每个单词作为一个命令来解析
示例程序:
[root@hadoop33 mydata]# more app2.sh
#!/bin/sh
echo ‘JAVA_HOME is $JAVA_HOME , today is date‘
echo "JAVA_HOME is $JAVA_HOME , today is date"
echo "JAVA_HOME is $JAVA_HOME , today is `date`"
[root@hadoop33 mydata]# app2.sh
JAVA_HOME is $JAVA_HOME , today is date
JAVA_HOME is /home/hadoop/jdk1.7.0_25x64 , today is date
JAVA_HOME is /home/hadoop/jdk1.7.0_25x64 , today is Wed Jul 20 11:03:50 CST 2016
[root@hadoop11 apache_logs]# more app2.sh
#!/bin/sh
yesterday=`date --date="1 days ago" +%Y-%m-%d`
echo "输出昨天的时间:"
echo $yesterday
[root@hadoop11 apache_logs]# app2.sh
输出昨天的时间:
2016-07-19
注意:飘号把引号中的每个单词作为一个命令,如果是变量则先求值然后作为一个命令处理
shell中的位置变量:
(1)执行脚本时,传入的参数按照先后顺序使用
(2)$0表示脚本文件本身
(3)其中1、2……表示引用变量的位置
示例程序:
[root@hadoop33 mydata]# more app3.sh
#!/bin/sh
#删除存在的输出文件夹 运行jar包 查看结果
echo "删除事先存在的输出路径:"
hadoop fs -rmr $2
echo "运行jar包:"
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount $1 $2
echo "查看运行结果:"
hadoop fs -cat $3
[root@hadoop33 mydata]# app3.sh /dir1/ /dir1out/ /dir1out/part-r-00000
删除事先存在的输出路径:
rmr: DEPRECATED: Please use ‘rm -r‘ instead.
16/07/20 11:30:39 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /dir1out
运行jar包:
16/07/20 11:30:41 INFO client.RMProxy: Connecting to ResourceManager at hadoop22/10.187.84.51:8032
16/07/20 11:30:42 INFO input.FileInputFormat: Total input paths to process : 1
16/07/20 11:30:42 INFO mapreduce.JobSubmitter: number of splits:1
16/07/20 11:30:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468805633167_0018
16/07/20 11:30:43 INFO impl.YarnClientImpl: Submitted application application_1468805633167_0018
16/07/20 11:30:43 INFO mapreduce.Job: The url to track the job: http://hadoop22:8088/proxy/application_1468805633167_0018/
16/07/20 11:30:43 INFO mapreduce.Job: Running job: job_1468805633167_0018
16/07/20 11:30:49 INFO mapreduce.Job: Job job_1468805633167_0018 running in uber mode : false
16/07/20 11:30:49 INFO mapreduce.Job: map 0% reduce 0%
16/07/20 11:30:54 INFO mapreduce.Job: map 100% reduce 0%
16/07/20 11:31:00 INFO mapreduce.Job: map 100% reduce 100%
16/07/20 11:31:00 INFO mapreduce.Job: Job job_1468805633167_0018 completed successfully
16/07/20 11:31:00 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=47
FILE: Number of bytes written=185823
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=138
HDFS: Number of bytes written=25
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3193
Total time spent by all reduces in occupied slots (ms)=3075
Total time spent by all map tasks (ms)=3193
Total time spent by all reduce tasks (ms)=3075
Total vcore-seconds taken by all map tasks=3193
Total vcore-seconds taken by all reduce tasks=3075
Total megabyte-seconds taken by all map tasks=3269632
Total megabyte-seconds taken by all reduce tasks=3148800
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=71
Map output materialized bytes=47
Input split bytes=99
Combine input records=8
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=47
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=68
CPU time spent (ms)=1330
Physical memory (bytes) snapshot=422313984
Virtual memory (bytes) snapshot=1783205888
Total committed heap usage (bytes)=281346048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=39
File Output Format Counters
Bytes Written=25
查看运行结果:
hello 4
me 1
she 1
you 2
shell中的date时间用法:
(1)显示当前时间
(2)格式化输出 +%Y-%m-%d【这样可以输出指定格式的年月日】
(3)格式+%s表示自1970-01-01 00:00:00以来的秒数,自指定时间以来的间隔秒数,利用这个间隔秒数可以转化成指定的年月日
(4)指定时间输出 –date=’2009-01-01 11:11:11’
(5)指定时间输出 –date=’3 days ago’
示例程序:
[root@hadoop33 mydata]# more app4.sh
#!/bin/sh
function dat()
{
echo "显示当前时间:"
date
echo "格式化输出当前时间:"
date +%Y-%m-%d-%H-%M-%S
echo "输出标准时间以来的秒数:"
date +%s
echo "指定时间输出:"
date --date="1991-08-18 12:12:00" +%Y-%m-%d-%H-%M-%S
echo "指定时间输出:"
date --date="1 days ago" +%Y-%m-%d-%H-%M-%S
}
#调用此函数
dat
[root@hadoop33 mydata]# app4.sh
显示当前时间:
Wed Jul 20 11:47:01 CST 2016
格式化输出当前时间:
2016-07-20-11-47-01
输出标准时间以来的秒数:
1468986421
指定时间输出:
1991-08-18-12-12-00
指定时间输出:
2016-07-19-11-47-01
shell中标准输入输出重定向:
标准输入输出都是显示在shell的命令行中,如果不想让显示在命令行中,我们可以使用重定向改变输出方向,将本该在命令行显示的结果输出到指定的路径当中
重定向命令:> 覆盖 >> 追加
示例程序:
[root@hadoop33 mydata]# date +%Y-%m-%d-%H-%M-%S >> word.txt
[root@hadoop33 mydata]# more word.txt
2016-07-20-11-57-19
shell中crontab定时器的用法:
(1)编辑使用crontab -e :一共6列,分别是:分 时 日 月 周 命令
(2)查看使用crontab -l
示例程序:
[root@hadoop11 ~]# crontab -l
*/5 * * * * date >> /usr/local/mydata/word2.txt
[root@hadoop11 ~]# more /usr/local/mydata/word2.txt
Tue Jul 19 14:05:01 CST 2016
Tue Jul 19 14:10:01 CST 2016
Tue Jul 19 14:15:01 CST 2016
Tue Jul 19 14:20:01 CST 2016
Tue Jul 19 14:25:01 CST 2016
shell中if判断与for循环:
格式:
if [ ... ] ;then
...
fi
for ((i=0;i<10;i++))
do
...
done
示例程序:
[root@hadoop22 mydata]# more app5.sh
#!/bin/sh
#本脚本文件用来测试shell中if与for的使用
function dat()
{
#在if中注意符号的间隔
if [ 3 > 56 ] ; then
echo "this is right"
else
echo "this is false"
fi
#测试for循环
for((i=0;i<10;i++))
do
echo $i
done
}
#调动函数
dat
[root@hadoop22 mydata]# app5.sh
this is right
0
1
2
3
4
5
6
7
8
9
问题:为什么输出的是right?
shell中的自定义函数:
格式:
function 函数名()
{
}
示例程序:
[root@hadoop33 mydata]# more app4.sh
#!/bin/sh
function dat()
{
echo "显示当前时间:"
date
echo "格式化输出当前时间:"
date +%Y-%m-%d-%H-%M-%S
echo "输出标准时间以来的秒数:"
date +%s
echo "指定时间输出:"
date --date="1991-08-18 12:12:00" +%Y-%m-%d-%H-%M-%S
echo "指定时间输出:"
date --date="1 days ago" +%Y-%m-%d-%H-%M-%S
}
#调用此函数
dat
[root@hadoop33 mydata]# app4.sh
显示当前时间:
Wed Jul 20 12:07:44 CST 2016
格式化输出当前时间:
2016-07-20-12-07-44
输出标准时间以来的秒数:
1468987664
指定时间输出:
1991-08-18-12-12-00
指定时间输出:
2016-07-19-12-07-44
对于上面的介绍,如有问题,欢迎留言!
标签:
原文地址:http://blog.csdn.net/a2011480169/article/details/51968865