标签:sed awk
一、正则介绍
基本元字符(基本正则表达式):
字符匹配:
. 匹配换行符之外的任意一个字符
[] 字符组元字符,元字符在[]内,会失去特殊意义不用转义
[^] 除开字符组中的字符
次数匹配:
* 匹配前面字符零次或多次
\? 零次或一次
\{m,n\} 至少m次,至多n次
\{m,\} m次
锚定符:
\<,\b 词首锚定
\>,\b 词尾锚定
^ 行的开头
$ 行的结尾
^$ 空行
.* 任意字符串
分组:
\(\)
\1,\2 前向引用,\1第一个括号的内容
扩展元字符(sed -r 扩展正则表达式):
不需要加\转义
*
?
{m,n}
()
\1,\2
+ 匹配前面字符一次或多次
| 或者测试了下,sed好像不支持懒惰。
详细点可以到这里看正则表达式30分钟入门:
http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html
二、awk使用
1、引用外部变量
格式:awk -v 变量="$变量" ‘BEGIN{print 变量}‘ 可以在 BEGIN 中间 END三个地方引用此变量[root@localhost awk]# time=`date +%s`
[root@localhost awk]# echo $time
1404716831
[root@localhost awk]# awk -v d=$time ‘BEGIN{print d}‘
1404716831
[root@localhost awk]#2、内部变量
FS:输入字段分隔符,默认空格,支持正则,如下面使用[]字符组做分隔符
NF:当前处理记录的字段数,$0表示整个记录,$1第一个字段,$NF最后一个字段
[root@localhost awk]# cat b
oracle:x:500:500::/home/oracle:/bin/bash
[root@localhost awk]# awk ‘BEGIN{FS="[:/]"}{for (i=1;i<=NF;i++){print $i}}‘ b
oracle
x
500
500
home
oracle
bin
bash
[root@localhost awk]#NR:当前记录行数(所有文件的记录行数)
FNR:每个文件的当前记录行数
[root@localhost awk]# cat c d
c1
c2
c3
c4
d5
d6
d7
d8
[root@localhost awk]# awk ‘BEGIN{printf("%5s%5s%5s\n","data"," NR"," FNR")}{printf("%5s%5s%5s\n",$0,NR,FNR)}‘ c d
data NR FNR
c1 1 1
c2 2 2
c3 3 3
c4 4 4
d5 5 1
d6 6 2
d7 7 3
d8 8 4
[root@localhost awk]#NR与FNR在处理多文件时经常用到,比如NR==FNR等,可以配合next一起先处理完第一个文件,才处理第二个文件,如:
next:看下面的二个语句,若是没有next,那么读一条记录,在执行完第一个{}后,会接着再执行第二个{},所以他的结果会把第一个文件的重复打印一次。
加了next语句:在执行第一个{}里的next,表示跳过后面的所有语句,重新读取新的一条记录进行匹配
[root@localhost awk]# awk ‘NR==FNR{print $0;next}{print $0}‘ c d
c1
c2
c3
c4
d5
d6
d7
d8
[root@localhost awk]# awk ‘NR==FNR{print $0}{print $0}‘ c d
c1
c1
c2
c2
c3
c3
c4
c4
d5
d6
d7
d8
[root@localhost awk]#OFS:输出字段分隔符,默认为空格
ORS:输出行分隔符,默认为换行符
[root@localhost awk]# cat b
oracle:x:500:500::/home/oracle:/bin/bash
[root@localhost awk]# awk -v FS=":" -v OFS="=" ‘{print $1,$2,$3}‘ b
oracle=x=500
[root@localhost awk]# awk -v FS=":" ‘{print $1,$2,$3}‘ b
oracle x 500
这里使用-v传OFS和FS的值[root@localhost awk]# awk ‘1‘ b
oracle:x:500:500::/home/oracle:/bin/bash
root:x:500:500::/home/oracle:/bin/bash
hxw168:x:500:500::/home/oracle:/bin/bash
[root@localhost awk]# awk ‘BEGIN{ORS="***";}{print $0}‘ b
oracle:x:500:500::/home/oracle:/bin/bash***root:x:500:500::/home/oracle:/bin/bash***hxw168:x:500:500::/home/oracle:/bin/bash***[root@localhost awk]#RS:输入记录分隔符,默认为换行符
使用了一个不存在的输入分隔符 *** ,所以整个文件内容读入到$0中。
[root@localhost awk]# cat e
[a]
name=1
sex=2
age=3
[b]
address=gd sd
[root@localhost awk]# awk -v RS="***" ‘{print $0,"-----"NR}‘ e
[a]
name=1
sex=2
age=3
[b]
address=gd sd -----1
[root@localhost awk]#3、数组
数组可直接使用,不需要提前声明,用0或空初始化。
split函数:把字符串以指定分隔符分成多个字段,并把它存放到一个数组中,返回分段的个数。
数组的常见循环:
for (数组下标 in 数组) #用这个打印出来的数组是乱序
{
print 数组下标,数组[数组下标];
}
如下:
[root@localhost awk]# cat b
oracle:500:home:oracle:/bin/bash
[root@localhost awk]# awk ‘{i=split($0,a,":");for(x in a){print x,a[x]};}‘ b
4 oracle
5 /bin/bash
1 oracle
2 500
3 home
[root@localhost awk]#常见例子:
统计用ssh每个ip登录的次数
[root@localhost log]# cat secure.3
Jun 16 10:04:47 localhost sshd[5360]: Accepted password for root from 10.9.11.44 port 7520 ssh2
Jun 16 10:04:47 localhost sshd[5360]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 16 12:21:27 localhost sshd[5360]: pam_unix(sshd:session): session closed for user root
Jun 17 16:23:53 localhost sshd[9174]: Accepted password for root from 10.9.11.44 port 6651 ssh2
Jun 17 16:23:53 localhost sshd[9174]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 17 19:00:09 localhost sshd[9174]: pam_unix(sshd:session): session closed for user root
Jun 18 09:22:33 localhost sshd[11487]: Accepted password for root from 10.9.11.44 port 58455 ssh2
Jun 18 09:22:33 localhost sshd[11487]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 18 18:30:56 localhost sshd[11487]: pam_unix(sshd:session): session closed for user root
Jun 19 15:23:23 localhost sshd[16970]: Accepted password for root from 10.9.11.44 port 48345 ssh2
Jun 19 15:23:23 localhost sshd[16970]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 19 18:59:00 localhost sshd[16970]: pam_unix(sshd:session): session closed for user root
Jun 20 09:24:57 localhost sshd[19425]: Accepted password for root from 10.9.11.44 port 5519 ssh2
Jun 20 09:24:57 localhost sshd[19425]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 20 19:14:30 localhost sshd[19425]: pam_unix(sshd:session): session closed for user root
Jun 21 08:59:13 localhost sshd[22674]: Accepted password for root from 10.9.11.44 port 4640 ssh2
Jun 21 08:59:13 localhost sshd[22674]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 21 15:23:28 localhost sshd[22674]: subsystem request for sftp
Jun 21 18:51:04 localhost sshd[22674]: pam_unix(sshd:session): session closed for user root
[root@localhost log]# cat secure.3 | awk ‘/Accepted/{a[$(NF-3)]++}END{for(x in a){print x,a[x]}}‘
10.9.11.44 6
[root@localhost log]#本文出自 “尽管错,让我错到死!” 博客,请务必保留此出处http://hxw168.blog.51cto.com/8718136/1435310
【文本处理】awk、sed使用 - 更新中,布布扣,bubuko.com
标签:sed awk
原文地址:http://hxw168.blog.51cto.com/8718136/1435310