shell编程之正则表达式

时间：2015-08-12 19:44:59 阅读：225 评论：0 收藏：0 [点我收藏+]

标签：表达式通配符字符串正则表达式模糊匹配

其实Shell还是主要用来批量化自动化实现就好了，用它开发还是挺头疼的，下面整理一些常见的正则详情：

一、正则表达式

1.何为正则表达式

正则表达式是用于描述字符排列和匹配模式的一种语法规则，他主要用于字符串的模式分割、匹配、查找以及替换操作

主要是用于模糊匹配，分割查找替换稍微少一些

2.正则表达式与通配符

正则表达式用来在文件中匹配符合条件的字符串，正则是包含匹配。grep、awk、sed等命令可以支持正则表达式,做字符匹配（数据）
通配符是用来匹配符合文件的文件名，通配符是完全匹配。ls、find、cp这些命令不支持正则表达式，所以之恩那个使用shell自己的通配符来进行匹配，做文件（名）完全匹配

* 匹配任意字符

？匹配任意一个内容

[ ] 匹配括号中的一个字符

[root@localhost ~]# touch cangls
[root@localhost ~]# touch cangll
[root@localhost ~]# ls
ananconda-ks.cfg   cangls  cangll  install.log
[root@localhost ~]# ls cangl?
cangls  cangll
[root@localhost ~]# ls cang??
cangls  cangll
[root@localhost ~]# ls cang*
cangls  cangll
[root@localhost ~]# ls cangl[l,s]
cangls  cangll

区分正则表达式和通配符：

[root@localhost ~]# touch abc
[root@localhost ~]# touch abcd
[root@localhost ~]# ls
ananconda-ks.cfg   cangls  cangll  install.log  abc  abcd
[root@localhost ~]# find . –name abc
./abc

#不包括abcd，完全匹配，而不是包含匹配

[root@localhost ~]# find .  - name abc?
./abcd
[root@localhost ~]# find . – name “abc*”
./abc
./abcd

3.基础正则表达式

正则表达式分为基础正则表达式和扩展正则表达式

基础正则表达式：

$:匹配行尾，统一放在命令字符尾部，例如Helllo$匹配以Hello结尾的行

其实linux中？（）是扩展正则表达式

详细介绍：

1)“*”前一个字符匹配0次，或任意多次

“a*”：匹配所有内容，包括空白行

“aa*”：匹配至少包含一个a的行

“aaa*”：匹配至少包括两个连续a的字符串

“aaaaa*”：则会匹配最少包含四个连续啊的字符串

例子：

[root@localhost ~]# vim test.txt
#!/bin/bashrc
a
aa
aaa
aaaa
aaaaa
 
b
bb
bbb
bbbb
bbbbb
ab
aabb
~
~wq
[root@localhost ~]# vi .bashrc
# .bashrc
# User specific aliases and function
alias rm = ‘rm –i’
alias cp =  ‘cp –i’
alias mv = ‘mv –i’
alias vi = ‘vim; 
alias grep = ‘grep –color = auto’
…
wq

#设置别名文件，

[root@localhost ~]# source .bashrc
#生效
[root@localhost ~]# grep “a*” test.txt
#*前加任意字符代表该字符重复0次到任意多次
a
aa
aaa
aaaa
aaaaa#上面a颜色为红色
 
b
bb
bbb
bbbb
bbbbb
ab
aabb

[root@localhost ~]# grep “aa*” test.txt
#至少匹配一个a，后面是多少个a都行，不匹配空格
a
aa
aaa
aaaa
aaaaa
ab
aabb

#出现的a皆是红色

[root@localhost ~]# grep “aaaaa*” test.txt
aaaaa

2)“.“匹配除了换行符外任意一个字符

”s..d“:”s..d”会匹配在s和d这两个字母之间一定有两个字符的单词

”s.*d“：匹配在s和d之间有任意字符

”.*“：匹配所有内容

[root@localhost ~]# vi test.txt
#!/bin/bashrc
a
aa
aaa
aaaa
aaaaa
 
said
soid
suud
sooooood
 
b
bb
bbb
bbbb
bbbbb
ab
aabb
~
:wq
[root@localhost ~]# grep “s..d” test.txt
said
soid
suud
[root@localhost ~]# grep “s.*d” test.txt
said
soid
suud
sooooood
[root@localhost ~]# grep “.*” test.txt
a
aa
aaa
aaaa
aaaaa
 
said
soid
suud
sooooood
 
b
bb
bbb
bbbb
bbbbb
ab
aabb

#所有的都是红色的，当然空格是没有颜色的=。=

3）.“^”匹配行首，“$”匹配行尾

“^M”：匹配以大写“M”开头的行

“n$”：匹配以小写“n”结尾的行

“^$”：会匹配空白行

[root@localhost ~]# vim test.txt
#!/bin/bashrc
a
aa
aaa
aaaa
aaaaa
 
said
soid
suud
sooooood
 
b
bb
bbb
bbbb
bbbbb
ab
aabb
~
:wq
[root@localhost ~]# grep “^s” test.txt
said
soid
suud
sooooood
[root@localhost ~]# grep “b$” test.txt
b
bb
bbb
bbbb
bbbbb
ab
aabb
[root@localhost ~]# grep “^$” test.txt
 
 
[root@localhost ~]# grep  -n “^$” test.txt
6:
11:

4）.“[]”匹配中括号中指定的任意一个字符，只匹配一个字符

“s[ao]id”: 匹配s和i字幕中，要不是啊，要不是o

“[0-9]”：匹配任意一个数字

“^[a-z]”：匹配用小写字母开头的行

[root@localhost ~]# grep –n “s[ao]id” test.txt
7：said
8：soid
[root@localhost ~]# vim test.txt
a
aa
aaa
aaaa
aaaaa
 
said
soid
suud
sooooood
1234567
890
2aa
cc22
 
b
bb
bbb
bbbb
bbbbb
ab
aabb
~
:wq
[root@localhost ~]# grep “[0-9]”test.txt
1234567
890
2aa
cc22

#虽然是部分匹配内容，但是显示还是会显示一整行

[root@localhost ~]# grep “^[0-9]” test.txt
1234567
890
2aa
#没有cc22
[root@localhost ~]# grep “[0-9]$” test.txt
1234567
890
cc22

#不匹配2aa

4）.”[^]”:匹配除中括号的字符以外的任意一个字符

“^[^a-z]”:匹配不用小写字母开头的行

“^[^a-zA-Z]”:匹配不用字母开头的行

[root@localhost ~]# grep “[a-z]” test.txt
#这行中必须有一个字母
a
aa
aaa
aaaa
aaaaa
said
soid
suud
sooooood
2aa
cc22
b
bb
bbb
bbbb
bbbbb
ab
aabb
[root@localhost ~]# grep “[^a-z]”test.txt
1234567
890
2aa
cc22
[root@localhost ~]# grep “^[^a-z]”test.txx
1234567
890
2aa

#注意：[A-z]:在一些正则表达式中表示所有字母，但是在linux中应该写[a-zA-Z]表示所有字母

5）.“\”转义符

“\.$”：匹配使用“.”结尾的行

[root@localhost ~]# vim test.txt
a
aa
aaa
aaaa
aaaaa
 
said
soid
suud
sooooood.
1234567
890
2aa
cc22
 
b
bb
bbb
bbbb
bbbbb
ab
aabb
~
wq
[root@localhost ~]# grep “.$” test.txt
a
aa
aaa
aaaa
aaaaa
said
soid
suud
sooooood.
1234567
890
2aa
cc22
b
bb
bbb
bbbb
bbbbb
ab
aabb
sab
saabb

#匹配任意结尾的行

[root@localhost ~]# grep “\.$” test.txt
sooooood.

6）.“\{n\}”表示前面的字符恰好出现n次

“a\{3\}”:匹配a字母必须连续出现三次的字符串

“[0-9]\{3\}”：匹配包含连续的三个数字的字符串

[root@localhost ~]# grep “a\{\3}” test.txt
aaa
aaaa
aaaaa

#其实和大于两次是一样的，需要更复杂更详细的方法来区分，比如加定界符

[root@localhost ~]# grep “a\{\2}b”test.txt
aabb

#只有这一个

[root@localhost ~]# grep “sa\{\2}b” test.txt
saabb

#没有aabb

7）.“\{n,\}”表示前面的字符出现不小于n次

“^[0-9]\{3,\}[a-z]”:匹配最少连续三个数字开头的行

8）.”\{n,m\}”匹配其前面的字符至少出现n次，最多出现m次

“sa\{1,3\}b”:匹配在字母s和字母b之间至少有一个a，最多三个a

[root@localhost ~]# grep “sa\{1,2\}b” test.txt

sab

saabb

#我们可以用正则来确定一个服务的关键字，来确定这个服务时候正常运行

其他例子：

[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}:匹配日期格式：YYYY-MM-DD

[0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\}：匹配IP地址（点分十进制）

[root@localhost ~]# vim date.txt
2015-08-10
20150810
192.168.3.10
127.0.0.1
123456789
~wq
[root@localhost ~]# grep [0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} date.txt
2015-08-10
[root@localhost ~]# grep [0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\} date.txt
192.168.3.10
127.0.0.1

二、字符截取命令

写在前面的例子：提取出普通用户

[root@localhost ~]# useradd user1
[root@localhost ~]# useradd user2
[root@localhost ~]# grep “/bin/bash” /etc/passwd
root:x:0:0:root:/root:/bin/bash
user1:x:500:500::/home/user1:/bin/bash
user2:x:500:500::/home/user2:/bin/bash
[root@localhost ~]# grep “/bin/bash” /etc/passwd | grep –v “root”
user1:x:500:500::/home/user1:/bin/bash
user2:x:500:500::/home/user2:/bin/bash

1.cut字段提取命令

格式：[root@localhost ~]# cut [选项] 文件名

选项：

-f 列号：提取第几列

-d分隔符：按照制定分隔符分割列

cut –d “:” –f /etc/passwd

#注意grep是行提取命令，cut是列提取，支持以符号来提取，但是不支持空格号

例子1：cut基础操作

[root@localhost ~]#  vi student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70

#之间用制表符隔开，Tab

[root@localhost ~]#  cut –f 2 student.txt
Name
furong
fengj
cang
[root@localhost ~]#  cut –f 2,4 student.txt
Name       Mark
furong      85
fengj         60
cang          70
[root@localhost ~]# grep “/bin/bash” /etc/passwd | grep –v “root”
user1:x:500:500::/home/user1:/bin/bash
user2:x:500:500::/home/user2:/bin/bash
[root@localhost ~]# grep “/bin/bash” /etc/passwd | grep –v “root”| cut –f 1 –d “:”
user1
user2

#只提取出这两个用户名称

例子2：使用cut查看各分区情况

[root@localhost ~]# df
文件系统      1K-块      已用        可用       已用%       挂载点
/dev/sda5     17906468    1793848     15203004      11%          /
tmpfs        515396      0         515396      0%           /dev/shm
/dev/sda1     198337      26361      161736       15%          /boot
/dev/sda2     2015824     35808      1877616       2%           /home
[root@localhost ~]# df –h
文件系统     容量       已用        可用       已用%         挂载点
/dev/sda5    18G        1.8G        15G       11%           /
tmpfs       504M        0         504M       0%           /dev/shm
/dev/sda1    194M        26M        158M      15%          /boot
/dev/sda2    2.0G        35M        1.8G       2%          /home
#h:human，人性化显示
[root@localhost ~]#  df –h | cut –f  5
 
/dev/sda5    18G        1.8G        15G       11%           /
tmpfs       504M        0         504M       0%           /dev/shm
/dev/sda1    194M        26M        158M      15%          /boot
/dev/sda2    2.0G        35M        1.8G       2%          /home
[root@localhost ~]#  df –h | cut –f  1
文件系统
/dev/sda5    18G        1.8G        15G       11%           /
tmpfs       504M        0         504M       0%           /dev/shm
/dev/sda1    194M        26M        158M      15%          /boot
/dev/sda2    2.0G        35M        1.8G       2%          /home

#为什么？因为这个文件里的这些数据都是用空格隔开的，没有制表符，在系统内默认是一列的

2.printf命令

其实printf不是字符截取命令，但是printf在awk中会使用，它是一种标准输出格式

格式：printf ‘输出类型输出格式’ 输出内容

输出类型：

%ns ：输出字符串。n是指数字代指几个字符

%ni ：输出整数。n是数字指代输出几个数字

%m.nf ：输出浮点数。m和n是数字，指代输出的整数位和小数点位数，如%8.2f代表是共输出8位数，其中2位是小数，6位是整数

输出格式：

\a：输出警告声音

\b：输出退格键，也就是Backspace键

\f ：清除屏幕

\n ：换行

\r ：回车，也就是Enter键

\t ：水平输出退格键，也就是Tab键

\v ：垂直输出退格键，也就是Tab键

[root@localhost ~]# echo 123456
123456
[root@localhost ~]# echo 1 2 3 4 5 6
1 2 3 4 5 6
[root@localhost ~]# printf 1 2 3 4 5 6
1[root@localhost ~]#

#因为你没有标出输出类型

[root@localhost ~]# printf %s 1 2 3 4 5 6
123456[root@localhost ~]#  printf %s %s %s 1 2 3 4 5 6

#不能省略单引号

%s%s123456[root@localhost ~]#  printf ‘%s %s %s’ 1 2 3 4 5 6
1 2 34 5 6[root@localhost ~]#  printf ‘%s\t%s\t%s\t\n’1 2 3 4 5 6
1   2   3
4   5   6
[root@localhost ~]#  printf student.txt
student.txt
[root@localhost ~]#  cat student.txt | printf
printf : usage : printf {-v var} format [arguments]

#printf不支持数据流程

[root@localhost ~]#  cat student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]#  printf  $(cat student.txt)
ID[root@localhost ~]#  printf ‘%s’$(cat student.txt)
IDNamegenderMark1furongFb52fengjF603cangF70[root@localhost ~]# 
[root@localhost ~]#  printf ‘%s\t%s\t%s\t%\n’$(cat student.txt)
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70

补充：在awk命令的输出中支持print和printf命令

print：print会在每个输出之后自动加一个换行符（Linux默认是没有print命令的）

printf：printf是标准格式输出命令，并不会自动加入换行符，如果需要在输出格式中手动加入换行符即可

3.awk命令

1).格式：awk ‘条件1｛动作1｝条件2｛动作2｝…’文件名

条件（Pattern）：

一般使用关系表达式作为条件

x>10判断变量x是否大于10

x>=10判断x是否大于等于10

x<=10判断x是否小于等于10

动作（Action）：

格式化输出

流程控制语句

函数

[root@localhost ~]#  vi student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]# cut –f 2,4 student.txt
Name  Mark
furong 85
fengj 60
cang  70

例子：使用awk实现选取2，4列

[root@localhost ~]# awk ‘{printf $2  “\t” $4 “\n”}’ student.txt
#如果没条件就是无条件执行，$2,$4 ：第2，4列
Name  Mark
furong 85
fengj  60
cang   70

#注意\t,\n要用双引号引起来，其实awk提取的叫做字段，首先把第一行第2个字段赋值到$2，重复读取完之后形成一列

Name 
furong
fengj   
cang

[root@localhost ~]# df –h
文件系统     容量       已用        可用       已用%         挂载点
/dev/sda5    18G        1.8G        15G       11%           /
tmpfs       504M        0         504M       0%           /dev/shm
/dev/sda1    194M        26M        158M      15%          /boot
/dev/sda2    2.0G        35M        1.8G       2%          /home
[root@localhost ~]# df –h | grep “/dev/sda5”
/dev/sda5      18G      1.8G       15G         11%        /
[root@localhost ~]# df –h | grep “/dev/sda5”| awk ‘{print $1 “\t” $5}’
11%

2）. BEGIN关系表达式

awk ‘BEGIN{printf”This is a transcript \n”} {printf $2 “\t” $4 “\n”} ‘student.txt

#第一个条件BEGIN，满足才会执行，第二个没有条件，无条件执行

[root@localhost ~]# cat student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]#
awk ‘BEGIN{printf”This is a transcript \n”} {printf $2 “\t” $4 “\n”} ‘student.txt
This is a transcript
Name  Mark
furong  85
fengj  60
cang   70

3).END

awk ‘END{printf”The End \n”} {printf $2 “\t” $4 “\n”} ‘student.txt

[root@localhost ~]# awk ‘END{printf”The End \n”} {printf $2 “\t” $4 “\n”} ‘student.txt
Name  Mark
furong  85
fengj  60
cang   70
The End

4).FS内置变量

#awk是默认使用制表符或者空格来分割的，我们如果想提取向：分割的passwd就要使用到其他的

cat /etc/passwd | grep “/bin/bash” | \ awk ‘BEGIN{FS=”:”}{printf $1 “\t”$3 “\n”}’

使用冒号“：”分割

[root@localhost ~]# cat /etc/passwd | grep /bin/bash
root:x:0:0:root:/root:/bin/bash
user1:x:500:500::/home/user1:/bin/bash
user2:x:500:500::/home/user2:/bin/bash
[root@localhost ~]# cat /etc/passwd | grep /bin/bash | awk ‘{FS=”:”}{print $1 “\t” $3}’
root:x:0:0:root:/root:/bin/bash
user1    500
user2    501

#原因：在把数据读取之后再处理，第一行在命令FS执行之前以前完成了

[root@localhost ~]# cat /etc/passwd | grep /bin/bash | awk ‘BEGIN{FS=”:”}{print $1 “\t” $3}’
root   0
user1 500
user2 501

5).关系运算符

cat student.txt | grep –v Name | \ awk ‘$4>=70{printf $2 “\n”}’
[root@localhost ~]# cat student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]#  cat student.txt | grep –v  Name
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]#  cat student.txt | grep –v  Name | awk ‘$4>=70{print $2}’
furong
cang

4.sed命令

字符替换命令，sed是一种几乎包括所有的UNIX平台（包括Linux）的轻量级编辑器。sed主要用来将数据进行选取、替换、删除、新增的命令

格式：sed [选项] ’[动作]’文件名

选项：

-n：一般sed命令会把所有数据都输出到屏幕，如果加入此选择则会把经过sed命令处理的行输出到屏幕，即只显示修改后的数据

-e：与需对输入的数据应用多条sed命令编辑即多个动作

-i ：用sed的修改结果直接修改读取数据的文件，而不是由屏幕输出，会修改原始文件

sed一般不会修改原始的数据

动作：

a ：追加，在当前行后添加一行或多行

c ：行替换，用c后面的字符替换原始数据行

i ：插入（insert），在当前行插入一行或多行。

p ：打印（print），输出指定的行

s ：字符串替换，用一个字符串替换另一个字符串。格式为“行范围s/旧字符串/新字符串/g”（和vim中的替换格式类似）

d ： DELETE，删除，删除指定行数据，但是不修改文件本身

[root@localhost ~]# vi student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70

1）.行数据操作

sed ‘2p’ student.txt
#查看文件的第二行
sed –n ‘2p’ student.txt
[root@localhost ~]# sed ‘2p’ student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70

#没有-n，仍旧会显示所有的信息

[root@localhost ~]# sed -n‘2p’ student.txt
1     furong   F        85
[root@localhost ~]# sed ‘2d’ student.txt
ID    Name   gender    Mark
2     fengj    F        60
3     cang     F        70
[root@localhost ~]# sed ‘2,4d’ student.txt
ID    Name   gender    Mark

#删除第二行到第四行的数据，但不修改文件本身

[root@localhost ~]# sed ‘2a Life is Short’student.txt
ID    Name   gender    Mark
1     furong   F        85
Life is Short
2     fengj    F        60
3     cang     F        70

#在原始第二行后面加入

[root@localhost ~]# sed ‘2i Life is Short’student.txt
ID    Name   gender    Mark
Life is Short
1     furong   F        85
2     fengj    F        60
3     cang     F        70

#在原始的第二行前面加入

[root@localhost ~]# sed ‘2c Life is Short’student.txt
ID    Name   gender    Mark
Life is Short
2     fengj    F        60
3     cang     F        70

#第二行数据完全替换

3）.字符串替换

格式：sed ‘行号s/旧字符串/新字符串/g’文件名

sed ‘3s/60/99/g’ student.txt；在第三行中把60换成99

sed -i‘3s/60/99/g’ student.txt;sed操作的数据直接写入文件

sed –e ‘s/fengj//g;s/cang//g’ student.txt;同时把”fengj”和”cang”替换为空

[root@localhost ~]# sed ‘4s/70/100/g’ student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]# sed -i‘4s/70/100/g’ student.txt
[root@localhost ~]#
[root@localhost ~]# cat student.txt
ID    Name   gender    Mark
1     furong   F        85
2     fengj    F        60
3     cang     F        70
[root@localhost ~]# sed –e ‘s/furong//g;s/fengj//g’ student.txt
ID    Name   gender    Mark
1            F        85
2            F        60
3     cang     F        70

#furong,fengj为空了，但是注意行号，没有添加就是默认全文，单独的一个还可以区分，但是如果一个文件存在多个相同的符号或者内容，需要使用行号来区分

三、字符处理命令

1.排序命令sort命令

sort [选项]文件名

选项：

-f ：忽略大小写

-n：以数值型进行排序，默认使用字符串型排序

-r ：反向排序，默认从小到大

-t ：指定分隔符，默认分隔符是制表符

-k n[，m] ：按照指定的字段范围排序，从第n字段开始，m字段结束（默认到行尾）

例子：

sort/etc/passwd；排序用户信息文件

sort –r /etc/passwd；反向排序

2.统计wc命令

wc [选项] 文件名

选项：

-l ：只统计行数

-w ：只统计单词数word

-m ：只统计字符数，会统计回车符

多加练习，正则自在心中。

本文出自 “8626774” 博客，请务必保留此出处http://8636774.blog.51cto.com/8626774/1684144

shell编程之正则表达式

标签：表达式通配符字符串正则表达式模糊匹配

原文地址：http://8636774.blog.51cto.com/8626774/1684144

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行