7.grep和egrep的深度剖析

时间：2014-12-26 18:54:19 阅读：260 评论：0 收藏：0 [点我收藏+]

标签：search grep expression regular 扩展正则表达式

1.grep和egrep是什么？

我们先来说说正则表达式，正则表达式：是一类字符所书写的模式，其中许多字符不表示其字面意义，而是表达控制或通配等功能；而其中正则表达式中还有大量的元字符，元字符就是不表示其字面意义（比如.它不是表示.这个符号，它是指任意的单个字符），而用于额外功能性描述的。

grep是一个文本搜索工具，它是根据用户指定的文本模式（搜索条件）对目标文件进行逐行搜索，并且显示能匹配到的行。grep的英文名字是：Globally search a regular expression and print 全局搜索正则表达式和打印。egrep中前面的e是extended，是指扩展的全局正则表达式和打印，因此我们知道它的功能是比grep会更强的，下面我们会详细分析他们各自的用法。

2.grep和egrep能做什么呢？

a.搜索文件中指定的内容并且显示；

b.搜索文件中有多少与我们的要求匹配的内容；

c.对命令结果进行处理显示我们需要了解的信息。

3.grep和egrep的使用

首先来查看一下grep的使用，我们可以在命令行输入man grep，得到如下的显示

SYNOPSIS 语法格式
       grep [OPTIONS] PATTERN [FILE...]
       grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

DESCRIPTION 描述信息
       grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines. grep是搜索输入的文件匹配PATTERN（模式）的行打印。

其中有一个 --color=auto选项，使用了此选项，查找到匹配的内容，则其会有颜色显示出来的

但是，如果说每次查找都指定此选项不是很麻烦，我们可以通过alias（别名）来定义

我们刚才说过grep在搜索时，是有元字符的，那有哪些元字符呢？下面我们把经常使用到的元字符都来讲解一下

在字符匹配中

.: 用来匹配任意单个字符

查找/etc/passwd文件含有ba中间根任意字符而后跟h的字串

[]: 匹配指定范围内的任意单个字符
[0-9],匹配0-9之间的任意单个字符，也可用[[:digit:]]是同样的道理
[a-z],匹配小写字符a-z之间的任意单个字符同理，[[:lower:]]也可
[A-Z],匹配小写字符A-Z之间的任意单个字符同理[[:upper:]]也行
[[:space:]] 匹配一个空格
[[:punct:]] 匹配单个标点符号
[[:alpha:]] 所有字母
[[:alnum:]] 字母和数字

给定一个文件test.txt，内容如下：

[root@linux_basic ~]#cat test.txt
I love you.
Hello,y.ol to/centos
8gentoo90 8df9 A:B:I:P:W
BELIVE in YOU self

a.查找test.txt中一个数字后面跟任意单个字符的行。

b.查找test.txt中任意字符后面跟一个小写字母的行。

c.查找test.txt中任意字符后面跟一个大写字母的行。

d.查找test.txt中大写字母后面跟一个空白的行。

注意：grep是工作于贪婪模式下的，尽可能多的去匹配

[^]是指定范围的字符取反

任意单个字符后面跟非大写字母的行

次数匹配元字符：用于实现指定其前面的字符所能够出现的次数
*: 任意长度，它前面的字符可以出现任意次
例如：x*y x可以出现任意次
xxy, xyy（就xy可以匹配）, y,
\?: 0次或1次，它前面的字符是可有可无的
例如：x\?y x可以出现0次或1次
xy, y, ay 不是整串，单个字串匹配也会显示
\{m\}: m次，它前的字符要出现m次
例如：x\{2\}y
xy（不匹配）, xxy, y（不匹配）, xxxxy, xyy（不匹配）
\{m,n\}: 匹配其前面的字符至少m次，至多n次
例如：x\{2,5\}y
xy, y, xxy
\{m,\}：匹配其前面的字符至少m次
\{0,n\}: 匹配其前面的字符至多n次

给定文件test.txt

[root@linux_basic ~]#cat test.txt
I love you.
Hello,y.ol to/centos
8gentoo90 8df9 A:B:I:P:W
BELIVE in YOU self

.*：任意长度的任意字符

[root@linux_basic ~]#grep "to.*" test.txt
Hello,y.ol to/centos
8gentoo90 8df9 A:B:I:P:W

在上面匹配当中，如果要指定在行首或行尾或者词首或词尾来匹配，改如何来操作？

正则表达式中给我们提供了位置锚定，也就是行首或行尾或者词首或词尾的锚定

位置锚定：
^: 行首锚定；脱字符号
写在模式最左侧
$: 行尾锚定：
写在模式最右侧
^$: 空白行
不包含特殊字符的连续字符组成的串叫单词：
\<: 词首，出现于单词左侧，\b
\<char
\>: 词尾，出现于单词右侧, \b
char\>
\< xxxxx\> 用来做精确锚定的

给定文件test.txt

[root@linux_basic ~]#cat test.txt
I love you.
Hello,y.ol to/centos
8gentoo90 8df9 A:B:I:P:W
BELIVE in YOU self

还有，如果说我们要知道文件中字串重复出现的次数，改如何获取呢？

可以通过分组查看，如下

分组：

例如：$ab$* 使用\来转义的指定ab可以出现任意次
分组中的模式匹配到的内容，可由正则表达式引擎记忆在内存中，之后可被引用

引用：
例如$ab\(x$y\).*$mn$   在嵌套的括号中，中间两个最近的括号为一组的
有编号：自左而后的左括号，以及与其匹配右括号
$a\(b\(c$\)mn$x$\).*\1 \1是指abcmnx    \2是指bc \3是指c
\#: 引用第n个括号所匹配到的内容，而非模式本身
例如：
$ab\?c$.*\1   b出现0次或1次 \1是指括号中出现的 abc b可以出现0次或1次

文件内容test.txt

[root@linux_basic ~]#cat test.txt
I lolove youloveve.
Hello,y.ol toto to/centos
8gentooenop90 8df9 gentooenop9 A:B:I:P:W
BELIVE igentooenop9n YOU self

在grep当中还有一些选项，常用的如下

grep命令选项：
-v: 反向选取
-v, --invert-match
      Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
-o: 仅显示匹配的字串，而非字串所在的行
-o, --only-matching
      Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-i: ignore-case，忽略字符大小写
-i, --ignore-case
      Ignore case distinctions in both the PATTERN and the input files. (-i is specified by POSIX.)
-E: 支持使用扩展正则表达式
-E, --extended-regexp
      Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-A #   匹配到对应的行及其下面#行
-A NUM, --after-context=NUM
    Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option,this has no effect and a warning is given.

-B # 匹配到对应的行及其上面#行
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

-C # 匹配到对应的行及其上下面#行
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
文件内容如下；

[root@linux_basic ~]#cat test.txt
I lolove youloveve.
Hello,y.ol toto to/centos
8gentooenop90 8df9 gentooenop9 A:B:I:P:W
BELIVE igentooenop9n YOU self

1、显示/proc/meminfo文件中以大写或小写S开头的行；
2、显示/etc/passwd文件中其默认shell为非/sbin/nologin的用户；
3、显示/etc/passwd文件中其默认shell为/bin/bash的用户；
进一步：仅显示上述结果中其ID号最大的用户；
4、找出/etc/passwd文件中的一位数或两位数；
5、显示/boot/grub/grub.conf中以至少一个空白字符开头的行；

答案

1.# grep -i ‘^s‘ /proc/meminfo
# grep ‘^[Ss]‘ /proc/meminfo

2.# grep -v "/sbin/nologin$" /etc/passwd | cut -d: -f1

3.#grep "/bin/bash$" /etc/passwd|sort -t: -k3 -n | tail -1|cut -d: -f1

4.#grep "\<[[:digit:]]\{1,2\}\>" /etc/passwd

5.#grep "^$[[:space:]]$\{1,\}" /boot/grub/grub.conf

下面是扩展正则表达式的内容，和正则表达式中很多内容都是相同的，来说说egrep的使用了

首先是元字符代表的含义：

字符匹配：
.：匹配任意单个字符
[]：匹配指定范围内的单个字符
[^]：匹配指定范围外的字符
次数匹配：
*：匹配其前字符任意次
?: 匹配其前字符0次或1次
+: 匹配其前字符至少1次；
{m}: 匹配其前字符精确匹配m次
{m,n}: 匹配其前字符至少m次，至多n次
{m,}：匹配其前字符至少出现m次
{0,n}：匹配其前字符至多出现n次

文件内容test.txt如下：

[root@linux_basic ~]#cat test.txt
I lolove youloveve.
Hello,y.ol toto to/centos
8gentooenop90 8df9 gentooenop9 A:B:I:P:W
BELIVE igentooenop9n YOU self

位置锚定：
^：行首锚定
$：行尾锚定
\<, \b：词首锚定
\>, \b：词尾锚定
^$, ^[[:space:]]*$ ：空白行

分组：
()
引用分组的内容：\1, \2, \3

或者的使用：
a|b: a或者b

文件内容test.txt如下：

[root@linux_basic ~]#cat test.txt
I lolove youloveve.
Hello,y.ol toto to/centos
8gentooenop90 8df9 gentooenop9 A:B:I:P:W
BELIVE igentooenop9n YOU self

grep -E ‘PATTERN‘ FILE... 指定使用扩展正则表达式

1、显示当前系统上root、fedora或user1用户的默认shell；
# egrep "^(root|fedora|user1):" /etc/passwd | cut -d: -f7

2、找出/etc/rc.d/init.d/functions文件中某单词后跟一组小括号“()”行；
#egrep -o "\<[[:alnum:]]+\>" /etc/rc.d/init.d/functions
#grep -o "\<[[:alnum:]]\{1,\}\>()" /etc/rc.d/init.d/functions

[root@linux_basic ~]#echo "/etc/rc.d/init.d/" | grep -o -E "[^/].+/$" . 这里是任意单个字符
etc/rc.d/init.d/
[root@linux_basic ~]#echo "/etc/rc.d/init.d/" | grep -o -E "[^/]+/?$"
init.d/

本文出自 “快乐就好” 博客，请务必保留此出处http://wdllife.blog.51cto.com/6615958/1596261

7.grep和egrep的深度剖析

标签：search grep expression regular 扩展正则表达式

原文地址：http://wdllife.blog.51cto.com/6615958/1596261

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行