码迷,mamicode.com
首页 > 其他好文 > 详细

正则表达式

时间:2016-11-10 21:55:09      阅读:220      评论:0      收藏:0      [点我收藏+]

标签:expression   character   literally   matching   windows   

一、一些重要的定义:

literal

A literal is any character we use in a search or matching expression, for example, to find ind in windows the ind is a literal string - each character plays a part in the search, it is literally the string we want to find.

metacharacter

A metacharacter is one or more special characters that have a unique meaning and are NOT used as literals in the search expression, for example, the character ^ (circumflex or caret) is a metacharacter.

target string

This term describes the string that we will be searching, that is, the string in which we want to find our match or search pattern.

search expression

Most commonly called the regular expression. This term describes the search expression that we will be using to search our target string, that is, the pattern we use to find what we want.

escape sequence

An escape sequence is a way of indicating that we want to use one of our metacharacters as a literal. In a regular expression an escape sequence involves placing the metacharacter \ (backslash) in front of the metacharacter that we want to use as a literal, for example, if we want to find (s) in the target string window(s) then we use the search expression \(s\) and if we want to find \\file in the target string c:\\file then we would need to use the search expression \\\\file (each \ we want to search for as a literal (there are 2) is preceded by an escape sequence \).

escape character转义字符\ :

正则表达式语言由两种基本字符类型组成:原义(正常)文本字符和元字符。转义字符用来指示不是元字符。

metacharacter元字符,元字符就是在正则表达式中具有特殊意义的字符。

literal字面意思的,原意的

Extended Regular Expressions (EREs) will support Basic Regular Expressions (BREs are essentially a subset of EREs).

EREs支持BREs。

二、元字符以及其解释:

[ ]

Match anything inside the square brackets for ONE character position, once and only once. For example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.

匹配任意一个在方括号中的字符一次。

- The - (dash) inside square brackets is the ‘range separator‘ and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].

You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).

NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.

在方括号中的破折号是用一个范围操作符并允许我们定义一个范围,当然你可以定义多个范围。

The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.

脱字号 [‘k?r?t] 在方括号中起到否定表达式的作用(我们可以看到一个交替使用对于把脱字符放在方括号外面)。

.

The . (period) means any character(s) in this position, for example, ton. will find tons, tone and tonn in tonneau but not wanton because it has no following character.

句号意思是匹配任何一个字符

?

The ? (question mark) matches when the preceding character occurs 0 or 1 times only, for example, colou?r will find both color (u is found 0 times) and colour (u is found 1 time).

问号,匹配前面的字符发生0次或者1次

*

The * (asterisk or star) matches when the preceding character occurs 0 or more times, for example, tre* will find tree (e is found 2 times) and tread (e is found 1 time) and trough (e is found 0 times and thus returns a match only on the tr).

星号,匹配前面的字符发生0次或者多次

+

The + (plus) matches when the preceding character occurs 1 or more times, for example, tre+ will find tree (e is found 2 times) and tread (e is found 1 time) but NOT trough (0 times).

加号,表示匹配前面的字符1次或者多次

{n}

Matches when the preceding character, or character range, occurs n times exactly, for example, to find a local phone number we could use [0-9]{3}-[0-9]{4} which would find any number of the form 123-4567. Value is enclosed in braces (curly brackets).

Note: The - (dash) in this case, because it is outside the square brackets, is a literal. Louise Rains writes to say that it is invalid to commence a NXX code (the 123) with a zero (which would be permitted in the expression above). In this case the expression [1-9][0-9]{2}-[0-9]{4} would be necessary to find a valid local phone number.

curly brackets花括号,精确匹配前面的字符或者范围。比如[0-9]{3}-[0-9]{4}可以匹配到123-4567,

{n,m}

Matches when the preceding character occurs at least n times but not more than m times, for example, ba{2,3}b will find baab and baaab but NOT bab or baaaab. Values are enclosed in braces (curly brackets).

大于等于n小于等于m次

{n,}

Matches when the preceding character occurs at least n times, for example, ba{2,}b will find ‘baab‘, ‘baaab‘ or ‘baaaab‘ but NOT ‘bab‘. Values are enclosed in braces (curly brackets).

至少发生n次

$

The $ (dollar) means look only at the end of the target string, for example, fox$ will find a match in ‘silver fox‘ since it appears at the end of the string but not in ‘the fox jumped over the moon‘.

dollar符号查找以目标字符串结尾的字符串。

()

The ( (open parenthesis) and ) (close parenthesis) may be used to group (or bind) parts of our search expression together. Officially this is called a subexpression (a.k.a. a submatch or group) and subexpressions may be nested to any depth. Parentheses (subexpresions) also capture the matched element into a variable that may be used as a backreference. See this example for its use in binding OR more about subexpressions (aka grouping or submatching) and their use as backreferences.

左圆括号和右圆括号或许用来把我们查询表达式的各个部分组合起来。官方叫做子表达式,子表达式可以被嵌套到任何深度。圆括号也捕捉匹配的元素到或许会用作后续引用的一个变量。

|

The | (vertical bar or pipe) is called alternation in techspeak and means find the left hand OR right values, for example, gr(a|e)y will find ‘gray‘ or ‘grey‘ and has the sense that - having found the literal characters ‘gr‘ - if the first test is not valid (a) the second will be tried (e), if the first is valid the second will not be tried. Alternation can be nested within each expression, thus gr((a|e)|i)y will find ‘gray‘, ‘grey‘ and ‘griy‘.

竖线或者叫管道符,被叫做交替,意思是查找左边或者右边的值,比如gr(a|e)y,管道符两边只能一边有效。管道符可以被嵌套到任何地方。

三、Common Extensions and Abbreviations常见的扩展和缩写

Backslash Sequence转义序列

\d

Match any character in the range 0 - 9 (equivalent of POSIX [:digit:])

\D

Match any character NOT in the range 0 - 9 (equivalent of POSIX [^[:digit:]])

\s

Match any whitespace characters (space, tab etc.). (equivalent of POSIX [:space:] EXCEPT VT is not recognized)

\S

Match any character NOT whitespace (space, tab). (equivalent of POSIX [^[:space:]])

\w

Match any character in the range 0 - 9, A - Z, a - z and punctuation (equivalent of POSIX [:graph:])

\W

Match any character NOT the range 0 - 9, A - Z, a - z and punctuation (equivalent of POSIX [^[:graph:]])

Positional Abbreviations位置缩写

\b

Word boundary. Match any character(s) at the beginning (\bxx) and/or end (xx\b) of a word, thus \bton\b will find ton but not tons, but \bton will find tons.

\B

Not word boundary. Match any character(s) NOT at the beginning(\Bxx) and/or end (xx\B) of a word, thus \Bton\B will find wantons but not tons, but ton\B will find both wantons and tons.

正则表达式

标签:expression   character   literally   matching   windows   

原文地址:http://victor2016.blog.51cto.com/6768693/1871534

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!