【Python 生成器匿名函数递归模块及包的导入正则re】

时间：2017-06-03 11:18:17 阅读：269 评论：0 收藏：0 [点我收藏+]

标签：series hello 冲突 ima 可读性 next ignore 退出 sre

一、生成器

1.定义

生成器(generator)是一个特殊的迭代器，它的实现更简单优雅，yield是生成器实现__next__()方法的关键。它作为生成器执行的暂停恢复点，可以对yield表达式进行赋值，也可以将yield表达式的值返回。

也就是说，yield是一个语法糖，内部实现支持了迭代器协议，同时yield内部是一个状态机，维护着挂起和继续的状态。

yield的功能：
1.相当于为函数封装好__iter__和__next__
2.return只能返回一次值，函数就终止了，而yield能返回多次值，每次返回都会将函数暂停，下一次next会从上一次暂停的位置继续执行

例：

def counter(n):
    print(‘start...‘)
    i=0
    while i < n:
        yield i
        i+=1
    print(‘end...‘)


g=counter(5)
print(g)
print(next(g))
print(next(g))
print(next(g))
print(next(g))
print(next(g))
# print(next(g))   #会报错

输出

start...
0
1
2
3
4

2.生成器函数

生成器函数：常规函数定义，但是，使用yield语句而不是return语句返回结果。yield语句一次返回一个结果，在每个结果中间，挂起函数的状态，以便下次重它离开的地方继续执行；

普通函数return返回

def lay_eggs(num):
    egg_list=[]
    for egg in range(num):
        egg_list.append(‘蛋%s‘ %egg)
    return egg_list

yikuangdan=lay_eggs(10) #我们拿到的是蛋
print(yikuangdan)

输出

[‘蛋0‘, ‘蛋1‘, ‘蛋2‘, ‘蛋3‘, ‘蛋4‘, ‘蛋5‘, ‘蛋6‘, ‘蛋7‘, ‘蛋8‘, ‘蛋9‘]

迭代器函数

def lay_eggs(num):
    for egg in range(num):
        res=‘蛋%s‘ %egg
        yield res       #生成器关键语法
        print(‘下完一个蛋‘)

laomuji=lay_eggs(10) #我们拿到的是一只母鸡
print(laomuji)
print(laomuji.__next__())       #迭代  蛋0
print(laomuji.__next__())     #蛋1
print(laomuji.__next__())       #蛋2
egg_l=list(laomuji)
print(egg_l)

输出

蛋0
下完一个蛋
蛋1
下完一个蛋
蛋2
下完一个蛋
下完一个蛋
下完一个蛋
下完一个蛋
下完一个蛋
下完一个蛋
下完一个蛋
下完一个蛋
[‘蛋3‘, ‘蛋4‘, ‘蛋5‘, ‘蛋6‘, ‘蛋7‘, ‘蛋8‘, ‘蛋9‘]

3.生成器表达式

生成器表达式：类似于列表推导，但是，生成器返回按需产生结果的一个对象，而不是一次构建一个结果列表；
```
food=yield food_list
```
#g.send(‘food1‘),先把food1传给yield，由yield赋值给food，然后返回给food_list,然后再往下执行，直到再次碰到yield，然后把yield后的返回值返回给food_list

例

注意：开始生成器不能send非空值

def eater(name):        #协程函数
    print(‘%s ready to eat‘ %name)
    food_list=[]
    while True:
        food=yield food_list           #装饰器表达式
        food_list.append(food)
        print(‘%s start to eat %s‘ %(name,food))


g=eater(‘hexin‘)
print(g)        #生成器


print(g.send(‘food1‘))  #传值

输出

Traceback (most recent call last):
<generator object eater at 0x1049030f8>　　　　#生成器对象
  File "/Users/hexin/PycharmProjects/py3/day5/2.py", line 71, in <module>
    print(g.send(‘food1‘))
TypeError: can‘t send non-None value to a just-started generator　　　　#开始生成器不能send非空值

初始化后

def eater(name):        #协程函数
    print(‘%s ready to eat‘ %name)
    food_list=[]
    while True:
        food=yield food_list           #装饰器表达式
        food_list.append(food)
        print(‘%s start to eat %s‘ %(name,food))


g=eater(‘hexin‘)
print(g)        #生成器
next(g) #等同于 g.send(None)，初始化

print(g.send(‘food1‘))

输出

<generator object eater at 0x107cde258>
hexin ready to eat
hexin start to eat food1
[‘food1‘]

为了防止忘记初始化，可利用装饰器进行初始化，如下

def deco(func):     #初始化函数
    def wrapper(*args,**kwargs):
        res=func(*args,**kwargs)
        next(res)          #等同于 g.send(None)，初始化
        return res
    return wrapper

@deco       #用初始化函数装饰器,调用初始化函数
def eater(name):        #协程函数
    print(‘%s ready to eat‘ %name)
    food_list=[]
    while True:
        food=yield food_list           #装饰器表达式
        food_list.append(food)
        print(‘%s start to eat %s‘ %(name,food))


g=eater(‘hexin‘)
# print(g)        #生成器
# next(g) #等同于 g.send(None)，初始化

print(g.send(‘food1‘))
print(g.send(‘food2‘))
print(g.send(‘food3‘))

输出

hexin ready to eat
hexin start to eat food1
[‘food1‘]
hexin start to eat food2
[‘food1‘, ‘food2‘]
hexin start to eat food3
[‘food1‘, ‘food2‘, ‘food3‘]

二、匿名函数及内置函数补充

1.语法

Python使用lambda关键字创造匿名函数。所谓匿名，意即不再使用def语句这样标准的形式定义一个函数。

语法：

lambda [arg1[, arg2, ... argN]]: expression

例：

普通函数

def func(x,y):
    return x+y

print(func)
print(func(1,2))

输出

<function func at 0x102b31f28>
3

等价的匿名函数

#匿名函数
f=lambda x,y:x+y
print(f)

print(f(1,2))

输出

<function <lambda> at 0x107a55f28>
3

2.匿名函数配合内置函数的用法

max,min,zip,sorted的用法

max(arg1, arg2, *args[, key]) #key=keyfunc

salaries={
‘e‘:3000,
‘a‘:100000000,
‘w‘:10000,
‘y‘:2000
}

print(max(salaries))  #默认比较key值大小
res=zip(salaries.values(),salaries.keys())  #以values比较
print(max(res))

配合匿名函数实现上面功能

salaries={
‘e‘:3000,
‘a‘:100000000,
‘w‘:10000,
‘y‘:2000
}


def func(k):
    return salaries[k]

print(max(salaries,key=func))   #传递函数
print(max(salaries,key=lambda k:salaries[k]))   #配合匿名函数,比较values
print(min(salaries,key=lambda k:salaries[k]))

# print(sorted(salaries,key=lambda x:salaries[x],reverse=True)) #默认的排序结果是从小到到

输出

a
a
y

补充：

map(function, iterable, ...)
对可迭代函数‘iterable‘中的每一个元素应用‘function’方法，将结果作为list返回。

例：

l=[‘a‘,‘w‘,‘y‘]
res=map(lambda x:x+‘_12‘,l)
print(res)
print(list(res))


nums=(2,4,9,10)
res1=map(lambda x:x**2,nums)
print(list(res1))

输出

<map object at 0x108e0bef0>
[‘a_12‘, ‘w_12‘, ‘y_12‘]
[4, 16, 81, 100]

reduce(function, sequence[, initial]) -> value

对sequence中的item顺序迭代调用function，函数必须要有2个参数。要是有第3个参数，则表示初始值，可以继续调用初始值，返回一个值。

l=[1,2,3,4,5]
print(reduce(lambda x,y:x+y,l,10))　　#10+1+2+3+4+5

输出

filter(function or None, sequence) -> list, tuple, or string

对sequence中的item依次执行function(item)，将执行结果为True（！=0）的item组成一个List/String/Tuple（取决于sequence的类型）返回，False则退出（0），进行过滤。

l=[‘a_SB‘,‘w_SB‘,‘y‘,‘egon‘]

res=filter(lambda x:x.endswith(‘SB‘),l)
print(res)
print(list(res))

输出

<filter object at 0x10bc43ef0>
[‘a_SB‘, ‘w_SB‘]

三、递归调用

1.定义

递归就是在过程或函数里调用自身，在使用递归策略时，必须有一个明确的递归结束条件，称为递归出口。

2.递归思想

例：

阶乘函数的定义是：
N! = factorial(N) = 1 * 2 * 3 * ... * N

那么可以用这种方法来看阶乘函数：
factorial(N) = N!
             = N * (N - 1)!
             = N * (N - 1) * (N - 2)!
             = N * (N - 1) * (N - 2) * ... * 3 * 2 * 1
             = N * factorial(N - 1)

于是我们有了阶乘函数的递归版本：

def factorial(n):
    if n == 0 or n == 1: return 1
    else: return (n * factorial(n - 1))


print factorial(6)

可以很轻易的得到，6!的结果是720。

每一个递归程序都遵循相同的基本步骤：
1.初始化算法。递归程序通常需要一个开始时使用的种子值（seed value）。要完成此任务，可以向函数传递参数，或者提供一个入口函数，这个函数是非递归的，但可以为递归计算设置种子值。
2.检查要处理的当前值是否已经与基线条件相匹配（base case）。如果匹配，则进行处理并返回值。
3.使用更小的或更简单的子问题（或多个子问题）来重新定义答案。
4.对子问题运行算法。
5.将结果合并入答案的表达式。
6.返回结果。

3.用途

递归算法一般用于解决三类问题：
（1）数据的定义是按递归定义的。（比如Fibonacci函数）
（2）问题解法按递归算法实现。（回溯）
（3）数据的结构形式是按递归定义的。（比如树的遍历，图的搜索）　　

递归的缺点：递归算法解题的运行效率较低。在递归调用的过程当中系统为每一层的返回点、局部量等开辟了栈来存储。递归次数过多容易造成栈溢出等。

4.二分法

l = [1, 2, 10,33,53,71,73,75,77,85,101,201,202,999,11111]

def search(find_num,seq):
    if len(seq) == 0:
        print(‘not exists‘)
        return
    mid_index=len(seq)//2
    mid_num=seq[mid_index]
    print(seq,mid_num)
    if find_num > mid_num:
        #in the right
        seq=seq[mid_index+1:]
        search(find_num,seq)
    elif find_num < mid_num:
        #in the left
        seq=seq[:mid_index]
        search(find_num,seq)
    else:
        print(‘find it‘)

search(77,l)
search(72,l)
search(-100000,l)

输出

[1, 2, 10, 33, 53, 71, 73, 75, 77, 85, 101, 201, 202, 999, 11111] 75
[77, 85, 101, 201, 202, 999, 11111] 201
[77, 85, 101] 85
[77] 77
find it
[1, 2, 10, 33, 53, 71, 73, 75, 77, 85, 101, 201, 202, 999, 11111] 75
[1, 2, 10, 33, 53, 71, 73] 33
[53, 71, 73] 71
[73] 73
not exists
[1, 2, 10, 33, 53, 71, 73, 75, 77, 85, 101, 201, 202, 999, 11111] 75
[1, 2, 10, 33, 53, 71, 73] 33
[1, 2, 10] 2
[1] 1
not exists

四、模块导入

1.定义

Python 模块(Module)，是一个 Python 文件，以 .py 结尾，包含了 Python 对象定义和Python语句。

模块让你能够有逻辑地组织你的 Python 代码段。

把相关的代码分配到一个模块里能让你的代码更好用，更易懂。

模块能定义函数，类和变量，模块里也能包含可执行的代码。

包括：内置模块，自定义模块，第三方模块；

2.作用

最大的好处是大大提高了代码的可维护性。其次，编写代码不必从零开始。当一个模块编写完毕，就可以被其他地方引用。我们在编写程序的时候，也经常引用其他模块，包括Python内置的模块和来自第三方的模块。

使用模块还可以避免函数名和变量名冲突。相同名字的函数和变量完全可以分别存在不同的模块中，因此，我们自己在编写模块时，不必考虑名字会与其他模块冲突。但是也要注意，尽量不要与内置函数名字冲突。

3.import

模块定义好后，我们可以使用 import 语句来引入模块，语法如下：

import module1[, module2[,... moduleN]

例：

#spam.py
print(‘from the spam.py‘)

money=1000

def read1():
    print(‘spam->read1->money‘,money)

def read2():
    print(‘spam->read2 calling read‘)
    read1()

def change():
    global money
    money=0

导入模块

import spam #只在第一次导入时才执行spam.py内代码,此处的显式效果是只打印一次‘from the spam.py‘,当然其他的顶级代码也都被执行了,只不过没有显示效果.

执行结果：
 from the spam.py

注：模块可以包含可执行的语句和函数的定义，这些语句的目的是初始化模块，它们只在模块名第一次遇到导入import语句时才执行（import语句是可以在程序中的任意位置使用的,且针对同一个模块很import多次,为了防止你重复导入，python的优化手段是：第一次导入后就将模块名加载到内存了，后续的import语句仅是对已经加载大内存中的模块对象增加了一次引用，不会重新执行模块内的语句）

import导入模块干的事：
1.产生新的名称空间
2.以新建的名称空间为全局名称空间，执行文件的代码
3.拿到一个模块名spam，指向spam.py产生的名称空间

测试一:money与spam.money不冲突

#测试一:money与spam.money不冲突
#test.py
import spam 
money=10
print(spam.money)

‘‘‘
执行结果：
from the spam.py
1000
‘‘‘

测试二：read1与spam.read1不冲突

#测试二：read1与spam.read1不冲突
#test.py
import spam
def read1():
    print(‘========‘)
spam.read1()

‘‘‘
执行结果:
from the spam.py
spam->read1->money 1000
‘‘‘

测试三：执行spam.change()操作的全局变量money仍然是spam中的money

#测试三：执行spam.change()操作的全局变量money仍然是spam中的
#test.py
import spam
money=1
spam.change()
print(money)

‘‘‘
执行结果：
from the spam.py
1
‘‘‘

as，为模块名起别名

 import spam as sm  #sm为spam的别名
 print(sm.money)

例：

为已经导入的模块起别名的方式对编写可扩展的代码很有用，假设有两个模块xmlreader.py和csvreader.py，它们都定义了函数read_data(filename):用来从文件中读取一些数据，但采用不同的输入格式。可以编写代码来选择性地挑选读取模块，例如

if file_format == ‘xml‘:
     import xmlreader as reader
elif file_format == ‘csv‘:
    import csvreader as reader
data=reader.read_date(filename)

在一行导入多个模块

import sys,os,re

4.From…import

from modname import name1[, name2[, ... nameN]]

要导入模块 fib 的 fibonacci 函数，使用如下语句：

from fib import fibonacci

这个声明不会把整个 fib 模块导入到当前的命名空间中，它只会将 fib 里的 fibonacci 单个引入到执行这个声明的模块的全局符号表。

from... import..导入模块干的事：

1.产生新的名称空间
2.以新建的名称空间为全局名称空间，执行文件的代码
3.直接拿到就是spam.py产生的名称空间中名字

#测试一：导入的函数read1，执行时仍然回到spam.py中寻找全局变量money
#test.py
from spam import read1
money=1000
read1()
‘‘‘
执行结果:
from the spam.py
spam->read1->money 1000
‘‘‘

#测试二:导入的函数read2，执行时需要调用read1(),仍然回到spam.py中找read1()
#test.py
from spam import read2
def read1():
    print(‘==========‘)
read2()

‘‘‘
执行结果:
from the spam.py
spam->read2 calling read
spam->read1->money 1000
‘‘‘

但如果当前有重名read1或者read2，那么会有覆盖效果。

#测试三:导入的函数read1，被当前位置定义的read1覆盖掉了
#test.py
from spam import read1
def read1():
    print(‘==========‘)
read1()
‘‘‘
执行结果:
from the spam.py
==========
‘‘‘

需要特别强调的一点是：python中的变量赋值不是一种存储操作，而只是一种绑定关系，如下：

from spam import money,read1
money=100 #将当前位置的名字money绑定到了100
print(money) #打印当前的名字
read1() #读取spam.py中的名字money,仍然为1000

‘‘‘
from the spam.py
100
spam->read1->money 1000
‘‘‘

from ... import ...
优点:方便，不用加前缀
缺点：容易跟当前文件的名称空间冲突

from ... import...也支持as和导入多模块

from spam import read1 as read

from spam import (read1,
                  read2,
                  money)

from spam import *

把spam中所有的不是以下划线(_)开头的名字都导入到当前位置，大部分情况下我们的python程序不应该使用这种导入方式，因为*你不知道你导入什么名字，很有可能会覆盖掉你之前已经定义的名字。而且可读性极其的差，在交互式环境中导入时没有问题。

from spam import * #将模块spam中所有的名字都导入到当前名称空间
print(money)
print(read1)
print(read2)
print(change)

‘‘‘
执行结果:
from the spam.py
1000
<function read1 at 0x1012e8158>
<function read2 at 0x1012e81e0>
<function change at 0x1012e8268>
‘‘‘

__all__来控制*（用来发布新版本）

在spam.py中新增一行

__all__=[‘money‘,‘read1‘] #这样在另外一个文件中用from spam import *就这能导入列表中规定的两个名字

__name__

spam.py当做脚本执行，__name__=‘__main__‘
spam.py当做模块导入，__name__=模块名

作用：用来控制.py文件在不同的应用场景下执行不同的逻辑

#fib.py

def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print(b, end=‘ ‘)
        a, b = b, a+b
    print()

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    return result

if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))

执行

#python fib.py <arguments>
python fib.py 50 #在命令行

5.模块的搜索路径

模块的查找顺序是：内存中已经加载的模块->内置模块->sys.path路径中包含的模块

python解释器在启动时会自动加载一些模块，可以使用sys.modules查看，在第一次导入某个模块时（比如spam），会先检查该模块是否已经被加载到内存中（当前执行文件的名称空间对应的内存），如果有则直接引用

如果没有，解释器则会查找同名的内建模块，如果还没有找到就从sys.path给出的目录列表中依次寻找spam.py文件。

特别注意的是：自定义的模块名不应该与系统内置模块重名。

在初始化后，python程序可以修改sys.path,路径放到前面的优先于标准库被加载。

注意：搜索时按照sys.path中从左到右的顺序查找，位于前的优先被查找，sys.path中还可能包含.zip归档文件和.egg文件，python会把.zip归档文件当成一个目录去处理，

#首先制作归档文件：zip module.zip foo.py bar.py

import sys
sys.path.append(‘module.zip‘)
import foo,bar

#也可以使用zip中目录结构的具体位置
sys.path.append(‘module.zip/lib/python‘)

注意：windows下的路径不加r开头，会语法错误

windows下的路径不加r开头，会语法错误
sys.path.insert(0,r‘C:\Users\Administrator\PycharmProjects\a‘)

五、包的导入

1.定义

包是一种通过使用‘.模块名’来组织python模块名称空间的方式。

无论是import形式还是from...import形式，凡是在导入语句中（而不是在使用时）遇到带点的，都要第一时间提高警觉：这是关于包才有的导入语法。.的左边必须是包；

包是一个分层次的文件目录结构，它定义了一个由模块及子包，和子包下的子包等组成的 Python 的应用环境。简单来说，包就是文件夹，但该文件夹下必须存在 __init__.py 文件, 该文件的内容可以为空。__int__.py用于标识当前文件夹是一个包。

包是由一系列模块组成的集合。模块是处理某一类问题的函数和类的集合。

例：

glance/                   #Top-level package

├── __init__.py      #Initialize the glance package

├── api                  #Subpackage for api

│   ├── __init__.py

│   ├── policy.py

│   └── versions.py

├── cmd                #Subpackage for cmd

│   ├── __init__.py

│   └── manage.py

└── db                  #Subpackage for db

    ├── __init__.py

    └── models.py

文件内容

#文件内容

#policy.py
def get():
    print(‘from policy.py‘)

#versions.py
def create_resource(conf):
    print(‘from version.py: ‘,conf)

#manage.py
def main():
    print(‘from manage.py‘)

#models.py
def register_models(engine):
    print(‘from models.py: ‘,engine)

2.import

在使用一个模块中的函数或类之前，首先要导入该模块。模块的导入使用import语句。

调用模块的函数或类时，需要以模块名作为前缀。

import导入文件时，产生名称空间中的名字来源于文件，import 包，产生的名称空间的名字同样来源于文件，即包下的__init__.py，导入包本质就是在导入该文件

例
在与包glance同级别的文件中测试

import glance.db.models
glance.db.models.register_models(‘mysql‘)

凡是在导入时带点的，点的左边都必须是一个包。

3.from ... import ...

如果不想在程序中使用前缀符，可以使用from…import…语句将模块导入。

需要注意的是from后import导入的模块，必须是明确的一个不能带点，否则会有语法错误，如：from a import b.c是错误语法

例

在与包glance同级别的文件中测试

from glance.db import models
models.register_models(‘mysql‘)

from glance.db.models import register_models
register_models(‘mysql‘)

执行的文件的当前路径就是sys.path

import sys
print(sys.path)

注意：

1.关于包相关的导入语句也分为import和from ... import ...两种，但是无论哪种，无论在什么位置，在导入时都必须遵循一个原则：凡是在导入时带点的，点的左边都必须是一个包，否则非法。可以带有一连串的点，如item.subitem.subsubitem,但都必须遵循这个原则。

2.对于导入后，在使用时就没有这种限制了，点的左边可以是包,模块，函数，类(它们都可以用点的方式调用自己的属性)。

3.对比import item 和from item import name的应用场景：
如果我们想直接使用name那必须使用后者。

4.init.py文件

不管是哪种方式，只要是第一次导入包或者是包的任何其他部分，都会依次执行包下的__init__.py文件(我们可以在每个包的文件内都打印一行内容来验证一下)，这个文件可以为空，但是也可以存放一些初始化包的代码。

5.from glance.api import *

从包api中导入所有，实际上该语句只会导入包api下__init__.py文件中定义的名字，我们可以在这个文件中定义__all___:

#在__init__.py中定义
x=10

def func():
    print(‘from api.__init.py‘)

__all__=[‘x‘,‘func‘,‘policy‘]

此时我们在于glance同级的文件中执行from glance.api import *就导入__all__中的内容（versions仍然不能导入）。

6.绝对导入和相对导入

最顶级包glance是写给别人用的，然后在glance包内部也会有彼此之间互相导入的需求，这时候就有绝对导入和相对导入两种方式：

绝对导入：以glance作为起始

相对导入：用.或者..的方式最为起始（只能在一个包中使用，不能用于不同目录内）

例：我们在glance/api/version.py中想要导入glance/cmd/manage.py

在glance/api/version.py

#绝对导入
from glance.cmd import manage
manage.main()

#相对导入
from ..cmd import manage
manage.main()

测试结果：注意一定要在于glance同级的文件中测试

from glance.api import versions

注意：可以用import导入内置或者第三方模块（已经在sys.path中），但是要绝对避免使用import来导入自定义包的子模块(没有在sys.path中)，应该使用from... import ...的绝对或者相对导入,且包的相对导入只能用from的形式。

7.单独导入包

单独导入包名称时不会导入包中所有包含的所有子模块，如

#在与glance同级的test.py中
import glance
glance.cmd.manage.main()

‘‘‘
执行结果：
AttributeError: module ‘glance‘ has no attribute ‘cmd‘

‘‘‘

解决方法：

#glance/__init__.py
from . import cmd
 
#glance/cmd/__init__.py
from . import manage

执行：

#在于glance同级的test.py中
import glance
glance.cmd.manage.main()

千万别问：__all__不能解决吗，__all__是用于控制from...import *

六、正则RE

1.正则表达式定义

正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

2.常用的正则表达式

技术分享

3.贪婪模式与非贪婪模式

正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的（在少数语言里也可能是默认非贪婪），总是尝试匹配尽可能多的字符；非贪婪的则相反，总是尝试匹配尽可能少的字符。例如：正则表达式"ab*"如果用于查找"abbbc"，将找到"abbb"。而如果使用非贪婪的数量词"ab*?"，将找到"a"。

4.反斜杠

与大多数编程语言相同，正则表达式里使用"\"作为转义字符，这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\"，那么使用编程语言表示的正则表达式里将需要4个反斜杠"\\\\"：前两个和后两个分别用于在编程语言里转义成反斜杠，转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。Python里的原生字符串很好地解决了这个问题，这个例子中的正则表达式可以使用r"\\"表示。同样，匹配一个数字的"\\d"可以写成r"\d"。有了原生字符串，你再也不用担心是不是漏写了反斜杠，写出来的表达式也更直观。

5.re模块

Python通过re模块提供对正则表达式的支持。使用re的一般步骤是先将正则表达式的字符串形式编译为Pattern实例，然后使用Pattern实例处理文本并获得匹配结果（一个Match实例），最后使用Match实例获得信息，进行其他的操作。

例：

import re

# 将正则表达式编译成Pattern对象
pattern = re.compile(r‘hello‘)

# 使用Pattern匹配文本，获得匹配结果，无法匹配时将返回None
match = pattern.match(‘hello world!‘)

if match:
    # 使用Match获得分组信息
    print(match.group())

输出

hello

Match

Match对象是一次匹配的结果，包含了很多关于此次匹配的信息，可以使用Match提供的可读属性或方法来获取这些信息。

属性：

string: 匹配时使用的文本。
re: 匹配时使用的Pattern对象。
pos: 文本中正则表达式开始搜索的索引。值与Pattern.match()和Pattern.seach()方法的同名参数相同。
endpos: 文本中正则表达式结束搜索的索引。值与Pattern.match()和Pattern.seach()方法的同名参数相同。
lastindex: 最后一个被捕获的分组在文本中的索引。如果没有被捕获的分组，将为None。
lastgroup: 最后一个被捕获的分组的别名。如果这个分组没有别名或者没有被捕获的分组，将为None。

方法：

group([group1, …]):

获得一个或多个分组截获的字符串；指定多个参数时将以元组形式返回。group1可以使用编号也可以使用别名；编号0代表整个匹配的子串；不填写参数时，返回group(0)；没有截获字符串的组返回None；截获了多次的组返回最后一次截获的子串。

groups([default]):

以元组形式返回全部分组截获的字符串。相当于调用group(1,2,…last)。default表示没有截获字符串的组以这个值替代，默认为None。

groupdict([default]):

返回以有别名的组的别名为键、以该组截获的子串为值的字典，没有别名的组不包含在内。default含义同上。

start([group]):

返回指定的组截获的子串在string中的起始索引（子串第一个字符的索引）。group默认值为0。

end([group]):

返回指定的组截获的子串在string中的结束索引（子串最后一个字符的索引+1）。group默认值为0。

span([group]):

返回(start(group), end(group))。

expand(template):

将匹配到的分组代入template中然后返回。template中可以使用\id或\g<id>、\g<name>引用分组，但不能使用编号0。\id与\g<id>是等价的；但\10将被认为是第10个分组，如果你想表达\1之后是字符‘0‘，只能使用\g<1>0。

例：

import re
m = re.match(r‘(\w+) (\w+)(?P<sign>.*)‘, ‘hello world!‘)
 
print "m.string:", m.string
print "m.re:", m.re
print "m.pos:", m.pos
print "m.endpos:", m.endpos
print "m.lastindex:", m.lastindex
print "m.lastgroup:", m.lastgroup
 
print "m.group(1,2):", m.group(1, 2)
print "m.groups():", m.groups()
print "m.groupdict():", m.groupdict()
print "m.start(2):", m.start(2)
print "m.end(2):", m.end(2)
print "m.span(2):", m.span(2)
print r"m.expand(r‘\2 \1\3‘):", m.expand(r‘\2 \1\3‘)

输出

### output ###
# m.string: hello world!
# m.re: <_sre.SRE_Pattern object at 0x016E1A38>
# m.pos: 0
# m.endpos: 12
# m.lastindex: 3
# m.lastgroup: sign
# m.group(1,2): (‘hello‘, ‘world‘)
# m.groups(): (‘hello‘, ‘world‘, ‘!‘)
# m.groupdict(): {‘sign‘: ‘!‘}
# m.start(2): 6
# m.end(2): 11
# m.span(2): (6, 11)
# m.expand(r‘\2 \1\3‘): world hello!

Pattern

Pattern对象是一个编译好的正则表达式，通过Pattern提供的一系列方法可以对文本进行匹配查找。

Pattern不能直接实例化，必须使用re.compile()进行构造。

Pattern提供了几个可读属性用于获取表达式的相关信息：

pattern: 编译时用的表达式字符串。
flags: 编译时用的匹配模式。数字形式。
groups: 表达式中分组的数量。
groupindex: 以表达式中有别名的组的别名为键、以该组对应的编号为值的字典，没有别名的组不包含在内。

例：

import re
p = re.compile(r‘(\w+) (\w+)(?P<sign>.*)‘, re.DOTALL)
 
print "p.pattern:", p.pattern
print "p.flags:", p.flags
print "p.groups:", p.groups
print "p.groupindex:", p.groupindex

输出

### output ###
# p.pattern: (\w+) (\w+)(?P<sign>.*)
# p.flags: 16
# p.groups: 3
# p.groupindex: {‘sign‘: 3}

实例方法[ | re模块方法]：

match(string[, pos[, endpos]]) | re.match(pattern, string[, flags]):

这个方法将从string的pos下标处起尝试匹配pattern；如果pattern结束时仍可匹配，则返回一个Match对象；如果匹配过程中pattern无法匹配，或者匹配未结束就已到达endpos，则返回None。

pos和endpos的默认值分别为0和len(string)；re.match()无法指定这两个参数，参数flags用于编译pattern时指定匹配模式。

注意：这个方法并不是完全匹配。当pattern结束时若string还有剩余字符，仍然视为成功。想要完全匹配，可以在表达式末尾加上边界匹配符‘$‘。

示例参见2.1小节。

search(string[, pos[, endpos]]) | re.search(pattern, string[, flags]):

这个方法用于查找字符串中可以匹配成功的子串。从string的pos下标处起尝试匹配pattern，如果pattern结束时仍可匹配，则返回一个Match对象；若无法匹配，则将pos加1后重新尝试匹配；直到pos=endpos时仍无法匹配则返回None。
pos和endpos的默认值分别为0和len(string))；re.search()无法指定这两个参数，参数flags用于编译pattern时指定匹配模式。

import re 
 
# 将正则表达式编译成Pattern对象 
pattern = re.compile(r‘world‘) 
 
# 使用search()查找匹配的子串，不存在能匹配的子串时将返回None 
# 这个例子中使用match()无法成功匹配 
match = pattern.search(‘hello world!‘) 
 
if match: 
    # 使用Match获得分组信息 
    print match.group() 
 
### 输出 ### 
# world

split(string[, maxsplit]) | re.split(pattern, string[, maxsplit]):

按照能够匹配的子串将string分割后返回列表。maxsplit用于指定最大分割次数，不指定将全部分割。

import re
 
p = re.compile(r‘\d+‘)
print p.split(‘one1two2three3four4‘)
 
### output ###
# [‘one‘, ‘two‘, ‘three‘, ‘four‘, ‘‘]

findall(string[, pos[, endpos]]) | re.findall(pattern, string[, flags]):

搜索string，以列表形式返回全部能匹配的子串。

import re
 
p = re.compile(r‘\d+‘)
print p.findall(‘one1two2three3four4‘)
 
### output ###
# [‘1‘, ‘2‘, ‘3‘, ‘4‘]

finditer(string[, pos[, endpos]]) | re.finditer(pattern, string[, flags]):

搜索string，返回一个顺序访问每一个匹配结果（Match对象）的迭代器。

import re
 
p = re.compile(r‘\d+‘)
for m in p.finditer(‘one1two2three3four4‘):
    print m.group(),
 
### output ###
# 1 2 3 4

sub(repl, string[, count]) | re.sub(pattern, repl, string[, count]):

使用repl替换string中每一个匹配的子串后返回替换后的字符串。
当repl是一个字符串时，可以使用\id或\g<id>、\g<name>引用分组，但不能使用编号0。
当repl是一个方法时，这个方法应当只接受一个参数（Match对象），并返回一个字符串用于替换（返回的字符串中不能再引用分组）。
count用于指定最多替换次数，不指定时全部替换。

import re
 
p = re.compile(r‘(\w+) (\w+)‘)
s = ‘i say, hello world!‘
 
print p.sub(r‘\2 \1‘, s)
 
def func(m):
    return m.group(1).title() + ‘ ‘ + m.group(2).title()
 
print p.sub(func, s)
 
### output ###
# say i, world hello!
# I Say, Hello World!

subn(repl, string[, count]) |re.sub(pattern, repl, string[, count]):

返回 (sub(repl, string[, count]), 替换次数)。

import re
 
p = re.compile(r‘(\w+) (\w+)‘)
s = ‘i say, hello world!‘
 
print p.subn(r‘\2 \1‘, s)
 
def func(m):
    return m.group(1).title() + ‘ ‘ + m.group(2).title()
 
print p.subn(func, s)
 
### output ###
# (‘say i, world hello!‘, 2)
# (‘I Say, Hello World!‘, 2)

re.compile(strPattern[, flag]):

这个方法是Pattern类的工厂方法，用于将字符串形式的正则表达式编译为Pattern对象。第二个参数flag是匹配模式，取值可以使用按位或运算符‘|‘表示同时生效，比如re.I | re.M。另外，你也可以在regex字符串中指定模式，比如re.compile(‘pattern‘, re.I | re.M)与re.compile(‘(?im)pattern‘)是等价的。
可选值有：

re.I(re.IGNORECASE): 忽略大小写（括号内是完整写法，下同）
M(MULTILINE): 多行模式，改变‘^‘和‘$‘的行为（参见上图）
S(DOTALL): 点任意匹配模式，改变‘.‘的行为
L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
X(VERBOSE): 详细模式。这个模式下正则表达式可以是多行，忽略空白字符，并可以加入注释。以下两个正则表达式是等价的：

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")

re提供了众多模块方法用于完成正则表达式的功能。这些方法可以使用Pattern实例的相应方法替代，唯一的好处是少写一行re.compile()代码，但同时也无法复用编译后的Pattern对象。这些方法将在Pattern类的实例方法部分一起介绍。如上面这个例子可以简写为：

m = re.match(r‘hello‘, ‘hello world!‘)
print m.group()

re模块还提供了一个方法escape(string)，用于将string中的正则表达式元字符如*/+/?等之前加上转义符再返回，在需要大量匹配元字符时有那么一点用。

6.举例常用的匹配方式

# =================================匹配模式=================================
#一对一的匹配
# ‘hello‘.replace(old,new)
# ‘hello‘.find(‘pattern‘)

#正则匹配
import re
#\w与\W
print(re.findall(‘\w‘,‘hello egon 123‘)) #[‘h‘, ‘e‘, ‘l‘, ‘l‘, ‘o‘, ‘e‘, ‘g‘, ‘o‘, ‘n‘, ‘1‘, ‘2‘, ‘3‘]
print(re.findall(‘\W‘,‘hello egon 123‘)) #[‘ ‘, ‘ ‘]

#\s与\S
print(re.findall(‘\s‘,‘hello  egon  123‘)) #[‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘]
print(re.findall(‘\S‘,‘hello  egon  123‘)) #[‘h‘, ‘e‘, ‘l‘, ‘l‘, ‘o‘, ‘e‘, ‘g‘, ‘o‘, ‘n‘, ‘1‘, ‘2‘, ‘3‘]

#\d与\D
print(re.findall(‘\d‘,‘hello egon 123‘)) #[‘1‘, ‘2‘, ‘3‘]
print(re.findall(‘\D‘,‘hello egon 123‘)) #[‘h‘, ‘e‘, ‘l‘, ‘l‘, ‘o‘, ‘ ‘, ‘e‘, ‘g‘, ‘o‘, ‘n‘, ‘ ‘]

#\A与\D
print(re.findall(‘\Ahe‘,‘hello egon 123‘)) #[‘he‘],\A==>^
print(re.findall(‘123\Z‘,‘hello egon 123‘)) #[‘he‘],\Z==>$

#\n与\t
print(re.findall(r‘\n‘,‘hello egon \n123‘)) #[‘\n‘]
print(re.findall(r‘\t‘,‘hello egon\t123‘)) #[‘\n‘]

#^与$
print(re.findall(‘^h‘,‘hello egon 123‘)) #[‘h‘]
print(re.findall(‘3$‘,‘hello egon 123‘)) #[‘3‘]

# 重复匹配：| . | * | ? | .* | .*? | + | {n,m} |
#.
print(re.findall(‘a.b‘,‘a1b‘)) #[‘a1b‘]
print(re.findall(‘a.b‘,‘a\nb‘)) #[]
print(re.findall(‘a.b‘,‘a\nb‘,re.S)) #[‘a\nb‘]
print(re.findall(‘a.b‘,‘a\nb‘,re.DOTALL)) #[‘a\nb‘]同上一条意思一样

#*
print(re.findall(‘ab*‘,‘bbbbbbb‘)) #[]
print(re.findall(‘ab*‘,‘a‘)) #[‘a‘]
print(re.findall(‘ab*‘,‘abbbb‘)) #[‘abbbb‘]

#?
print(re.findall(‘ab?‘,‘a‘)) #[‘a‘]
print(re.findall(‘ab?‘,‘abbb‘)) #[‘ab‘]
#匹配所有包含小数在内的数字
print(re.findall(‘\d+\.?\d*‘,"asdfasdf123as1.13dfa12adsf1asdf3")) #[‘123‘, ‘1.13‘, ‘12‘, ‘1‘, ‘3‘]

#.*默认为贪婪匹配
print(re.findall(‘a.*b‘,‘a1b22222222b‘)) #[‘a1b22222222b‘]

#.*?为非贪婪匹配：推荐使用
print(re.findall(‘a.*?b‘,‘a1b22222222b‘)) #[‘a1b‘]

#+
print(re.findall(‘ab+‘,‘a‘)) #[]
print(re.findall(‘ab+‘,‘abbb‘)) #[‘abbb‘]

#{n,m}
print(re.findall(‘ab{2}‘,‘abbb‘)) #[‘abb‘]
print(re.findall(‘ab{2,4}‘,‘abbb‘)) #[‘abb‘]
print(re.findall(‘ab{1,}‘,‘abbb‘)) #‘ab{1,}‘ ===> ‘ab+‘
print(re.findall(‘ab{0,}‘,‘abbb‘)) #‘ab{0,}‘ ===> ‘ab*‘

#[]
print(re.findall(‘a[1*-]b‘,‘a1b a*b a-b‘)) #[]内的都为普通字符了，且如果-没有被转意的话，应该放到[]的开头或结尾
print(re.findall(‘a[^1*-]b‘,‘a1b a*b a-b a=b‘)) #[]内的^代表的意思是取反，所以结果为[‘a=b‘]
print(re.findall(‘a[0-9]b‘,‘a1b a*b a-b a=b‘)) #[]内的^代表的意思是取反，所以结果为[‘a=b‘]
print(re.findall(‘a[a-z]b‘,‘a1b a*b a-b a=b aeb‘)) #[]内的^代表的意思是取反，所以结果为[‘a=b‘]
print(re.findall(‘a[a-zA-Z]b‘,‘a1b a*b a-b a=b aeb aEb‘)) #[]内的^代表的意思是取反，所以结果为[‘a=b‘]

#\# print(re.findall(‘a\\c‘,‘a\c‘)) #对于正则来说a\\c确实可以匹配到a\c,但是在python解释器读取a\\c时，会发生转义，然后交给re去执行，所以抛出异常
print(re.findall(r‘a\\c‘,‘a\c‘)) #r代表告诉解释器使用rawstring，即原生字符串，把我们正则内的所有符号都当普通字符处理，不要转义
print(re.findall(‘a\\\\c‘,‘a\c‘)) #同上面的意思一样，和上面的结果一样都是[‘a\\c‘]

#():分组
print(re.findall(‘ab+‘,‘ababab123‘)) #[‘ab‘, ‘ab‘, ‘ab‘]
print(re.findall(‘(ab)+123‘,‘ababab123‘)) #[‘ab‘]，匹配到末尾的ab123中的ab
print(re.findall(‘(?:ab)+123‘,‘ababab123‘)) #findall的结果不是匹配的全部内容，而是组内的内容,?:可以让结果为匹配的全部内容

#|
print(re.findall(‘compan(?:y|ies)‘,‘Too many companies have gone bankrupt, and the next one is my company‘))

import re
print(re.findall("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>")) #[‘h1‘]
print(re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>").group()) #<h1>hello</h1>
print(re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>").groupdict()) #<h1>hello</h1>

print(re.search(r"<(\w+)>\w+</(\w+)>","<h1>hello</h1>").group())
print(re.search(r"<(\w+)>\w+</\1>","<h1>hello</h1>").group())

import re

print(re.findall(r‘-?\d+\.?\d*‘,"1-12*(60+(-40.35/5)-(-4*3))")) #找出所有数字[‘1‘, ‘-12‘, ‘60‘, ‘-40.35‘, ‘5‘, ‘-4‘, ‘3‘]


#使用|，先匹配的先生效，|左边是匹配小数，而findall最终结果是查看分组，所有即使匹配成功小数也不会存入结果
#而不是小数时，就去匹配(-?\d+)，匹配到的自然就是，非小数的数，在此处即整数
print(re.findall(r"-?\d+\.\d*|(-?\d+)","1-2*(60+(-40.35/5)-(-4*3))")) #找出所有整数[‘1‘, ‘-2‘, ‘60‘, ‘‘, ‘5‘, ‘-4‘, ‘3‘]

【Python 生成器匿名函数递归模块及包的导入正则re】

标签：series hello 冲突 ima 可读性 next ignore 退出 sre

原文地址：http://www.cnblogs.com/smallmars/p/6935269.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

【Python 生成器 匿名函数 递归 模块及包的导入 正则re】

一、生成器

1.定义

2.生成器函数

3.生成器表达式

二、匿名函数及内置函数补充

1.语法

2.匿名函数配合内置函数的用法

三、递归调用

1.定义

2.递归思想

3.用途

4.二分法

四、模块导入

1.定义

2.作用

3.import

4.From…import

5.模块的搜索路径

五、包的导入

1.定义

2.import

3.from ... import ...

4.__init__.py文件

5.from glance.api import *

6.绝对导入和相对导入

7.单独导入包

六、正则RE

1.正则表达式定义

2.常用的正则表达式

3.贪婪模式与非贪婪模式

4.反斜杠

5.re模块

Match

Pattern

实例方法[ | re模块方法]：

6.举例常用的匹配方式

【Python 生成器匿名函数递归模块及包的导入正则re】

4.init.py文件