python迭代器&生成器

时间：2015-02-28 17:57:58 阅读：302 评论：0 收藏：0 [点我收藏+]

标签：

看代码看到什么yield又看不懂了。Sign.....

迭代器：

首先Python有内置容器类：List ,dict,tuple.... 我们把这些叫做Container。Container都是可以用for..in..来遍历的。

那么，是什么使得遍历可以成功进行呢？

就是我们的迭代器对象啦。这个对象使得遍历能够进行。

迭代器一般有两个方法：

①__iter__()②next()

我们先来解释for..in...到底是怎么进行的：

首先会调用__iter__, 这个函数是用来返回Iterator对象的。之后调用next()方法输出。

 1 class Iterators(object):
 2     
 3     def __init__(self):
 4         self.value = 1
 5     
 6     def __iter__(self):
 7         return self
 8     
 9     def next(self):
10         self.value+=1
11         return self.value
12   
13 a=Iterators()
14 for value in a:
15     print value

像上面这个例子，在for...in..时先调用__iter__方法返回一个iterator对象，然后再调用这个迭代对象的next方法。

 1 class Iterators(object):
 2     
 3     def __init__(self):
 4         self.value = 1
 5     
 6     def __iter__(self):
 7         return iter([1,2,3])
 8     
 9     def next(self):
10         self.value+=1
11         return self.value

但是，如果我们改掉代码的第七行，输出就会不同，这时候输出是1,2,3。因为这时候返回的Iterator对象是iter([1,2,3]。

（ps:iter是python内置的函数，返回一个iterator对象。）

所以这时候调用的next()函数就是[1,2,3]的next(),所以输出就是1,2,3。

迭代器知道这么点差不多了。

生成器：

生成器通俗一点说就是有yield的函数，yield语句有点类似return。简单的小栗子：

1 def generator():
2         yield 1
3         yield 2
4     
5    
6 a=generator()
7 print a.next()
8 print a.next()

输出1,2。

这里，a=generator()不是执行函数，而是生成了一个生成器对象。

生成器就是一种迭代器。

所以生成器也有next()方法，也可以用for ... in ......

 1 class Text8Corpus(object):
 2     """Iterate over sentences from the "text8" corpus, unzipped from http://mattmahoney.net/dc/text8.zip ."""
 3     def __init__(self, fname, max_sentence_length=1000):
 4         self.fname = fname
 5         self.max_sentence_length = max_sentence_length
 6 
 7     def __iter__(self):
 8         # the entire corpus is one gigantic line -- there are no sentence marks at all
 9         # so just split the sequence of tokens arbitrarily: 1 sentence = 1000 tokens
10         sentence, rest = [], b‘‘
11         with utils.smart_open(self.fname) as fin:
12             while True:
13                 text = rest + fin.read(8192)  # avoid loading the entire file (=1 line) into RAM
14                 if text == rest:  # EOF
15                     sentence.extend(rest.split()) # return the last chunk of words, too (may be shorter/longer)
16                     if sentence:
17                         yield sentence
18                     break
19                 last_token = text.rfind(b‘ ‘)  # the last token may have been split in two... keep it for the next iteration
20                 words, rest = (utils.to_unicode(text[:last_token]).split(), text[last_token:].strip()) if last_token >= 0 else ([], text)
21                 sentence.extend(words)
22                 while len(sentence) >= self.max_sentence_length:
23                     yield sentence[:self.max_sentence_length]
24                     sentence = sentence[self.max_sentence_length:]

那我遇到的问题就好解决了。这是Word2Vec里将字符从文件中读取出来的一段代码。在for...in ...时，首先会运行__iter__方法，因为这个方法里有yield，所以它可以迭代。输入的文件是一个个英文单词，以空格分割。最后的结果就是一个个List,每个List最多有1000个字符。

不过那个b‘‘和last_token我还是没看懂哎。

python迭代器&生成器

标签：

原文地址：http://www.cnblogs.com/stezqy/p/4305637.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行