码迷,mamicode.com
首页 > 其他好文 > 详细

4.1 Implicit Sequences

时间:2015-07-17 18:09:01      阅读:128      评论:0      收藏:0      [点我收藏+]

标签:

Some important ideas in big data processing:

  • Implicit representations of streams of sequential data
  • Declarative programming languages to manipulate and transform data
  • Distributed computing

 

Implicit Sequences

A sequence can be represented without each element being stored explicitly in the memory of the computer. That is, we can construct an object that provides access to all of the elements of some sequential dataset without computing all of those elements in advance and storing them. Instead, we compute elements on demand.

Example: The built-in range class represents consecutive integers

  •  The range is represented by two values: start and end
  •  The length and elements are computed on demand
  • •Constant space for arbitrarily long sequences

自己构造的一个range object:

class Range:
    """An implicit sequence of consecutive integers.

    >>> r = Range(3, 12)
    >>> len(r)
    9
    >>> r[3]
    6
    """
    def __init__(self, start, end=None):
        if end is None:
            start, end = 0, start
        self.start = start
        self.end = end

    def __repr__(self):
        return Range({0}, {1}).format(self.start, self.end)

    def __len__(self):
        return max(0, self.end - self.start)

    def __getitem__(self, k):
        if k >= len(self):
            raise IndexError(index out of range)
        return self.start + k

 

 

1. Iterables and Iterators

Iterator: Mutable object that tracks a position in a sequence, advancing on __next__

Iterable: Represents a sequence and returns a new iterator(have __next__method) on __iter__

ps: iterator interface could also implement __iter__ methods, but it does not require the __iter__ method to return a new object — just itself (with state about its current position).

class LetterIter:
    """An iterator over letters.

    >>> a_to_c = LetterIter(‘a‘, ‘c‘)
    >>> next(a_to_c)
    ‘a‘
    >>> next(a_to_c)
    ‘b‘
    >>> next(a_to_c)
    Traceback (most recent call last):
        ...
    StopIteration
    """
    def __init__(self, start=a, end=e):
        self.next_letter = start
        self.end = end

    def __next__(self):
        if self.next_letter >= self.end:
            raise StopIteration
        result = self.next_letter
        self.next_letter = chr(ord(result)+1)
        return result

class Letters:
    """An implicit sequence of letters.

    >>> b_to_k = Letters(‘b‘, ‘k‘)
    >>> first_iterator = b_to_k.__iter__()
    >>> next(first_iterator)
    ‘b‘
    >>> next(first_iterator)
    ‘c‘
    >>> second_iterator = iter(b_to_k)
    >>> second_iterator.__next__()
    ‘b‘
    >>> first_iterator.__next__()
    ‘d‘
    >>> first_iterator.__next__()
    ‘e‘
    >>> second_iterator.__next__()
    ‘c‘
    >>> second_iterator.__next__()
    ‘d‘
    """
    def __init__(self, start=a, end=e):
        self.start = start
        self.end = end

    def __iter__(self):
        return LetterIter(self.start, self.end)

 

1.1 Many built-in Python sequence operations return iterators that compute results lazily.

map(func, iterable): 

filter(func, iterable):

zip(first_iter, second_iter):

reversed(sequence):  

 

1.2 For statement

When executing a for statement, __iter__ returns an iterator and __next__ provides each item

counts = [1, 2, 3]
for item in counts:
    print(item)
-------------
1
2
3

counts = [1, 2, 3]
items = counts.__iter__()
try:
    while True:
        item = items.__next__()
        print(item)
except StopIteration:
    pass 
-----------------
1
2
3

 

 

2. Generators and Generator Functions

A generator function is a function that yields values instead of returning them

A generator is an iterator, created by a generator function

When a generator function is called, it returns a generator that iterates over yields

def letter_generator(next_letter, end):
    while next_letter < end:
        yield next_letter
        next_letter = chr(ord(next_letter)+1)
>>> s = letter_generator(a, z)
>>> next(s)
a
>>> next(s)
b    

 前面的Letter 可以修改为:

class Letters:
    def __init__(self, start=a, end=e):
        self.start = start
        self.end = end

    def __iter__(self):
        return letter_generator(self.start, self.end)

 

 

3. Stream

A stream is a linked list with an explicit first element and a rest-of-the-list that is computed lazily

(Second element is a zero-argument function that returns a Stream or Stream.empty)

class Stream:
    """A lazily computed linked list.

    >>> s = Stream(1, lambda: Stream(6-2, lambda: Stream(9)))
    >>> s.first
    1
    >>> s.rest.first
    4
    >>> s.rest
    Stream(4, <...>)
    >>> s.rest.rest.first
    9
    """

    class empty:
        def __repr__(self):
            return Stream.empty

    empty = empty()

    def __init__(self, first, compute_rest=lambda: Stream.empty):
        assert callable(compute_rest), compute_rest must be callable.
        self.first = first
        self._compute_rest = compute_rest

    @property
    def rest(self):
        """Return the rest of the stream, computing it if necessary."""
        if self._compute_rest is not None:
            self._rest = self._compute_rest()
            self._compute_rest = None
        return self._rest

    def __repr__(self):
        return Stream({0}, <...>).format(repr(self.first))

def first_k(s, k):
    """Return up to k elements of stream s as a list.

    >>> s = Stream(1, lambda: Stream(4, lambda: Stream(9)))
    >>> first_k(s, 2)
    [1, 4]
    >>> first_k(s, 5)
    [1, 4, 9]
    """
    elements = []
    while s is not Stream.empty and k > 0:
        elements.append(s.first)
        s, k = s.rest, k-1
    return elements

 

 e.g: integer_stream

def integer_stream(first=1):
    """Return a stream of consecutive integers, starting with first.

    >>> s = integer_stream(3)
    >>> s
    Stream(3, <...>)
    >>> m = map_stream(lambda x: x*x, s)
    >>> first_k(m, 5)
    [9, 16, 25, 36, 49]
    """
    def compute_rest():
        return integer_stream(first+1)
    return Stream(first, compute_rest)

 

 3.1 Higher-Order Functions on Streams

def map_stream(fn, s):
    """Map a function fn over the elements of a stream s.

    >>> s = integer_stream(3)
    >>> s
    Stream(3, <...>)
    >>> m = map_stream(lambda x: x*x, s)
    >>> first_k(m, 5)
    [9, 16, 25, 36, 49]
    """
    if s is Stream.empty:
        return s
    def compute_rest():
        return map_stream(fn, s.rest)
    return Stream(fn(s.first), compute_rest)

def filter_stream(fn, s):
    """Filter stream s with predicate function fn."""
    if s is Stream.empty:
        return s
    def compute_rest():
        return filter_stream(fn, s.rest)
    if fn(s.first):
        return Stream(s.first, compute_rest)
    else:
        return compute_rest()

 e.g. primes stream

def primes(positives):
    """Return a stream of primes, given a stream of positive integers.

    >>> positives = integer_stream(2)
    >>> first_k(primes(positives), 8)
    [2, 3, 5, 7, 11, 13, 17, 19]
    """
    def not_divisible(x):
        return x % positives.first != 0
    def compute_rest():
        return primes(filter_stream(not_divisible, positives.rest))
    return Stream(positives.first, compute_rest)

 

 

 

 2015-07-17

4.1 Implicit Sequences

标签:

原文地址:http://www.cnblogs.com/whuyt/p/4654135.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!