Python笔记-Grouping Records Together Based on a Field

时间：2014-06-08 18:22:29 阅读：280 评论：0 收藏：0 [点我收藏+]

标签：python

Grouping Records Together Based on a Field

Problem

You have a sequence of dictionaries or instances and you want to iterate over the data in groups based on the value of a particular field, such as date.

Solution

The itertools.groupby() function is particularly useful for grouping data together like this. To illustrate, suppose you have the following list of dictionaries:

rows = [
    {‘address‘: ‘5412 N CLARK‘, ‘date‘: ‘07/01/2012‘},
    {‘address‘: ‘5148 N CLARK‘, ‘date‘: ‘07/04/2012‘},
    {‘address‘: ‘5800 E 58TH‘, ‘date‘: ‘07/02/2012‘},
    {‘address‘: ‘2122 N CLARK‘, ‘date‘: ‘07/03/2012‘},
    {‘address‘: ‘5645 N RAVENSWOOD‘, ‘date‘: ‘07/02/2012‘},
    {‘address‘: ‘1060 W ADDISON‘, ‘date‘: ‘07/02/2012‘},
    {‘address‘: ‘4801 N BROADWAY‘, ‘date‘: ‘07/01/2012‘},
    {‘address‘: ‘1039 W GRANVILLE‘, ‘date‘: ‘07/04/2012‘},
]

Now suppose you want to iterate over the data in chunks grouped by date. To do it, first sort by the desired field (in this case, date) and then use itertools.groupby():

from operator import itemgetter
from itertools import groupby

# Sort by the desired field first
rows.sort(key=itemgetter(‘date‘))

# Iterate in groups
for date, items in groupby(rows, key=itemgetter(‘date‘)):
    print(date)
    for i in items:
        print(‘    ‘, i)

This produces the following output:

    07/01/2012
         {‘date‘: ‘07/01/2012‘, ‘address‘: ‘5412 N CLARK‘}
         {‘date‘: ‘07/01/2012‘, ‘address‘: ‘4801 N BROADWAY‘}
    07/02/2012
         {‘date‘: ‘07/02/2012‘, ‘address‘: ‘5800 E 58TH‘}
         {‘date‘: ‘07/02/2012‘, ‘address‘: ‘5645 N RAVENSWOOD‘}
         {‘date‘: ‘07/02/2012‘, ‘address‘: ‘1060 W ADDISON‘}
    07/03/2012
         {‘date‘: ‘07/03/2012‘, ‘address‘: ‘2122 N CLARK‘}
    07/04/2012
         {‘date‘: ‘07/04/2012‘, ‘address‘: ‘5148 N CLARK‘}
         {‘date‘: ‘07/04/2012‘, ‘address‘: ‘1039 W GRANVILLE‘}

Discussion

The groupby() function works by scanning a sequence and finding sequential "runs" of identical values (or values returned by the given key function). On each iteration, it returns the value along with an iterator that produces all of the items in a group with the same value.

An important preliminary step is sorting the data according to the field of interest. Since groupby()only examines consecutive items, failing to sort first won’t group the records as you want.(分组之前要先排序)

If your goal is to simply group the data together by dates into a large data structure that allows random access, you may have better luck using defaultdict() to build a multidict, as described in“Mapping Keys to Multiple Values in a Dictionary”. For example:

from collections import defaultdict
rows_by_date = defaultdict(list)
for row in rows:
    rows_by_date[row[‘date‘]].append(row)

This allows the records for each date to be accessed easily like this:

>>> for r in rows_by_date[‘07/01/2012‘]:
...     print(r)
...
{‘date‘: ‘07/01/2012‘, ‘address‘: ‘5412 N CLARK‘}
{‘date‘: ‘07/01/2012‘, ‘address‘: ‘4801 N BROADWAY‘}
>>>

For this latter example, it’s not necessary to sort the records first. Thus, if memory is no concern, it may be faster to do this than to first sort the records and iterate using groupby().

Python笔记-Grouping Records Together Based on a Field,布布扣,bubuko.com

Python笔记-Grouping Records Together Based on a Field

标签：python

原文地址：http://blog.csdn.net/jjjcainiao/article/details/28889515

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行