标签:python mapreduce map reduce filter
Here I share with you a demo for python map, reduce and filter functional programming thatowned by me(Xiaoqiang).
I assume there are two DB tables, that `file_logs` and `expanded_attrs` which records more columns to expand table `file_logs`. For demonstration, we assume that there are more than one file logs for a same tuple of (platform_id, client_id). We need to feture
out which is the one lasted updated for (platform_id=1, client_id=1) tuple.
Here is the thoughts:
1. Filter out all file logs for tuple (platform_id=1, client_id=1) from original file logs,
2. Merge expand table attributes into file_logs table in memory, like union selection.
3. Reduce the full version of file_logs for figuring out which is latest updated.
Demo codes shows here (use Python 2.6+, 2.7+):
BTW, you are welcome if you feature out a more effective way of working or any issues you found. Thanks. :)
#!/usr/bin/env python """ Requirement: known platform_id=1, client_id=1 as pid and cid. exists file_logs and expanded_attrs which are array of objects, expanded_attrs is a table of columns expand table file_logs as file_logs contains more than one for pid=1,cid=1, we need to find out which is the one latest updated. """ file_logs = [ { 'file_log_id': '1', 'platform_id': '1', 'client_id': '1', 'file': 'path/to/platform/client/j-1/stdout' }, { 'file_log_id': '2', 'platform_id': '1', 'client_id': '1', 'file': 'path/to/platform/client/j-2/stdout' }, { 'file_log_id': '3', 'platform_id': '2', 'client_id': '3', 'file': 'path/to/platform/client/j-3/stdout' }, ] expanded_attrs = [ { 'file_log_id': '1', 'attr_name': 'CLICK', 'attr_value': '100' }, { 'file_log_id': '1', 'attr_name': 'SUPPRESSION', 'attr_value': '100' }, { 'file_log_id': '1', 'attr_name': 'last_updated', 'attr_value': '2014-07-14' }, { 'file_log_id': '2', 'attr_name': 'CLICK', 'attr_value': '200' }, { 'file_log_id': '2', 'attr_name': 'SUPPRESSION', 'attr_value': '200' }, { 'file_log_id': '2', 'attr_name': 'last_updated', 'attr_value': '2014-07-15' }, { 'file_log_id': '3', 'attr_name': 'CLICK', 'attr_value': '300' }, { 'file_log_id': '3', 'attr_name': 'SUPPRESSION', 'attr_value': '300' }, { 'file_log_id': '3', 'attr_name': 'last_updated', 'attr_value': '2014-07-15' }, ] platform_id = '1' client_id = '1' target_scope_filelogs = filter(lambda x: x['platform_id'] == platform_id and x['client_id'] == client_id, file_logs) map( lambda x: x.update(reduce( lambda xx, xy: xx.update({ xy['attr_name']: xy['attr_value'] }) is None and xx, filter(lambda xx: xx['file_log_id'] == x['file_log_id'], expanded_attrs), dict() )), target_scope_filelogs ) print reduce(lambda x, y: x['last_updated'] > y['last_updated'] and x or y, target_scope_filelogs) #> {'file_log_id': '2', 'platform_id': '1', 'last_updated': '2014-07-15', 'SUPPRESSION': '200', 'file': 'path/to/platform/client/j-2/stdout', 'client_id': '1', 'CLICK': '200'}
Demo of Python "Map Reduce Filter",布布扣,bubuko.com
Demo of Python "Map Reduce Filter"
标签:python mapreduce map reduce filter
原文地址:http://blog.csdn.net/wxqee/article/details/38451609