码迷,mamicode.com
首页 > 编程语言 > 详细

Python 的mapreduce 单词统计(转载)

时间:2014-12-23 22:45:42      阅读:288      评论:0      收藏:0      [点我收藏+]

标签:

#!/usr/bin/env python
import random

# ‘abc..z‘
alphaStr = "".join(map(chr, range(97,123)))
fp = open("word.txt", "w")
maxIter = 100000
for i in range(maxIter):
    word = ""
    len =random.randint(1,5)
    for j in range(len):
        word + = alphaStr[random.randint(0,25)]
        fp.write(word + ‘\n‘)
fp.close()


cat word.txt | ./wordcount_mapper.py | ./wordcount_reducer.py . 

word count reduce,   python

#filename:  wordcount_reducer.py
from  operator import itemgetter
import sys

wordcount = {}
for line in sys.stdin:
    word, count = line.strip().split(‘\t‘,1)
    try:
        count = int(count)
        wordcount[word] = wordcount.get(word,0) + count
    except ValueError
        pass

sorted_wordcount = sorted(wordcount.iterms(), key = itemgettter(0))
for word,count in sorted_wordcount:
    print("%s\t%s") %(word, count)

Python 的mapreduce 单词统计(转载)

标签:

原文地址:http://my.oschina.net/innovation/blog/359748

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!