python sorted() count() set(list)-去重

时间：2018-06-01 16:48:57 阅读：242 评论：0 收藏：0 [点我收藏+]

标签：print AC == lin 标点符号 item 统计 set 出现

2、用python实现统计一篇英文文章内每个单词的出现频率，并返回出现频率最高的前10个单词及其出现次数，并解答以下问题？（标点符号可忽略）

（1）创建文件对象f后，解释f的readlines和xreadlines方法的区别？

（2）追加需求：引号内元素需要算作一个单词，如何实现？

cat /root/text.txt

hello world 2018 xiaowei,good luck
hello kitty 2017 wangleai,ha he
hello kitty ,hasd he
hello kitty ,hasaad hedsfds

#我的脚本

#!/usr/bin/python
#get [‘a‘,‘b‘,‘c‘]
import re
with open(‘/root/text.txt‘) as f:
　　openfile = f.read()

def get_list_dict():
　　word_list = re.split(‘[0-9\W]+‘,openfile)
　　list_no_repeat = set(word_list)
　　dict_word = {}
　　for each_word in list_no_repeat:
　　　　dict_word[each_word] = word_list.count(each_word)
　　del dict_word[‘‘]
　　return dict_word

#{‘a‘:2,‘c‘:5,‘b‘:1} => {‘c‘:5,‘a‘:2,‘b‘:1}
def sort_dict_get_ten(dict_word):
　　list_after_sorted = sorted(dict_word.items(),key=lambda x:x[1],reverse=True)
　　print list_after_sorted
　　for i in range(3):
　　print list_after_sorted[i][0],list_after_sorted[i][1]

def main():

dict_word = get_list_dict()
sort_dict_get_ten(dict_word)

if __name__ == ‘__main__‘:

main()

[(‘hello‘, 4), (‘kitty‘, 3), (‘he‘, 2), (‘good‘, 1), (‘hasd‘, 1), (‘wangleai‘, 1), (‘hasaad‘, 1), (‘xiaowei‘, 1), (‘hedsfds‘, 1), (‘luck‘, 1), (‘world‘, 1), (‘ha‘, 1)]
hello 4
kitty 3
he 2

python sorted() count() set(list)-去重

标签：print AC == lin 标点符号 item 统计 set 出现

原文地址：https://www.cnblogs.com/hixiaowei/p/9122280.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行