我希望初学Python时就能知道的一些用法

时间：2015-02-04 14:51:01 阅读：253 评论：0 收藏：0 [点我收藏+]

标签：

有时候我反问我自己，怎么不知道在Python 3中用更简单的方式做“这样”的事，当我寻求答案时，随着时间的推移，我当然发现更简洁、有效并且bug更少的代码。总的来说（不仅仅是这篇文章），“那些”事情总共数量是超过我想象的，但这里是第一批不明显的特性，后来我寻求到了更有效的/简单的/可维护的代码。

字典

字典中的keys()和items()

你能在字典的keys和items中做很多有意思的操作，它们类似于集合（set）：

         aa={‘mike’: ‘male’, ‘kathy’: ‘female’, ‘steve’: ‘male’, ‘hillary’: ‘female’} 
       
         bb={‘mike’: ‘male’, ‘ben’: ‘male’, ‘hillary’: ‘female’} 
       
         aa.keys() & bb.keys()# {‘mike’, ‘hillary’} # these are set-like 
       
         aa.keys()-bb.keys()# {‘kathy’, ‘steve’} 
       
         # If you want to get the common key-value pairs in the two dictionaries 
       
         aa.items() & bb.items()# {(‘mike’, ‘male’), (‘hillary’, ‘female’)}

太简洁啦！

在字典中校验一个key的存在

下面这段代码你写了多少遍了？

          dictionary = {} 
        
          fork, vinls: 
        
              ifnot kindictionary: 
        
                  dictionary[k] = [] 
        
              dictionary[k].append(v)

这段代码其实没有那么糟糕，但是为什么你一直都需要用if语句呢？

          from collectionsimportdefaultdict 
        
          dictionary = defaultdict(list) # defaults to list 
        
          fork, vinls: 
        
              dictionary[k].append(v)

这样就更清晰了，没有一个多余而模糊的if语句。

用另一个字典来更新一个字典

          from itertoolsimportchain 
        
          a = {‘x’:1, ‘y’:2, ‘z’:3} 
        
          b = {‘y’:5, ‘s’:10, ‘x’:3, ‘z’:6} 
        
          # Update awithb 
        
          c = dict(chain(a.items(), b.items())) 
        
          c # {‘y’:5, ‘s’:10, ‘x’:3, ‘z’:6}

这样看起来还不错，但是不够简明。看看我们是否能做得更好：

1 2	c=a.copy() c.update(b)

更清晰而且更有可读性了！

从一个字典获得最大值

如果你想获取一个字典中的最大值，可能会像这样直接：

         aa={k:sum(range(k))forkinrange(10)} 
       
         aa# {0: 0, 1: 0, 2: 1, 3: 3, 4: 6, 5: 10, 6: 15, 7: 21, 8: 28, 9: 36} 
       
         max(aa.values())#36

这么做是有效的，但是如果你需要key，那么你就需要在value的基础上再找到key。然而，我们可以用过zip来让展现更扁平化，并返回一个如下这样的key-value形式：

1 2	max(zip(aa.values(), aa.keys())) # (36, 9) => value, key pair

同样地，如果你想从最大到最小地去遍历一个字典，你可以这么干：

1 2	sorted(zip(aa.values(), aa.keys()), reverse=True) # [(36, 9), (28, 8), (21, 7), (15, 6), (10, 5), (6, 4), (3, 3), (1, 2), (0, 1), (0, 0)]

在一个list中打开任意数量的items

我们可以运用*的魔法，获取任意的items放到list中：

          defcompute_average_salary(person_salary): 
        
              person,*salary=person_salary 
        
              returnperson, (sum(salary)/float(len(salary))) 
        
          person, average_salary=compute_average_salary([“mike”,40000,50000,60000]) 
        
          person# ‘mike’ 
        
          average_salary# 50000.0

这不是那么有趣，但是如果我告诉你也可以像下面这样呢：

          defcompute_average_salary(person_salary_age): 
        
              person,*salary, age=person_salary_age 
        
              returnperson, (sum(salary)/float(len(salary))), age 
        
          person, average_salary, age=compute_average_salary([“mike”,40000,50000,60000,42]) 
        
          age# 42

看起来很简洁嘛!

当你想到有一个字符串类型的key和一个list的value的字典，而不是遍历一个字典，然后顺序地处理value，你可以使用一个更扁平的展现(list中套list)，像下面这样:

          # Instead of doing this 
        
          fork, vindictionary.items(): 
        
              process(v) 
        
          # we are separating head and the rest, and process the values 
        
          # as a list similar to the above. head becomes the key value 
        
          forhead,*restinls: 
        
              process(rest) 
        
          # if not very clear, consider the following example 
        
          aa={k:list(range(k))forkinrange(5)}# range returns an iterator 
        
          aa# {0: [], 1: [0], 2: [0, 1], 3: [0, 1, 2], 4: [0, 1, 2, 3]} 
        
          fork, vinaa.items(): 
        
              sum(v) 
        
          #0 
        
          #0 
        
          #1 
        
          #3 
        
          #6 
        
          # Instead 
        
          aa=[[ii]+list(range(jj))forii, jjinenumerate(range(5))] 
        
          forhead,*restinaa: 
        
              print(sum(rest)) 
        
          #0 
        
          #0 
        
          #1 
        
          #3 
        
          #6

你可以把list解压成head，*rest,tail等等。

Collections用作计数器

Collections是我在python中最喜欢的库之一，在python中，除了原始的默认的，如果你还需要其他的数据结构，你就应该看看这个。

我日常基本工作的一部分就是计算大量而又不是很重要的词。可能有人会说，你可以把这些词作为一个字典的key，他们分别的值作为value，在我没有接触到collections中的Counter时，我可能会同意你的做法（是的，做这么多介绍就是因为Counter）。

假设你读的python语言的维基百科，转化为一个字符串，放到一个list中（标记好顺序）：

         importre 
       
         word_list=list(map(lambdak: k.lower().strip(), re.split(r’[;,:(.s)]s*’, python_string))) 
       
         word_list[:10]# [‘python’, ‘is’, ‘a’, ‘widely’, ‘used’, ‘general-purpose’, ‘high-level’, ‘programming’, ‘language’, ‘[17][18][19]’]

到目前为止看起来都不错，但是如果你想计算这个list中的单词：

         fromcollectionsimportdefaultdict# again, collections! 
       
         dictionary=defaultdict(int) 
       
         forwordinword_list: 
       
             dictionary[word]+=1

这个没有那么糟糕，但是如果你有了Counter，你将会节约下你的时间做更有意义的事情。

         fromcollectionsimportCounter 
       
         counter=Counter(word_list) 
       
         # Getting the most common 10 words 
       
         counter.most_common(10) 
       
         [(‘the’,164), (‘and’,161), (‘a’,138), (‘python’,138), 
       
         (‘of’,131), (‘is’,102), (‘to’,91), (‘in’,88), (‘’,56)] 
       
         counter.keys()[:10]# just like a dictionary 
       
         [‘’, ‘limited’, ‘all’, ‘code’, ‘managed’, ‘multi-paradigm’, 
       
         ‘exponentiation’, ‘fromosing’, ‘dynamic’]

很简洁吧，但是如果我们看看在Counter中包含的可用的方法：

         dir(counter) 
       
         [‘__add__’, ‘__and__’, ‘__class__’, ‘__cmp__’, ‘__contains__’, ‘__delattr__’, ‘__delitem__’, ‘__dict__’, 
       
         ‘__doc__’, ‘__eq__’, ‘__format__’, ‘__ge__’, ‘__getattribute__’, ‘__getitem__’, ‘__gt__’, ‘__hash__’, 
       
         ‘__init__’, ‘__iter__’, ‘__le__’, ‘__len__’, ‘__lt__’, ‘__missing__’, ‘__module__’, ‘__ne__’, ‘__new__’, 
       
         ‘__or__’, ‘__reduce__’, ‘__reduce_ex__’, ‘__repr__’, ‘__setattr__’, ‘__setitem__’, ‘__sizeof__’, 
       
         ‘__str__’, ‘__sub__’, ‘__subclasshook__’, ‘__weakref__’, ‘clear’, ‘copy’, ‘elements’, ‘fromkeys’, ‘get’, 
       
         ‘has_key’, ‘items’, ‘iteritems’, ‘iterkeys’, ‘itervalues’, ‘keys’, ‘most_common’, ‘pop’, ‘popitem’, ‘setdefault’, 
       
         ‘subtract’, ‘update’, ‘values’, ‘viewitems’, ‘viewkeys’, ‘viewvalues’]

你看到__add__和__sub__方法了吗，是的，Counter支持加减运算。因此，如果你有很多文本想要去计算单词，你不必需要Hadoop，你可以运用Counter(作为map)然后把它们加起来（相当于reduce）。这样你就有构建在Counter上的mapreduce了，你可能以后还会感谢我。

扁平嵌套lists

Collections也有_chain函数，其可被用作扁平嵌套lists

         fromcollectionsimportchain 
       
         ls=[[kk]+list(range(kk))forkkinrange(5)] 
       
         flattened_list=list(collections._chain(*ls))

同时打开两个文件

如果你在处理一个文件（比如一行一行地），而且要把这些处理好的行写入到另一个文件中，你可能情不自禁地像下面这么去写：

         withopen(input_file_path) as inputfile: 
       
             withopen(output_file_path, ‘w’) as outputfile: 
       
                 forlineininputfile: 
       
                     outputfile.write(process(line))

除此之外，你可以在相同的一行里打开多个文件，就像下面这样：

         withopen(input_file_path) as inputfile,open(output_file_path, ‘w’) as outputfile: 
       
             forlineininputfile: 
       
                 outputfile.write(process(line))

这样就更简洁啦！

从一堆数据中找到星期一

如果你有一个数据想去标准化（比如周一之前或是之后），你也许会像下面这样：

         importdatetime 
       
         previous_monday=some_date-datetime.timedelta(days=some_date.weekday()) 
       
         # Similarly, you could map to next monday as well 
       
         next_monday=some_date+date_time.timedelta(days=-some_date.weekday(), weeks=1)

这就是实现方式。

处理HTML

如果你出于兴趣或是利益要爬一个站点，你可能会一直面临着html标签。为了去解析各种各样的html标签，你可以运用html.parer：

         fromhtml.parserimportHTMLParser 
       
         classHTMLStrip(HTMLParser): 
       
             def__init__(self): 
       
                 self.reset() 
       
                 self.ls=[] 
       
             defhandle_data(self, d): 
       
                 self.ls.append(d) 
       
             defget_data(self): 
       
                 return‘’.join(self.ls) 
       
        @staticmethod 
       
             defstrip(snippet): 
       
                 html_strip=HTMLStrip() 
       
                 html_strip.feed(snippet) 
       
                 clean_text=html_strip.get_data() 
       
                 returnclean_text 
       
         snippet=HTMLStrip.strip(html_snippet)

如果你仅仅想避开html：

         escaped_snippet=html.escape(html_snippet) 
       
         # Back to html snippets(this is new in Python 3.4) 
       
         html_snippet=html.unescape(escaped_snippet) 
       
         # and so forth ...

关于作者： jasper

我希望初学Python时就能知道的一些用法

标签：

原文地址：http://my.oschina.net/orgsky/blog/375240

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行