Python朝花夕拾

时间：2015-08-20 10:23:15 阅读：132 评论：0 收藏：0 [点我收藏+]

标签：

Q1：HTTP Error 403: Forbidden

python中经常使用urllib2.urlopen函数提取网页源码，但是有些时候这个函数返回的却是：HTTP Error 403: Forbidden，这表明源网站不允许爬虫进行爬取

解决方法：伪装成浏览器进行访问

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2

url = "http://www.google.com/translate_a/t?client=t&sl=zh-CN&tl=en&q=%E7%94%B7%E5%AD%A9"
#浏览器头
headers = {‘User-Agent‘:‘Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6‘}
req = urllib2.Request(url=url,headers=headers)
data = urllib2.urlopen(req).read()
print data

这样就可以得到网页源码了～

注：如果源码中中文为乱码，可以使用：

print data.decode("UTF-8")

参见 http://www.geekcome.com/content-10-3101-1.html

Q2：文件读写

read() 每次读取整个文件，它通常用于将文件内容放到一个字符串变量中，readline() 和 .readlines()之间的差异是后者一次读取整个文件，象 .read()一样。.readlines()自动将文件内容分析成一个行的列表，该列表可以由 Python 的 for... in ... 结构进行处理。另一方面，.readline()每次只读取一行，通常比 .readlines()慢得多。仅当没有足够内存可以一次读取整个文件时，才应该使用.readline()

注意：f.readline()完成后最后一个字符是 ‘\n’,此时光标已经定位到了下一行的第0个位置，如果需要去除最后的回车键‘\n’，则为

 line = f.readline()
 line = line[:-1]

也可以用"string".strip(‘\n‘)

参见http://www.cnblogs.com/kaituorensheng/archive/2012/06/05/2537347.html

for line in open(‘poem.txt‘):
    line = f.readline()
    print line

f = open(‘test.txt‘, ‘r‘)                   #以读方式打开文件
for line in f.readlines():  
    print line

>>>> welcome_str ="Welcome you"
>>>> welcome_str[0] ‘W‘
>>> git_list = ["qiwsir","github","io"]
>>> git_list[0]
‘qiwsir‘


  Q3:list和str转化

list和str的最大区别是：list是原处可以改变的，str则原处不可变

list和str转化

str.split()

>>> line = "Hello.I am qiwsir.Welcome you."   
>>> line.split(".")   #以英文的句点为分隔符，得到list 
[‘Hello‘, ‘I am qiwsir‘, ‘Welcome you‘, ‘‘]   
>>> line.split(".",1)  #这个1,就是表达了上文中的：If maxsplit is given, at most maxsplit splits are done. 
[‘Hello‘, ‘I am qiwsir.Welcome you.‘]       
>>> name = "Albert Ainstain"  #也有可能用空格来做为分隔符 
>>> name.split(" ") 
[‘Albert‘, ‘Ainstain‘] 
join可以说是split的逆运算

从list得到的字符串做切片操作

list[0][1:] #remove the first character

Q4：list对应元素相加

使用zip函数

zip就是把2个数组糅在一起
x=[1, 2, 3, 4, 5 ]
y=[6, 7, 8, 9, 10]
zip(x, y)就得到了
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]

list12 = [x+y for x,y in zip(list1,list2)]

求l1=[[1,2],[3,4],[5,6]]对应元素的和

>>> l1=[[1,2],[3,4],[5,6]]
>>> list(map(sum,zip(*l1)))
[9, 12]

zip参考：http://www.lfyzjck.com/python-zip/

Q5：判断数据类型

import types
type(x) is types.IntType # 判断是否int 类型
type(x) is types.StringType #是否string类型

types.ListType == type(text)

Q6：三目运算符

true_part if condition else false_part

>>> 1 if True else 0
1
>>> 1 if False else 0
0

Q7：list去重

http://www.jb51.net/article/55342.htm

ids = [1,4,3,3,4,2,3,4,5,6,1]
news_ids = list(set(ids))
news_ids.sort(ids.index)

Q8:Python的函数指针

#Icmp----Generates some ICMP traffic
def matchIcmp(json_data):
    if json_data[‘network‘][‘icmp‘]:
        print ‘Generates some ICMP traffic‘
func_sets=[matchEntropy,matchIcmp,matchIrc,matchHttp,matchSmtp,matchDomain,matchFile,matchRegister,matchMutex,matchApi]
for func in func_sets:
    func(json_data)

#模块中，类外的函数指针调用
def hwFunc1(x):
    print("waleking‘s func1")
def hwFunc2(x):
    print("%s" %(x+1))
    print("waleking‘s func2")
funcSets={"func1":hwFunc1,"func2":hwFunc2}
funcSets["func1"](1)

Q9：TypeError: ‘NoneType’ object is not iterable

原因：最终所被调用的函数所返回的值，和返回值赋值给的变量，不匹配。

Q10：Python按照书写顺序输出字典中的元素

使用`dict`时，Key是无序的。在对`dict`做迭代时，我们无法确定Key的顺序。

如果要保持Key的顺序，可以用OrderedDict：

>>> from collections import OrderedDict
>>> d = dict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
>>> d # dict的Key是无序的
{‘a‘: 1, ‘c‘: 3, ‘b‘: 2}
>>> od = OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])

注意，OrderedDict的Key会按照插入的顺序排列，不是Key本身排序：

>>> od = OrderedDict()
>>> od[‘z‘] = 1
>>> od[‘y‘] = 2
>>> od[‘x‘] = 3
>>> od.keys() # 按照插入的Key的顺序返回
[‘z‘, ‘y‘, ‘x‘]

reference：http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001411031239400f7181f65f33a4623bc42276a605debf6000

Python朝花夕拾

标签：

原文地址：http://www.cnblogs.com/lunachy/p/4744159.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

Python朝花夕拾

Q1：HTTP Error 403: Forbidden

Q2：文件读写

Q3:list和str转化

Q4：list对应元素相加

Q5：判断数据类型

Q6：三目运算符

Q7：list去重

Q8:Python的函数指针

Q9：TypeError: ‘NoneType’ object is not iterable

Q10：Python按照书写顺序输出字典中的元素

使用dict时，Key是无序的。在对dict做迭代时，我们无法确定Key的顺序。

使用`dict`时，Key是无序的。在对`dict`做迭代时，我们无法确定Key的顺序。