Python 爬虫基础

时间：2014-12-10 19:46:40 阅读：248 评论：0 收藏：0 [点我收藏+]

下面是一个 Python 爬虫最简单的例子，使用网络库 urllib2 和正则表达式库 re，模拟浏览器的 User-Agent。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#引入基础网络库
import urllib2
#引入正则表达式模块
import re

#模拟User-Agent
headers = {‘User-Agent‘:‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11‘}
req = urllib2.Request("http://www.cnblogs.com/pengzhong", headers=headers)
html = urllib2.urlopen(req).read()

#findall方法，注意区分大小写，全是小写
menus = re.findall(r‘<a class="menu".*</a>‘,html)

f=open(‘menu.txt‘,‘w‘)

#输出匹配到的字符串
for menu in menus:
	f.write(menu)

f.close

Python 爬虫基础

标签：blog http ar os 使用 sp for on div

原文地址：http://www.cnblogs.com/pengzhong/p/PythonSpiderBase.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行