Python 爬虫

时间：2017-07-24 10:03:23 阅读：174 评论：0 收藏：0 [点我收藏+]

标签：安装 cti python 爬虫输入网站 latest from function import

1.用Requests爬去你想要的爬取的网站

import requests
 
r = requests.get(‘https://www.baidu.com‘)
print r.text # 打印网站源代码

注意：使用Requests前需要安装Requests库，安装方法，命令行输入：

1	`pip` `install` `requests`

2. 用Beautiful Soup解析网站源代码

安装：

1	`pip` `install` `beautifulsoup4`

解析：

from bs4 import BeautifulSoup # 引用BeautifulSoup库
import requests               # 引用Requests库
 
r = requests.get(‘https://www.baidu.com‘)
html = r.text                 # 获取网站源代码
soup = BeautifulSoup(html)    #创建 beautifulsoup 对象
print soup.a                  # 获取网页的链接 （a标签）
# ...

PS：

Requests库的具体用法，见 http://docs.python-requests.org/zh_CN/latest/

BeautifulSoup库的具体用法，见 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

Python 爬虫

标签：安装 cti python 爬虫输入网站 latest from function import

原文地址：http://www.cnblogs.com/siyu1915/p/7226951.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行