爬虫一

时间：2017-05-10 17:03:35 阅读：244 评论：0 收藏：0 [点我收藏+]

标签：htm www encoding url parse ann 许可证 list odi

初识爬虫

 1 #! /usr/bin/env python
 2 # encoding: utf-8
 3 
 4 from bs4 import BeautifulSoup
 5 import requests
 6 
 7 
 8 response = requests.get("http://www.autohome.com.cn/news/")  
 9 # response.text
10 response.encoding = response.apparent_encoding  # 解决爬虫乱码
11 
12 soup = BeautifulSoup(response.text, features="html.parser")  # 生成Soup对象
13 soup_obj = soup.find(id="auto-channel-lazyload-article")  # find查找第一个符合条件的对象
14 
15 li_list = soup_obj.find_all("li")  # find_all查找所有符合的对象，查找出来的值在列表中
16 # print(target)
17 for i in li_list:
18     a = i.find("a")
19     if a:
20         a_attrs = a.attrs.get("href")  # attrs查找属性
21         print(a_attrs)
22         a_h = a.find("h3")
23         print(a_h)
24         img = a.find("img")
25         print(img)

requests

Python标准库中提供了：urllib、urllib2、httplib等模块以供Http请求，但是，它的 API 太渣了。它是为另一个时代、另一个互联网所创建的。它需要巨量的工作，甚至包括各种方法覆盖，来完成最简单的任务。

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装，从而使得Pythoner进行网络请求时，变得美好了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作。

爬虫一

标签：htm www encoding url parse ann 许可证 list odi

原文地址：http://www.cnblogs.com/YingLai/p/6836823.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行