pyhont---信息的爬取与提取---requests库

时间：2017-12-02 23:25:15 阅读：321 评论：0 收藏：0 [点我收藏+]

标签：pre **kwargs 对象 method header res _for url head

python --requests能爬取网页的信息

Requests库的七个主要方法
request(method.url.**kwargs) 构造一个请求，支撑以下各方法的基础方法
　　get() 获取HTML网页的主要方法，对应于http的get
　　head() 获取ＨＴＭＬ网页头信息的方法，网页中的ｈｅａｄ部分
　　post() 向网页提交post请求，对应于http的post
　　put() 向网页提交Put请求，对应于http的put
　　patch()向HTML网页提交局部修改请求,对应于HTTP的patch
　　delete() 向HTML网页提交删除请求，对应于HTTP的delete

response对象的五个常用属性:
　　r.status_code 200表示连接成功
　　r.text http响应内容的字符串形式，即url对应网页内容
　　r.encoding 从httpheader中猜测响应内容编码方式
　　r.apparent_encoding 从内容中分析出响应内容编码方式
　　r.content http响应内容的二进制形式

get()方法获取资源的一般流程：
　　r.status_code 检查连接状态
　　r.text r,encoding r.apparent_encoding 等解析网页内容

通用框架

import requests
def getHtml(url):
    try:
        r=requests.get(url) ‘‘‘/*返回一个response对象*/ ‘‘‘
        r.raise_for_status
        r.encoding=r.apparent_encoding
        return  r.text;
    except:
        print("爬取失败")

pyhont---信息的爬取与提取---requests库

标签：pre **kwargs 对象 method header res _for url head

原文地址：http://www.cnblogs.com/z-bear/p/7955888.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行