码迷,mamicode.com
首页 > 其他好文 > 详细

爬基基础

时间:2018-01-21 10:56:26      阅读:113      评论:0      收藏:0      [点我收藏+]

标签:server   pen   bytes   资源   post   utf-8   sts   byte   rgs   

爬虫

网络爬虫(又称网页蜘蛛,网络机器人), 是一种按照一定规则,自动的抓取万维网信息的程序或者脚本。

http与服务器交互的方法:

get  仅仅获取资源的信息,不增加或者修改数据

post 一般放到该服务器上的资源,一般通过form表单进行提交请求

put  增加

delete 删除

Requests模块  安装 pip  install requests

import requests

1、get方式

import requests
params = {key1:hello,key2:world}
url = http://www.baidu.com
r = requests.get(url=url,params=params)
print(r.url)

运行结果:

http://www.baidu.com/?key1=hello&key2=world

2、post方式

import requests
params = {key1:hello,key2:world}
params = {key1:hello,key2:world}
r = requests.post("http://httpbin.org/post",data = params)
print(r.text)

运行结果:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "key1": "hello", 
    "key2": "world"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "21", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "113.116.146.147", 
  "url": "http://httpbin.org/post"

3、相应http的请求

import  requests
url = "http://qiushibaike.com/"
r = requests.get(url=url)
print(r.encoding)
print(type(r.text))
print(type(r.content))

运行结果:

UTF-8
<class str>
<class bytes>

Requests 中text和centent的区别是什么

r.text 返回str类型数据  可用于获取文本类型数据

r.content 返回bytes型  可用于获取图片,文件

4、其他常用方法

import  requests
header = {User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36}
r = requests.get(https://www.qiushibaike.com/, headers=header)
# print(r.text)
print(r.request)
print(r.headers)
print(r.cookies)
print(r.url)
print(r.status_code)

运行结果

<PreparedRequest [GET]>
{Server: openresty, Date: Sun, 21 Jan 2018 01:08:11 GMT, Content-Type: text/html; charset=UTF-8, Content-Length: 18094, Connection: keep-alive, Content-Encoding: gzip, Set-Cookie: _xsrf=2|acc1cc58|fb495aec5628f018bc13a85be6a76a81|1516496891; Path=/, Vary: User-Agent, Accept-Encoding, Etag: "560c073021ccc9e765bb6f4e4b4182594d4664ec"}
<RequestsCookieJar[<Cookie _xsrf=2|acc1cc58|fb495aec5628f018bc13a85be6a76a81|1516496891 for www.qiushibaike.com/>]>
https://www.qiushibaike.com/
200

Request的会话对象

Python3   s = requests.session()

Python2   S = requests.Session()

所有一次会话的信息都保存在s中,只需要对s进行操作就可以了。

s.get(url)

 

爬基基础

标签:server   pen   bytes   资源   post   utf-8   sts   byte   rgs   

原文地址:https://www.cnblogs.com/pythonlx/p/8323461.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!