码迷,mamicode.com
首页 > 编程语言 > 详细

python3 爬虫入门

时间:2019-05-12 11:14:55      阅读:111      评论:0      收藏:0      [点我收藏+]

标签:gen   start   encoding   header   dede   ESS   ext   parse   url   

 

import urllib.request;
import urllib.parse;

url = "http://www.iciba.com/publish";

headers = {
	"Host" : "www.iciba.com",
	"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
	"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
	"Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
	#"Accept-Encoding" : "gzip, deflate"
};


request = urllib.request.Request(url=url,headers=headers);

response = urllib.request.urlopen(request);

print(response.read().decode());

 

报错:

UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte

【解决之道】没有进行解压缩处理

import urllib.request;
import urllib.parse;
import gzip;




url = "https://www.baidu.com";
headers = {
	"Host" : "www.baidu.com",
	"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
	"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
	"Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
	"Accept-Encoding" : "gzip, deflate"
};


request = urllib.request.Request(url=url,headers=headers);

response = urllib.request.urlopen(request);


content = response.read();
‘‘‘
获取响应信息
‘‘‘
encoding = response.info().get("Content-Encoding");


if(encoding == "gzip"):
	print(gzip.decompress(content).decode());

 

python3 爬虫入门

标签:gen   start   encoding   header   dede   ESS   ext   parse   url   

原文地址:https://www.cnblogs.com/liwuming/p/10851045.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!