python 爬取html页面

时间：2017-04-13 14:56:01 阅读：115 评论：0 收藏：0 [点我收藏+]

标签：int span return 无法 utf-8 pen htm logs code

 1 #coding=utf-8
 2 import urllib.request
 3 
 4 def gethtml(url):  
 5     page=urllib.request.urlopen(url)
 6     html=page.read().decode("utf-8")
 7     
 8     return html  
 9   
10 
11     
12 url="........"  
13 
14 html = gethtml(url)
15 
16 
17 print(html)

有些网站的页面无法全部爬取（笔记）

python 爬取html页面

标签：int span return 无法 utf-8 pen htm logs code

原文地址：http://www.cnblogs.com/jjj-fly/p/6703779.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

Spring Cloud 从入门到精通（一）Nacos 服务中心初探 2021-07-29
基础的排序算法 2021-07-29
SpringBoot|常用配置介绍 2021-07-29
关于 .NET 与 JAVA 在 JIT 编译上的一些差异 2021-07-29
C语言常用函数-toupper()将字符转换为大写英文字母函数 2021-07-29
《手把手教你》系列技巧篇（十）-java+ selenium自动化测试-元素定位大法之By class name（详细教程） 2021-07-28
4-1 YAML配置文件注入 JavaBean中 2021-07-28
【python】用来将对象持久化的 pickle 模块 2021-07-28
马拉车算法 2021-07-28
用Python进行冒泡排序 2021-07-28