python 爬虫之BeautifulSoup 库的基本使用

时间：2018-10-23 14:28:28 阅读：200 评论：0 收藏：0 [点我收藏+]

标签：python 爬虫 htm 轻松方式构造 pos 属性 .com import

import urllib2
url = ‘http://www.someserver.com/cgi-bin/register.cgi‘
values = {}
values[‘name‘] = ‘Michael Foord‘
values[‘location‘] = ‘Northampton‘
values[‘language‘] = ‘Python‘

data = urllib.urlencode(values) #数据进行编码生成get方式的请求字段
req = urllib2.Request(url,data) #作为data参数传递到Request对象中 POST方式访问
response = urllib2.urlopen(req) 返回一个类文件对象
the_page = response.read()
soup = BeautifulSoup(the_page，"html.parser") 通过类文件the_page 创建beautifulsoup对象，soup的内容就是页面的源码内容
构造好BeautifulSoup对象后，借助find()和find_all()这两个函数，可以通过标签的不同属性轻松地把繁多的html内容过滤为你所想要的
url_name = line.get(‘href‘) 获取a标签的url信息
Title = line.get_text().strip() 获取a标签的文本内容

标签：python 爬虫 htm 轻松方式构造 pos 属性 .com import

原文地址：http://blog.51cto.com/weadyweady/2307779

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行