标签:
BeautifulSoup 一个分析、处理DOM树的类库。
采集所有img标签的title属性的内容
# -*- coding: utf-8 -*- from urllib.request import urlopen from urllib.error import HTTPError from bs4 import BeautifulSoup url = "http://qa.beloved999.com/category/view?id=2" url = "http://beloved.finley.com/category/view?id=24" html = urlopen(url) bs = BeautifulSoup(html.read(),"html.parser") res = bs.findAll("img", "item-image") print(len(res)) for a in res: print(a[‘title‘])
发送的HTTP_USER_AGENT是Python-urllib/3.4。包含HTTP的信息有
‘HTTP_ACCEPT_ENCODING‘ => ‘identity‘
‘HTTP_CONNECTION‘ => ‘close‘
‘HTTP_HOST‘ => ‘beloved.finley.com‘
‘HTTP_USER_AGENT‘ => ‘Python-urllib/3.4‘ 。
标签:
原文地址:http://www.cnblogs.com/mafeifan/p/4655782.html