标签:coding ref decode tab import insecure col XML for
写了一个简单的网络爬虫:
#coding=utf-8 from bs4 import BeautifulSoup import requests url = "http://www.weather.com.cn/textFC/hb.shtml" def get_temperature(url): headers = { ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36‘, ‘Upgrade-Insecure-Requests‘:‘1‘, ‘Referer‘:‘http://www.weather.com.cn/weather1d/10129160502A.shtml‘, ‘Host‘:‘www.weather.com.cn‘ } res = requests.get(url,headers=headers) res.encoding = "utf-8" content = res.content # 拿到的是ascll编码 content = content.decode(‘UTF-8‘)# 转成UTF-8编码 #print(content) soup = BeautifulSoup(content,‘lxml‘) conMidetab = soup.find(‘div‘,class_=‘conMidtab‘) conMidetab2_list = conMidetab.find_all(‘div‘,class_=‘conMidtab2‘) for x in conMidetab2_list: tr_list = x.find_all(‘tr‘)[2:] # 所有的tr province = ‘‘ min = 0 for index,x in enumerate(tr_list): if index == 0: td_list = x.find_all(‘td‘) province = td_list[0].text.replace(‘\n‘,‘‘) city = td_list[1].text.replace(‘\n‘,‘‘) min = td_list[7].text.replace(‘\n‘,‘‘) else: td_list = x.find_all(‘td‘) city = td_list[0].text.replace(‘\n‘,‘‘) min = td_list[6].text.replace(‘\n‘,‘‘) print(province,city,min) # province_list = tr_list[2] # td_list = province_list.find_all(‘td‘) # province_td = td_list[0] # province = province_td.text # #print(province.replace(‘\n‘,‘‘)) get_temperature(url)
标签:coding ref decode tab import insecure col XML for
原文地址:https://www.cnblogs.com/e0yu/p/9505490.html