python---网络爬虫

时间：2018-08-20 15:45:10 阅读：153 评论：0 收藏：0 [点我收藏+]

标签：coding ref decode tab import insecure col XML for

写了一个简单的网络爬虫：

#coding=utf-8
from bs4 import BeautifulSoup
import requests
url = "http://www.weather.com.cn/textFC/hb.shtml"
def get_temperature(url):
    headers = {
        ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36‘,
        ‘Upgrade-Insecure-Requests‘:‘1‘,
        ‘Referer‘:‘http://www.weather.com.cn/weather1d/10129160502A.shtml‘,
        ‘Host‘:‘www.weather.com.cn‘
    }
    res = requests.get(url,headers=headers)
    res.encoding = "utf-8"
    content = res.content # 拿到的是ascll编码
    content = content.decode(‘UTF-8‘)# 转成UTF-8编码
    #print(content)

    soup = BeautifulSoup(content,‘lxml‘)
    conMidetab = soup.find(‘div‘,class_=‘conMidtab‘)
    conMidetab2_list = conMidetab.find_all(‘div‘,class_=‘conMidtab2‘)
    for x in conMidetab2_list:
        tr_list = x.find_all(‘tr‘)[2:] # 所有的tr
        province = ‘‘
        min = 0
        for index,x in enumerate(tr_list):
            if index == 0:
                td_list = x.find_all(‘td‘)
                province = td_list[0].text.replace(‘\n‘,‘‘)
                city = td_list[1].text.replace(‘\n‘,‘‘)
                min = td_list[7].text.replace(‘\n‘,‘‘)
            else:
                td_list = x.find_all(‘td‘)
                city = td_list[0].text.replace(‘\n‘,‘‘)
                min = td_list[6].text.replace(‘\n‘,‘‘)
            print(province,city,min)
        # province_list = tr_list[2]
        # td_list = province_list.find_all(‘td‘)
        # province_td = td_list[0]
        # province = province_td.text
        # #print(province.replace(‘\n‘,‘‘))
get_temperature(url)

python---网络爬虫

标签：coding ref decode tab import insecure col XML for

原文地址：https://www.cnblogs.com/e0yu/p/9505490.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行