爬取google的搜索结果并保存

时间：2018-08-10 14:32:08 阅读：264 评论：0 收藏：0 [点我收藏+]

标签：none turn 搜索 *** request ppa body parent odi

demo：

#coding:utf-8
import requests
from bs4 import BeautifulSoup
import bs4
import re
def getHTMLText(url):
    try:
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return ‘‘


def fillList(ulist,html):
    soup=BeautifulSoup(html,‘lxml‘)
    bd=soup.body.find_all(‘cite‘)
    for node in soup.find_all(‘div‘, {‘class‘: ‘g‘}):
        cite_node = node.find(‘cite‘)
        abstract_node = node.find(‘span‘, {‘class‘: ‘st‘})
        time_node=node.find(‘span‘,{‘class‘:‘f‘})
        cite=cite_node.text
        abstract=abstract_node.text
        #time=time_node.text
        #if time is None:
        #    continue
        #print(time)
        ulist.append([cite,abstract])
        #print(‘*********‘)
    print(ulist)


uinfo=[]
url="https://www.google.com.hk/search?safe=strict&source=hp&ei=mQltW6O1CLe60PEP-_eY-AQ&q=%E6%98%8E%E7%95%A5%E6%95%B0%E6%8D%AECTO&oq=%E6%98%8E%E7%95%A5%E6%95%B0%E6%8D%AECTO&gs_l=psy-ab.3...7917.11610.0.12024.14.12.0.0.0.0.896.1417.5-1j1.2.0....0...1c.1j4.64.psy-ab..12.2.1416...0j0i30k1j0i5i30k1.0.uovOOEULNls"
html=getHTMLText(url)
fillList(uinfo,html)

爬取google的搜索结果并保存

标签：none turn 搜索 *** request ppa body parent odi

原文地址：https://www.cnblogs.com/elpsycongroo/p/9454551.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行