python3 的小爬虫

时间：2018-06-18 19:56:18 阅读：187 评论：0 收藏：0 [点我收藏+]

标签：dal int else http NPU nbsp 小爬虫 file tps

爬取网址

https://findicons.com/pack/2787/beautiful_flat_icons

import  requests
import re
import  time
def get_html(url):
    headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.18 Safari/537.36"}     #访问的请求头，模拟浏览器的请求头
    request=requests.get(url,headers=headers)                       #给一个get请求，把请求结果保留下来
    html=request.text                                               #把结果用代码的方式呈现
    return  html

def get_image(html):
    reexpr=re.compile(‘<img src="https://findicons.com/files/(.*?)"‘,re.S)         #re表达式，re.S表示可以换行
    a=re.findall(reexpr,html)                                                      #从网页源码中找到所有符合正则的条件
    global count
    for i in a :
        full_web="https://findicons.com/files/"+i                                  #因为正则表达式匹配时总会有两三张不同的图，所以上面的正则只是匹配到id后面的内容，这里把整个url给拼接起来了
        full_web_image=requests.get(full_web)                                      #使用获取的网址再去发送请求，获取图片
        with open(r"F:\image\%d图.jpg"%count,"wb") as f:                           #保存图片
            print("正在抓取第%d张图片"%count)
            f.write(full_web_image.content)
            time.sleep(1)
            count+=1
    # return count

count=1                    #定义一个全局的计数器

def main(offest=1):
    url = "https://findicons.com/pack/2787/beautiful_flat_icons/"+str(offest)
    html_text = get_html(url)
    get_image(html_text)


if __name__ == "__main__":

    offest=input("请输入想爬取到第几页,默认爬取第一页")
    if offest=="":
        main()
    else:
        for i in range (1,int(offest)+1):
            main(i)

python3 的小爬虫

标签：dal int else http NPU nbsp 小爬虫 file tps

原文地址：https://www.cnblogs.com/liutao97/p/9195879.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行