码迷,mamicode.com
首页 > 其他好文 > 详细

scrapy抓取校花网图片

时间:2020-04-17 23:26:55      阅读:73      评论:0      收藏:0      [点我收藏+]

标签:sts   爬虫   enc   file   stdout   write   from   extract   rac   

一:基础版(抓取首页图片)

爬虫py文件代码:

 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 import sys
 4 import io
 5 from scrapy.selector import Selector
 6 from scrapy.http import Request
 7 from ..items import Day96XiaohuaItem
 8 import re
 9 sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding=utf-8)
10 
11 
12 class XiaohuaSpider(scrapy.Spider):
13     name = xiaohua
14     allowed_domains = [www.xueshengmai.com/hua/]
15     start_urls = [http://www.xueshengmai.com/hua/]
16 
17     def parse(self, response):
18         # ------------持久化数据--------------
19         hxs = Selector(response=response).xpath("//div[@class=‘item_t‘]/div[@class=‘img‘]/a/img").extract()
20         # print(hxs)
21         for i in hxs:
22             # print(i)
23             title = re.findall("alt=(.*) src=",i)[0].strip(")+".jpg"
24             src = "http://www.xueshengmai.com%s"%re.findall("src=(.*)>",i)[0].strip(")
25             print(title,src)
26             item_obj = Day96XiaohuaItem(title=title, src=src)
27             yield item_obj

items.py 代码:

1 import scrapy
2 
3 
4 class Day96XiaohuaItem(scrapy.Item):
5     # define the fields for your item here like:
6     # name = scrapy.Field()
7     title=scrapy.Field()
8     src=scrapy.Field()

pipelines代码:

import requests

class Day96XiaohuaPipeline(object):
    def process_item(self, item, spider):
        file_path="imgs/%s"%item["title"]
        file_src=item["src"]
        f=open(file_path,"wb")
        img_date=requests.get(file_src)
        f.write(img_date.content)
        f.close()

二:分页抓取校花网图片

scrapy抓取校花网图片

标签:sts   爬虫   enc   file   stdout   write   from   extract   rac   

原文地址:https://www.cnblogs.com/sun-10387834/p/12723029.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!