码迷,mamicode.com
首页 > 其他好文 > 详细

各种爬虫管道

时间:2018-08-23 21:02:26      阅读:311      评论:0      收藏:0      [点我收藏+]

标签:.json   api   管道   dict   sel   finish   datetime   exporter   ret   

from datetime import datetime
from scrapy.exporters import JsonItemExporter, CsvItemExporter
import pymongo
import redis
from .settings import REDIS_HOST, REDIS_PORT, MONGO_HOST, MONGO_PORT


# 数据源的管道
class AqiDataPipeline(object):
    def process_item(self, item, spider):
        # 记录爬取时间
        item[‘crawl_time‘] = datetime.utcnow()
        # 记录爬虫
        item[‘spider‘] = spider.name
        return item


# Json的管道
class AqiJsonPipeline(object):
    def open_spider(self, spider):
        self.file = open("aqi.json", ‘wb‘)
        self.write = JsonItemExporter(self.file)
        self.write.start_exporting()

    def process_item(self, item, spider):
        self.write.export_item(item)
        return item

    def close_spider(self, spider):
        self.write.finish_exporting()
        self.file.close()


# Csv的管道
class AqiVscPipeline(object):
    def open_spider(self, spider):
        self.file = open("aqi.csv", ‘wb‘)
        self.write = CsvItemExporter(self.file)
        self.write.start_exporting()

    def process_item(self, item, spider):
        self.write.export_item(item)
        return item

    def close_spider(self, spider):
        self.write.finish_exporting()
        self.file.close()


# mongodb数据库管道
class AqiMongoPipeline(object):
    def open_spider(self, spider):
        self.client = pymongo.MongoClient(host=MONGO_HOST, port=MONGO_PORT)
        self.db = self.client[‘Aqi‘]
        self.collection = self.db[‘aqi‘]

    def process_item(self, item, spider):
        self.collection.insert(dict(item))
        return item

    def close_spider(self, spider):
        self.client.close()


# redis数据库管道
class AqiRedisPipeline(object):
    def open_spider(self, spider):
        self.client = redis.Redis(host=REDIS_HOST, port=REDIS_PORT)

    def process_item(self, item, spider):
        self.client.lpush(‘aqi‘, dict(item))
        return item

各种爬虫管道

标签:.json   api   管道   dict   sel   finish   datetime   exporter   ret   

原文地址:https://www.cnblogs.com/hanjian200ok/p/9526028.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!