标签:imp manager line process select sts 对象 img check
主要特点
#!/usr/bin/env python# -*- coding: utf-8 -*-from bs4 import BeautifulSoup as bsfrom sasila.slow_system.base_processor import BaseProcessorfrom sasila.slow_system.downloader.http.spider_request import Requestfrom sasila.slow_system.core.request_spider import RequestSpiderclass Mzi_Processor(BaseProcessor): spider_id = ‘mzi_spider‘ spider_name = ‘mzi_spider‘ allowed_domains = [‘mzitu.com‘] start_requests = [Request(url=‘http://www.mzitu.com/‘, priority=0)] @checkResponse def process(self, response): soup = bs(response.m_response.content, ‘lxml‘) print soup.title.string href_list = soup.select(‘a‘) for href in href_list: yield Request(url=response.nice_join(href[‘href‘]))
写法与scrapy几乎一样
spider = spider.set_pipeline(ConsolePipeline())
spider.start()
from sasila.slow_system.manager import managermanager.set_spider(spider)sasila.start()
架构
即时爬虫
即时爬虫是可以通过api调用,传入需要爬取的页面或者需求,即时爬取数据并返回结果。现阶段开发并不完善。仅提供思路参考。示例核心代码在 sasila.immediately_system 中。
为啥叫Sasila
作为一个wower,你可以猜到吗ヾ( ̄▽ ̄)
更多源码请进群:125240963 即可获取
介绍大家一款超级灵活,友好并且超级实用的爬虫框架!得心应手!
标签:imp manager line process select sts 对象 img check
原文地址:https://www.cnblogs.com/PY147/p/9191330.html