码迷,mamicode.com
首页 > 其他好文 > 详细

爬取股票scrapy

时间:2019-03-14 13:38:45      阅读:224      评论:0      收藏:0      [点我收藏+]

标签:define   cal   utf-8   yield   工程   lin   name   建立   test   

步骤1.建立工程和Spider

scrapy startproject BaiduStocks
cd BaiduStocks 
scrapy genspider stocks baidu.com

 步骤2.编写爬虫Spider

 配置stocks.py文件

修改返回页面的处理

修改对新增url爬取请求的处理

import scrapy
import re
class StocksSpider(scrapy.Spider):
    name = "stocks"
    start_urls = [http://quote.eastmoney.com/stocklist.html]
    def parse(self,response):
            for href in response.css(a::aattr(href)).extract():
                try:
                    stock = re.findall(r"[s][hz]\d{6}",href)[0]
                    url = https://gupiao.bai.com/stock/ + stock +.html
                    yield scrapy.Request(url,callback=self.parse_stock)
                except:
                    continue
    def parse_stock(self.response):
        infoDict = {}
        stockInfo = respomse.css(.stock-bets)
        name = stockInfo.css(.bets-name).extact()[0]
        keyList = stockInfo.css(dt).extract()
        valueList = stockInfo.css(dd).extract()
        for i in range(len(keyList)):
            key = re.findall(r>.*</dt>,keyList[i])[0][1:-5]
            try:
                val = re.findall(r\d+\.?.*,valueList[i])[0][0:-5]
            except:
                val = --
            infoDict[key]=val
        infoDict.update(
        {股票名称}:re.findall(\s.*\(,name)[0].split()[0]+        re.findall(\>.*\<,name)[0][1:-i]})
     yield inofDict

 

 步骤3.编写Piplines.py文件

定义对爬取项(Scraped Item)的处理类

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html


class BaidustocksPipeline(object):
    def process_item(self, item, spider):
        return item
class BaidustocksinfoPipeline(object):
    def open_spider(self,spider):
        self.f = open(BaiduStockInfo.txt,w)
    def close_spider(self.spider):
        self.f.close()
    def process_item(self.item.spider):
        try:
            line = str(dict(item)) + \n
            self.f.write(line)
        except:
            pass
        return item

配置ITEM_PIPLINES选项

修改setting.py

ITEM_PIPELINES = {
    BaiduStocks.pipelines.BaidustocksInfoPipeline: 300,

 

爬取股票scrapy

标签:define   cal   utf-8   yield   工程   lin   name   建立   test   

原文地址:https://www.cnblogs.com/HomeG/p/10529500.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!