Python scrapy爬虫数据保存到MySQL数据库

时间：2019-04-17 23:18:18 阅读：211 评论：0 收藏：0 [点我收藏+]

标签：position blank 招聘信息 object cursor 构造 mys var word

除将爬取到的信息写入文件中之外，程序也可通过修改 Pipeline 文件将数据保存到数据库中。为了使用数据库来保存爬取到的信息，在 MySQL 的 python 数据库中执行如下 SQL 语句来创建 job_inf 数据表：

CREATE TABLE job inf (
  id INT (11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  title VARCHAR (255),
  salary VARCHAR (255),
  company VARCHAR (255),
  url VARCHAR(500),
  work_addr VARCHAR (255),
  industry VARCHAR (255),
  company_size VARCHAR(255),
  recruiter VARCHAR(255),
  publish_date VARCHAR (255)
)

　然后将 Pipeline 文件改为如下形式，即可将爬取到的信息保存到 MySQL 数据库中：

# 导入访问MySQL的模块
import mysql.connector
class ZhipinspiderPipeline(object):
    # 定义构造器，初始化要写入的文件
    def __init__(self):
        self.conn = mysql.connector.connect(user=‘root‘, password=‘32147‘,
            host=‘localhost‘, port=‘3306‘,
            database=‘python‘, use_unicode=True)
        self.cur = self.conn.cursor()
    # 重写close_spider回调方法，用于关闭数据库资源
    def close_spider(self, spider):
        print(‘----------关闭数据库资源-----------‘)
        # 关闭游标
        self.cur.close()
        # 关闭连接
        self.conn.close()
    def process_item(self, item, spider):
        self.cur.execute("INSERT INTO job_inf VALUES(null, %s, %s, %s, %s, %s,             %s, %s, %s, %s)", (item[‘title‘], item[‘salary‘], item[‘company‘],
            item[‘url‘], item[‘work_addr‘], item[‘industry‘],
            item.get(‘company_size‘), item[‘recruiter‘], item[‘publish_date‘]))
        self.conn.commit()

　　上面程序中第 19 行代码使用 execute() 方法将 item 对象中的信息插入数据库中。

程序为该 Pipeline 类定义了构造器，该构造器可用于初始化数据库连接、游标；程序还为该 Pipeline 类重写了 close_spider() 方法，该方法负责关闭构造器中初始化的数据库资源。

使用 scrapy crawl job_position 命令启动爬虫，当程序运行结束之后，将会在 python 数据库的 job_inf 表中看到多了 300 条招聘信息。

Python scrapy爬虫数据保存到MySQL数据库

标签：position blank 招聘信息 object cursor 构造 mys var word

原文地址：https://www.cnblogs.com/jackzz/p/10726911.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行