scrapy crawl itcast -o teachers.json 爬虫案列

时间：2018-01-11 20:39:41 阅读：729 评论：0 收藏：0 [点我收藏+]

标签：imp each parse col ike 数据封装 color star doc

spider.py文件配置

  1 
  2 # -*- coding: utf-8 -*-
  3 import scrapy
  4 from itTeachers.items import ItteachersItem
  5 
  6 
  7 class ItcastSpider(scrapy.Spider):
  8     name = ‘itcast‘
  9     allowed_domains = [‘itcast.cn‘]
 10     start_urls = [‘http://www.itcast.cn/channel/teacher.shtml#‘]
 11 
 12     def parse(self, response):
 13         #with open("teacher.html","w") as f:
 14             #f.write(response.body)
 15 
 16         items = []
 17 
 18         teacher_list = response.xpath(‘//div[@class="li_txt"]‘)
 19         for each in teacher_list:
 20 
 21             #我们将得到的数据封装到一个‘ItcastItem‘对象
 22             item = ItteachersItem()
 23             name = each.xpath(‘h3/text()‘).extract()
 24             title = each.xpath(‘h4/text()‘).extract()
 25             info = each.xpath(‘p/text()‘).extract()
 26 
 27             #xpath返回的是包含一个元素的列表
 28             item[‘name‘] = name[0]
 29             item[‘title‘] = title[0]
 30             item[‘info‘] = info[0]
 31 
 32             items.append(item)
 33         #直接返回最后数据
 34         return items
~

items.py文件配置

  1 # -*- coding: utf-8 -*-
  2 
  3 # Define here the models for your scraped items
  4 #
  5 # See documentation in:
  6 # https://doc.scrapy.org/en/latest/topics/items.html
  7 
  8 import scrapy
  9 
 10 
 11 class ItteachersItem(scrapy.Item):
 12     # define the fields for your item here like:
 13     # name = scrapy.Field()
 14     name = scrapy.Field()
 15     title = scrapy.Field()
 16     info = scrapy.Field()

scrapy crawl itcast -o teachers.json 爬虫案列

标签：imp each parse col ike 数据封装 color star doc

原文地址：https://www.cnblogs.com/hizf/p/8270008.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行