标签:传输 item using erro debug reference define fine 过多
The main goal in scraping is to extract structured data from unstructured sources, typically, web pages. Scrapy spiders can return the extracted data as Python dicts. While convenient and familiar, Python dicts lack structure: it is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.
To define common output data format Scrapy provides the Item
class. Item
objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.
Various Scrapy components use extra information provided by Items: exporters look at declared fields to figure out columns to export, serialization can be customized using Item fields metadata, trackref
tracks Item instances to help find memory leaks (see Debugging memory leaks with trackref), etc.
简单的说,就是爬虫过多的时候,使用dict容易出现键字打错,而造成数据传输错误,使用item 系统可以通过key error来提示程序员从而避免这种问题。
scrapy 为什么要用yield item 而不用yield dict来传输数据
标签:传输 item using erro debug reference define fine 过多
原文地址:https://www.cnblogs.com/WalkOnMars/p/12168584.html