标签:
import scrapy from goose import Goose class Article(scrapy.Item): title = scrapy.Field() text = scrapy.Field() class MyGooseSpider(scrapy.Spider): name = ‘goose‘ start_urls = [ ‘http://blog.scrapinghub.com/2014/06/18/extracting-schema-org-microdata-using-scrapy-selectors-and-xpath/‘, ‘http://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/‘, ] def parse(self, response): article = Goose().extract(raw_html=response.body) yield Article(title=article.title, text=article.cleaned_text)
转自:http://stackoverflow.com/questions/26940002/can-i-use-scrapy-with-goose
标签:
原文地址:http://www.cnblogs.com/bushe/p/4757981.html