标签:res join mis first extract pat color contain bsp
class ImgSpider(scrapy.Spider): name = ‘img‘ # allowed_domains = [‘https://sc.chinaz.com/tupian/‘] start_urls = [‘https://sc.chinaz.com/tupian/‘] def parse(self, response): res_list = response.xpath(‘//div[@id="container"]/div‘) for i in res_list: res_item = i.xpath(‘./div/a/img/@src2‘).extract_first() item = ImgsproItem() item[‘src‘] = res_item yield item
错误提示如下
raise ValueError(f‘Missing scheme in request url: {self._url}‘)
原因分析:
res_item得到的地址缺少域名,是不完整的url,为确保请求地址正确,可以使用urljoin()。
参见 https://stackoverflow.com/questions/42026244/scrapy-valueerrormissing-scheme-in-request-url-s-self-url
代码修改如下后可正确执行
def parse(self, response): res_list = response.xpath(‘//div[@id="container"]/div‘) for i in res_list: res_item = i.xpath(‘./div/a/img/@src2‘).extract_first() item = ImgsproItem() # item[‘src‘] = res_item item[‘src‘] = response.urljoin(res_item) yield item
python scrapy 报错 raise ValueError(f'Missing scheme in request url:
标签:res join mis first extract pat color contain bsp
原文地址:https://www.cnblogs.com/codingsea/p/14823473.html