码迷,mamicode.com
首页 > 其他好文 > 详细

scrapy - grab english name

时间:2017-02-14 19:00:22      阅读:201      评论:0      收藏:0      [点我收藏+]

标签:mozilla   result   load   proc   ide   x64   current   elf   lis   

wxpath定位-采集验证-入库-使用。

 

 

from scrapy.spider import Spider
from scrapy.crawler import CrawlerProcess


class EnglishName(Spider):
    name = EnglishName
    start_urls = [http://babynames.net/all/starts-with/%(first)s?page=%(page)i % {first: first,
                                                                                    page: page,} for first in abcdefghijklmnopqrstuvwxyz
                                                                                                 for page in range(1, 30, 1)]

    def parse(self, response):

        for wname in response.xpath(".//ul[@class=‘names-results listing-view‘]/li"):
            grab_url = response.url
            print grab_url
            wboy = wname.xpath("a/span[@class=‘result-gender boy‘]")
            wgirl = wname.xpath("a/span[@class=‘result-gender girl‘]")
            wres = wname.xpath("a/span[@class=‘result-name‘]/text()").extract()
            isboy = 1
            if wboy == []:
                isboy = 0
            for w in wres:
                wres=w
                print isboy
                print wres


if __name__ == __main__:
    process = CrawlerProcess({DOWNLOAD_DELAY:2,
                              CONCURRENT_REQUESTS_PER_DOMAIN: 6,
                              USER_AGENT:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2950.5 Safari/537.36
                              })
    process.crawl(EnglishName)
    process.start()

 

scrapy - grab english name

标签:mozilla   result   load   proc   ide   x64   current   elf   lis   

原文地址:http://www.cnblogs.com/yuanjiangw/p/6398707.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!