python爬虫抓取站长之家IP库，仅供练习用！

时间：2017-03-13 16:57:07 阅读：461 评论：0 收藏：0 [点我收藏+]

python爬虫抓取站长之家IP库，单线程的，仅供练习，IP库数据有43亿条，如果按此种方法抓取至少得数年，所以谨以此作为练手，新手代码很糙，请大家见谅。

#!/usr/bin/python
#coding=UTF-8
import urllib2
import re
import os
import csv
import codecs
user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘
headers = { ‘User-Agent‘ : user_agent }

def gethtml(url):
    user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘
    headers = { ‘User-Agent‘ : user_agent }
    request=urllib2.Request(url,headers=headers)
    response=urllib2.urlopen(request)
    html=response.read()
    return html
def getdate(html):
    r=r‘<span class="Whwtdhalf w50\-0">(.*?)</span>\s*</p>‘
    rs=re.compile(r,re.S)
    iplace=rs.findall(html)
    return iplace
def getdataid(html):
    r=r‘<span class="Whwtdhalf w15\-0">(\d+?)</span>‘
    rs=re.compile(r)
    dataid=rs.findall(html)
    return dataid
def geturl(ip):
    url=‘http://ip.chinaz.com/%s‘%ip
    return url

count=1
with open(‘ipku.csv‘,‘a+‘) as csvfile:
    csvfile.write(codecs.BOM_UTF8)
    spamwriter=csv.writer(csvfile,dialect=‘excel‘)
    spamwriter.writerow([‘数字地址‘,‘IP地址‘,‘服务器地址‘])
    for k in xrange(211,256):
        for v in xrange(0,256):
            for m in xrange(10,256):
                for n in xrange(0,256):
                    ip=str(k)+‘.‘+str(v)+‘.‘+str(m)+‘.‘+str(n)
                    url=geturl(ip)
                    html=gethtml(url)
                    dataid=getdataid(html)
                    iplace=getdate(html)
                    for h in dataid:
                        print count
                        count+=1
                        spamwriter.writerow([h,ip,iplace[1]])

打印count是因为让大家看到进程在运行，没特殊意义。

本文出自 “全球互联云主机Q874247458” 博客，请务必保留此出处http://gosweet.blog.51cto.com/11759495/1905737

python爬虫抓取站长之家IP库，仅供练习用！

标签：爬虫 python ip库

原文地址：http://gosweet.blog.51cto.com/11759495/1905737

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行