no2.crossdomain.xml批量读取（待完善）

时间：2016-05-02 02:17:33 阅读：219 评论：0 收藏：0 [点我收藏+]

标签：

读取太多url有问题

#coding=utf-8 
import urllib
import requests
import sys
import re
import time


def getxml(url):
    xml = urllib.urlopen(url+‘/crossdomain.xml‘)
    xmlread = xml.read() 
    reg = str(r‘(?=domain=)(.*?)(?=/>)‘)
    #reg = str(r‘<?xml*(.*?)</‘)
    reg = re.compile(reg)
    domaintxt = re.findall(reg,xmlread)
    #print domaintxt
    return domaintxt

f = open(‘xmlsource.txt‘,‘r‘)
f1 = open(‘reslut.txt‘,‘w‘)
#try:
context=list_of_all_the_lines = f.readlines( )
for i in context:
    #context:
    x = i.strip()
    print ‘website:‘+x+‘ have ‘+str(len(getxml(x)))+‘ domain:‘
    print >>f1,‘website:‘+x+‘ have ‘+str(len(getxml(x)))+‘ domain:‘
    #print context[i] +str(len(getxml(x)))
    xmllen = len(getxml(x))
    for m in range(0,xmllen,1):
        falresult = getxml(x)[m]
        falresult = falresult.replace(‘"‘,‘‘)
        falresult = falresult.replace(‘domain=‘,‘‘)
        print falresult
        print >>f1,falresult
    print (‘\n‘)
    print >>f1,(‘\n‘)
    time.sleep(1)
print (‘Over‘)
print >>f1,(‘Over‘)
f1.close()

　xml:

http://www.sina.com.cn/
http://www.discuz.net/
http://www.rising.com.cn/
http://www.ifeng.com//
http://www.sdo.com/
http://www.sogou.com/
http://www.163.com/

no2.crossdomain.xml批量读取（待完善）

标签：

原文地址：http://www.cnblogs.com/crac/p/5451639.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行