码迷,mamicode.com
首页 > 编程语言 > 详细

python小程序 获取wooyun厂商site

时间:2016-04-12 21:00:57      阅读:405      评论:0      收藏:0      [点我收藏+]

标签:

# encoding=utf-8
import re
import requests

class getUrl(object):

    def __init__(self,num):
        self.totle = num
        self.myheader = {Host: www.wooyun.org,
                         Connection:  keep-alive,
                    User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36,
                    Accept:*/*,Referer:http://www.wooyun.com/,
                    Accept-Encoding:gzip, deflate, sdch,Accept-Language:zh-CN,zh;q=0.8}  # 表头信息

    def beginer(self):
        print get start
        page = 1
        urlliset = []
        while page < 45:
            url = http://www.wooyun.org/corps/page/+str(page)
            r = requests.get(url,headers=self.myheader)
            site = re.findall(href="http://(.*?)",r.text)
            site = re.findall((!www.)(.*?),r.text)
            site2 = re.findall(href="https://(.*?)",r.text)
            page += 1
            for elem in site:
                urlliset.append(elem)
            for elem in site2:
                urlliset.append(elem)
        self.writeQQ(text = urlliset,file_dir=site.text,mode=w)

    def writeQQ(self,text, file_dir, mode):
        with open(file_dir, mode) as f:
            for site in text:
                f.write(site)
                f.write("\n")




spidre = getUrl(44)
spidre.beginer()

 

python小程序 获取wooyun厂商site

标签:

原文地址:http://www.cnblogs.com/zxcx/p/5384350.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!