Python一日一练05----怒刷点击量

时间：2015-04-02 15:16:57 阅读：139 评论：0 收藏：0 [点我收藏+]

标签：

功能

自动获取CSDN文章列表，并对每篇文章增加点击量.

源码

import urllib.request
import re
import time
import random
from bs4 import BeautifulSoup

p = re.compile('/a359680405/article/details/........')

#自己的博客主页
url = "http://blog.csdn.net/a359680405"

#使用build_opener()是为了让python程序模仿浏览器进行访问
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

html = opener.open(url).read().decode('utf-8')

allfinds = p.findall(html)
print(allfinds)

urlBase = "http://blog.csdn.net"#需要将网址合并的部分
#页面中的网址有重复的，需要使用set进行去重复
mypages = list(set(allfinds))
for i in range(len(mypages)):
    mypages[i] = urlBase+mypages[i]

print('要刷的网页有：')
for index , page in enumerate(mypages) :
    print(str(index), page)

#设置每个网页要刷的次数
brushMax = 200

#所有的页面都刷
print('下面开始刷了哦：')
for index , page in enumerate(mypages) :
    brushNum=random.randint(0,brushMax)
    for j in range(brushNum):
        try :
            pageContent = opener.open(page).read().decode('utf-8')
            #使用BeautifulSoup解析每篇博客的标题
            soup = BeautifulSoup(pageContent)
            blogTitle = str(soup.title.string)
            blogTitle = blogTitle[0:blogTitle.find('-')]
            print(str(j) , blogTitle) 
            
        except urllib.error.HTTPError:
            print('urllib.error.HTTPError')
            time.sleep(1)#出现错误，停几秒先
            
        except urllib.error.URLError:
            print('urllib.error.URLError')
            time.sleep(1)#出现错误，停几秒先
        time.sleep(0.1)#正常停顿，以免服务器拒绝访问

Python一日一练05----怒刷点击量

标签：

原文地址：http://blog.csdn.net/a359680405/article/details/44830847

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行