标签:controls 小爬虫 读数 ike 爬虫 代码 art xpath play
话不多说,分析一波csdn的阅读数,计数原理是每次进入页面记作一次,所以我们很简单的构建一个访问的小爬虫就好了,那么开始操作。
1 import requests 2 import time 3 from lxml import etree 4 import random 5 ? 6 def post_article(): 7 ‘‘‘下面url换成自己的,获取自己所有博客的链接‘‘‘ 8 response = requests.get(url=‘me_url‘,headers = getHeaders()) 9 text = response.content.decode(‘utf-8‘) 10 html = etree.HTML(text) 11 urls = html.xpath(‘//h4/a/@href‘) 12 for url in urls: 13 article_url.append(url) 14 15 def access_url(): 16 ‘‘‘访问其中一个url,随机从自己的博客中选中进行访问‘‘‘ 17 try: 18 url = random.choice(article_url) 19 response = requests.get(url, headers=getHeaders()) 20 time.sleep(2) 21 except Exception as e : 22 print(e)
根据上面的代码,你的博客阅读数会蹭蹭的上涨,唉,想想都泪奔,要靠这种,
我们下面写一下注意的就可以,设置headers,还有睡眠时间等,频繁的访问会使服务器拒绝为你增加阅读数,you ok?(散装英语).
再加上我们设置的headers:
1 def getHeaders(): 2 user_agent_list = [ 3 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1" 4 "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11", 5 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", 6 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6", 7 "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", 8 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5", 9 "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", 10 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 11 "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 12 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 13 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", 14 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", 15 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 16 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 17 "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 18 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3", 19 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", 20 "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24" 21 ] 22 UserAgent = random.choice(user_agent_list) 23 headers = {‘User-Agent‘: UserAgent} 24 return headers
主程序代码块:
1 if __name__ == ‘__main__‘: 2 index = 0 3 post_article() 4 print(‘进行到这了。。。‘) 5 while True: 6 access_url() 7 print(index) 8 index += 1 9 ‘‘‘自己随意设计的次数‘‘‘ 10 if index == 100000: 11 break
这个小爬虫就这么出来了,不要过度使用,只为学习技术,有任何纠纷跟我无关(瑟瑟发抖)。
标签:controls 小爬虫 读数 ike 爬虫 代码 art xpath play
原文地址:https://www.cnblogs.com/xbhog/p/11745793.html