码迷,mamicode.com
首页 > 其他好文 > 详细

抓取贴吧

时间:2018-06-08 19:31:47      阅读:189      评论:0      收藏:0      [点我收藏+]

标签:rds   header   win64   dex   style   baidu   NPU   结束   urllib   


import os
import urllib.request
from urllib import parse

#https://tieba.baidu.com/f?kw=%E7%BE%8E%E5%A5%B3&pn=50

def writePage(filename,html):
"""
:param filename:
"""
with open(filename, "wb") as f:
f.write(html)

def request(url,begPage,endPage):
for index in range(begPage, endPage + 1):
ua_url = fullurl + "&pn=" + str((index - 1) * 50)
print(ua_url)
request = urllib.request.Request(ua_url, headers=headers)
response = urllib.request.urlopen(request).read()
filename = "第" + str(index) + "页.html"
writePage(filename,response)

if __name__ == "__main__":
url = "https://tieba.baidu.com/f?"
key = input("please input keywords to query >>")
begPage = int(input("请输入开始页数 >> "))
endPage = int(input("请输入结束页数 >> "))
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0"}
kw = parse.urlencode({"kw": key})
fullurl = url + kw
request(fullurl,begPage,endPage)

抓取贴吧

标签:rds   header   win64   dex   style   baidu   NPU   结束   urllib   

原文地址:https://www.cnblogs.com/angle90/p/9157065.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!