标签:写入 存储 平台 EDA nat 一个数据库 允许 subject sts
大概是今年五月中旬左右,豆瓣对外开放的API接口不允许访问了。包括以下API接口:
https://api.douban.com/v2/movie/in_theaters
https://api.douban.com/v2/movie/top250
https://api.douban.com/v2/movie/coming_soon
https://api.douban.com/v2/movie/search
https://api.douban.com/v2/movie/subject/:id
就算是把域名从https://api.douban.com修改为https://douban.uieee.com也无法继续访问。
网上搜索解决方案知道需要添加apikey,于是乎可以用了。然后过了几天后发现搜索功能不能用了,因此又废掉了。
而我正在做的一个东西就需要用用到搜索功能,苦寻无果。最后知道了是豆瓣官方禁用了api。
于是乎,就萌发了自己搞一个数据库进行操作的想法。数据库建立的前提是数据,第一步就是从网上抓取电影数据,本文也是分享自己写的抓取数据的一个程序片段,简单易懂。
import requests
from time import sleep # 内置,使用sleep暂停,防止频率太高被封ip
import json
# 写入json文件
def jsonFile(fileData):
file = open("D:\\python_pycharm\\movies.json", "ab")
file.write(fileData)
file.close()
# url=‘http://api.douban.com/v2/movie/celebrity/‘
# url=‘http://api.douban.com/v2/movie/top250‘
# url=‘https://api.douban.com/v2/movie/search‘
count = 344
#10554898
for start in range(26709256, 35576316, 1):
url = ‘https://api.douban.com/v2/movie/subject‘
# r = requests.get(url, params={‘start‘:
# start, ‘count‘: 50,‘apikey‘:‘0b2bdeda43b5688921839c8ecb20399b‘})
# api规定列表参数使用start和count 若为search 则, ‘q‘:‘喜剧‘
# 用于获取电影详情
strStart = str(start)
url = url + ‘/‘ + strStart
r = requests.get(url,
params={‘apikey‘: ‘0b2bdeda43b5688921839c8ecb20399b‘})
print(‘processing %s‘ % r.url) # 打印当前页面url
print(‘目前已经获取 %d 个电影数据‘ % count)
print(‘r.status_code的值为%d‘ % r.status_code)
print("\n")
if (r.status_code != 200):
continue
else:
res = r.json() # r是一个Response对象,res是一个字典,保存了响应网页的json数据
# print(res[‘casts‘][0][‘id‘])
# print(res[‘directors‘])
# print(res[‘directors‘][0][‘id‘])
# print(res[‘rating‘][‘average‘])
# print(res[‘title‘])
# print(res[‘popular_comments‘])
# print(res[‘popular_reviews‘])
if res[‘directors‘] != [] and res[‘casts‘] != []:
if (res[‘casts‘][0][‘id‘] is not None
and res[‘directors‘][0][‘id‘] != None
and res[‘rating‘][‘average‘] != 0 and res[‘title‘] != ‘‘
and res[‘popular_comments‘] != []
and res[‘popular_reviews‘] != []
and 2010 <= int(res[‘year‘]) <= 2019):
print(res[‘year‘])
jsObj = json.dumps(res)
jsonFile(bytes(jsObj, encoding=‘utf-8‘)) # 如何存到文件?
# jsonFile(bytes(‘,‘,encoding=‘utf-8‘)) #如何存到文件?
print(‘存储到本地成功‘)
count = count + 1
# else:
# continue
sleep(0.3)
具体结果如下
获取到的数据会被存储成json格式,随后就可以导入所要用的数据平台。
其中电影数据如下所示:
{
"rating": {
"max": 5,
"value": 5.0,
"min": 0
},
"useful_count": 20,
"author": {
"uid": "kletva",
"avatar": "https://img1.doubanio.com/icon/u38323388-8.jpg",
"signature": "\u6709\u5931\u5fc5\u6709\u5f97",
"alt": "https://www.douban.com/people/kletva/",
"id": "38323388",
"name": "\u8475\u4ed4\uff01"
},
"subject_id": "30335636",
"content": "\u6bcf\u5468\u4e00GO\u7ec8\u4e8e\u53c8\u5f00\u59cb\u5566\uff01\uff01\uff01\u8fd8\u662f\u719f\u6089\u7684\u914d\u65b9\u719f\u6089\u7684\u5473\u9053\u719f\u6089\u7684\u4e09\u8f69\u5bb6\u719f\u6089\u7684\u56e7\u5b50\uff01\uff01\uff01\u70ed\u8840\u6cb8\u817e\uff01\u4e0d\u6789\u8110\u5e26(\u2606\u25bd\u2606) \u53e6\u5916\u8fd8\u6dfb\u52a0\u4e86\u5f88\u591a\u548c\u8bfe\u957f\u7684\u751f\u6d3b\u516b\u5366\uff0c\u548c\u7b2c\u4e00\u5b63\u76f8\u6bd4\u66f4\u6709\u8da3\u4e86\uff01\uff01",
"created_at": "2019-01-10 09:30:52",
"id": "1488502571"
},
{
"rating": {
"max": 5,
"value": 4.0,
"min": 0
},
"useful_count": 70,
"author": {
"uid": "myshobeat",
"avatar": "https://img3.doubanio.com/icon/u55024488-2.jpg",
"signature": "\u5e78\u4f1a",
"alt": "https://www.douban.com/people/myshobeat/",
"id": "55024488",
"name": "Surprise"
},
"subject_id": "30335636",
"content": "\u672c\u5267\u6210\u529f\u8be0\u91ca\u4e86\u201c\u4e70\u4e0e\u4e0d\u4e70\u6709\u65f6\u5c31\u5728\u4e00\u5ff5\u4e4b\u95f4\u201d\u4ee5\u53ca\u201c\u7cbe\u795e\u5bfc\u5e08\u548c\u4f20\u9500\u5934\u76ee\u7684\u4e00\u7ebf\u4e4b\u9694\u201d\uff0c\u76f4\u64ad\u786c\u5e7f\u592a\u591a\u6263\u4e00\u5206\uff0c\u597d\u5728\u76ee\u524d\u5267\u60c5\u8db3\u591f\u5438\u5f15\u4eba\uff0cPS\uff1a\u770b\u5230\u67d0\u6761\u5f3a\u884c\u5b89\u63d2\u7684\u611f\u60c5\u7ebf\u6211\u5c31\u77e5\u9053\u7f16\u5267\u540e\u9762\u80af\u5b9a\u8981\u641e\u4e8b\u4e86\u2190\u7ad9\u5b9a\u6b64CP\u4e0d\u52a8\u6447w",
"created_at": "2019-01-09 23:00:58",
"id": "1620646383"
}
标签:写入 存储 平台 EDA nat 一个数据库 允许 subject sts
原文地址:https://blog.51cto.com/15069472/2577213