码迷,mamicode.com
首页 > 其他好文 > 详细

刚开始学习的时候写的代码 现在看看真是感慨

时间:2018-08-11 23:20:21      阅读:197      评论:0      收藏:0      [点我收藏+]

标签:内容   start   request   url   movies   except   mat   学习   字符串   

from urllib import request
import json
import time
import re

url = ["https://movie.douban.com/celebrity/1032800/movies?start=0&format=pic&sortby=time&","https://movie.douban.com/celebrity/1032800/movies?start=10&format=pic&sortby=time&","https://movie.douban.com/celebrity/1032800/movies?start=20&format=pic&sortby=time&","https://movie.douban.com/celebrity/1032800/movies?start=30&format=pic&sortby=time&"]
for i in url:


url = i

rsp = request.urlopen(url)
print(type(rsp))

data = rsp.read().decode()
print(type(data))
#data = json.loads(data)
#timesleep(1)
##########print(data)

#需要提取<dt>()</div>
s = r‘<dt>(.*?)</div>‘

pattern = re.compile(s,re.S) #re.S将整个提取内容视作一整个字符串 否则第一行跳转后,就会使抓取结束

dianying = pattern.findall(data)
##########print(len(dianying))
#print(dianying)
#print("==="*50)

for dyname in dianying:
ss = r‘<img.*?alt="(.*?)"‘
pattern = re.compile(ss)
biaoti = pattern.findall(dyname)
print(biaoti)

sss = r‘<span>(.*?)</span>‘
pattern = re.compile(sss)
fenshu = pattern.findall(dyname)
print(fenshu)

sss = r‘<dd>(.*?)</dd>‘
pattern = re.compile(sss)
daoyan = pattern.findall(dyname)[0]
print("导演:",daoyan)
try:
yanyuan = pattern.findall(dyname)[1]
print("演员:",yanyuan)
except:
print("无导演")
print("-----------------------------"*8)

刚开始学习的时候写的代码 现在看看真是感慨

标签:内容   start   request   url   movies   except   mat   学习   字符串   

原文地址:https://www.cnblogs.com/cwkcwk/p/9461378.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!