Kindle：自动追更之Calibre2脚本

时间：2017-05-27 14:24:24 阅读：253 评论：0 收藏：0 [点我收藏+]

标签：soup jpg web strip() als pytho future parse false

 1 #!/usr/bin/env python2
 2 # vim:fileencoding=utf-8
 3 from __future__ import unicode_literals, division, absolute_import, print_function
 4 from calibre.web.feeds.news import BasicNewsRecipe
 5 
 6 class rdzs(BasicNewsRecipe):
 7     title          = ‘儒道至圣‘
 8     description = ‘这是一个读书人掌握天地之力的世界。  才气在身，诗可杀敌，词能灭军，文章安天下。  秀才提笔，纸上谈兵；举人杀敌，出口成章；进士一怒，唇枪舌剑。  圣人驾临，口诛笔伐，可诛人，可判天子无道，以一敌国。  此时，圣院把持文位，国君掌官位，十国相争，蛮族虎视，群妖作乱。  此时，无唐诗大兴，无宋词鼎盛，无创新文章，百年无新圣。  一个默默无闻的寒门子弟，被人砸破头后，挟传世诗词，书惊圣文章，踏上至圣之路。‘
 9     max_articles_per_feed = 20000
10     fileName = ‘xx/rdzs.txt‘
11     cover_url  = ‘http://www.50zw.la/files/article/image/2/2806/2806s.jpg‘ 
12     no_stylesheets = True
13     keep_only_tags = [dict(name=‘div‘, attrs={‘class‘:‘h1title‘}),dict(name=‘div‘, attrs={‘id‘:‘htmlContent‘})]
14     url_prefix = ‘http://www.xxbiquge.com‘   
15     no_stylesheets = True
16     keep_only_tags = [dict(name=‘div‘, attrs={‘class‘:‘bookname‘}),dict(name=‘div‘, attrs={‘id‘:‘content‘})]
17     file_object = open(fileName,‘r‘)
18     lastHref = file_object.read()
19     file_object.close()
20     hasLoad = bool(lastHref)
21 
22     def get_title(self, link):
23         return link.contents[0].strip()
24     
25     def parse_index(self):
26         soup = self.index_to_soup(self.url_prefix+"/5_5690")
27  
28         div = soup.find(‘div‘, { ‘id‘: ‘list‘ })
29         lastHref =self.lastHref
30         articles = []
31         for link in div.findAll(‘a‘):
32             til = self.get_title(link)         
33             href = link[‘href‘]
34             self.lastHref = href
35             if href == lastHref:
36                 self.hasLoad = False
37             if self.hasLoad:
38                 continue
39             else:
40                 url = self.url_prefix + href
41                 a = { ‘title‘: til, ‘url‘: url }
42                 articles.append(a)
43  
44         tutorial = [(self.title, articles)]
45         file_write = open(self.fileName,‘w‘)
46         file_write.write(self.lastHref)
47         file_write.flush()
48         file_write.close()
49         return tutorial

说明：

fileName ：是为了简单的记录一下最后一次更新的网址，避免每次都全部更新

上述代码的详细讲解可以看：http://abirdcfly.github.io/2016/03/07/calibre2mobi/

Kindle：自动追更之Calibre2脚本

标签：soup jpg web strip() als pytho future parse false

原文地址：http://www.cnblogs.com/loveclumsybaby/p/6912542.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行