抓取网页内容生成kindle电子书

时间：2015-04-27 23:37:45 阅读：244 评论：0 收藏：0 [点我收藏+]

标签：

参考：

http://calibre-ebook.com/download_linux
http://blog.codinglabs.org/articles/convert-html-to-kindle-book.html

The Linux Command Line

#TLCL.recipe
from calibre.web.feeds.recipes import BasicNewsRecipe
class The_Linux_Command_Line(BasicNewsRecipe):
 
    title = ‘The Linux Command Line‘
    description = ‘The Linux Command Line‘
    cover_url = ‘http://img5.douban.com/lpic/s7056078.jpg‘
 
    url_pre = ‘http://billie66.github.io/TLCL/book/‘
    no_stylesheets = True
    keep_only_tags = [{ ‘class‘: ‘typo‘ }]　　　　#内容的寻找范围
 
    def parse_index(self):
        soup = self.index_to_soup(self.url_pre)#目录页
 
        div = soup.find(‘div‘, {‘class‘: ‘contents‘})#目录页的寻找范围
 
        articles = []
        for link in div.findAll(‘a‘):
                
            til = link.contents[0].strip()
            url = self.url_pre + link[‘href‘]
            a = { ‘title‘: til, ‘url‘: url }
 
            articles.append(a)
 
        results = [(‘The Linux Command Line‘, articles)]
 
        return results

抓取网页内容生成kindle电子书

标签：

原文地址：http://www.cnblogs.com/flowjacky/p/4461595.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行