码迷,mamicode.com
首页 > 数据库 > 详细

python爬虫学习(2)__抓取糗百段子,与存入mysql数据库

时间:2016-08-12 20:02:39      阅读:150      评论:0      收藏:0      [点我收藏+]

标签:

import pymysql
import requests
from bs4 import BeautifulSoup
#pymysql链接数据库 conn
=pymysql.connect(host=127.0.1,unix_socket=/tmp/mysql.sock,user=root,passwd=19950311,db=mysql) cur=conn.cursor() cur.execute("USE scraping")
#存储段子标题,内容
def store(title,content): cur.execute("insert into pages(title,content) values(\"%s\",\"%s\")",(title,content)) cur.connection.commit() global links class QiuShi(object): def __init__(self,start_url): self.url=start_url def crawing(self): try: html=requests.get(self.url,lxml) return html.content except ConnectionError as e: return ‘‘ def extract(self,htmlContent): if len(htmlContent)>0: bsobj=BeautifulSoup(htmlContent,lxml) #print bsobj jokes=bsobj.findAll(div,{class:article block untagged mb15}) for j in jokes: text=j.find(h2).text content=j.find(div,{class:content}).string if text != None and content != None: # print text,content,数据库编码为utf-8 store(text.encode(utf-8),content.encode(utf-8)) print text.encode(utf-8),content.encode(utf-8) print ------------------------------------------------------------------------------ else: print ‘‘ def main(self): text=self.crawing() self.extract(text) try: qiushi=QiuShi(http://www.qiushibaike.com/) qiushi.main() finally:
#关闭cursor,connection cur.close() conn.close()

 

python爬虫学习(2)__抓取糗百段子,与存入mysql数据库

标签:

原文地址:http://www.cnblogs.com/yunwuzhan/p/5765963.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!