python爬取京东所有iphone的价格和名称

时间：2015-04-30 08:55:26 阅读：170 评论：0 收藏：0 [点我收藏+]

原本想升一下级，用一下creep神马的，但是正则今天突然出了点小问题，我就生气了，就用正则抓取了一下。

这个正则可以用re.search 或者 re.findall都可以，我比较喜欢用search因为可以直接提取结果不用在过滤了。

代码如下，这个小爬爬比较简单。

#-*- coding:utf-8 -*-

import urllib2
import json
import re

SearchIphoneUrl = 'http://search.jd.com/Search?keyword=%E8%8B%B9%E6%9E%9C%E6%89%8B%E6%9C%BA&enc=utf-8&qr=&qrst=UNEXPAND&as_key=title_key%2C%2C%E6%89%8B%E6%9C%BA&rt=1&stop=1&click=&psort=1&page=1'
header = {'User-Agent':'User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0','Accept':'*/*'}

def getHtmlSrc(url,header):
    req = urllib2.Request(url,header)
    res = urllib2.urlopen(url,timeout = 5)
    htmlSrc = res.read()
    return htmlSrc

def saveHtmlSrc(url):
    html = getHtmlSrc(url,header)
    with open('jd_iphone.txt','w') as f:
        f.write(html)

saveHtmlSrc(SearchIphoneUrl)
print '++++++++++++++++++++京东放养的爬虫++++++++++++++++++++'

with open('jd_iphone.txt','r') as fhtml:
    localhtml = fhtml.read()#.replace("'",'"').replace(' ','')
    for skuid in re.findall('<li sku="\d+" >',localhtml):
        #商品编号   
        sku = skuid.split('"')[1]
        #手机名称
        pname = re.search('''<font class="skcolor_ljg">苹果</font>(.*?)<font class="skcolor_ljg">手机</font>(.*?)<font class='adwords' id='AD_%s'></font>''' % sku,localhtml) # 正则取商品名称html
        #手机价格
        price = re.search('''<strong class="J_%s" data-price="(.*?)">'''%sku,localhtml)
        if(pname!='' and price!=''):
            print "商品编号：%s"%sku
            print "名称：%s\n价格：%s\n\n"%(pname.group(1),price.group(1))

print '++++++++++++++++++++京东放养的爬虫++++++++++++++++++++'

python爬取京东所有iphone的价格和名称

标签：爬虫 python unicode 京东正则

原文地址：http://blog.csdn.net/djd1234567/article/details/45379019

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行