码迷,mamicode.com
首页 > 其他好文 > 详细

requests库入门之小爬虫

时间:2017-09-05 16:48:42      阅读:156      评论:0      收藏:0      [点我收藏+]

标签:eth   ret   class   encoding   爬虫   0.11   jpg   port   coding   

通用代码框架:

try:
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return "产生异常"

 

爬取某网页100次花费的时间

import requests
import time

def getHTMLText(url):
    try:
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return "产生异常"

if __name__==__main__:
    url=http://www.baidu.com
    a=time.time()
    for i in range(100):
        getHTMLText(url)
    b=time.time()
    print(爬取100次需要花费的时间为%d秒 %(b-a))

爬取京东商品页面的爬取:

import requests

url=https://item.jd.com/5369026.html
try:
    r=requests.get(url)
    r.raise_for_status()
    r.encoding=r.apparent_encoding
    print(r.text[:1000])
except:
    print(爬取失败)

爬取有限制的网页:

import requests
url = http://yzb.tju.edu.cn/xwzx/tkbs_xw/201609/t20160914_285521.htm
try:
    kv={user-agent:Mozilla/5.0}
    r = requests.get(url,headers=kv)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[1000:2000])
except:
    print(爬取失败)

百度关键词搜索:

import requests
keyword=Python
try:
    kv = {wd:keyword}
    r = requests.get(http://www.baidu.com/s,params=kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except:
    print(爬取失败)

360关键词搜索全代码:

import requests
keyword=Python
try:
    kv={q:keyword}
    r=requests.get(http://www.so.com/s,params=kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except:
    print(爬取失败)

图片爬取:

import requests
import os
url=http://image.nationalgeographic.com.cn/2017/0905/20170905114825283.jpg
root=E://pics//
path=root+url.split(/)[-1]
try:
    if not os.path.exists(root):
        os.mkdir(root)
    if not os.path.exists(path):
        r=requests.get(url)
        with open(path,wb) as f:
            f.write(r.content)
            f.close()
            print(文件保存成功)
    else:
        print(文件已存在)
except:
    print(爬取失败)

ip地址查询:

import requests
url=http://m.ip138.com/ip.asp?ip=
try:
    r=requests.get(url+202.204.80.112)
    r.raise_for_status()
    r.encoding=r.apparent_encoding
    print(r.text[-500:])
except:
    print(爬取失败)

 

requests库入门之小爬虫

标签:eth   ret   class   encoding   爬虫   0.11   jpg   port   coding   

原文地址:http://www.cnblogs.com/jiangyaju/p/7479100.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!