码迷,mamicode.com
首页 > 编程语言 > 详细

python简单爬虫

时间:2018-08-13 23:53:14      阅读:204      评论:0      收藏:0      [点我收藏+]

标签:try   urlencode   turn   span   gecko   repo   app   web   ==   

from urllib import request,parse
from urllib.error import HTTPError,URLError

def get(url,headers = None):
  return urlrequest(url,headers=headers)
def post(url,form,headers=None):
  return urlrequest(url,form,headers=headers)
def urlrequest(url,form = None,headers = None):   user_agent = Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36   if headers == None:     headers = {
      
User-Agent:user_agent     } html_bytes = b‘‘ try:     if form:
      
#POST
      #转换成字符串
      form_str = parse.urlencode(form)
      #转换成bytes
      html_bytes = form_str.encode(‘utf-8‘)
      req = request.Request(url,data=form_bytes)
    else:
      #GET
      #Request
      req = request.Request(url,headers = headers)
     #添加     response = request.urlopen(req,timeout = 5)     html_bytes = reponse.read()   except HTTPError as e:
    print(e)
  except URLError as e:
    print(e)
  return html_bytes
if __name__==__main__:
  #post
  #url = ‘http://fanyi.baidu.com/sug‘
  #form = {
  #  ‘kw‘:‘鹰‘    
  #}
  #html_bytes = post(url,form=form)
  #print(html_bytes)
  

  url = ‘http://www.baidu.com‘
html_bytes
= get(url) print(html_bytes.decode(utf-8))

python简单爬虫

标签:try   urlencode   turn   span   gecko   repo   app   web   ==   

原文地址:https://www.cnblogs.com/lxh777/p/9471646.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!