码迷,mamicode.com
首页 > Web开发 > 详细

URL地址编码和解码

时间:2017-06-13 18:51:20      阅读:315      评论:0      收藏:0      [点我收藏+]

标签:空格   ref   https   values   art   pytho   request   seq   should   

0. 参考

【整理】关于http(GET或POST)请求中的url地址的编码(encode)和解码(decode)

python3中的urlopen对于中文url是如何处理的?

中文URL的编码问题

1. rfc1738

2.1. The main parts of URLs

   A full BNF description of the URL syntax is given in Section 5.

   In general, URLs are written as follows:

       <scheme>:<scheme-specific-part>

   A URL contains the name of the scheme being used (<scheme>) followed
   by a colon and then a string (the <scheme-specific-part>) whose
   interpretation depends on the scheme.

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed. For resiliency, programs
   interpreting URLs should treat upper case letters as equivalent to
   lower case in scheme names (e.g., allow "HTTP" as well as "http").

注意字母不区分大小写

2. python2

2.1

 1 >>> import urllib
 2 >>> url = http://web page.com
 3 >>> url_en = urllib.quote(url)    #空格编码为“%20”
 4 >>> url_plus = urllib.quote_plus(url)    #空格编码为“+”
 5 >>> url_en_twice = urllib.quote(url_en)
 6 >>> url
 7 http://web page.com
 8 >>> url_en
 9 http%3A//web%20page.com
10 >>> url_plus
11 http%3A%2F%2Fweb+page.com
12 >>> url_en_twice
13 http%253A//web%2520page.com    #出现%25说明是二次编码
14 #相应解码
15 >>> urllib.unquote(url_en)
16 http://web page.com
17 >>> urllib.unquote_plus(url_plus)
18 http://web page.com

2.2 URL含有中文

1 >>> import urllib
2 >>> url_zh = uhttp://movie.douban.com/tag/美国
3 >>> url_zh_en = urllib.quote(url_zh.encode(utf-8))    #参数为string
4 >>> url_zh_en
5 http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD
6 >>> print urllib.unquote(url_zh_en).decode(utf-8)
7 http://movie.douban.com/tag/美国

3. python3

3.1

 1 >>> import urllib
 2 >>> url = http://web page.com
 3 >>> url_en = urllib.parse.quote(url)    #注意是urllib.parse.quote
 4 >>> url_plus = urllib.parse.quote_plus(url)
 5 >>> url_en
 6 http%3A//web%20page.com
 7 >>> url_plus
 8 http%3A%2F%2Fweb+page.com
 9 >>> urllib.parse.unquote(url_en)
10 http://web page.com
11 >>> urllib.parse.unquote_plus(url_plus)
12 http://web page.com

3.2 URl含中文

1 >>> import urllib
2 >>> url_zh = http://movie.douban.com/tag/美国
3 >>> url_zh_en = urllib.parse.quote(url_zh)
4 >>> url_zh_en
5 http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD
6 >>> urllib.parse.unquote(url_zh_en)
7 http://movie.douban.com/tag/美国

4. 其他

 1 >>> help(urllib.urlencode)
 2 Help on function urlencode in module urllib:
 3 
 4 urlencode(query, doseq=0)
 5     Encode a sequence of two-element tuples or dictionary into a URL query string.
 6 
 7     If any values in the query arg are sequences and doseq is true, each
 8     sequence element is converted to a separate parameter.
 9 
10     If the query arg is a sequence of two-element tuples, the order of the
11     parameters in the output will match the order of parameters in the
12     input.
13 
14 >>>

 

URL地址编码和解码

标签:空格   ref   https   values   art   pytho   request   seq   should   

原文地址:http://www.cnblogs.com/my8100/p/7002876.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!