标签:inter geturl log csharp gen col arp integer ati
>>> from urllib.parse import urlparse >>> o = urlparse(‘http://www.cwi.nl:80/%7Eguido/Python.html‘) >>> o ParseResult(scheme=‘http‘, netloc=‘www.cwi.nl:80‘, path=‘/%7Eguido/Python.html‘, params=‘‘, query=‘‘, fragment=‘‘) >>> o.scheme ‘http‘ >>> o.port 80 >>> o.geturl() ‘http://www.cwi.nl:80/%7Eguido/Python.html‘
>>> from urllib.parse import urlparse >>> urlparse(‘//www.cwi.nl:80/%7Eguido/Python.html‘) ParseResult(scheme=‘‘, netloc=‘www.cwi.nl:80‘, path=‘/%7Eguido/Python.html‘, params=‘‘, query=‘‘, fragment=‘‘) >>> urlparse(‘www.cwi.nl/%7Eguido/Python.html‘) ParseResult(scheme=‘‘, netloc=‘‘, path=‘www.cwi.nl/%7Eguido/Python.html‘, params=‘‘, query=‘‘, fragment=‘‘) >>> urlparse(‘help/Python.html‘) ParseResult(scheme=‘‘, netloc=‘‘, path=‘help/Python.html‘, params=‘‘, query=‘‘, fragment=‘‘)
Attribute |
Index |
Value |
Value if not present |
---|---|---|---|
|
0 |
URL scheme specifier |
scheme parameter |
|
1 |
Network location part |
empty string |
|
2 |
Hierarchical path |
empty string |
|
3 |
Parameters for last path element |
empty string |
|
4 |
Query component |
empty string |
|
5 |
Fragment identifier |
empty string |
|
User name |
||
|
Password |
||
|
Host name (lower case) |
||
|
Port number as integer, if present |
>>>from urllib.parse import urljoin >>>urljoin(‘http://www.cwi.nl/%7Eguido/Python.html‘, ‘FAQ.html‘)
‘http://www.cwi.nl/%7Eguido/FAQ.html‘
>>> urljoin(‘http://www.cwi.nl/%7Eguido/Python.html‘, ... ‘//www.python.org/%7Eguido‘) ‘http://www.python.org/%7Eguido‘
>>>urllib.request.quote(‘http://www.baidu.com‘) ‘http%3A//www.baidu.com‘
>>>urllib.request.unquote(‘http%3A//www.baidu.com‘) ‘http://www.baidu.com‘
简单的demo示例
思路如下:
import urllib.request import urllib.parse url=‘http://www.baidu.com‘ hearder={ ‘User-Agent‘:‘Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36‘ } request=urllib.request.Request(url,headers=header) reponse=urllib.request.urlopen(request).read() h=open("./1.html","wb") h.write(reponse) h.close()
参考:https://docs.python.org/3/library/urllib.parse.html?highlight=urlparse#urllib.parse.urlparse
https://blog.csdn.net/fengxinlinux/article/details/77281253
https://www.runoob.com/python/python-func-open.html
python基础篇-爬虫urlparse使用及简单示例 (一)
标签:inter geturl log csharp gen col arp integer ati
原文地址:https://www.cnblogs.com/guanbin-529/p/12833766.html