Requests模块
1.headers关键字
import requests from urllib.parse import urlencode keyword = input(‘>>:‘).strip() res = urlencode({‘wd‘: keyword}, encoding=‘utf-8‘) url = ‘https://www.baidu.com/s?‘ + res print(url) response = requests.get(url, headers={ ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36‘ }, ) print(response.status_code) with open(‘a.html‘,‘w‘,encoding=‘utf-8‘) as f: f.write(response.text)
2.params方法实现上面关键字的方法。(优化了)
import requests from urllib.parse import urlencode keyword = input(‘>>:‘).strip() response = requests.get(‘https://www.baidu.com/s?‘, params={ ‘wd‘:keyword, ‘pn‘:20 }, headers={ ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36‘ }, ) print(response.status_code) with open(‘a.html‘,‘w‘,encoding=‘utf-8‘) as f: f.write(response.text)
3.headers涉及到User_agent,加了user_agent才能获取到内容。
import requests response = requests.get(‘https://www.zhihu.com/explore‘, headers={ ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36‘,} ) print(response.status_code) print(response.text)
4.headers涉及到cookie,加了cookie才能获取到内容。
response=requests.get( url=‘https://github.com/settings/emails‘, headers={ ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36‘, }, cookies={ "k1":"v1", }, ) print(‘378533872@qq.com‘ in response.text)
5.allow_redirects: 跳转, 默认跳转,改为Flask就不跳转
import requests response=requests.get( url=‘https://github.com/settings/emails‘, headers={ ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36‘, }, cookies={ "k1":"v1", }, allow_redirects=False, )