标签:cep rap script 百度搜索 ogr url word port 信息
import requests url = ‘https://item.jd.com/2967929.html‘ try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[:1000]) except: print("抓取失败")
抓jd产品页面信息^
import requests url = ‘https://www.amazon.cn/gp/product/B01M8L5Z3Y‘ try: kv = {‘user-agent‘:‘Mozilla/5.0‘} r = requests.get(url,headers = kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[1000:2000]) except: print("抓取失败")
抓z产品页面信息,较jd页面,加入headers参数,意在说明,有些网站屏蔽程序访问页面。^
import requests keyword = ‘Python‘ try: kv = {‘wd‘:keyword} r = requests.get(‘http://www.baidu.com/s‘,params=kv) # 百度搜索URL一般为:http://www.baidu.com/s?wd=关键字 # params参数加了“?wd=Python",现在(20200225)还需参考z产品页面抓取示例,百度也有屏蔽user-agent为python print(r.request.url) r.raise_for_status() print(len(r.text)) except: print("抓取失败")
通过代码提交关键字搜索,并抓取页面,上文注释具体情况如下:^
print(r.request.headers) #{‘User-Agent‘: ‘python-requests/2.22.0‘, ‘Accept-Encoding‘: ‘gzip, deflate‘, ‘Accept‘: ‘*/*‘, ‘Connection‘: ‘keep-alive‘}
r = requests.get(‘http://www.baidu.com/s‘,params=kv)
改为:
r = requests.get(‘http://www.baidu.com/s‘,headers={‘user-agent‘:‘Mozilla/5.0‘}params=kv)
import requests import os url = ‘http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg‘ root = "D://pics//" path = root +url.split(‘/‘)[-1] try: if not os.path.exists(root): os.mkdir(root) if not os.path.exists(path): r = requests.get(url) with open(path,‘wb‘) as f: f.write(r.content) # 写二进制文件内容 f.close() print(‘保存文件成功‘) else: print(‘文件已存在‘) except: print("抓取失败")
保存国家地理杂志网站的一幅图片^
import requests import re c = re.compile(r"<li>(.*?)</li>") # 非贪婪模式 url = ‘http://www.ip138.com/iplookup.asp?ip=‘ try: r = requests.get(url + ‘202.204.80.112‘ + ‘&action=2‘,headers={‘User-Agent‘:‘Mozilla/5.0‘}) # url较教程做了调整 r.raise_for_status() r.encoding = r.apparent_encoding cMatch = c.search(str(r.text)) print(cMatch.group(1)) except: print("抓取失败")
地址所属地查询^,比嵩天老师的代码多了re表达式,直接匹配出结果
D:\python_work\venv\Scripts\python.exe D:/python_work/test.py
本站数据:北京市海淀区 北京理工大学 教育网
Process finished with exit code 0
标签:cep rap script 百度搜索 ogr url word port 信息
原文地址:https://www.cnblogs.com/leisurelyRD/p/12359855.html