搜索关键字：user-agent，搜索到1107个结果！码迷,mamicode.com！

requests访问页面时set-cookie获取cookie

import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0', 'cookie': '' } url = "http ...

分类：其他好文时间：2020-02-26 11:33:08 阅读次数：150

Python 爬取必应壁纸

import re import os import requests from time import sleep headers = { "User-Agent": ("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) " "Gecko/201 ...

分类：编程语言时间：2020-02-24 09:51:46 阅读次数：83

爬虫初识

两个常用库确定自己要访问的页面和构建请求头： url="http://www.xxx.com" headers = {"User Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Ge ...

分类：其他好文时间：2020-02-23 18:31:27 阅读次数：82

多线程抓获猫眼Top100电影信息

import requestsimport parselimport timeimport threading# 模拟浏览器headers = {"Referer": "https://maoyan.com/board/4?offset=0", "User-Agent": "Mozilla/5.0 ...

分类：编程语言时间：2020-02-23 09:56:39 阅读次数：70

222

import requests from bs4 import BeautifulSoup import re def getPage(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/5 ...

分类：其他好文时间：2020-02-18 20:33:44 阅读次数：301

假期十二

爬虫爬取 from bs4 import BeautifulSoup import requests import xlwt def getHouseList(url): house = [] headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6. ...

分类：其他好文时间：2020-02-12 23:57:47 阅读次数：127

Nginx禁止蜘蛛爬取服务器

修改nginx.conf，禁止网络爬虫的ua，返回403 添加agent_deny.conf配置文件 #禁止Scrapy等工具的抓取 if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #禁止指定UA及UA为空的访问 i ...

分类：其他好文时间：2020-02-12 18:38:56 阅读次数：84

python爬取百度百科（根据爬取的热词自动匹配相应解释）且将数据存入数据库中

import requests from lxml import etree import time, json, requests import pymysql header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...

分类：数据库时间：2020-02-12 00:28:03 阅读次数：82

python学习5 爬虫老是被封如何解决

先设置等待时间：常见的设置等待时间有两种，一种是显性等待时间（强制停几秒），一种是隐性等待时间（看具体情况，比如根据元素加载完成需要时间而等待）图 1 是显性等待时间设置，图 2 是隐性。第二步，修改请求头：识别你是程序还是网友浏览器浏览的重要依据就是 User-Agent，比如网友用浏览器浏 ...

分类：编程语言时间：2020-02-12 00:16:10 阅读次数：83

爬虫 requests 和 beautiful soup 提取内容

import requestsimport timefrom bs4 import BeautifulSoupclass getContents(): # 获取html页面 def getHTMLText(self, url): try: kv = {'user-agent': 'Mozilla/5 ...

分类：其他好文时间：2020-02-10 12:03:57 阅读次数：77

共1107条上一页 1 ... 11 12 13 14 15 ... 111 下一页

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)