robotparser (File Formats) – Python 中文开发手册

时间：2020-07-12 10:39:13 阅读：95 评论：0 收藏：0 [点我收藏+]

[


??Python 中文开发手册

robotparser (File Formats) - Python 中文开发手册

注意

robotparser模块已urllib.robotparser在Python 3中重命名。当将源代码转换为Python 3时，2to3工具将自动适应导入。

该模块提供了一个类，RobotFileParser它回答关于特定用户代理是否可以在发布该robots.txt文件的网站上获取URL的问题。有关robots.txt文件结构的更多详细信息，请参阅http://www.robotstxt.org/orig.html。

class robotparser.RobotFileParser(url=‘‘)

这个类提供了一些方法来读取，解析和回答有关urlrobots.txt文件的问题。

set_url(url)

设置引用robots.txt文件的URL 。

read()

读取robots.txtURL并将其提供给解析器。

parse(lines)

解析行参数。

can_fetch(useragent, url)

返回True是否允许useragent根据解析文件中包含的规则获取urlrobots.txt。

mtime()

返回robots.txt文件上次获取的时间。这对于需要robots.txt定期检查新文件的长时间运行的网络蜘蛛非常有用。

modified()

设置robots.txt文件上次获取到当前时间的时间。

以下示例演示了RobotFileParser类的基本用法。

>>> import robotparser
>>> rp = robotparser.RobotFileParser()
>>> rp.set_url("https://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True


??Python 中文开发手册

]

??本文标题：robotparser (File Formats) – Python 中文开发手册 - Break易站

转载请保留页面地址：https://www.breakyizhan.com/python/35187.html

robotparser (File Formats) – Python 中文开发手册

标签：class 结构 lib 中文运行文件的保留 div 手册

原文地址：https://www.cnblogs.com/breakyizhan/p/13287202.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行