PythonCrawl自学日志（4）

时间：2016-09-23 21:25:32 阅读：144 评论：0 收藏：0 [点我收藏+]

标签：

2016年9月22日10:34:02
一、Selector
1.如何构建
（1）text构建： body = ‘<html><body><span>good</span></body></html>‘
Selector(text=body).xpath(‘//span/text()‘).extract()
（2）resopnse构建 response = HtmlResponse(url=‘http://example.com‘, body=body)
Selector(response=response).xpath(‘//span/text()‘).extract()
2.如何使用
response.xpath("网页元素")或者response.css(css元素)
3.（Nesting Selectors）嵌套选择器
response.xpath().xpath()
4.使用正则表达式
Selector.re()，使用正则表达式提取数据
5.使用相对xpath
提取二级元素时，使用.//元素名
6.使用扩展（re、set）
二、Selector详细介绍
1.成员变量：
（1）response： HtmlResponse或者XmlResponse
（2）text： response为空时有效，
（3）type： html、xml、None
2.成员函数
（1）xpath()：寻找匹配查询请求的字符串的节点，返回SelectorList的一个实例结果
（2）css(): 应用给定的CSS选择器，返回SelectorList的一个实例，转化为Xpath查询
（3）extract(): 串行将匹配的节点返回一个Unicode字符串列表，结尾为编码内容的百分比
（3）reg(regex)：正则表达式，或者是一个将被re.compile(regex)编译为正则表达式的字符串
（4）register_namesp- 注册命名空间，能够从标准命名空间中选择或提取数据
aces(prefix，uri)
（5）remove_namespac- 移除命名空间，允许使用namespace-less xpath遍历所选内容
Selector
es
三、SelectorList对象
内建list类，除list内方法，还有下面的方法
1.xpath(query) 对列表中的每个元素调用xpath()，返回一个单一化的SelectorList对象
2.css(query) 对列表中的每个元素调用css(),返回另一个单一化的SelectorList对象
3.extract() 同理，返回单一化的unicode字符串列表
4.re() 同理，返回单一化的Unicode字符串列表
5.__nonzero__() 查询列表空

PythonCrawl自学日志（4）

标签：

原文地址：http://www.cnblogs.com/AlloCa/p/5901492.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行