标签:重复 结果 tle pen image 赋值 ips name 分配
解析标签,文件如下
<book> <author>Dusty Phillips</author> <publisher>Packt Publishing</publisher> <title>Python 3 Object Oriented Programming</title> <content> <chapter> <number>1</number> <title>Object Oriented Design</title> </chapter> <chapter> <number>2</number> <title>Objects In Pyhton</title> </chapter> </content> </book>
分析流程:
代码如下
import sys class Node: # 单标签类 def __init__(self, tag_name, parent=None): self.parent = parent self.tag_name = tag_name self.children = [] # 用来存储每个标签的对象 self.text = "" def __str__(self): # 输出格式化 if self.text: return self.tag_name + ":" +self.text else: return self.tag_name class Parser: # 解析器 def __init__(self, parse_string): self.parse_string = parse_string self.root = None self.current_node = None self.state = FirstTag() def process(self, remaining_string): remaining = self.state.process(remaining_string, self) # 根据不同的状态来执行不同的逻辑 if remaining: self.process(remaining) def start(self): self.process(self.parse_string) class FirstTag: # 顶标签的执行逻辑 def process(self, remaining_string, parser): i_start_tag = remaining_string.find(‘<‘) i_end_tag = remaining_string.find(‘>‘) tag_name = remaining_string[i_start_tag+1:i_end_tag] root = Node(tag_name) parser.root = parser.current_node = root # 需要把对象赋值给current_node,这点很重要 parser.state = ChildNode() return remaining_string[i_end_tag+1:] class ChildNode: # 分配器 def process(self, remaining_string, parser): stripped = remaining_string.strip() if stripped.startswith("</"): parser.state = CloseTag() elif stripped.startswith("<"): parser.state = OpenTag() else: parser.state = TextNode() return stripped class OpenTag: # 起始标签的逻辑 def process(self, remaining_string, parser): i_start_tag = remaining_string.find("<") i_end_tag = remaining_string.find(">") tag_name = remaining_string[i_start_tag+1:i_end_tag] node = Node(tag_name, parser.current_node) # 需要包上一个标签的对象一起追加 parser.current_node.children.append(node) parser.current_node = node parser.state = ChildNode() return remaining_string[i_end_tag+1:] class CloseTag: # 关闭标签 def process(self, remaining_string, parser): i_start_tag = remaining_string.find(‘<‘) i_end_tag = remaining_string.find(">") assert remaining_string[i_start_tag+1] == "/" tag_name = remaining_string[i_start_tag+2: i_end_tag] assert tag_name == parser.current_node.tag_name parser.current_node = parser.current_node.parent # 当遇到两个名称一样的标签,说明这是一对,那么需要把parent作为Node对象来操作,这样就不会重复添加了 parser.state = ChildNode() return remaining_string[i_end_tag+1:].strip() class TextNode: # 文本类容处理 def process(self, remaining_string, parser): i_start_tag = remaining_string.find("<") text = remaining_string[:i_start_tag] parser.current_node.text = text # 因为我们在起始标签逻辑处,把Node的对象保留在了current_node,现在就可以使用这个对象来绑定text parser.state = ChildNode() return remaining_string[i_start_tag:] with open(‘index.html‘) as file: contents = file.read() p = Parser(contents) p.start() nodes = [p.root] while nodes: node = nodes.pop(0) print(node) nodes = node.children+nodes # 把所有对象都放在了children列表中,只需要依次拿出来打印就行:__str__格式化输出
结果:
book
author: Dusty Phillips
publisher: Packt Publishing
title: Python 3 Object Oriented Programming
content
chapter
number: 1
title: Object Oriented Design
chapter
number: 2
title: Objects In Pyhton
标签:重复 结果 tle pen image 赋值 ips name 分配
原文地址:https://www.cnblogs.com/su-sir/p/12541941.html