6-3 如何解析简单的XML文档

时间：2018-04-28 14:17:10 阅读：131 评论：0 收藏：0 [点我收藏+]

标签：简单 xrange 文档 += function 元素 lap name abc

技术分享图片

元素节点、元素树

>>> from xml.etree.ElementTree import parse

>>> help(parse)
Help on function parse in module xml.etree.ElementTree:

parse(source, parser=None)

help(parse)

>>> f  = open(r‘C:\视频\python高效实践技巧笔记\6数据编码与处理相关话题\linker_log.xml‘)
>>> 
>>> et = parse(f)    #et  ElementTree的对象

>>> help(et.getroot)
Help on method getroot in module xml.etree.ElementTree:

getroot(self) method of xml.etree.ElementTree.ElementTree instance

help(et.getroot)

>>> root = et.getroot() #获取根节点 是一个元素对象

>>> root
<Element ‘DOCUMENT‘ at 0x2e87f90>

#此节点的属性

>>> root.tag               #查看标签
‘DOCUMENT‘

>>> root.attrib               #查看属性，是一个字典，本例中有值，无值时为空
{‘gen_time‘: ‘Fri Dec 01 16:04:26 2017 ‘}

>>> root.text                #查看节点文本，是一个回车无自符串
‘\n‘
>>> root.text.strip()        #将节点文本对 空白字符串过滤
‘‘

>>> root.text.strip()

　‘‘

技术分享图片

#root自身是一个可迭代对象，直接进行迭代遍历子元素

>>> for child in root:
    print(child.get(‘id‘))  #child表示子元素 get()方法是获取某一属性。

输出结果

01ABBC90

01BF8610

01BF8AF0

01BFC5F0

01BFE3E8

01BFE850

01BFEAC8

01BFF128

01BFF2B0

01BFF4B8

01BFF730

01BFF960

01BFFB68

#通过find()、findall()、iterfind()只能找当前元素的直接子元素如本例中”root”只能找”MSG”而不能找”TEXT”

>>> root.find(‘MSG‘)       #find（）找到第一个碰到的元素
<Element ‘MSG‘ at 0x2e87fd0>
>>> root.find(‘MSG‘)
<Element ‘MSG‘ at 0x2e87fd0>
>>> root.findall(‘MSG‘)    #find()找到所有的元素
[<Element ‘MSG‘ at 0x2e87fd0>, <Element ‘MSG‘ at 0x2e9f0d0>, <Element ‘MSG‘ at 0x2e9f170>, <Element ‘MSG‘ at 0x2e9f210>, <Element ‘MSG‘ at 0x2e9f2b0>, <Element ‘MSG‘ at 0x2e9f350>, <Element ‘MSG‘ at 0x2e9f3f0>, <Element ‘MSG‘ at 0x2e9f490>, <Element ‘MSG‘ at 0x2e9f530>, <Element ‘MSG‘ at 0x2e9f5d0>, 

>>> root.find(‘TEXT‘)      #“TEXT”是”MSG”的子元素，所以root直接find()找不到
>>> 
>>> msg = root.find(‘MSG‘)
>>> msg.find(‘TEXT‘)
<Element ‘TEXT‘ at 0x2e9f090>


#iterfind()  生成可迭代对表
>>> iterMsg = root.iterfind(‘MSG‘)
>>> for i in xrange(5):
    x = iterMsg.next()
    print x.get(‘id‘)

输出

01BF8610

01BF8AF0

01BFC5F0

01BFE3E8

01BFE850

>>> iterMsg = root.iterfind(‘MSG‘)
>>> i = 0
>>> for x in iterMsg:
    print(x.get(‘id‘))
    i+=1
    if(i ==5):
        break

输出结果：

01ABBC90

01BF8610

01BF8AF0

01BFC5F0

01BFE3E8

#iter()可以迭代出所有元素的节点

>>> root.iter()
<generator object iter at 0x02ED3CD8>

技术分享图片

#递归查找某一元素

>>> list(root.iter(‘TEXT‘))

技术分享图片

三、查找高级用法

1、“*”查找所有的节点

>>> root.findall(‘MSG/*‘)   #查找MSG下的所有子节点，注意只能找其子节点而不能找其孙子节点

技术分享图片

2、“.//”无论哪个层次下都能找到节点

>>> root.find(‘.//TEXT‘)        #能找到
<Element ‘TEXT‘ at 0x2e9f090>
>>> root.find(‘TEXT‘)        #不能找到
>>>

3、“..”找到父层次的节点

>>> root.find(‘.//TEXT/..‘)
<Element ‘MSG‘ at 0x2e87fd0>

4、“@”包含某一属性

>>> root.find(‘MSG[@name]‘)          #没有包含name属性的
>>> root.find(‘MSG[@Type]‘)          #没有包含Type属性的
>>> root.find(‘MSG[@type]‘)          #存在包含type属性的，并返回
<Element ‘MSG‘ at 0x2e87fd0>

5、属性等于特定值

>>> root.find(‘MSG[@id="01BFE3E8"]‘)   #注意参数里的=号后面的字符串需要带引号
<Element ‘MSG‘ at 0x2e9f2b0>

6、指定序号

>>> root.find("MSG[2]")      #找第二个
<Element ‘MSG‘ at 0x2e9f0d0>

>>> root.find("MSG[last()]")  #找最后一个
<Element ‘MSG‘ at 0x2ecdef0>

>>> root.find("MSG[last()-1]")  #找倒数第二个
<Element ‘MSG‘ at 0x2ecde30>

6-3 如何解析简单的XML文档

标签：简单 xrange 文档 += function 元素 lap name abc

原文地址：https://www.cnblogs.com/smulngy/p/8966738.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行