标签:encode data content doc let net tcl 直接 etop
f = zipfile.ZipFile(xpsPath,‘r‘)
f.extractall(xpsUnzipDir)
f.close()
coordinateDic = {"Resource": None,‘Pages‘:[]}
pagesDir = os.path.join(xpsUnzipDir,r‘Documents\1\Pages‘.encode(‘gb18030‘))
listDir = os.listdir(pagesDir)
xmlContent = fs.read()
fs.close()
pageXml = etree.XML(xmlContent)
FixedPageChildren = pageXml.getchildren()
分类思想:根据其标签是‘Glyphs‘, ‘Path‘还是‘Canvas‘分三种情况处理:
for key in xmlDic.keys():
if ‘Canvas‘ in key: #处理含有Canvas父节点的情况
canvasRenderTransformMatrix = xmlDic[key][‘RenderTransform‘]
canvasRootClip = xmlDic[key][‘Clip‘] #限定渲染区域,超出区域的不会显示,暂时不处理该限定
addHLineToPage(pageDic,xmlDic[key][‘PathList‘],canvasRenderTransformMatrix)
addVLineToPage(pageDic, xmlDic[key][‘PathList‘], canvasRenderTransformMatrix)
** 其中,addHLineToPage()函数是提取水平线的,addVLineToPage()提取竖线**
内层canvas里可能还有线,也要提取
for childCanvas in xmlDic[key][‘CanvasList‘]: #第二层canvas
childCanvasRenderTransformMatrix = [float(it) for it in childCanvas.attrib[‘RenderTransform‘].split(‘,‘)] if childCanvas.attrib.has_key(‘RenderTransform‘) else [1,0,0,1,0,0]
childCanvasChildren = childCanvas.getchildren()
childCanvasPathList = []
for child in childCanvasChildren:
if ‘Path‘ in child.tag:
childCanvasPathList.append(child)
addHLineToPage(pageDic,childCanvasPathList,childCanvasRenderTransformMatrix,canvasRenderTransformMatrix)
addVLineToPage(pageDic, childCanvasPathList, childCanvasRenderTransformMatrix,
canvasRenderTransformMatrix)
addHLineToPage(pageDic,xmlDic[key],canvasRenderTransformMatrix)
addVLineToPage(pageDic,xmlDic[key],canvasRenderTransformMatrix)
for glyph in xmlDic[key][‘GlyphsList‘]:
extractGlyphs(glyph,pageDic)
extractGlyphs(glyph,pageDic)
coordinateDic[‘Pages‘].append(pageDic)
dataDic={‘Resource‘: coordinateDic[‘Resource‘], ‘Pages‘: coordinateDic[‘Pages‘]}
dataDic=Test.getInTableTextCoodinate(dataDic,‘xps‘)
coordinateDic[‘Pages‘]=dataDic[‘Pages‘]
标签:encode data content doc let net tcl 直接 etop
原文地址:https://www.cnblogs.com/monkey-moon/p/9255617.html