码迷,mamicode.com
首页 > 其他好文 > 详细

关于文档处理的各种脚本

时间:2019-12-17 01:00:32      阅读:101      评论:0      收藏:0      [点我收藏+]

标签:ica   stack   链接   nta   shu   end   false   extension   enum   

  暂时记录,改天再整理

import re
import os
import pandas as pd
from requests import get
from docx import Document
import win32com.client as win
import subprocess

# dir2 = ‘C:/Users/User/Documents/gzzw/‘
# names = os.listdir(dir2)
# data = {}
# word = win.Dispatch(‘Word.Application‘)
# for (i, n) in enumerate(names):
    # subprocess.call([‘soffice‘, ‘--headless‘, ‘--convert-to‘, ‘docx‘, ‘--outdir‘, dir2 + str(i) + ‘.docx‘, dir + n])
    # doc = Document(docx=dir2 + n)
    # table = doc.tables[1]
    # row = table.rows[0]
    # row1 = table.rows[1]
    # for k, v in zip(row.cells, row1.cells):
    #     if i == 0:
    #         data[k.text] = [v.text]
    #     else:
    #         data[k.text].append(v.text)
    # gs = re.match(r‘.*_(.*)_.*‘, n)
    # dn.append(gs.group(1))
    # os.rename(dir + n, dir + str(i) + ‘.doc‘)
    # doc = word.Documents.Open(dir + n)
    # doc.SaveAs(dir2 + str(i) + ‘.docx‘, FileFormat=12)
    # table = doc.Tables(2)
    # for j in range(table.Columns.Count):
        # print(table.Cell(Row=1, Column=i + 1).Range.Text)
        # label.append(table.Cell(Row=1, Column=i + 1).Range.Text.encode(‘utf8‘))
        # dn.append(table.Cell(Row=2, Column=i + 1).Range.Text.encode(‘utf8‘))

# word.Quit()

# sheel = pd.DataFrame(data)
# sheel.to_excel(dir2 + ‘statics.xlsx‘, index=False, encoding=‘utf8‘)

  一些参考链接:

    1.https://code.activestate.com/recipes/279003-converting-word-documents-to-text/

    2.https://stackoverflow.com/questions/1468099/python-win32-extensions-documentation

    3.https://stackoverflow.com/questions/10366596/how-to-read-contents-of-an-table-in-ms-word-file-using-python

    4.https://stackoverflow.com/questions/38468442/multiple-doc-to-docx-file-conversion-using-python

    5.https://www.jianshu.com/p/4fa504c720c1

关于文档处理的各种脚本

标签:ica   stack   链接   nta   shu   end   false   extension   enum   

原文地址:https://www.cnblogs.com/darkchii/p/12051950.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!