如何用python读取word_sql

使用Python的内部方法open()读取文本文件

try:

f=open('/file','r')

print(f.read())

finally:

if f:

f.close()

如果读取word文档推荐使用第三方插件，python-docx 可以在官网上下载

使用方式

# -*- coding: cp936 -*-

import docx

document = docx.Document(文件路径)

docText = '\n\n'.join([

paragraph.text.encode('utf-8') for paragraph in document.paragraphs

])

print docText

第一步：获取doc文件的xml组成文件

import zipfiledef get_word_xml(docx_filename):

with open(docx_filename) as f:

zip = zipfile.ZipFile(f)

xml_content = zip.read('word/document.xml')

return xml_content

第二步：解析xml为树形数据结构

from lxml import etreedef get_xml_tree(xml_string):

return etree.fromstring(xml_string)

第三步：读取word内容：

def _itertext(self, my_etree):

"""Iterator to go through xml tree's text nodes"""

for node in my_etree.iter(tag=etree.Element):

if self._check_element_is(node, 't'):

yield (node, node.text)def _check_element_is(self, element, type_char):

word_schema = '99999'

return element.tag == '{%s}%s' % (word_schema,type_char)

>>> def PrintAllParagraphs(doc):

count=doc.Paragraphs.Count

for i in range(count-1,-1,-1):

pr=doc.Paragraphs[i].Range

print pr.Text

>>> app=my.Office.Word.GetInstance()

>>> doc=app.Documents[0]

>>> PrintAllParagraphs(doc)

1.什么是域

域应用基础

>>> @staticmethod

def GetInstance():

u'''获取Word应用程序的Application对象'''

import win32com.client

return win32com.client.Dispatch('Word.Application')

my.Office.Word.GetInstance的方法实现如上，是一个使用win32com *** 纵Word Com的接口的封装

所有Paragraph即段落对象，都是通过Paragraph.Range.Text来访问它的文字的

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/sjk/9413602.html

如何用python读取word

发表评论

评论列表（0条）