如何用android解析docx文档_IT百科

android上查阅word类型文档的方式主要有几种，下载诸如wps，office等应用，用户可以直接打开需要查看的word文档，对于应用开发者来说，如何在自己的应用中集成word文档查阅功能，使自己的app不受限于第三方应用有没有安装，有时候还是需要考虑的。

集成app阅读word功能也可以通过几种方式实现，例如购买专门的sdk包，像Aspose等（money啊）或者服务器端处理成图片或者html，然后android端去请求访问等方式。对于大部分个人开发者而言，这两种方式就显得比较重量级了。

下面介绍两种专门解析docx文件的方式：docx4j 以及poi

Docx4j

github地址：https://github.com/plutext/AndroidDocxToHtml

这个是官网demo，基本可以直接使用，解析出来的格式比较全，样式也比较接近原文档，就是解析速度令人不敢恭维，手机上测试的话，一般一份儿docx文档都需要30s以上甚至更多，有时候测试文档明明就只有几十k大小而已，对于比较大，比较复杂的文档，时间就更是让人崩溃。解析速度不是令人满意。

解析测试中遇到的bug

1.表格丢失，内容丢失：内嵌表格（表格中还有表格的这种）的内容和样式会有部分丢失现象

2.表格（又是我？）样式：假如文档中的表格在word文档中排版时超出了该文档的边界线，你会发现超出边界的内容又不见了

3.目录乱码：如果文档中有目录，目录会被加上一些超链接，需要手工处理去掉

4.图片无法解析：有一些格式的图片无法解析，比如EMF，WMF这种类型的

5.批注无法显示：目前没有找到批注显示的地方，暂且算丢失吧，后面在试试

6.。。。其它暂时还没被发现的问题

POI

poi是apache的一个开源项目，不多说，直接上官网去下载就可以

官网地址：http://poi.apache.org/

如果你是android studio用户：那就很简单了

只需要引入依赖（版本号不一定哦，gradle会自己把相关依赖包下载到位）：

compile 'fr.opensagres.xdocreport:org.apache.poi.xwpf.converter.xhtml:1.0.5'

那如果你是eclipse用户（伙计，赶紧用studio吧）

需要手工引入以下jar包，包括：

poi , poi-ooxml , ooxml-schema,org.apache.poi.xwpf.converter.xhtml,org.apache.poi.xwpf.converter.core

实现代码如下

{

InputStream is = new FileInputStream(file)

XWPFDocument docx = new

XWPFDocument(is)

OutputStream os = new ByteArrayOutputStream()

String imgDesPath = "/sdcard/img"

File imgFile = new File("/sdcard/img")

this.baseUrl = this.getDir("image", Context.MODE_PRIVATE).toURL().toString()

if (!imgFile.exists()) {

file.mkdirs()

}

poi解析的问题

速度比docx4j要稍快一点，会有文档内容解析不全样式丢失的情况

流程

调用接口将docx转化为html，然后app中通过webview加载该html即可显示

转化代码如下（我就想问下，这代码格式到底该怎么调啊～好烦躁）：

try {

InputStream is = new FileInputStream(file)

XWPFDocument docx = new

XWPFDocument(is)

OutputStream os = new ByteArrayOutputStream()

String imgDesPath = "/sdcard/img"

File imgFile = new File("/sdcard/img")

this.baseUrl = this.getDir("image", Context.MODE_PRIVATE).toURL().toString()

if (!imgFile.exists()) {

file.mkdirs()

}

XHTMLOptions options = XHTMLOptions.create().URIResolver(new BasicURIResolver(imgDesPath))

options.setExtractor(new FileImageExtractor(imgFile))

options.setIgnoreStylesIfUnused(false)

options.setFragment(true)

XHTMLConverter.getInstance().convert(docx, os, options)

**os.write("/sdcard/xxx/html文件")**

} catch (Exception e) {

Log.d(TAG, "catch " + e.getMessage())

}

webview 里面直接load 上面生成的html文件就可以了

不知道你是具体读取Word里面的什么元素，下面以读取文字和图片为例吧，两个代码示例，你参考看看:

读取文本

import com.spire.doc.Document

import java.io.FileWriter

import java.io.IOException

public class ExtractText {

public static void main(String[] args) throws IOException {

//加载Word文档

Document document = new Document()

document.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.docx")

//获取文档中的文本保存为String

String text=document.getText()

//将String写入Txt文件

writeStringToTxt(text,"ExtractedText.txt")

}

public static void writeStringToTxt(String content, String txtFileName) throws IOException {

FileWriter fWriter= new FileWriter(txtFileName,true)

try {

fWriter.write(content)

}catch(IOException ex){

ex.printStackTrace()

}finally{

try{

fWriter.flush()

fWriter.close()

} catch (IOException ex) {

ex.printStackTrace()

}

2. 读取图片

import com.spire.doc.Document

import com.spire.doc.documents.DocumentObjectType

import com.spire.doc.fields.DocPicture

import com.spire.doc.interfaces.ICompositeObject

import com.spire.doc.interfaces.IDocumentObject

import javax.imageio.ImageIO

import java.awt.image.BufferedImage

import java.io.File

import java.io.IOException

import java.util.ArrayList

import java.util.LinkedList

import java.util.List

import java.util.Queue

public class ExtractImages {

public static void main(String[] args) throws IOException {

//加载Word文档

Document document = new Document()

document.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.docx")

//创建Queue对象

Queue nodes = new LinkedList()

nodes.add(document)

//创建List对象

List images = new ArrayList()

//遍历文档中的子对象

while (nodes.size() >0) {

ICompositeObject node = nodes.poll()

for (int i = 0i <node.getChildObjects().getCount()i++) {

IDocumentObject child = node.getChildObjects().get(i)

if (child instanceof ICompositeObject) {

nodes.add((ICompositeObject) child)

//获取图片并添加到List

if (child.getDocumentObjectType() == DocumentObjectType.Picture) {

DocPicture picture = (DocPicture) child

images.add(picture.getImage())

}

//将图片保存为PNG格式文件

for (int i = 0i <images.size()i++) {

File file = new File(String.format("output/图片-%d.png", i))

ImageIO.write(images.get(i), "PNG", file)

}

注意这里使用的jar包是spire.doc.jar，需要在java程序中先导入jar文件。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/bake/11875292.html

如何用android解析docx文档

发表评论

评论列表（0条）