怎么把Microsoft+word文档改为word文档？_教程

首先准备一个测试的odt文件(演示用),百度搜索openoffice下载并安装软件

运行软件openoffice然后选择打开文件,在路漏宽举径返碧中要打开的文件就可以打开了

在打开文件之后,直接将文档另存为word格式保存,即巧含可。

Apache Open Office Writer 能打开并脊 Word 文件格式。

Apache OpenOffice是一款先进的开源办公软件套件，它包含文本文档、电子表格、演示文稿、绘图、数据库等。它能够支渗清持许多语言并且在所有普通计算机上工作。它将你所有的数据以国际开放绝喊渗标准格式存储下来，并能够读写从其它常用办公软件包来的文件。它可以被完全免费下载并使用于任何用途。

将Word转Html的原理是这样的：

1、客户上传Word文档到服务器

2、服务器调用OpenOffice程序打开上传的Word文档

3、OpenOffice将Word文档另存为Html格式

4、Over

至此可见，这要求服务器端安装OpenOffice软件，其实也可以此燃是MS Office，不过OpenOffice的优势是跨平台，你懂的。恩，说明一下，本文的测试基于 MS Win7 Ultimate X64 系统。

下面就是规规矩矩的实现弯慧。

1、下载OpenOffice，

2、下载Jodconverter 这是一埋扒答个开启OpenOffice进行格式转化的第三方jar包。

3、泡杯热茶，等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice的一项服务：C:\Program Files (x86)\OpenOffice.org 3\program>soffice -headless -accept="socket,port=8100urp"

5、打开eclipse

6、喝杯热茶，等待eclipse打开。

7、新建eclipse项目，导入Jodconverter/lib 下得jar包。

* commons-io

* jodconverter

* juh

* jurt

* ridl

* slf4j-api

* slf4j-jdk14

* unoil

* xstream

8、Coding...

查看代码

package com.mzule.doc2html.util

import java.io.BufferedReader

import java.io.File

import java.io.FileInputStream

import java.io.FileNotFoundException

import java.io.IOException

import java.io.InputStreamReader

import java.net.ConnectException

import java.util.Date

import java.util.regex.Matcher

import java.util.regex.Pattern

import com.artofsolving.jodconverter.DocumentConverter

import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection

import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection

import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter

/**

* 将Word文档转换成html字符串的工具类

* @author MZULE

public class Doc2Html {

public static void main(String[] args) {

System.out

.println(toHtmlString(new File("C:/test/test.doc"), "C:/test"))

}

/**

* 将word文档转换成html文档

* @param docFile

*需要转换的word文档

* @param filepath

*转换之后html的存放路径

* @return 转换之后的html文件

public static File convert(File docFile, String filepath) {

// 创建保存html的文件

File htmlFile = new File(filepath + "/" + new Date().getTime()

+ ".html")

// 创建Openoffice连接

OpenOfficeConnection con = new SocketOpenOfficeConnection(8100)

try {

// 连接

con.connect()

} catch (ConnectException e) {

System.out.println("获取OpenOffice连接失败...")

e.printStackTrace()

}

// 创建转换器

DocumentConverter converter = new OpenOfficeDocumentConverter(con)

// 转换文档问html

converter.convert(docFile, htmlFile)

// 关闭openoffice连接

con.disconnect()

return htmlFile

}

/**

* 将word转换成html文件，并且获取html文件代码。

* @param docFile

*需要转换的文档

* @param filepath

*文档中图片的保存位置

* @return 转换成功的html代码

public static String toHtmlString(File docFile, String filepath) {

// 转换word文档

File htmlFile = convert(docFile, filepath)

// 获取html文件流

StringBuffer htmlSb = new StringBuffer()

try {

BufferedReader br = new BufferedReader(new InputStreamReader(

new FileInputStream(htmlFile)))

while (br.ready()) {

htmlSb.append(br.readLine())

}

br.close()

// 删除临时文件

htmlFile.delete()

} catch (FileNotFoundException e) {

e.printStackTrace()

} catch (IOException e) {

e.printStackTrace()

}

// HTML文件字符串

String htmlStr = htmlSb.toString()

// 返回经过清洁的html文本

return clearFormat(htmlStr, filepath)

}

/**

* 清除一些不需要的html标记

* @param htmlStr

*带有复杂html标记的html语句

* @return 去除了不需要html标记的语句

protected static String clearFormat(String htmlStr, String docImgPath) {

// 获取body内容的正则

String bodyReg = "<BODY .*</BODY>"

Pattern bodyPattern = Pattern.compile(bodyReg)

Matcher bodyMatcher = bodyPattern.matcher(htmlStr)

if (bodyMatcher.find()) {

// 获取BODY内容，并转化BODY标签为DIV

htmlStr = bodyMatcher.group().replaceFirst("<BODY", "<DIV")

.replaceAll("</BODY>", "</DIV>")

}

// 调整图片地址

htmlStr = htmlStr.replaceAll("<IMG SRC=\"", "<IMG SRC=\"" + docImgPath

+ "/")

// 把<P></P>转换成</div></div>保留样式

// content = content.replaceAll("(<P)([^>]*>.*?)(<\\/P>)",

// "<div$2</div>")

// 把<P></P>转换成</div></div>并删除样式

htmlStr = htmlStr.replaceAll("(<P)([^>]*)(>.*?)(<\\/P>)", "<p$3</p>")

// 删除不需要的标签

htmlStr = htmlStr

.replaceAll(

"<[/]?(font|FONT|span|SPAN|xml|XML|del|DEL|ins|INS|meta|META|[ovwxpOVWXP]:\\w+)[^>]*?>",

"")

// 删除不需要的属性

htmlStr = htmlStr

.replaceAll(

"<$1$2>")

return htmlStr

}

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/tougao/12208578.html

怎么把Microsoft+word文档改为word文档？

发表评论

评论列表（0条）