java解析word文档有哪些方法

java解析word文档有哪些方法,第1张

java读取word文档时,虽然网上介绍了很多插裂吵件poi、java2Word、jacob、itext等等,poi无法读取格式(新的API估

计行好像还在处于研发阶段,不太稳定,做项目不太敢用);java2Word、jacob容易报错找不到注册,比较诡异,我曾经在不同的机器上试过, *** 作

方法完全一致,有的机器不报错,有的报错,去他们论坛找高人解决也说不出原因,项目部署用它有点玄;itxt好像写很方便但是我查了好久资料没有见到过关

于读的好办法。经过一番选择还是折中点采用rtf最好,毕竟rtf是开源格式,不需要借助任何插件,只需基本IO *** 作外加编码转换即可。rtf格式文件

面看来和doc没啥区别,都可以用word打开,各种格式都可以设定。

----- 实现的功能:读取rtf模板内容(格式和文本内容),替换变化部分,形成新的rtf文档。

----- 实现思路:模如源伍板中固定部分手动输入,变化的部分用$info$表示,只需替换$info$即可。

1、采用字节的形式读取rtf模板内容

2、将可变的内容字符串转为rtf编码

3、替换原文中的可变部分,形成新的rtf文档

主要程序如下:

public String bin2hex(String bin) {

char[] digital = "0123456789ABCDEF".toCharArray()

StringBuffer sb = new StringBuffer("")

byte[] bs = bin.getBytes()

int bit

for (int i = 0i <bs.lengthi++) {

bit = (bs[i] &0x0f0)

>>4

sb.append("\\'")

sb.append(digital[bit])

bit = bs[i] &0x0f

sb.append(digital[bit])

}

return sb.toString()

}

public String readByteRtf(InputStream ins, String path){

String sourcecontent =

""

try{

ins = new

FileInputStream(path)

byte[] b

= new byte[1024]

if (ins == null) {

System.out.println("源模板文件不存在")

}

int bytesRead = 0

while (true) {

bytesRead = ins.read(b, 0, 1024)// return final read bytes

counts

if(bytesRead == -1) {// end of InputStream

System.out.println("读取模板文件结束")

break

}

sourcecontent += new String(b, 0, bytesRead)// convert to string

using bytes

}

}catch(Exception e){

e.printStackTrace()

}

return sourcecontent

}

以上为核心代码,剩余部分就是替换,从新组装java中的String.replace(oldstr,newstr)方法可以实现,在这就不贴了。源代码部分详见附件。

运行源代码前提:

c盘创建YQ目录,将附件中"模板.rtf"复制到YQ目录之下,运行OpreatorRTF.java文件即可,就会在YQ目录下渣或生成文件名如:21时15分19秒_cheney_记录.rtf

的文件。

package com

import java.io.File

import java.io.FileInputStream

import java.io.FileWriter

import java.io.IOException

import java.io.InputStream

import java.io.PrintWriter

import java.text.SimpleDateFormat

import java.util.Date

public class OperatorRTF {

public String strToRtf(String content){

char[] digital = "0123456789ABCDEF".toCharArray()

StringBuffer sb = new StringBuffer("")

byte[] bs = content.getBytes()

int bit

for (int i = 0i <bs.lengthi++) {

bit = (bs[i] &0x0f0)

>>4

sb.append("\\'")

sb.append(digital[bit])

bit = bs[i] &0x0f

sb.append(digital[bit])

}

return sb.toString()

}

public String replaceRTF(String content,String replacecontent,int

flag){

String rc = strToRtf(replacecontent)

String target = ""

if(flag==0){

target = content.replace("$timetop$",rc)

}

if(flag==1){

target = content.replace("$info$",rc)

}

if(flag==2){

target = content.replace("$idea$",rc)

}

if(flag==3){

target = content.replace("$advice$",rc)

}

if(flag==4){

target = content.replace("$infosend$",rc)

}

return target

}

public String getSavePath() {

String path = "C:\\YQ"

File fDirecotry = new File(path)

if (!fDirecotry.exists()) {

fDirecotry.mkdirs()

}

return path

}

public String ToSBC(String input){

char[] c =

input.toCharArray()

for (int i =

0i <c.lengthi++){

if (c[i] == 32){

c[i] = (char) 12288

continue

}

if (c[i] <127){

c[i] = (char) (c[i] + 65248)

}

}

return new

String(c)

}

public void rgModel(String username, String content) {

// TODO Auto-generated method stub

Date current=new Date()

SimpleDateFormat sdf=new java.text.SimpleDateFormat("yyyy-MM-dd

HH:mm:ss")

String targetname = sdf.format(current).substring(11,13) + "时"

targetname += sdf.format(current).substring(14,16) + "分"

targetname += sdf.format(current).substring(17,19) + "秒"

targetname += "_" + username +"_记录.rtf"

String strpath = getSavePath()

String sourname = strpath+"\\"+"模板.rtf"

String sourcecontent = ""

InputStream ins = null

try{

ins = new FileInputStream(sourname)

byte[] b = new byte[1024]

if (ins == null) {

System.out.println("源模板文件不存在")

}

int bytesRead = 0

while (true) {

bytesRead = ins.read(b, 0, 1024)// return final read bytes

counts

if(bytesRead == -1) {// end of InputStream

System.out.println("读取模板文件结束")

break

}

sourcecontent += new String(b, 0, bytesRead)// convert to string

using bytes

}

}catch(Exception e){

e.printStackTrace()

}

String targetcontent = ""

String array[] = content.split("~")

for(int i=0i<array.lengthi++){

if(i==0){

targetcontent = replaceRTF(sourcecontent, array[i], i)

}else{

targetcontent = replaceRTF(targetcontent, array[i], i)

}

}

try {

FileWriter fw = new FileWriter(getSavePath()+"\\" +

targetname,true)

PrintWriter out = new PrintWriter(fw)

if(targetcontent.equals("")||targetcontent==""){

out.println(sourcecontent)

}else{

out.println(targetcontent)

}

out.close()

fw.close()

System.out.println(getSavePath()+" 该目录下生成文件" +

targetname + " 成功")

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace()

}

}

public static void main(String[] args) {

// TODO Auto-generated method stub

OperatorRTF oRTF = new OperatorRTF()

String content =

"2008年10月12日9时-2008年10月12日6时~我们参照检验药品的方法~我们参照检验药品的方法~我们参照检验药品的方法~我们参照检验药品的方法"

oRTF.rgModel("cheney",content)

}

}

java用poi可以 *** 作word

下面是我灶茄仔程序里用的,不过只是解析文本,你可以参考下纳租:

import java.io.FileInputStream

import org.apache.poi.hwpf.extractor.WordExtractor

import org.apache.poi.xwpf.extractor.XWPFWordExtractor

import org.apache.poi.xwpf.usermodel.XWPFDocument

/**

* 对MS office文档的处理

* @author caoshen

*

*/

public class OfficeUtils {

/**

* 获得WORD文档所有的内容隐汪

* @param filePath

* @return

*/

public static String getWordContent(String filePath){

String content = ""

FileInputStream fis

try {

fis = new FileInputStream(filePath)

WordExtractor we = new WordExtractor(fis)

content = we.getText()

} catch (Exception e) {

try {

fis = new FileInputStream(filePath)

XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis))

content = oleTextExtractor.getText()

} catch (Exception e1) {

e1.printStackTrace()

}

}

return content

}

建一个Student实体类封装数雹键早据

public static List<Student>readXml() {

List<Student>list = new ArrayList<Student>()

//定义一个<a href="https://www.baidu.com/s?wd=dom%E8%A7%A3%E6%9E%90&tn=44039180_cpr&fenlei=mv6quAkxTZn0IZRqIHckPjm4nH00T1Y3uhnvryDYrjIBPyDYn1Rv0ZwV5Hcvrjm3rH6sPfKWUMw85HfYnjn4nH6sgvPsT6K1TL0qnfK1TL0z5HD0IgF_5y9YIZ0lQzqlpA-bmyt8mh7GuZR8mvqVQL7dugPYpyq8Q1RznjcYn1TLnH04rjcYnjTvPf" target="_blank" class="baidu-highlight">dom解析</a>器亮纳工厂实例

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()

try {

//由工厂实例得到一个<a href="https://www.baidu.com/s?wd=dom%E8%A7%A3%E6%9E%90&tn=44039180_cpr&fenlei=mv6quAkxTZn0IZRqIHckPjm4nH00T1Y3uhnvryDYrjIBPyDYn1Rv0ZwV5Hcvrjm3rH6sPfKWUMw85HfYnjn4nH6sgvPsT6K1TL0qnfK1TL0z5HD0IgF_5y9YIZ0lQzqlpA-bmyt8mh7GuZR8mvqVQL7dugPYpyq8Q1RznjcYn1TLnH04rjcYnjTvPf" target="_blank" class="baidu-highlight">dom解析</a>器

DocumentBuilder dom = factory.newDocumentBuilder()

//找到<a href="https://www.baidu.com/s?wd=xml%E6%96%87%E6%A1%A3&tn=44039180_cpr&fenlei=mv6quAkxTZn0IZRqIHckPjm4nH00T1Y3uhnvryDYrjIBPyDYn1Rv0ZwV5Hcvrjm3rH6sPfKWUMw85HfYnjn4nH6sgvPsT6K1TL0qnfK1TL0z5HD0IgF_5y9YIZ0lQzqlpA-bmyt8mh7GuZR8mvqVQL7dugPYpyq8Q1RznjcYn1TLnH04rjcYnjTvPf" target="_blank" class="baidu-highlight">xml文档</a>

File file=new File("src/com/jereh/ch05/Students.xml")

Document doc=dom.parse(file)

//

Element root = doc.getDocumentElement()

NodeList stuNodeList = root.getChildNodes()

for (int i = 0i <stuNodeList.getLength()i++) {

Node stu = stuNodeList.item(i)

Student student = new Student()

if (stu != null &&stu.getNodeType() == Node.ELEMENT_NODE) {

/源雀/ System.out.println(stu)

Element stuElement = (Element) stu

student.setNo(stuElement.getAttribute("id"))

// stu.getAttributes().getNamedItem(null)

// Element stu=(Element)stuNodeList

NodeList info = stuElement.getChildNodes()

for (int j = 0j <info.getLength()j++) {

info.item(j).getNodeName()

Node n = info.item(j)

if ("name".equals(n.getNodeName())) {

// n.getFirstChild().getNodeValue()

student.setName(n.getLastChild().getNodeValue())

} else if ("age".equals(n.getNodeName())) {

student.setAge(Integer.parseInt(n.getFirstChild()

.getNodeValue()))

}

list.add(student)

}

}

}

} catch (ParserConfigurationException e) {

// TODO Auto-generated catch block

e.printStackTrace()

} catch (SAXException e) {

// TODO Auto-generated catch block

e.printStackTrace()

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace()

}

// System.out.println(root.getNodeValue())

return list

}


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/yw/12385346.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-25
下一篇 2023-05-25

发表评论

登录后才能评论

评论列表(0条)

保存