打开受损文件,然后单击选择另存为。
选择将原文件存储到其他位置并用网页的格式保存。确保选中完整的电子表格。
用excell打开保存的文件,并以Excel格式再次保存。
如果运气足够好,文件损坏就会消失
Java可以使用这个开源框架,对word进行读取合并等 *** 作,Apache POI是一个开源的利用Java读写Excel、WORD等微软OLE2组件文档的项目。最新的3.5版本有很多改进,加入了对采用OOXML格式的Office 2007支持,如xlsx、docx、pptx文档。 示例如下:import org.apache.poi.POITextExtractorimport org.apache.poi.hwpf.extractor.WordExtractor
//得到.doc文件提取器
org.apache.poi.hwpf.extractor.WordExtractor doc = new WordExtractor(new FileInputStream(filePath))
//提取.doc正文文本
String text = doc.getText()
//提取.doc批注
String[] comments = doc. getCommentsText()
2007
import org.apache.poi.POITextExtractor
import org.apache.poi.xwpf.extractor.XWPFWordExtractor
import org.apache.poi.xwpf.usermodel.XWPFComment
import org.apache.poi.xwpf.usermodel.XWPFDocument
//得到.docx文件提取器
org.apache.poi.xwpf.extractor.XWPFWordExtractor docx = new XWPFWordExtractor(POIXMLDocument.openPackage(filePath))
//提取.docx正文文本
String text = docx.getText()
//提取.docx批注
org.apache.poi.xwpf.usermodel.XWPFComment[] comments = docx.getDocument()).getComments()
for(XWPFComment comment:comments){
comment.getId()//提取批注Id
comment.getAuthor()//提取批注修改人
comment.getText()//提取批注内容
}
实现代码如下:
public class Word2Html {public static void main(String argv[]) {
try {
//word 路径 html输出路径
convert2Html("D:/doctohtml/1.doc","D:/doctohtml/1.html")
} catch (Exception e) {
e.printStackTrace()
}
}
public static void writeFile(String content, String path) {
FileOutputStream fos = null
BufferedWriter bw = null
try {
File file = new File(path)
fos = new FileOutputStream(file)
bw = new BufferedWriter(new OutputStreamWriter(fos,"utf-8"))
bw.write(content)
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace()
} catch (IOException ioe) {
ioe.printStackTrace()
} finally {
try {
if (bw != null)
bw.close()
if (fos != null)
fos.close()
} catch (IOException ie) {
}
}
}
public static void convert2Html(String fileName, String outPutFile)
throws TransformerException, IOException,
ParserConfigurationException {
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName))//WordToHtmlUtils.loadDoc(new FileInputStream(inputFile))
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument())
wordToHtmlConverter.setPicturesManager( new PicturesManager()
{
public String savePicture( byte[] content,
PictureType pictureType, String suggestedName,
float widthInches, float heightInches )
{
//html 中 图片标签中 显示的图片路路径 <img src="d:/test/0.jpg"/>
return "d:/doctohtml/"+suggestedName
}
} )
wordToHtmlConverter.processDocument(wordDocument)
//save pictures
List pics=wordDocument.getPicturesTable().getAllPictures()
if(pics!=null){
for(int i=0i<pics.size()i++){
Picture pic = (Picture)pics.get(i)
System.out.println()
try {
//word中图片的存储路径
pic.writeImageContent(new FileOutputStream("D:/doctohtml/"
+ pic.suggestFullFileName()))
} catch (FileNotFoundException e) {
e.printStackTrace()
}
}
}
Document htmlDocument = wordToHtmlConverter.getDocument()
ByteArrayOutputStream out = new ByteArrayOutputStream()
DOMSource domSource = new DOMSource(htmlDocument)
StreamResult streamResult = new StreamResult(out)
TransformerFactory tf = TransformerFactory.newInstance()
Transformer serializer = tf.newTransformer()
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8")
serializer.setOutputProperty(OutputKeys.INDENT, "yes")
serializer.setOutputProperty(OutputKeys.METHOD, "html")
serializer.transform(domSource, streamResult)
out.close()
writeFile(new String(out.toByteArray()), outPutFile)
}
}
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)