用Java 读取 PDF 遇到中文标签该怎么处理_IT百科

直接使用系统字体读取或创建带中文的pdf，需要注意jar的版本。

<groupId>com.itextpdf</groupId>

<artifactId>itextpdf</artifactId>

</dependency>

<groupId>com.itextpdf</groupId>

<artifactId>itext-asian</artifactId>

</dependency>

<groupId>com.itextpdf.tool</groupId>

<artifactId>xmlworker</artifactId>

</dependency>123456789101112131415

代码如下，覆写XMLWorkerFontProvider$getFont即可读取中文

public void createPdf(String src, String dest) throws IOException, DocumentException {

Document document = new Document()

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest))

document.open()

XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(src), null, new XMLWorkerFontProvider(){public Font getFont(final String fontname, final String encoding,

final boolean embedded, final float size, final int style,

final BaseColor color) {

BaseFont bf = null

try {

bf = BaseFont.createFont("C:/Windows/Fonts/SIMYOU.TTF",BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED)

} catch (Exception e) {

e.printStackTrace()

}

Font font = new Font(bf, size, style, color)

font.setColor(color)

return font

}

})

document.close()

}1234567891011121314151617181920212223

创建时，使用系统（windows下）的字体即可

BaseFont baseFont = BaseFont.createFont("C:/Windows/Fonts/SIMYOU.TTF",BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED)

Font font = new Font(baseFont)

①建立com.lowagie.text.Document对象的实例。

Document document = new Document()

②建立一个书写器(Writer)与document对象关联，通过书写器(Writer)可以将文档写入到磁盘中。

PDFWriter.getInstance(document, new FileOutputStream("Helloworld.PDF"))

③打开文档。

document.open()

④向文档中添加内容。

document.add(new Paragraph("Hello World"))

⑤关闭文档。

document.close()

通过上面的5个步骤，就能产生一个Helloworld.PDF的文件，文件内容为"Hello World"。

可以用Spire.Pdf for Java类库给PDF文档添加附件，下面的代码是插入Excel和Word附件给你参考：

import com.spire.pdf.annotations.*

import com.spire.pdf.attachments.PdfAttachment

import com.spire.pdf.graphics.*

import java.awt.*

import java.awt.geom.Dimension2D

import java.awt.geom.Rectangle2D

import java.io.File

import java.io.FileInputStream

import java.io.IOException

public class AttachFiles {

public static void main(String[] args) throws IOException {

//创建PdfDocument对象

PdfDocument doc = new PdfDocument()

//加载PDF文档

doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.pdf")

//添加附件到PDF

PdfAttachment attachment = new PdfAttachment("C:\\Users\\Administrator\\Desktop\\使用说明书.docx")

doc.getAttachments().add(attachment)

//绘制标签

String label = "财务报表.xlsx"

PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial Unicode MS",Font.PLAIN,12),true)

double x = 35

double y = doc.getPages().get(0).getActualSize().getHeight() - 200

doc.getPages().get(0).getCanvas().drawString(label, font, PdfBrushes.getOrange(), x, y)

//添加注释附件到PDF

String filePath = "C:\\Users\\Administrator\\Desktop\\财务报表.xlsx"

byte[] data = toByteArray(filePath)

Dimension2D size = font.measureString(label)

Rectangle2D bound = new Rectangle2D.Float((float) (x + size.getWidth() + 2), (float) y, 10, 15)

PdfAttachmentAnnotation annotation = new PdfAttachmentAnnotation(bound, filePath, data)

annotation.setColor(new PdfRGBColor(new Color(0, 128, 128)))

annotation.setFlags(PdfAnnotationFlags.Default)

annotation.setIcon(PdfAttachmentIcon.Graph)

annotation.setText("点击打开财务报表.xlsx")

doc.getPages().get(0).getAnnotationsWidget().add(annotation)

//保存文档

doc.saveToFile("Attachments.pdf")

}

//读取文件到byte数组

public static byte[] toByteArray(String filePath) throws IOException {

File file = new File(filePath)

long fileSize = file.length()

if (fileSize >Integer.MAX_VALUE) {

System.out.println("file too big...")

return null

}

FileInputStream fi = new FileInputStream(file)

byte[] buffer = new byte[(int) fileSize]

int offset = 0

int numRead = 0

while (offset <buffer.length &&(numRead = fi.read(buffer, offset, buffer.length - offset)) >= 0) {

offset += numRead

}

if (offset != buffer.length) {

throw new IOException("Could not completely read file "

+ file.getName())

}

fi.close()

return buffer

}

效果：

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/bake/11787732.html

用Java 读取 PDF 遇到中文标签该怎么处理

发表评论

评论列表（0条）