python – PDF出血检测

python – PDF出血检测,第1张

概述我目前正在编写一个小工具( Python pyPdf)来测试PDF以确保打印机符合性. 唉,我已经对第一项任务感到困惑:检测PDF是否至少有3毫米’流血'(页面周围没有打印任何内容).我已经知道我无法检测完整文档的出血,因为似乎没有全局文档.然而,在页面上我总共可以检测到五个不同的盒子: > mediaBox > bleedBox > trimBox > cropBox > artBox 我阅读了 @H_403_1@ 我目前正在编写一个小工具( Python pypdf)来测试pdf以确保打印机符合性.

唉,我已经对第一项任务感到困惑:检测pdf是否至少有3毫米’流血'(页面周围没有打印任何内容).我已经知道我无法检测完整文档的出血,因为似乎没有全局文档.然而,在页面上我总共可以检测到五个不同的盒子:

> mediaBox
> bleedBox
> trimBox
> cropBox
> artBox

我阅读了关于那些盒子的pyPdf documentation,但我理解的唯一一个是mediaBox,它似乎代表整个页面大小(即纸张).

bleedBox显然应该定义出血,但似乎并非总是如此.

我注意到的另一件事是,例如PDF,所有这些盒子在每页上都有完全相同的尺寸(完全没有出血),但是当我打开它时会有大量的流血;这让我认为单个文本元素有自己的偏移量.

所以,显然,只计算mediaBox和bleedBox的出血不是一个可行的选择.

如果有人能够了解这些盒子实际上是什么以及我可以从中得出什么结论(例如,一个盒子总是小于另一个盒子),我将非常高兴.

奖金问题:有人能告诉我documentation中提到的“默认用户空间单位”究竟是什么?我很确定这是指机器上的mm,但是我想在各处执行mm.

解决方法 引用Adobe发布的pdf规范 ISO 32000-1:2008:

14.11.2 Page BoundarIEs

14.11.2.1 General

A pdf page may be prepared either for a finished medium,such as a
sheet of paper,or as part of a prepress process in which the content
of the page is placed on an intermediate medium,such as film or an
imposed reproduction plate. In the latter case,it is important to
distinguish between the intermediate page and the finished page. The
intermediate page may often include additional production-related
content,such as bleeds or printer marks,that falls outsIDe the
boundarIEs of the finished page. To handle such cases,a pdf page
maydefine as many as five separate boundarIEs to control varIoUs
aspects of the imaging process:

The media Box defines the boundarIEs of the physical medium on which
the page is to be printed. It may include any extended area
surrounding the finished page for bleed,printing marks,or other such
purposes. It may also include areas close to the edges of the medium
that cannot be marked because of physical limitations of the output
device. Content falling outsIDe this boundary may safely be discarded
without affecting the meaning of the pdf file.

The crop Box defines the region to which the contents of the page
shall be clipped (cropped) when displayed or printed. Unlike the other
Boxes,the crop Box has no defined meaning in terms of physical page
geometry or intended use; it merely imposes clipPing on the page
contents. However,in the absence of additional information (such as
imposition instructions specifIEd in a JDF or PJTF job ticket),the
crop Box determines how the page’s contents shall be positioned on the
output medium. The default value is the page’s media Box.

The bleed Box (pdf 1.3) defines the region to which the contents of
the page shall be clipped when output in a production environment.
This may include any extra bleed area needed to accommodate the
physical limitations of cutting,folding,and trimming equipment. The
actual printed page may include printing marks that fall outsIDe the
bleed Box. The default value is the page’s crop Box.

The trim Box (pdf 1.3) defines the intended dimensions of the
finished page after trimming. It may be smaller than the media Box to
allow for production-related content,such as printing instructions,
cut marks,or colour bars. The default value is the page’s crop Box.

The art Box (pdf 1.3) defines the extent of the page’s meaningful
content (including potential white space) as intended by the page’s
creator. The default value is the page’s crop Box.

The page object dictionary specifIEs these boundarIEs in the MediaBox,
CropBox,BleedBox,TrimBox,and ArtBox entrIEs,respectively (see
table 30). All of them are rectangles expressed in default user space
units. The crop,bleed,trim,and art Boxes shall not ordinarily
extend beyond the boundarIEs of the media Box. If they do,they are
effectively reduced to their intersection with the media Box. figure
86 illustrates the relationships among these boundarIEs. (The crop Box
is not shown in the figure because it has no defined relationship with
any of the other boundarIEs.)

接下来有一个漂亮的图形显示了彼此相关的框:

在很多情况下只设置媒体盒的原因是

>如果pdf用于电子消费(即在电脑上阅读),其他盒子几乎不重要;和
>即使在印前环境中,它们也不再像过去那样必要了,参见article Pedro在他的评论中提及.

关于你的“红利问题”:用户空间单位默认为1/72英寸;但是,从pdf 1.6开始,可以使用页面字典中的UserUnit条目将其更改为该大小的任何(不必要的整数)倍数.在现有pdf中更改它实际上是将其缩放,因为用户空间单位是页面的设备无关坐标系中的基本单位.因此,除非您要更新页面描述中的每个命令,并参考坐标以保持页面尺寸,否则您不希望强制执行毫米级用户空间单元…;)

总结

以上是内存溢出为你收集整理的python – PDF出血检测全部内容,希望文章能够帮你解决python – PDF出血检测所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/1191038.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-03
下一篇 2022-06-03

发表评论

登录后才能评论

评论列表(0条)

保存