Pytesseract OCR多个配置选项

Pytesseract OCR多个配置选项,第1张

Pytesseract OCR多个配置选项

tesseract-4.0.0a
支持下面
psm
。如果要具有单个字符识别,请设置
psm =10
。并且如果您的文本仅包含数字,则可以设置
tessedit_char_whitelist=0123456789

Page segmentation modes:  0    Orientation and script detection (OSD) only.  1    Automatic page segmentation with OSD.  2    Automatic page segmentation, but no OSD, or OCR.  3    Fully automatic page segmentation, but no OSD. (Default)  4    Assume a single column of text of variable sizes.  5    Assume a single uniform block of vertically aligned text.  6    Assume a single uniform block of text.  7    Treat the image as a single text line.  8    Treat the image as a single word.  9    Treat the image as a single word in a circle. 10    Treat the image as a single character. 11    Sparse text. Find as much text as possible in no particular order. 12    Sparse text with OSD. 13    Raw line. Treat the image as a single text line,  bypassing hacks that are Tesseract-specific.

这是

image_to_string
带有多个参数的示例用法

target = pytesseract.image_to_string(image, lang='eng', boxes=False,         config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

希望这可以帮助。



欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5643588.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存