Pytesseract OCR多个配置选项_随笔

Pytesseract OCR多个配置选项

tesseract-4.0.0a

支持下面

psm

。如果要具有单个字符识别，请设置

psm =10

。并且如果您的文本仅包含数字，则可以设置

tessedit_char_whitelist=0123456789

。

Page segmentation modes:  0    Orientation and script detection (OSD) only.  1    Automatic page segmentation with OSD.  2    Automatic page segmentation, but no OSD, or OCR.  3    Fully automatic page segmentation, but no OSD. (Default)  4    Assume a single column of text of variable sizes.  5    Assume a single uniform block of vertically aligned text.  6    Assume a single uniform block of text.  7    Treat the image as a single text line.  8    Treat the image as a single word.  9    Treat the image as a single word in a circle. 10    Treat the image as a single character. 11    Sparse text. Find as much text as possible in no particular order. 12    Sparse text with OSD. 13    Raw line. Treat the image as a single text line,  bypassing hacks that are Tesseract-specific.

这是

image_to_string

带有多个参数的示例用法。

target = pytesseract.image_to_string(image, lang='eng', boxes=False,         config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

希望这可以帮助。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5643588.html

Pytesseract OCR多个配置选项

发表评论

评论列表（0条）