自然场景下特定人工标识的识别_python

文章目录

一、目标
二、需学习和调用的内容
三、解决方案和实施
- 1. 总体方案
- 2.具体实施
详细代码

一、目标

在自然场景下检测并识别出五类特定标识（特定标志是用蓝色A4纸打印）。输入为自然场景下的图像，输出为该图像中该标识对应的数字标号。对于五类之外的样本输出数字0，达到字母识别分类.

二、需学习和调用的内容

（1）Python的Opencv库调用，百度OCR字符识别
（2）K均值聚类，HSV蓝色区域提取，矩形形态比计算

三、解决方案和实施 1. 总体方案

对于字母的识别采用两步走，分别为图像截取和字母识别分类，首先将图片进行聚类之后提取蓝色区域，再根据形态比截取出矩形A4纸区域，然后对截取后的图像进行优化处理。再调用OCR字符识别AIP，对截取后的图像进行识别分类。

2.具体实施

（1）人工标识提取：根据训练样本设计思路，我们只需要将图片中的蓝色联通区域提取出来即可。将原始图片转化为HSV图像，根据HSV各个通道的阈值，根据HSV阈值表，设置蓝色阈值参数。cv2.inRange(hsv,blue_lower,blue_upper)函数用来判断图像中每个像素是否在[low,upper]之间，如果是，则对应的像素赋值为255，即白色，其余的赋值为0，即黑色。

但是从提取出蓝色区域后的图像可以看出，虽然可以看出来图片中有一些白色噪点，以及字母矩形框周围有毛边。因此考虑对图像进行降噪优化。首先对提取蓝色区域后的图像进行模糊化 *** 作，将小的白色噪点过滤掉。再将图片进行二值化处理，只有0和1表示。对于字母矩形框周围的毛边，采用腐蚀和膨胀处理。其中腐蚀 *** 作可以腐蚀图像中的白色像素，从而消除白色小斑点，而膨胀处理将剩余的白色像素扩张并增长回去。经过优化 *** 作之后，噪点和白边可见明显地去除

之后查找轮廓，将轮廓转换为矩形，cv2.minAreaRect()函数可以将连续区域的像素点集合用最小矩形框出来。然后根据矩形的坐标，原图标注，并进行裁剪，即可得到目标区域。

发现此时错误框取了车玻璃，因此分析如何过滤掉车窗。从直观看，字母的蓝色区域和车窗的黑色区域应该很容易识别，但是根据HSV表可以看出，黑色和蓝色的各个通道阈值有交叉。通常思路为改变HSV取值范围，从而更精准地只提取出字母所在的蓝色矩形区域。但是考虑颜色相近时，以及光照的影响等，手动调参较为繁琐，鲁棒性差。因此考虑聚类，将原始图像的蓝色区域更加明显，聚类后的图像进行之前的提取步骤。由于聚类后的图像由于采用均值聚类，蓝色聚类中心附近的周围像素点均赋值为蓝色，因此颜色更加均匀。如下图可以看出，聚类后的矩形标注过滤掉了车窗，最后只截取了字母矩形框。

但是在后续图像处理中，发现有些图片背景中会有与字母矩形框颜色相同的区域。此时会框取出诸多蓝色矩形区域。

此时考虑在本次任务中，字母矩形框时标准A4纸，因此可以设计形态比，通过设置矩形区域长度、宽度、长宽比阈值过滤背景干扰。此过程在于通过读取训练样本中的每个图片中字母矩形的长度和和宽度，由于拍摄远近不同，因此不同图片的长宽可能相差较大，但是长宽比是基本一定的，因此加入长宽比阈值进行判断。整个判断的公式如下：

if h1-h2>0 and l1-l2>0:
   if h1-h2>150 and h1>0 and h2>0 and l1>0 and l2>0 and h1-h2<800 and l1-l2>250 and l1-l2<1000 and abs(float((l1-l2)/(h1-h2)-1.5))<0.3:

其中h1，h2分别代表矩形的最高点和最低点，l1和l2分别代表矩形水平方向的最远点和最近点。h1-h2代表宽度，l1-l2代表长度。通过设置形态比后，可以准确唯一截取出字母矩形。在接下来的OCR字符读取环节中，由于截取的矩形边缘有白色条纹影响，如图11所示，出现错误多识别现象。因此采取向内截取的方法，将标注出的矩形区域，向内部多截取一部分。在此实验中，采用宽度向内截取60，长度向内截取80。重新截取后的图像如图12。

python temp = image[h2+30:h1-30,l2+40:l1-40]

最后将训练集中的图片字母均提取出，可见达到了较好的效果。

但是仍会出现一种情况，即当字母矩形区域与背景蓝色区域是连通时，如图14所示，用形态比框取矩形区域不再适用，考虑OCR具有较为强大的识别能力，因此此类问题简化为截取原始图像的2/3区域，再进行识别。

（2）OCR字符识别：调用百度OCR字符识别API ```text=client.basicGeneral(image)```，进行字母识别。虽然OCR字符识别是全方位的，但是由于在自然场景中有诸多干扰。当测试图片中字母较歪时，会难以识别。因此加入循环旋转识别。如下图可见当对图片进行两次10°旋转后，能准确读取字母。 ![在这里插入图片描述](http://www.kaotop.com/file/tupian/20220423/124b0e6a4c594dc186fc5d6c36976598.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA5YiY5Yqg5rK56KaB5Yqq5Yqb,size_17,color_FFFFFF,t_70,g_se,x_16) 详细代码

# coding=utf-8
import os
from aip import AipOcr
from PIL import Image
import glob
import cv2
import numpy as np
import time
import matplotlib.pyplot as plt

# '''先对text文本进行清空  '''
# with open("20053057.txt",'a') as f:
#     f.truncate(0)
# '''文本清空结束'''

config = {
# 可以自己申请百度账号，申请完只需要改下边三个内容即可  网址：https://console.bce.baidu.com/ai/?_=1609926825165#/ai/ocr/app/detail~appId=2166744
'appId':'23492925',
'apiKey':'xjGgjAde0RT8i7vG27Y9wr23',
'secretKey':'ke7yfqSW8hmiyW7EVBolAM2H0PXn0O0g'
}
client = AipOcr(**config)
def get_file_content(filePath):
    with open(filePath,'rb') as fp:
        return fp.read()


for i in range(246,248):
    # time.sleep(0.5)
    # img= "detected_img/""cut_res"+str(i)+".jpg"
    img= "test/"+str(i)+".jpg"
    # piclevel = Image.open(img)
    # imgSize = piclevel.size #图片的大小
    # w = piclevel.width      #图片的宽
    # h = piclevel.height     #图片的高
    # print('w',w)
    # print('h',h)
    # print('size',imgSize)
    try:
        image = Image.open(img)
        # image.show()
    except:
        print('Open Error! Try again!')
        continue
    else:
        print("image"+str(i))
        """ 读取图片 """
        img=cv2.imread(img)
        # cv2.imshow('img',img)

        '''下列代码为聚类'''
        Z = img.reshape((-1,3))
        # convert to np.float32
        Z = np.float32(Z)
        # define criteria, number of clusters(K) and apply kmeans()
        criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
        K = 8
        ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
        #print(center)
        #print(len(label))
        # Now convert back into uint8, and make original image
        center = np.uint8(center)
        res = center[label.flatten()]
        # print(res)
        img = res.reshape((img.shape))
        #print(res2)
        # cv2.imshow('img',img)
        '''聚类结束'''

        # print('img:',type(img),img.shape,img.dtype)
        #cv2.imshow('img',img)
        #cv2.waitKey(0)
        hsv=cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
        # cv2.imshow('hsv',hsv)

        #提取蓝色区域
        blue_lower=np.array([100,50,50])
        blue_upper=np.array([124,255,255])
        mask=cv2.inRange(hsv,blue_lower,blue_upper)
        # print('mask',type(mask),mask.shape)
        # cv2.imshow('mask',mask)

        #模糊
        blurred=cv2.blur(mask,(9,9))
        # cv2.imshow('blurred',blurred)
        #二值化
        ret,binary=cv2.threshold(blurred,127,255,cv2.THRESH_BINARY)
        # cv2.imshow('blurred binary',binary)
        # cv2.imwrite("erzhi.jpg", binary)

        #使区域闭合无空隙
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 7))
        closed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
        # cv2.imshow('closed',closed)
        # cv2.imwrite("bihe.jpg", closed)
        #腐蚀和膨胀
        '''
        腐蚀 *** 作将会腐蚀图像中白色像素，以此来消除小斑点，
        而膨胀 *** 作将使剩余的白色像素扩张并重新增长回去。
        '''
        erode=cv2.erode(closed,None,iterations=4)
        # cv2.imshow('erode',erode)
        dilate=cv2.dilate(erode,None,iterations=4)
        # cv2.imshow('dilate',dilate)
        # cv2.imwrite("pengzhang.jpg", dilate)
        # 查找轮廓
        contours, hierarchy=cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
        # print('轮廓个数：',len(contours))
        res=img.copy()
        # j=0
        # temp=0
        cut=0   #是否能截图的标志
        for con in contours:
            #轮廓转换为矩形
            rect=cv2.minAreaRect(con)
            #矩形转换为box
            box=np.int0(cv2.boxPoints(rect))
            #在原图画出目标区域
            cv2.drawContours(res,[box],-1,(0,0,255),2)
            # print([box])
            #计算矩形的行列
            h1=max([box][0][0][1],[box][0][1][1],[box][0][2][1],[box][0][3][1])
            h2=min([box][0][0][1],[box][0][1][1],[box][0][2][1],[box][0][3][1])
            l1=max([box][0][0][0],[box][0][1][0],[box][0][2][0],[box][0][3][0])
            l2=min([box][0][0][0],[box][0][1][0],[box][0][2][0],[box][0][3][0])
            # print('h1',h1)
            # print('h2',h2)
            # print('l1',l1)
            # print('l2',l2)

            #加上防错处理，确保裁剪区域无异常 if h1-h2>0 and l1-l2>0:
            if h1-h2>150 and h1>0 and h2>0 and l1>0 and l2>0 and h1-h2<800 and l1-l2>250 and l1-l2<1000 and abs(float((l1-l2)/(h1-h2)-1.5))<0.3:
                #裁剪矩形区域
                temp=img[h2+30:h1-30,l2+40:l1-40]
                cut=1
        if cut==0:
        # try:
        #     cv2.imwrite("./pic/"+str(i)+".jpg", temp)
        # except:
            print('不能框图，截取中间部位')
            piclevel = Image.open("test/"+str(i)+".jpg")
            w = piclevel.width      #图片的宽
            h = piclevel.height     #图片的高
            w_center=int(w/2)
            h_center=int(h/2)
            # print('w',w)
            # print('h',h)
            w_l=int(w/3)
            h_l=int(h/3)
            temp=img[w_center-w_l:w_center+w_l,h_center-h_l:h_center+h_l]
            # temp=img[w_center-400:w_center+400,h_center-200:h_center+200]
            # j=j+1
            #显示裁剪后的标志
            # cv2.imshow('sign'+str(j),temp)
        cv2.imwrite("./pic/"+str(i)+".jpg", temp)

    '''开始进行识别'''
    # time.sleep(1)
    # img= "VOCdevkit/VOC2007/JPEGImages/C"+str(i)+".jpg"
    # img= "detected_img/""cut_res"+str(i)+".jpg"
    pic= "pic/"+str(i)+".jpg"
    # img= "img/"+str(i)+".jpg"
    try:
        image = Image.open(pic)
        # image.show()
    except:
        print('Open Error! Try again!')
        continue
    else:
        # print("image"+str(i))
        """ 读取图片 """
        # image_1=get_file_content(img)
        # text = client.basicAccurate(image_1)
        # result = text["words_result"]

        #  '''下列代码为聚类'''
        # image=cv2.imread(img)
        # # image = Image.open(img)
        # Z = image.reshape((-1,3))
        # # convert to np.float32
        # Z = np.float32(Z)
        # # define criteria, number of clusters(K) and apply kmeans()
        # criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
        # K = 8
        # ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
        # #print(center)
        # #print(len(label))
        # # Now convert back into uint8, and make original image
        # center = np.uint8(center)
        # res = center[label.flatten()]
        # # print(res)
        # image = res.reshape((image.shape))
        # '''聚类结束'''
        # hsv=cv2.cvtColor(image,cv2.COLOR_BGR2HSV)
        # # cv2.imshow('hsv',hsv)
        # #提取蓝色区域
        # blue_lower=np.array([100,50,50])
        # blue_upper=np.array([124,255,255])
        # mask=cv2.inRange(hsv,blue_lower,blue_upper)
        # print('mask',type(mask),mask.shape)
        # # cv2.imshow('mask',mask)
        # #模糊
        # blurred=cv2.blur(mask,(9,9))
        # # cv2.imshow('blurred',blurred)
        # #二值化
        # ret,binary=cv2.threshold(blurred,127,255,cv2.THRESH_BINARY)
        # # cv2.imshow('blurred binary',binary)

        # cv2.imwrite("./img/"+str(i)+".jpg", binary)

        result=""
        for k in range(8):
            time.sleep(0.5)
            image_1=get_file_content(pic)
            '''basicGeneral一般精度5000张限制'''
            text = client.basicGeneral(image_1)    
            '''高精度500张限制'''
            # text = client.basicAccurate(image_1)  
            result = text["words_result"]
            if len(result)!=0:
                break
            image_1 = Image.open(pic)
            image_1=image_1.rotate(5)
            image_1.save(pic)
            print("rotate:",k+1)
        for j in result:
            print(j["words"])
        '''以下用来进行识别和保存二进制txt文件'''
        if len(result)==0:
            num=0
        elif len(result)>1:
            num=0
        elif len(result)==1:
            for p in result:
                if p["words"]=='A':
                    num=1
                elif p["words"]=='B':
                    num=2
                elif p["words"]=='C':
                    num=3
                elif p["words"]=='D':
                    num=4
                elif p["words"]=='E':
                    num=5
                else:
                    num=0
        with open("20053057.txt",'a') as f:
            f.write(str(num))
            f.write('\n')
            f.close()