_python_内存溢出

本人是一名在校大学生，服务于校学生会。在本学期结束后就要离开这个组织了，离开之前想为它留一个小礼物。因学生会每学期招新后，都需要部长对着新同学的课表来做一张空课表来安排常规任务，很花时间，于是就决定做一个小项目来实现一键导出空课表，为后来的学弟学妹们提供方便。
由于本人技术尚不成熟，很多地方也都是自己胡乱搞，因此部分代码仅供参考，欢迎各位朋友提供更优的写法。

文章目录

一：注册百度云账号，并创建文字识别应用
- 1：访问百度智能云官网，并注册账户
- 2：注册完成后，会进入如下页面，点击“l领取免费资源”
- 3：进行个人认证，获取更多资源
- 4：勾选所需要的资源（图片含小序号）
二:创建应用
- 1：点击创建应用
- - 注：在这里iOCR接口已经包含在文字识别这个应用里了
- 2：输入完信息后，点击立即创建
- 3：查看应用
- 4：得到很重要的AppID，API Key（AK），Secret Key（SK）
三：设置自定义iOCR识别模块
- 1：访问百度AI
- 2：点击左上角的开放能力——找到文字识别——找到iOCR通用版
- 3：进入如下界面，点击立即使用
- 4：创建模板
- 5：传入图片，命名完成后开始创建
- 6：创建参考字段
- - 注意：
- 7:框选识别区（按自己需求）
- 8：完成以上步骤就可以发布了
- 9：得到重要的模块id，这将是之后代码的一个重要参数（通过这个参数找到这个模块）
四：阅读API文档，并完成重要代码搭建
- 1：首先打开文档中心----找到iOCR文档-----找到API文档
- 2：其他的可以都不看，参数栏一定要仔细阅读（马上你就知道为啥）
- 3：往下翻到请求代码示例，选择Python
- - 然后然后然后然后，Ctrl+c，Ctrl+v，直接搬走
  - 注：通过读取参数：只留下一个templateSign
- 4：获得access_token（很重要）
- - 注意：官方文档返回的是这样的东西
  - 我们要的
  - 上述代码的两个参数
五：实现一键导出空课表
- 正则表达式提取重要信息
六：完整代码（有两处需要手动修改）

一：注册百度云账号，并创建文字识别应用 1：访问百度智能云官网，并注册账户

https://login.bce.baidu.com/account=&redirect=http%3A%2F%2Fconsole.bce.baidu.com%2Fai%2F%3F_%3D1650510786757%26fromai%3D1#/ai/ocr/overview/resource/list

2：注册完成后，会进入如下页面，点击“l领取免费资源”

3：进行个人认证，获取更多资源

4：勾选所需要的资源（图片含小序号）

领取成功后，免费测试资源将在30分钟内生效，生效后开始步骤二。

二:创建应用 1：点击创建应用

文字识别部分都是默认勾选的，所以我们就不用管了。

注：在这里iOCR接口已经包含在文字识别这个应用里了

2：输入完信息后，点击立即创建

3：查看应用

4：得到很重要的AppID，API Key（AK），Secret Key（SK）

三：设置自定义iOCR识别模块 1：访问百度AI

https://ai.baidu.com/

2：点击左上角的开放能力——找到文字识别——找到iOCR通用版

（参照图中序号和小箭头）

3：进入如下界面，点击立即使用

4：创建模板

5：传入图片，命名完成后开始创建 6：创建参考字段

（我这边图片放小是为了保护隐私，毕竟也不是我的课表。ps：我舍友的）

注意：

创建参考字段时，不能跨行（即传入图片中的行），“一个参考字段不允许有两行。”
**框选4个以上参照字段，并尽量分散在四角。**这个还是很重要的，如果模板的这一点没做好，后面相同的模型也是识别不出来的，就像这样：

出现这个情况就说明，参照字段没有选好。
因此参考字段一定要选好，我的经验就是它推荐的那样，分布在四周，且保证每个区都能识别出文字（因为参考字段识别结果不能为空）
详细见：文字识别常见问题：https://ai.baidu.com/ai-doc/OCR/ik3h7y8b1

7:框选识别区（按自己需求）

8：完成以上步骤就可以发布了

9：得到重要的模块id，这将是之后代码的一个重要参数（通过这个参数找到这个模块）

四：阅读API文档，并完成重要代码搭建 1：首先打开文档中心----找到iOCR文档-----找到API文档

2：其他的可以都不看，参数栏一定要仔细阅读（马上你就知道为啥）

3：往下翻到请求代码示例，选择Python 然后然后然后然后，Ctrl+c，Ctrl+v，直接搬走

以下（知道你们懒，先别照搬，先往下看，内部有我们要传的参数，已经在代码中标注）：

import base64
import requests
import sys

if sys.version_info.major == 2:
    from urllib import quote
else:
    from urllib.parse import quote



headers = {
        'Content-Type': "application/x-www-form-urlencoded",
        'charset': "utf-8"
    }
if __name__ == '__main__':
    recognise_api_url = "https://aip.baidubce.com/rest/2.0/solution/v1/iocr/recognise"

#这里要改##############################################################这里要改
    access_token = "your_access_token"
    templateSign = "your_template_sign"
    #classifierId = "your_classifier_id"
#这里要改##############################################################这里要改



    # 测试数据路径
    image_path = "your_image_path"
    try:
        with open(image_path, 'rb') as f:
            image_data = f.read()
        if sys.version_info.major == 2:
            image_b64 = base64.b64encode(image_data).replace("\r", "")
        else:
            image_b64 = base64.b64encode(image_data).decode().replace("\r", "")

        # 请求模板的bodys
        recognise_bodys = "access_token=" + access_token + "&templateSign=" + templateSign + \
                "&image=" + quote(image_b64.encode("utf8"))
        # 请求分类器的bodys
        classifier_bodys = "access_token=" + access_token + "&classifierId=" + classifierId + \
                "&image=" + quote(image_b64.encode("utf8"))

        # 请求模板识别
        response = requests.post(recognise_api_url, data=recognise_bodys, headers=headers)
        # 请求分类器识别
        # response = requests.post(recognise_api_url, data=classifier_bodys, headers=headers)

        print(response.text)
    except Exception as e:
        print (e)

注：通过读取参数：只留下一个templateSign

并且传入的值就是，第三大步获得的模板id

接下来只剩一个参数是access_token，获取它即可。

4：获得access_token（很重要）

官方文档：https://ai.baidu.com/ai-doc/REFERENCE/Ck3dwjhhu

如下（可千万别抄，往下看）

import requests 
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=【官网获取的AK】&client_secret=【官网获取的SK】'
response = requests.get(host)
if response:
    print(response.json())

注意：官方文档返回的是这样的东西

我们要的

是：而不是开头的refresh_token（一开始我就栽在这里了）

因此调整代码（利用字典）

import requests 
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=【官网获取的AK】&client_secret=【官网获取的SK】'
response = requests.get(host)
if response:
  Access=response.json().get('access_token')

上述代码的两个参数

【官网获取的AK】
【官网获取的SK】
即是第二大步获得的AK,SK,将其替换进去即可，就可以得到access_token
再代入到3的代码里即可。

五：实现一键导出空课表正则表达式提取重要信息

在完成以上后就可以调用百度自定义iOCR接口识别图片了，如下

里面全是字符，在这里我们使用正则表达式来提取数据
如果不会正则表达式的可以看这里
正则表达式：https://blog.csdn.net/weixin_55159605/article/details/124085670

剩下的就是零碎的读取文件图片、获得正则表达式的信息，并且在筛选出重要信息存入word即可，这里便不再讲述，详细见代码**(还需要在同目录下新建一个名为图片的文件夹，在其内部存放图片)。**

六：完整代码（有两处需要手动修改）

import base64
import requests
import sys
import re
import os

if sys.version_info.major == 2:
    from urllib import quote
else:
    from urllib.parse import quote

    


#获得access_token的在这里
host = 'https://aip.baidubce.com/oauth/2.0/token?
    ######################################  这里要改 
grant_type=client_credentials&client_id=【官网获取的AK】&client_secret=【官网获取的SK】'
    ######################################  这里要改 
response = requests.get(host)
if response:
    ACCESS=response.json().get('refresh_token')
    ACCESS="'"+ACCESS+"'"

headers = {
        'Content-Type': "application/x-www-form-urlencoded",
        'charset': "utf-8"
    }


if __name__ == '__main__':
    recognise_api_url = "https://aip.baidubce.com/rest/2.0/solution/v1/iocr/recognise"

	access_token =ACCESS
    ######################################  这里要改 
    templateSign = "你的模块id"
	#####################################  这里要改 


    #遍历所有图片课表
    path=os.getcwd()#获取当前位置
    
    path=eval(repr(path).replace('\','\\'))##单斜杠变成双斜杠,防止字符转义

    dir_path=path+'\\'+'图片'
    #遍历文件夹内所有文件
    files=os.listdir(dir_path)
    for file in files:
        image_path = dir_path+'\'+file
     
    # 测试数据路径        
        try:
            with open(image_path, 'rb') as f:
                image_data = f.read()
            if sys.version_info.major == 2:
                image_b64 = base64.b64encode(image_data).replace("\r", "")
            else:
                image_b64 = base64.b64encode(image_data).decode().replace("\r", "")

            # 请求模板的bodys
            recognise_bodys = "access_token=" + access_token + "&templateSign=" + templateSign + \
                    "&image=" + quote(image_b64.encode("utf8"))
      

            # 请求模板识别
            response = requests.post(recognise_api_url, data=recognise_bodys, headers=headers)
            # 请求分类器识别
            # response = requests.post(recognise_api_url, data=classifier_bodys, headers=headers)

            result=response.text
            #找名字
            name=re.findall('"word":"(.*?)课表"}',result)[0]
            
            Mon=re.findall('"word_name":"周一","word":"星期一(.*?)"',result)[0]
            Tus=re.findall('"word_name":"周二","word":"星期二(.*?)"',result)[0]
            Wes=re.findall('"word_name":"周三","word":"星期三(.*?)"',result)[0]
            Thu=re.findall('"word_name":"周四","word":"星期四(.*?)"',result)[0]
            Fri=re.findall('"word_name":"周五","word":"星期五(.*?)"',result)[0]
            Sat=re.findall('"word_name":"周六","word":"星期六(.*?)"',result)[0]
            Sun=re.findall('"word_name":"周日","word":"星期日(.*?)"',result)[0]
            text='周一：'+Mon+'\n'+'周二：'+Tus+'\n'+'周三：'+Wes+'\n'+'周四：'+Thu+'\n'+'周五：'+Fri+'\n'+'周六：'+Sat+'\n'+'周日：'+Sun

            #创建文件夹
            os.makedirs('空课表',exist_ok=True)
            path=os.getcwd()#获取当前所在文件位置
            path=eval(repr(path).replace('\','\'))#单斜杠变成双斜杠

            #周一一大
            Mon_list_12=[]
            MD=re.findall('周一：(.*?)周二',text,re.S)

            MD_12=re.findall('(1-2节)',MD[0])
            if MD_12!=[]:
                MD_12_Week=re.findall('1-2节\)(.*?周)',MD[0])
                with open(path+'\'+'空课表'+'\'+'周一一大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Mon_list_12.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周一一大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Mon_list_12[-1])


            #周一二大
            Mon_list_34=[]
            MD=re.findall('周一：(.*?)周二',text,re.S)

            MD_34=re.findall('(3-4节)',MD[0])
            if MD_34!=[]:
                MD_34_Week=re.findall('3-4节\)(.*?周)',MD[0])
                with open(path+'\'+'空课表'+'\'+'周一二大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Mon_list_34.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周一二大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Mon_list_34[-1])

            #周一三大
            Mon_list_56=[]
            MD=re.findall('周一：(.*?)周二',text,re.S)

            MD_56=re.findall('(5-6节)',MD[0])
            if MD_56!=[]:
                MD_56_Week=re.findall('5-6节\)(.*?周)',MD[0])
                with open(path+'\'+'空课表'+'\'+'周一三大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Mon_list_56.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周一三大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Mon_list_56[-1])
                
            #周一四大
            Mon_list_78=[]
            MD=re.findall('周一：(.*?)周二',text,re.S)

            MD_78=re.findall('(7-8节)',MD[0])
            if MD_78!=[]:
                MD_78_Week=re.findall('7-8节\)(.*?周)',MD[0])
                with open(path+'\'+'空课表'+'\'+'周一四大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Mon_list_78.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周一四大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Mon_list_78[-1])


            #周二一大
            Tus_list_12=[]
            TD=re.findall('周二：(.*?)周三',text,re.S)

            TD_12=re.findall('(1-2节)',TD[0])
            if TD_12!=[]:
                TD_12_Week=re.findall('1-2节\)(.*?周)',TD[0])
                with open(path+'\'+'空课表'+'\'+'周二一大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Tus_list_12.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周二一大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Tus_list_12[-1])


            #周二二大
            Tus_list_34=[]
            TD=re.findall('周二：(.*?)周三',text,re.S)

            TD_34=re.findall('(3-4节)',TD[0])
            if TD_34!=[]:
                TD_34_Week=re.findall('3-4节\)(.*?周)',TD[0])
                with open(path+'\'+'空课表'+'\'+'周二二大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Tus_list_34.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周二二大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Tus_list_34[-1])
                    

            #周二三大
            Tus_list_56=[]
            TD=re.findall('周二：(.*?)周三',text,re.S)

            TD_56=re.findall('(5-6节)',TD[0])
            if TD_56!=[]:
                TD_56_Week=re.findall('5-6节\)(.*?周)',TD[0])
                with open(path+'\'+'空课表'+'\'+'周二三大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Tus_list_56.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周二三大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Tus_list_56[-1])
                
            #周二四大
            Tus_list_78=[]
            TD=re.findall('周二：(.*?)周三',text,re.S)

            TD_78=re.findall('(7-8节)',TD[0])
            if TD_78!=[]:
                TD_78_Week=re.findall('7-8节\)(.*?周)',TD[0])
                with open(path+'\'+'空课表'+'\'+'周二四大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Tus_list_78.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周二四大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Tus_list_78[-1])

            #周三一大
            Wed_list_12=[]
            WD=re.findall('周三：(.*?)周四',text,re.S)

            WD_12=re.findall('(1-2节)',WD[0])
            if WD_12!=[]:
                WD_12_Week=re.findall('1-2节\)(.*?周)',WD[0])
                with open(path+'\'+'空课表'+'\'+'周三一大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Wed_list_12.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周三一大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Wed_list_12[-1])

           #周三二大
            Wed_list_34=[]
            WD=re.findall('周三：(.*?)周四',text,re.S)

            WD_34=re.findall('(3-4节)',WD[0])
            if WD_34!=[]:
                WD_34_Week=re.findall('3-4节\)(.*?周)',WD[0])
                with open(path+'\'+'空课表'+'\'+'周三二大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Wed_list_34.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周三二大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Wed_list_34[-1])

           #周三三大
            Wed_list_56=[]
            WD=re.findall('周三：(.*?)周四',text,re.S)

            WD_56=re.findall('(5-6节)',WD[0])
            if WD_56!=[]:
                WD_56_Week=re.findall('5-6节\)(.*?周)',WD[0])
                with open(path+'\'+'空课表'+'\'+'周三三大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Wed_list_56.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周三三大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Wed_list_56[-1])

           #周三四大
            Wed_list_78=[]
            WD=re.findall('周三：(.*?)周四',text,re.S)

            WD_78=re.findall('(7-8节)',WD[0])
            if WD_78!=[]:
                WD_78_Week=re.findall('7-8节\)(.*?周)',WD[0])
                with open(path+'\'+'空课表'+'\'+'周三四大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Wed_list_78.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周三四大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Wed_list_78[-1])
            


            #周四一大
            Thr_list_12=[]
            TH=re.findall('周四：(.*?)周五',text,re.S)

            TH_12=re.findall('(1-2节)',TH[0])
            if TH_12!=[]:
                TH_12_Week=re.findall('1-2节\)(.*?周)',TH[0])
                with open(path+'\'+'空课表'+'\'+'周四一大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Thr_list_12.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周四一大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Thr_list_12[-1])


            #周四二大
            Thr_list_34=[]
            TH=re.findall('周四：(.*?)周五',text,re.S)

            TH_34=re.findall('(3-4节)',TH[0])
            if TH_34!=[]:
                TH_34_Week=re.findall('3-4节\)(.*?周)',TH[0])
                with open(path+'\'+'空课表'+'\'+'周四二大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Thr_list_34.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周四二大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Thr_list_34[-1])

            #周四三大
            Thr_list_56=[]
            TH=re.findall('周四：(.*?)周五',text,re.S)

            TH_56=re.findall('(5-6节)',TH[0])
            if TH_56!=[]:
                TH_56_Week=re.findall('5-6节\)(.*?周)',TH[0])
                with open(path+'\'+'空课表'+'\'+'周四三大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Thr_list_56.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周四三大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Thr_list_56[-1])


            #周四四大
            Thr_list_78=[]
            TH=re.findall('周四：(.*?)周五',text,re.S)

            TH_78=re.findall('(7-8节)',TH[0])
            if TH_78!=[]:
                TH_78_Week=re.findall('7-8节\)(.*?周)',TH[0])
                with open(path+'\'+'空课表'+'\'+'周四四大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Thr_list_78.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周四四大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Thr_list_78[-1])

            #周五一大
            Fr_list_12=[]
            FD=re.findall('周五：(.*?)周六',text,re.S)

            Fr_12=re.findall('(1-2节)',FD[0])
            if Fr_12!=[]:
                Fr_12_Week=re.findall('1-2节\)(.*?周)',FD[0])
                with open(path+'\'+'空课表'+'\'+'周五一大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Fr_list_12.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周五一大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Fr_list_12[-1])

            #周五二大
            Fr_list_34=[]
            FD=re.findall('周五：(.*?)周六',text,re.S)

            Fr_34=re.findall('(3-4节)',FD[0])
            if Fr_34!=[]:
                Fr_34_Week=re.findall('3-4节\)(.*?周)',FD[0])
                with open(path+'\'+'空课表'+'\'+'周五二大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Fr_list_34.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周五二大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Fr_list_34[-1])

            #周五三大
            Fr_list_56=[]
            FD=re.findall('周五：(.*?)周六',text,re.S)

            Fr_56=re.findall('(5-6节)',FD[0])
            if Fr_56!=[]:
                Fr_56_Week=re.findall('5-6节\)(.*?周)',FD[0])
                with open(path+'\'+'空课表'+'\'+'周五三大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Fr_list_56.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周五三大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Fr_list_56[-1])


            #周五四大
            Fr_list_78=[]
            FD=re.findall('周五：(.*?)周六',text,re.S)

            Fr_78=re.findall('(7-8节)',FD[0])
            if Fr_78!=[]:
                Fr_78_Week=re.findall('7-8节\)(.*?周)',FD[0])
                with open(path+'\'+'空课表'+'\'+'周五四大空格表.docx','a+',encoding='GBK') as f:
                    f.write('')
            else:
                Fr_list_78.append(name)
                #创建文件
                with open(path+'\'+'空课表'+'\'+'周五四大空格表.docx','a+',encoding='GBK') as f:
                    f.write(Fr_list_78[-1])
                    

                
        except Exception as e:
            print (e)

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/755894.html

发表评论

评论列表（0条）