转自AI Studio,原文链接:

【官方】十分钟完成 PP-
OCRv3 识别全流程实战 - 飞桨AI Studio

十分钟完成 PP-OCRv3 识别全流程实战

项目地址:PaddleOCR github 地址: https://github.com/PaddlePaddle/PaddleOCR

PaddleOCR是百度开源的超轻量级OCR模型库,提供了数十种文本检测、识别模型,旨在打造一套丰富、领先、实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。同时PaddleOCR也几经更新, 🔥在2022.5.9 发布最新版本PaddleOCR release/2.5 :

  • 发布PP-OCRv3,速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
  • 发布半自动标注工具PPOCRLabelv2:新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
  • 发布OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求;
  • 发布交互式OCR开源电子书《动手学OCR》,覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。

本教程旨在帮助使用者快速了解PP-OCRv3识别,并掌握其使用方式,包括:

    1. PP-OCR3识别快速使用
    1. 十分钟完成文本识别模型的训练和预测方式

最后带来PP-OCRv3直播预告,敬请期待!

1 PP-OCRv3识别快速使用

本节介绍如何使用PaddleOCR的轻量级模型完成文本识别的任务。

1.1 准备运行环境

首先,安装PaddleOCR的依赖库。

In [ ]

import os
# 修改代码运行的默认目录为 /home/aistudio/
os.chdir("/home/aistudio")
# 如果git clone方式下载速度慢,您可直接在github中下载PaddleOCR的dygraph分支的zip压缩文件,然后上传到工作环境中解压使用
#!unzip PaddleOCR-dygraph.zip
!git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR.git

In [ ]

# 安装依赖库
os.chdir("/home/aistudio/PaddleOCR")
!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple

1.2. 快速预测文字内容

测试图片:

In [8]


import os
os.chdir('/home/aistudio/PaddleOCR')
# 也可安装paddleocr whl包进行快速使用
# !pip install paddleocr
from paddleocr import PaddleOCR

ocr = PaddleOCR()  # need to run only once to download and load model into memory
img_path = '/home/aistudio/PaddleOCR/doc/imgs_words/en/word_1.png'
result = ocr.ocr(img_path, det=False)
for line in result:
    print(line)
  0%|          | 0.00/3.67M [00:00<?, ?iB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar
100%|██████████| 3.67M/3.67M [00:00<00:00, 6.36MiB/s]
  0%|          | 0.00/11.9M [00:00<?, ?iB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar
100%|██████████| 11.9M/11.9M [00:00<00:00, 42.0MiB/s]
 19%|█▉        | 279k/1.45M [00:00<00:00, 2.67MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|██████████| 1.45M/1.45M [00:00<00:00, 4.66MiB/s]
[2022/05/05 14:53:56] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv3', output='./output', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/aistudio/PaddleOCR/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='PP-STRUCTURE', table=True, table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
 
[2022/05/05 14:53:59] ppocr WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process
('JOINT', 0.9179949760437012)

2. 训练文字识别模型

本节提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:

2.1. 数据准备

2.1.1 自定义数据集

下面以通用数据集为例, 介绍如何准备数据集:

  • 训练集

建议将训练图片放入同一个文件夹,并用一个txt文件(rec_gt_train.txt)记录图片路径和标签,txt文件里的内容如下:

注意: txt文件中默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错。

" 图像文件名                 图像标注信息 "

train_data/rec/train/word_001.jpg   简单可依赖
train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
...

最终训练集应有如下文件结构:

|-train_data
  |-rec
    |- rec_gt_train.txt
    |- train
        |- word_001.png
        |- word_002.jpg
        |- word_003.jpg
        | ...

除上述单张图像为一行格式之外,PaddleOCR也支持对离线增广后的数据进行训练,为了防止相同样本在同一个batch中被多次采样,我们可以将相同标签对应的图片路径写在一行中,以列表的形式给出,在训练中,PaddleOCR会随机选择列表中的一张图片进行训练。对应地,标注文件的格式如下。

["11.jpg", "12.jpg"]   简单可依赖
["21.jpg", "22.jpg", "23.jpg"]   用科技让复杂的世界更简单
3.jpg   ocr

上述示例标注文件中,"11.jpg"和"12.jpg"的标签相同,都是简单可依赖,在训练的时候,对于该行标注,会随机选择其中的一张图片进行训练。

  • 验证集

同训练集类似,验证集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,验证集的结构如下所示:

|-train_data
  |-rec
    |- rec_gt_test.txt
    |- test
        |- word_001.jpg
        |- word_002.jpg
        |- word_003.jpg
        | ...

2.1.2 数据下载

  • ICDAR2015

若您本地没有数据集,可以在官网下载 ICDAR2015 数据,用于快速验证。也可以参考DTRB ,下载 benchmark 所需的lmdb格式数据集。

如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:

# 训练集标签
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# 测试集标签
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt

PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 ppocr/utils/gen_label.py, 这里以训练集为例:

# 将官网下载的标签文件转换为 rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"

数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件: 

我们在 ~/data/data34824/ 目录下准备了数据集,可以使用如下指令解压数据文件。

In [9]

!mkdir train_data && cd ./train_data/ && mkdir -p ic15_data && cd ic15_data && cp ~/data/data34824/ic15_rec.zip ./ && unzip -o -q ic15_rec.zip && tar xf ic15.tar

2.2. 开始训练

PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 PP-OCRv3中文识别模型为例:

首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune

In [10]

!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar

# 解压模型参数
!cd ./pretrain_models/ && tar -xf ch_PP-OCRv3_rec_train.tar && rm -rf ch_PP-OCRv3_rec_train.tar
--2022-05-05 14:54:16--  https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 287467520 (274M) [application/x-tar]
正在保存至: “./pretrain_models/ch_PP-OCRv3_rec_train.tar”

ch_PP-OCRv3_rec_tra 100%[===================>] 274.15M  47.9MB/s    in 8.6s    

2022-05-05 14:54:24 (31.9 MB/s) - 已保存 “./pretrain_models/ch_PP-OCRv3_rec_train.tar” [287467520/287467520])

2.2.1 启动训练

注意

需将configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml中的训练和评估数据集路径修改为ic15的数据集路径:

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/
    ext_op_transform_idx: 1
    label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
    ......
Eval:   
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data
    label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]

如果您安装的是cpu版本,请将配置文件中的 use_gpu 字段修改为false

启动训练命令很简单,指定好配置文件即可。另外在命令行中可以通过 -o 修改配置文件中的参数值。启动训练命令如下所示

其中:

  • Global.pretrained_model: 加载的预训练模型路径
  • Global.character_dict_path : 字典路径(这里只支持26个小写字母+数字)
  • Global.eval_batch_step : 评估频率
  • Global.epoch_num: 总训练轮数

如果训练速度慢,可去掉数据增强,但是当数据量较少,应用场景复杂时,建议保留数据增强,可提高模型泛化性和精度。

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/
    ext_op_transform_idx: 1
    label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    # - RecConAug:
    #     prob: 0.5
    #     ext_data_num: 2
    #     image_shape: [48, 320, 3]
    # - RecAug:

In [ ]

# 由于预训练模型提供的是蒸馏模型,需先将Student模型的参数提取出
import paddle
params = paddle.load('./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy' + '.pdparams')
new_state_dict = {}
for k1 in params.keys():
    if 'Student.' in k1:
        new_state_dict[k1.replace('Student.','')] = params[k1]
        # print(k1)
paddle.save(new_state_dict, './pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy'+'_new.pdparams')

# CPU 训练
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
# !python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
#  -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new Global.use_gpu=False \
#     Global.character_dict_path=ppocr/utils/en_dict.txt \
#     Global.eval_batch_step=[0,200] \
#     Global.epoch_num=40


# GPU训练 支持单卡,多卡训练
#单卡训练
!python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
 -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy_new\
    Global.character_dict_path=ppocr/utils/en_dict.txt \
    Global.eval_batch_step=[0,200] \
    Global.epoch_num=40
#多卡训练,通过--gpus参数指定卡号
#!python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy

2.3. 模型评估与预测

2.3.1 评估

训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。评估数据集可以通过 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml 修改Eval中的 label_file_path 设置。

In [22]


# GPU 评估
!python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt
# CPU 评估, Global.checkpoints 为待测权重
# !python3 tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.checkpoints=./output/rec_ppocr_v3/best_accuracy Global.use_gpu=False Global.character_dict_path=ppocr/utils/en_dict.txt
-----------  Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: 0
heter_devices: 
heter_worker_num: None
heter_workers: 
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: log
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers: 
training_script: tools/eval.py
training_script_args: ['-c', 'configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml', '-o', 'Global.checkpoints=./output/rec_ppocr_v3/best_accuracy', 'Global.character_dict_path=ppocr/utils/en_dict.txt']
worker_num: None
workers: 
------------------------------------------------
WARNING 2022-05-05 16:08:53,089 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2022-05-05 16:08:53,095 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:55453               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:55453               |
    |                     PADDLE_RANK_IN_NODE                        0                      |
    |                 PADDLE_LOCAL_DEVICE_IDS                        0                      |
    |                 PADDLE_WORLD_DEVICE_IDS                        0                      |
    |                     FLAGS_selected_gpus                        0                      |
    |             FLAGS_selected_accelerators                        0                      |
    +=======================================================================================+

INFO 2022-05-05 16:08:53,095 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
launch proc_id:10834 idx:0
[2022/05/05 16:08:54] ppocr INFO: Architecture : 
[2022/05/05 16:08:54] ppocr INFO:     Backbone : 
[2022/05/05 16:08:54] ppocr INFO:         last_conv_stride : [1, 2]
[2022/05/05 16:08:54] ppocr INFO:         last_pool_type : avg
[2022/05/05 16:08:54] ppocr INFO:         name : MobileNetV1Enhance
[2022/05/05 16:08:54] ppocr INFO:         scale : 0.5
[2022/05/05 16:08:54] ppocr INFO:     Head : 
[2022/05/05 16:08:54] ppocr INFO:         head_list : 
[2022/05/05 16:08:54] ppocr INFO:             CTCHead : 
[2022/05/05 16:08:54] ppocr INFO:                 Head : 
[2022/05/05 16:08:54] ppocr INFO:                     fc_decay : 1e-05
[2022/05/05 16:08:54] ppocr INFO:                 Neck : 
[2022/05/05 16:08:54] ppocr INFO:                     depth : 2
[2022/05/05 16:08:54] ppocr INFO:                     dims : 64
[2022/05/05 16:08:54] ppocr INFO:                     hidden_dims : 120
[2022/05/05 16:08:54] ppocr INFO:                     name : svtr
[2022/05/05 16:08:54] ppocr INFO:                     use_guide : True
[2022/05/05 16:08:54] ppocr INFO:             SARHead : 
[2022/05/05 16:08:54] ppocr INFO:                 enc_dim : 512
[2022/05/05 16:08:54] ppocr INFO:                 max_text_length : 25
[2022/05/05 16:08:54] ppocr INFO:         name : MultiHead
[2022/05/05 16:08:54] ppocr INFO:     Transform : None
[2022/05/05 16:08:54] ppocr INFO:     algorithm : SVTR
[2022/05/05 16:08:54] ppocr INFO:     model_type : rec
[2022/05/05 16:08:54] ppocr INFO: Eval : 
[2022/05/05 16:08:54] ppocr INFO:     dataset : 
[2022/05/05 16:08:54] ppocr INFO:         data_dir : ./train_data/ic15_data
[2022/05/05 16:08:54] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:08:54] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:08:54] ppocr INFO:         transforms : 
[2022/05/05 16:08:54] ppocr INFO:             DecodeImage : 
[2022/05/05 16:08:54] ppocr INFO:                 channel_first : False
[2022/05/05 16:08:54] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:08:54] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:08:54] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:08:54] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:08:54] ppocr INFO:             KeepKeys : 
[2022/05/05 16:08:54] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:08:54] ppocr INFO:     loader : 
[2022/05/05 16:08:54] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:08:54] ppocr INFO:         drop_last : False
[2022/05/05 16:08:54] ppocr INFO:         num_workers : 4
[2022/05/05 16:08:54] ppocr INFO:         shuffle : False
[2022/05/05 16:08:54] ppocr INFO: Global : 
[2022/05/05 16:08:54] ppocr INFO:     cal_metric_during_train : True
[2022/05/05 16:08:54] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:08:54] ppocr INFO:     checkpoints : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:08:54] ppocr INFO:     debug : False
[2022/05/05 16:08:54] ppocr INFO:     distributed : False
[2022/05/05 16:08:54] ppocr INFO:     epoch_num : 500
[2022/05/05 16:08:54] ppocr INFO:     eval_batch_step : [0, 2000]
[2022/05/05 16:08:54] ppocr INFO:     infer_img : doc/imgs_words/ch/word_1.jpg
[2022/05/05 16:08:54] ppocr INFO:     infer_mode : False
[2022/05/05 16:08:54] ppocr INFO:     log_smooth_window : 20
[2022/05/05 16:08:54] ppocr INFO:     max_text_length : 25
[2022/05/05 16:08:54] ppocr INFO:     pretrained_model : None
[2022/05/05 16:08:54] ppocr INFO:     print_batch_step : 10
[2022/05/05 16:08:54] ppocr INFO:     save_epoch_step : 3
[2022/05/05 16:08:54] ppocr INFO:     save_inference_dir : None
[2022/05/05 16:08:54] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:08:54] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:08:54] ppocr INFO:     use_gpu : True
[2022/05/05 16:08:54] ppocr INFO:     use_space_char : True
[2022/05/05 16:08:54] ppocr INFO:     use_visualdl : False
[2022/05/05 16:08:54] ppocr INFO: Loss : 
[2022/05/05 16:08:54] ppocr INFO:     loss_config_list : 
[2022/05/05 16:08:54] ppocr INFO:         CTCLoss : None
[2022/05/05 16:08:54] ppocr INFO:         SARLoss : None
[2022/05/05 16:08:54] ppocr INFO:     name : MultiLoss
[2022/05/05 16:08:54] ppocr INFO: Metric : 
[2022/05/05 16:08:54] ppocr INFO:     ignore_space : False
[2022/05/05 16:08:54] ppocr INFO:     main_indicator : acc
[2022/05/05 16:08:54] ppocr INFO:     name : RecMetric
[2022/05/05 16:08:54] ppocr INFO: Optimizer : 
[2022/05/05 16:08:54] ppocr INFO:     beta1 : 0.9
[2022/05/05 16:08:54] ppocr INFO:     beta2 : 0.999
[2022/05/05 16:08:54] ppocr INFO:     lr : 
[2022/05/05 16:08:54] ppocr INFO:         learning_rate : 0.001
[2022/05/05 16:08:54] ppocr INFO:         name : Cosine
[2022/05/05 16:08:54] ppocr INFO:         warmup_epoch : 5
[2022/05/05 16:08:54] ppocr INFO:     name : Adam
[2022/05/05 16:08:54] ppocr INFO:     regularizer : 
[2022/05/05 16:08:54] ppocr INFO:         factor : 3e-05
[2022/05/05 16:08:54] ppocr INFO:         name : L2
[2022/05/05 16:08:54] ppocr INFO: PostProcess : 
[2022/05/05 16:08:54] ppocr INFO:     name : CTCLabelDecode
[2022/05/05 16:08:54] ppocr INFO: Train : 
[2022/05/05 16:08:54] ppocr INFO:     dataset : 
[2022/05/05 16:08:54] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2022/05/05 16:08:54] ppocr INFO:         ext_op_transform_idx : 1
[2022/05/05 16:08:54] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:08:54] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:08:54] ppocr INFO:         transforms : 
[2022/05/05 16:08:54] ppocr INFO:             DecodeImage : 
[2022/05/05 16:08:54] ppocr INFO:                 channel_first : False
[2022/05/05 16:08:54] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:08:54] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:08:54] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:08:54] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:08:54] ppocr INFO:             KeepKeys : 
[2022/05/05 16:08:54] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:08:54] ppocr INFO:     loader : 
[2022/05/05 16:08:54] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:08:54] ppocr INFO:         drop_last : True
[2022/05/05 16:08:54] ppocr INFO:         num_workers : 4
[2022/05/05 16:08:54] ppocr INFO:         shuffle : True
[2022/05/05 16:08:54] ppocr INFO: profiler_options : None
[2022/05/05 16:08:54] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
[2022/05/05 16:08:54] ppocr INFO: Initialize indexs of datasets:['./train_data/ic15_data/rec_gt_test.txt']
W0505 16:08:54.870386 10834 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:08:54.875574 10834 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:08:59] ppocr INFO: resume from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:08:59] ppocr INFO: metric in ckpt ***************
[2022/05/05 16:08:59] ppocr INFO: acc:0.5551275851943784
[2022/05/05 16:08:59] ppocr INFO: norm_edit_dis:0.8100207578002598
[2022/05/05 16:08:59] ppocr INFO: fps:1855.1658462248745
[2022/05/05 16:08:59] ppocr INFO: best_epoch:36
[2022/05/05 16:08:59] ppocr INFO: start_epoch:37

eval model::   0%|          | 0/17 [00:00<?, ?it/s]
eval model::   6%|▌         | 1/17 [00:00<00:14,  1.08it/s]
eval model::  18%|█▊        | 3/17 [00:01<00:09,  1.49it/s]
eval model::  29%|██▉       | 5/17 [00:01<00:05,  2.04it/s]
eval model::  41%|████      | 7/17 [00:01<00:03,  2.73it/s]
eval model::  53%|█████▎    | 9/17 [00:01<00:02,  3.59it/s]
eval model::  65%|██████▍   | 11/17 [00:01<00:01,  4.72it/s]
eval model::  76%|███████▋  | 13/17 [00:01<00:00,  6.07it/s]
eval model::  88%|████████▊ | 15/17 [00:01<00:00,  7.59it/s]
eval model:: 100%|██████████| 17/17 [00:02<00:00,  8.22it/s]
[2022/05/05 16:09:01] ppocr INFO: metric eval ***************
[2022/05/05 16:09:01] ppocr INFO: acc:0.5551275851943784
[2022/05/05 16:09:01] ppocr INFO: norm_edit_dis:0.8100207578002598
[2022/05/05 16:09:01] ppocr INFO: fps:2231.024193160108
INFO 2022-05-05 16:09:05,136 launch.py:311] Local processes completed.

2.3.2 测试识别效果

使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。

默认预测图片存储在 infer_img 里,通过 -o Global.checkpoints 加载训练好的参数文件:

根据配置文件中设置的的 save_model_dir 和 save_epoch_step 字段,会有以下几种参数被保存下来:

output/rec/
├── best_accuracy.pdopt  
├── best_accuracy.pdparams  
├── best_accuracy.states  
├── config.yml  
├── iter_epoch_3.pdopt  
├── iter_epoch_3.pdparams  
├── iter_epoch_3.states  
├── latest.pdopt  
├── latest.pdparams  
├── latest.states  
└── train.log

其中 best_accuracy.* 是评估集上的最优模型;iter_epoch_x.* 是以 save_epoch_step 为间隔保存下来的模型;latest.* 是最后一个epoch的模型。

In [23]

# 预测英文结果
# GPU预测
!python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=doc/imgs_words/en/word_1.png
[2022/05/05 16:09:44] ppocr INFO: Architecture : 
[2022/05/05 16:09:44] ppocr INFO:     Backbone : 
[2022/05/05 16:09:44] ppocr INFO:         last_conv_stride : [1, 2]
[2022/05/05 16:09:44] ppocr INFO:         last_pool_type : avg
[2022/05/05 16:09:44] ppocr INFO:         name : MobileNetV1Enhance
[2022/05/05 16:09:44] ppocr INFO:         scale : 0.5
[2022/05/05 16:09:44] ppocr INFO:     Head : 
[2022/05/05 16:09:44] ppocr INFO:         head_list : 
[2022/05/05 16:09:44] ppocr INFO:             CTCHead : 
[2022/05/05 16:09:44] ppocr INFO:                 Head : 
[2022/05/05 16:09:44] ppocr INFO:                     fc_decay : 1e-05
[2022/05/05 16:09:44] ppocr INFO:                 Neck : 
[2022/05/05 16:09:44] ppocr INFO:                     depth : 2
[2022/05/05 16:09:44] ppocr INFO:                     dims : 64
[2022/05/05 16:09:44] ppocr INFO:                     hidden_dims : 120
[2022/05/05 16:09:44] ppocr INFO:                     name : svtr
[2022/05/05 16:09:44] ppocr INFO:                     use_guide : True
[2022/05/05 16:09:44] ppocr INFO:             SARHead : 
[2022/05/05 16:09:44] ppocr INFO:                 enc_dim : 512
[2022/05/05 16:09:44] ppocr INFO:                 max_text_length : 25
[2022/05/05 16:09:44] ppocr INFO:         name : MultiHead
[2022/05/05 16:09:44] ppocr INFO:     Transform : None
[2022/05/05 16:09:44] ppocr INFO:     algorithm : SVTR
[2022/05/05 16:09:44] ppocr INFO:     model_type : rec
[2022/05/05 16:09:44] ppocr INFO: Eval : 
[2022/05/05 16:09:44] ppocr INFO:     dataset : 
[2022/05/05 16:09:44] ppocr INFO:         data_dir : ./train_data/ic15_data
[2022/05/05 16:09:44] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:09:44] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:09:44] ppocr INFO:         transforms : 
[2022/05/05 16:09:44] ppocr INFO:             DecodeImage : 
[2022/05/05 16:09:44] ppocr INFO:                 channel_first : False
[2022/05/05 16:09:44] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:09:44] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:09:44] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:09:44] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:09:44] ppocr INFO:             KeepKeys : 
[2022/05/05 16:09:44] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:44] ppocr INFO:     loader : 
[2022/05/05 16:09:44] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:09:44] ppocr INFO:         drop_last : False
[2022/05/05 16:09:44] ppocr INFO:         num_workers : 4
[2022/05/05 16:09:44] ppocr INFO:         shuffle : False
[2022/05/05 16:09:44] ppocr INFO: Global : 
[2022/05/05 16:09:44] ppocr INFO:     cal_metric_during_train : True
[2022/05/05 16:09:44] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:09:44] ppocr INFO:     checkpoints : None
[2022/05/05 16:09:44] ppocr INFO:     debug : False
[2022/05/05 16:09:44] ppocr INFO:     distributed : False
[2022/05/05 16:09:44] ppocr INFO:     epoch_num : 500
[2022/05/05 16:09:44] ppocr INFO:     eval_batch_step : [0, 2000]
[2022/05/05 16:09:44] ppocr INFO:     infer_img : doc/imgs_words/en/word_1.png
[2022/05/05 16:09:44] ppocr INFO:     infer_mode : False
[2022/05/05 16:09:44] ppocr INFO:     log_smooth_window : 20
[2022/05/05 16:09:44] ppocr INFO:     max_text_length : 25
[2022/05/05 16:09:44] ppocr INFO:     pretrained_model : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:44] ppocr INFO:     print_batch_step : 10
[2022/05/05 16:09:44] ppocr INFO:     save_epoch_step : 3
[2022/05/05 16:09:44] ppocr INFO:     save_inference_dir : None
[2022/05/05 16:09:44] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:09:44] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:09:44] ppocr INFO:     use_gpu : True
[2022/05/05 16:09:44] ppocr INFO:     use_space_char : True
[2022/05/05 16:09:44] ppocr INFO:     use_visualdl : False
[2022/05/05 16:09:44] ppocr INFO: Loss : 
[2022/05/05 16:09:44] ppocr INFO:     loss_config_list : 
[2022/05/05 16:09:44] ppocr INFO:         CTCLoss : None
[2022/05/05 16:09:44] ppocr INFO:         SARLoss : None
[2022/05/05 16:09:44] ppocr INFO:     name : MultiLoss
[2022/05/05 16:09:44] ppocr INFO: Metric : 
[2022/05/05 16:09:44] ppocr INFO:     ignore_space : False
[2022/05/05 16:09:44] ppocr INFO:     main_indicator : acc
[2022/05/05 16:09:44] ppocr INFO:     name : RecMetric
[2022/05/05 16:09:44] ppocr INFO: Optimizer : 
[2022/05/05 16:09:44] ppocr INFO:     beta1 : 0.9
[2022/05/05 16:09:44] ppocr INFO:     beta2 : 0.999
[2022/05/05 16:09:44] ppocr INFO:     lr : 
[2022/05/05 16:09:44] ppocr INFO:         learning_rate : 0.001
[2022/05/05 16:09:44] ppocr INFO:         name : Cosine
[2022/05/05 16:09:44] ppocr INFO:         warmup_epoch : 5
[2022/05/05 16:09:44] ppocr INFO:     name : Adam
[2022/05/05 16:09:44] ppocr INFO:     regularizer : 
[2022/05/05 16:09:44] ppocr INFO:         factor : 3e-05
[2022/05/05 16:09:44] ppocr INFO:         name : L2
[2022/05/05 16:09:44] ppocr INFO: PostProcess : 
[2022/05/05 16:09:44] ppocr INFO:     name : CTCLabelDecode
[2022/05/05 16:09:44] ppocr INFO: Train : 
[2022/05/05 16:09:44] ppocr INFO:     dataset : 
[2022/05/05 16:09:44] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2022/05/05 16:09:44] ppocr INFO:         ext_op_transform_idx : 1
[2022/05/05 16:09:44] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:09:44] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:09:44] ppocr INFO:         transforms : 
[2022/05/05 16:09:44] ppocr INFO:             DecodeImage : 
[2022/05/05 16:09:44] ppocr INFO:                 channel_first : False
[2022/05/05 16:09:44] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:09:44] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:09:44] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:09:44] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:09:44] ppocr INFO:             KeepKeys : 
[2022/05/05 16:09:44] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:44] ppocr INFO:     loader : 
[2022/05/05 16:09:44] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:09:44] ppocr INFO:         drop_last : True
[2022/05/05 16:09:44] ppocr INFO:         num_workers : 4
[2022/05/05 16:09:44] ppocr INFO:         shuffle : True
[2022/05/05 16:09:44] ppocr INFO: profiler_options : None
[2022/05/05 16:09:44] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
W0505 16:09:44.179414 10923 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:09:44.184576 10923 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:09:48] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:48] ppocr INFO: infer_img: doc/imgs_words/en/word_1.png
[2022/05/05 16:09:48] ppocr INFO: 	 result: JOINT	0.9950313568115234
[2022/05/05 16:09:48] ppocr INFO: success!

预测图片:

预测使用的配置文件必须与训练一致.

测试文件夹下所有图像的文字识别效果

In [24]

# GPU预测
!python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.infer_img=./doc/imgs_words_en/
[2022/05/05 16:09:59] ppocr INFO: Architecture : 
[2022/05/05 16:09:59] ppocr INFO:     Backbone : 
[2022/05/05 16:09:59] ppocr INFO:         last_conv_stride : [1, 2]
[2022/05/05 16:09:59] ppocr INFO:         last_pool_type : avg
[2022/05/05 16:09:59] ppocr INFO:         name : MobileNetV1Enhance
[2022/05/05 16:09:59] ppocr INFO:         scale : 0.5
[2022/05/05 16:09:59] ppocr INFO:     Head : 
[2022/05/05 16:09:59] ppocr INFO:         head_list : 
[2022/05/05 16:09:59] ppocr INFO:             CTCHead : 
[2022/05/05 16:09:59] ppocr INFO:                 Head : 
[2022/05/05 16:09:59] ppocr INFO:                     fc_decay : 1e-05
[2022/05/05 16:09:59] ppocr INFO:                 Neck : 
[2022/05/05 16:09:59] ppocr INFO:                     depth : 2
[2022/05/05 16:09:59] ppocr INFO:                     dims : 64
[2022/05/05 16:09:59] ppocr INFO:                     hidden_dims : 120
[2022/05/05 16:09:59] ppocr INFO:                     name : svtr
[2022/05/05 16:09:59] ppocr INFO:                     use_guide : True
[2022/05/05 16:09:59] ppocr INFO:             SARHead : 
[2022/05/05 16:09:59] ppocr INFO:                 enc_dim : 512
[2022/05/05 16:09:59] ppocr INFO:                 max_text_length : 25
[2022/05/05 16:09:59] ppocr INFO:         name : MultiHead
[2022/05/05 16:09:59] ppocr INFO:     Transform : None
[2022/05/05 16:09:59] ppocr INFO:     algorithm : SVTR
[2022/05/05 16:09:59] ppocr INFO:     model_type : rec
[2022/05/05 16:09:59] ppocr INFO: Eval : 
[2022/05/05 16:09:59] ppocr INFO:     dataset : 
[2022/05/05 16:09:59] ppocr INFO:         data_dir : ./train_data/ic15_data
[2022/05/05 16:09:59] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2022/05/05 16:09:59] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:09:59] ppocr INFO:         transforms : 
[2022/05/05 16:09:59] ppocr INFO:             DecodeImage : 
[2022/05/05 16:09:59] ppocr INFO:                 channel_first : False
[2022/05/05 16:09:59] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:09:59] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:09:59] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:09:59] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:09:59] ppocr INFO:             KeepKeys : 
[2022/05/05 16:09:59] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:59] ppocr INFO:     loader : 
[2022/05/05 16:09:59] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:09:59] ppocr INFO:         drop_last : False
[2022/05/05 16:09:59] ppocr INFO:         num_workers : 4
[2022/05/05 16:09:59] ppocr INFO:         shuffle : False
[2022/05/05 16:09:59] ppocr INFO: Global : 
[2022/05/05 16:09:59] ppocr INFO:     cal_metric_during_train : True
[2022/05/05 16:09:59] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2022/05/05 16:09:59] ppocr INFO:     checkpoints : None
[2022/05/05 16:09:59] ppocr INFO:     debug : False
[2022/05/05 16:09:59] ppocr INFO:     distributed : False
[2022/05/05 16:09:59] ppocr INFO:     epoch_num : 500
[2022/05/05 16:09:59] ppocr INFO:     eval_batch_step : [0, 2000]
[2022/05/05 16:09:59] ppocr INFO:     infer_img : ./doc/imgs_words_en/
[2022/05/05 16:09:59] ppocr INFO:     infer_mode : False
[2022/05/05 16:09:59] ppocr INFO:     log_smooth_window : 20
[2022/05/05 16:09:59] ppocr INFO:     max_text_length : 25
[2022/05/05 16:09:59] ppocr INFO:     pretrained_model : ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:09:59] ppocr INFO:     print_batch_step : 10
[2022/05/05 16:09:59] ppocr INFO:     save_epoch_step : 3
[2022/05/05 16:09:59] ppocr INFO:     save_inference_dir : None
[2022/05/05 16:09:59] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v3
[2022/05/05 16:09:59] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
[2022/05/05 16:09:59] ppocr INFO:     use_gpu : True
[2022/05/05 16:09:59] ppocr INFO:     use_space_char : True
[2022/05/05 16:09:59] ppocr INFO:     use_visualdl : False
[2022/05/05 16:09:59] ppocr INFO: Loss : 
[2022/05/05 16:09:59] ppocr INFO:     loss_config_list : 
[2022/05/05 16:09:59] ppocr INFO:         CTCLoss : None
[2022/05/05 16:09:59] ppocr INFO:         SARLoss : None
[2022/05/05 16:09:59] ppocr INFO:     name : MultiLoss
[2022/05/05 16:09:59] ppocr INFO: Metric : 
[2022/05/05 16:09:59] ppocr INFO:     ignore_space : False
[2022/05/05 16:09:59] ppocr INFO:     main_indicator : acc
[2022/05/05 16:09:59] ppocr INFO:     name : RecMetric
[2022/05/05 16:09:59] ppocr INFO: Optimizer : 
[2022/05/05 16:09:59] ppocr INFO:     beta1 : 0.9
[2022/05/05 16:09:59] ppocr INFO:     beta2 : 0.999
[2022/05/05 16:09:59] ppocr INFO:     lr : 
[2022/05/05 16:09:59] ppocr INFO:         learning_rate : 0.001
[2022/05/05 16:09:59] ppocr INFO:         name : Cosine
[2022/05/05 16:09:59] ppocr INFO:         warmup_epoch : 5
[2022/05/05 16:09:59] ppocr INFO:     name : Adam
[2022/05/05 16:09:59] ppocr INFO:     regularizer : 
[2022/05/05 16:09:59] ppocr INFO:         factor : 3e-05
[2022/05/05 16:09:59] ppocr INFO:         name : L2
[2022/05/05 16:09:59] ppocr INFO: PostProcess : 
[2022/05/05 16:09:59] ppocr INFO:     name : CTCLabelDecode
[2022/05/05 16:09:59] ppocr INFO: Train : 
[2022/05/05 16:09:59] ppocr INFO:     dataset : 
[2022/05/05 16:09:59] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2022/05/05 16:09:59] ppocr INFO:         ext_op_transform_idx : 1
[2022/05/05 16:09:59] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2022/05/05 16:09:59] ppocr INFO:         name : SimpleDataSet
[2022/05/05 16:09:59] ppocr INFO:         transforms : 
[2022/05/05 16:09:59] ppocr INFO:             DecodeImage : 
[2022/05/05 16:09:59] ppocr INFO:                 channel_first : False
[2022/05/05 16:09:59] ppocr INFO:                 img_mode : BGR
[2022/05/05 16:09:59] ppocr INFO:             MultiLabelEncode : None
[2022/05/05 16:09:59] ppocr INFO:             RecResizeImg : 
[2022/05/05 16:09:59] ppocr INFO:                 image_shape : [3, 48, 320]
[2022/05/05 16:09:59] ppocr INFO:             KeepKeys : 
[2022/05/05 16:09:59] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio']
[2022/05/05 16:09:59] ppocr INFO:     loader : 
[2022/05/05 16:09:59] ppocr INFO:         batch_size_per_card : 128
[2022/05/05 16:09:59] ppocr INFO:         drop_last : True
[2022/05/05 16:09:59] ppocr INFO:         num_workers : 4
[2022/05/05 16:09:59] ppocr INFO:         shuffle : True
[2022/05/05 16:09:59] ppocr INFO: profiler_options : None
[2022/05/05 16:09:59] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
W0505 16:09:59.636674 10950 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:09:59.641562 10950 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:10:04] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_10.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: PAIN	0.9976047277450562
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_116.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: QBHOUSE	0.9709253311157227
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_19.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: SLOW	0.9971550703048706
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_201.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: HOUSE	0.9960419535636902
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_308.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: LITTLE	0.9545474052429199
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_336.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: SUPER	0.9802681803703308
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_401.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: BURGE	0.827716052532196
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_461.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: SPED	0.912112832069397
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_52.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: Future	0.9685637354850769
[2022/05/05 16:10:04] ppocr INFO: infer_img: ./doc/imgs_words_en/word_545.png
[2022/05/05 16:10:04] ppocr INFO: 	 result: EORIT	0.9076364636421204
[2022/05/05 16:10:04] ppocr INFO: success!

2.4. 模型导出与预测

inference 模型(paddle.jit.save保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

识别模型转inference模型与检测的方式相同,如下:

In [25]

# -c 后面设置训练算法的yml配置文件
# -o 配置可选参数
# Global.pretrained_model 参数设置待转换的训练模型地址,不用添加文件后缀 .pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir参数设置转换的模型将保存的地址。
!python3 tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=./output/rec_ppocr_v3/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt Global.save_inference_dir=./inference/ch_PP-OCRv3_rec/
W0505 16:10:41.224627 11024 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0505 16:10:41.229585 11024 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/05 16:10:45] ppocr INFO: load pretrain successful from ./output/rec_ppocr_v3/best_accuracy
[2022/05/05 16:10:47] ppocr INFO: inference model is saved to ./inference/ch_PP-OCRv3_rec/inference

**注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的character_dict_path为自定义字典文件。

转换成功后,在目录下有三个文件:

inference/ch_PP-OCRv3_rec/
    ├── inference.pdiparams         # 识别inference模型的参数文件
    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
    └── inference.pdmodel           # 识别inference模型的program文件
  • 自定义模型推理

    如果训练时修改了文本的字典,在使用inference模型预测时,需要通过--rec_char_dict_path指定使用的字典路径

注意

  • 使用PP-OCRv3识别进行推理时,不需要使用--rec_algorithm指定算法名称,使用默认的推理方式即为PP-OCRv3识别的推理过程。

In [27]

# GPU预测
!python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt

# CPU预测
!python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir=./inference/ch_PP-OCRv3_rec/ --rec_char_dict_path=ppocr/utils/en_dict.txt --use_gpu=False
[2022/05/05 16:13:50] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802668690681458)
[2022/05/05 16:13:53] ppocr INFO: Predicts of ./doc/imgs_words_en/word_336.png:('SUPER', 0.9802700281143188)

推理预测的图片为:

3. FAQ

Q1: 训练模型转inference 模型之后预测效果不一致?

A:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致

Q2: 如何自定义字典、修改backbone、训练多语言模型?

A:请参考PP-OCRv3识别详细教程。

4. 直播预告

🔥2022.5.11~13 每晚8:30【超强OCR技术详解与产业应用实战】三日直播课

  • 11日:开源最强OCR系统PP-OCRv3揭秘
  • 12日:云边端全覆盖的PP-OCRv3训练部署实战
  • 13日:OCR产业应用全流程拆解与实战 赶紧扫码报名吧!

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐