0 项目背景

在本系列项目中,我们尝试基于Paddle工具库实现一个OCR垂类场景。原始数据集是一系列电度表的照片,类型较多,需要完成电表的读数识别,对于有编号的电表,还要完成其编号的识别。

1 数据集简介

注:因保密授权原因,数据集尚未公开,待更新

首先,我们来简单看一下数据集的情况。总的来说,这个场景面临几个比较大的问题:

  • 电表类型较多。相比之下,现有数据量(500张)可能不够。
  • 照片角度倾斜较厉害。这个比较好理解,有些电表可能不具备正面拍照条件,有不少图片是从下往上、甚至从左下往右上拍的。
  • 反光严重。反光问题对定位目标框以及识别数字可能都会产生影响。
  • 表号是点阵数字,不易识别。这个问题是标注的时候发现的,有的标注,PPOCRLabel自动识别的四点检测定位其实已经非常准了,但里面的数字识别效果却很离谱。
  • 对检测框精准度要求非常高。电表显示读数的地方附近一般不是空白,往往有单位、字符或是小数点上的读数等,如果检测框没框准,就会把其它可识别项纳进来,如果也是数字,就算加了后处理也处理不掉。

下面,读者可以通过这几张典型图片,初步感受下数据集的基本情况。



2 开发思路

鉴于上面提到的这些问题,该场景的开发几乎是从数据标注就开始陷入纠结。比如是标注一次(PPOCRLabel)还是标注两次(Labelimg标检测框+PPOCRLabel识别finetune)?比如是全程用PPOCR还是PPDET+PPOCR

最后发现,标注似乎可以一次到位,就是使用PPOCRLabel进行标注,然后将OCR标注格式通过规则转换为目标检测的标注。其原因在于,如果单独对数据集进行目标检测标注,等于要标注两次(标注目标框+标注框内内容),相比之下,显然用PPOCRLabel标注为OCR数据集是性价比更高的选择。

对于开发路线,一开始考虑的是两条都试试看,之所以这么考虑,在最初的尝试中,基于PPOCR文本检测模型finetune的效果一直上不去,是因为对OCR模型目标框能否一步检测到位存在疑虑;又根据数据集的实际情况,也考虑过引入PaddleDetection的旋转目标检测模型。

因此,一开始,项目的整体探索思路如下:
https://ai-studio-static-online.cdn.bcebos.com/f8befdf5c9b84bcbb5bb6e36633199708917da0e949044b6b5c395516211d959

2.1 基于PaddleDetection的探索

PPOCR+PPDET电表读数和编号识别项目中,我们跑通了第一条路线,基于标注矩形目标检测的电表识别。在这个部分,追求的是先搞定“有没有”的问题。

其预测效果如下:


从上面的预测结果看来,我们发现直接用矩形框检测也存在问题。由于输入图片会存在歪斜,导致矩形框可能会框住多余的文字,进而影响文字识别效果。

2.2 基于PaddleOCR的全流程打通

现在,在本项目中,我们将实现全程基于PaddleOCR完成电表识别任务。

打通这条流程的前提是,通过“炼丹”,大幅提升了PaddleOCR文本检测模型在电表框框选预测的准确性,使其达到甚至超越了使用PaddleDetection基线的表现。

3 PaddleOCR的文本检测模型

PaddleOCR包含丰富的文本检测、文本识别以及端到端算法。在PaddleOCR的全景图中,我们可以看到PaddleOCR支持的文本检测算法。
https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.4/doc/overview.png

在标注数据的基础上,基于通用的文本检测算法finetune,我们就可以训练一个能将电表识别中的多余文本框自动去除,只留下目标的电表读数、编号的电表文本检测模型。

明确了目标,我们开始下一步的操作。

3.1 训练DB模型

为节省训练时间,这里提供了一个效果不错的预训练模型以及配置文件,读者可以选择基于预训练模型finetune或是从头训练。

AIStudio训练,一定要注意几个重点!

  • 用至尊版!因为原图分辨率太大,目标框相对其实很小,所以输入模型的size太小训练效果不好,而size设大自然需要更多显存
  • use_shared_memory设置为False
  • batch_size_per_card不能设置太大,因为输入size比较大

后面两个tricks如果不照做,训练会闪退。

在本文中,我们直接对configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml文件内容进行下面的替换。

配置文件如下:

Global:
  debug: false
  use_gpu: true
  epoch_num: 1200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/det_dianbiao_v3
  save_epoch_step: 1200
  eval_batch_step:
  - 0
  - 100
  cal_metric_during_train: false
  pretrained_model: my_exps/student.pdparams
  checkpoints: null
  save_inference_dir: null
  use_visualdl: false
  infer_img: M2021/台安站公寓楼段值班房间.jpg
  save_res_path: ./output/det_db/predicts_db.txt
Architecture:
  model_type: det
  algorithm: DB
  Transform: null
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: large
    disable_se: true
  Neck:
    name: DBFPN
    out_channels: 96
  Head:
    name: DBHead
    k: 50
Loss:
  name: DBLoss
  balance_loss: true
  main_loss_type: DiceLoss
  alpha: 5
  beta: 10
  ohem_ratio: 3
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0001
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 0
PostProcess:
  name: DBPostProcess
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5
Metric:
  name: DetMetric
  main_indicator: hmean
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./
    label_file_list:
    - M2021/M2021_label_train.txt
    ratio_list:
    - 1.0
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - CopyPaste: null
    - IaaAugment:
        augmenter_args:
        - type: Fliplr
          args:
            p: 0.5
        - type: Affine
          args:
            rotate:
            - -10
            - 10
        - type: Resize
          args:
            size:
            - 0.5
            - 3
    - EastRandomCropData:
        size:
        - 1600
        - 1600
        max_tries: 50
        keep_ratio: true
    - MakeBorderMap:
        shrink_ratio: 0.4
        thresh_min: 0.3
        thresh_max: 0.7
    - MakeShrinkMap:
        shrink_ratio: 0.4
        min_text_size: 8
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - threshold_map
        - threshold_mask
        - shrink_map
        - shrink_mask
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 4 # 重点!
    num_workers: 4
    use_shared_memory: False # 重点!
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./
    label_file_list:
    - M2021/M2021_label_eval.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - DetResizeForTest:
        limit_side_len: 1280
        limit_type: min
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - shape
        - polys
        - ignore_tags
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 2
    use_shared_memory: False # 重点!
profiler_options: null
!git clone https://gitee.com/paddlepaddle/PaddleOCR.git
# 解压数据集
!unzip -O GB2312  data/data117381/M2021.zip
!cp -r ../M2021 ./M2021
# 安装ppocr
!pip install fasttext==0.8.3
!pip install paddleocr --no-deps -r requirements.txt
%cd PaddleOCR/
/home/aistudio/PaddleOCR
# 提供的预训练模型和配置文件(供参考,直接用不该上面两个注意点,训练会报错)
!tar -xvf ../my_exps.tar -C ./
my_exps/
my_exps/student.pdparams
my_exps/det_dianbiao_size1600_copypaste/
my_exps/det_dianbiao_size1600_copypaste/best_accuracy.pdopt
my_exps/det_dianbiao_size1600_copypaste/config.yml
my_exps/det_dianbiao_size1600_copypaste/train.log
my_exps/det_dianbiao_size1600_copypaste/best_accuracy.pdparams
my_exps/det_dianbiao_size1600_copypaste/best_accuracy.states
my_exps/det_dianbiao_size1600/
my_exps/det_dianbiao_size1600/best_accuracy.pdopt
my_exps/det_dianbiao_size1600/config.yml
my_exps/det_dianbiao_size1600/latest.pdopt
my_exps/det_dianbiao_size1600/train.log
my_exps/det_dianbiao_size1600/latest.pdparams
my_exps/det_dianbiao_size1600/best_accuracy.pdparams
my_exps/det_dianbiao_size1600/latest.states
my_exps/det_dianbiao_size1600/best_accuracy.states
# 从头开始训练
!python tools/train.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml

3.2 模型效果验证

# 也可以查看下提供的模型训练效果
!python tools/eval.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml  -o Global.checkpoints="my_exps/det_dianbiao_size1600_copypaste/best_accuracy"
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/skimage/morphology/_skeletonize.py:241: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  0, 1, 1, 0, 0, 1, 0, 0, 0], dtype=np.bool)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/skimage/morphology/_skeletonize.py:256: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=np.bool)
[2022/01/20 01:40:18] root INFO: Architecture : 
[2022/01/20 01:40:18] root INFO:     Backbone : 
[2022/01/20 01:40:18] root INFO:         disable_se : True
[2022/01/20 01:40:18] root INFO:         model_name : large
[2022/01/20 01:40:18] root INFO:         name : MobileNetV3
[2022/01/20 01:40:18] root INFO:         scale : 0.5
[2022/01/20 01:40:18] root INFO:     Head : 
[2022/01/20 01:40:18] root INFO:         k : 50
[2022/01/20 01:40:18] root INFO:         name : DBHead
[2022/01/20 01:40:18] root INFO:     Neck : 
[2022/01/20 01:40:18] root INFO:         name : DBFPN
[2022/01/20 01:40:18] root INFO:         out_channels : 96
[2022/01/20 01:40:18] root INFO:     Transform : None
[2022/01/20 01:40:18] root INFO:     algorithm : DB
[2022/01/20 01:40:18] root INFO:     model_type : det
[2022/01/20 01:40:18] root INFO: Eval : 
[2022/01/20 01:40:18] root INFO:     dataset : 
[2022/01/20 01:40:18] root INFO:         data_dir : ./
[2022/01/20 01:40:18] root INFO:         label_file_list : ['M2021/M2021_label_eval.txt']
[2022/01/20 01:40:18] root INFO:         name : SimpleDataSet
[2022/01/20 01:40:18] root INFO:         transforms : 
[2022/01/20 01:40:18] root INFO:             DecodeImage : 
[2022/01/20 01:40:18] root INFO:                 channel_first : False
[2022/01/20 01:40:18] root INFO:                 img_mode : BGR
[2022/01/20 01:40:18] root INFO:             DetLabelEncode : None
[2022/01/20 01:40:18] root INFO:             DetResizeForTest : 
[2022/01/20 01:40:18] root INFO:                 limit_side_len : 1280
[2022/01/20 01:40:18] root INFO:                 limit_type : min
[2022/01/20 01:40:18] root INFO:             NormalizeImage : 
[2022/01/20 01:40:18] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/01/20 01:40:18] root INFO:                 order : hwc
[2022/01/20 01:40:18] root INFO:                 scale : 1./255.
[2022/01/20 01:40:18] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/01/20 01:40:18] root INFO:             ToCHWImage : None
[2022/01/20 01:40:18] root INFO:             KeepKeys : 
[2022/01/20 01:40:18] root INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/01/20 01:40:18] root INFO:     loader : 
[2022/01/20 01:40:18] root INFO:         batch_size_per_card : 1
[2022/01/20 01:40:18] root INFO:         drop_last : False
[2022/01/20 01:40:18] root INFO:         num_workers : 2
[2022/01/20 01:40:18] root INFO:         shuffle : False
[2022/01/20 01:40:18] root INFO:         use_shared_memory : False
[2022/01/20 01:40:18] root INFO: Global : 
[2022/01/20 01:40:18] root INFO:     cal_metric_during_train : False
[2022/01/20 01:40:18] root INFO:     checkpoints : my_exps/det_dianbiao_size1600_copypaste/best_accuracy
[2022/01/20 01:40:18] root INFO:     debug : False
[2022/01/20 01:40:18] root INFO:     distributed : False
[2022/01/20 01:40:18] root INFO:     epoch_num : 1200
[2022/01/20 01:40:18] root INFO:     eval_batch_step : [0, 100]
[2022/01/20 01:40:18] root INFO:     infer_img : M2021/台安站公寓楼段值班房间.jpg
[2022/01/20 01:40:18] root INFO:     log_smooth_window : 20
[2022/01/20 01:40:18] root INFO:     pretrained_model : my_exps/student.pdparams
[2022/01/20 01:40:18] root INFO:     print_batch_step : 10
[2022/01/20 01:40:18] root INFO:     save_epoch_step : 1200
[2022/01/20 01:40:18] root INFO:     save_inference_dir : None
[2022/01/20 01:40:18] root INFO:     save_model_dir : ./output/det_dianbiao_v3
[2022/01/20 01:40:18] root INFO:     save_res_path : ./output/det_db/predicts_db.txt
[2022/01/20 01:40:18] root INFO:     use_gpu : True
[2022/01/20 01:40:18] root INFO:     use_visualdl : False
[2022/01/20 01:40:18] root INFO: Loss : 
[2022/01/20 01:40:18] root INFO:     alpha : 5
[2022/01/20 01:40:18] root INFO:     balance_loss : True
[2022/01/20 01:40:18] root INFO:     beta : 10
[2022/01/20 01:40:18] root INFO:     main_loss_type : DiceLoss
[2022/01/20 01:40:18] root INFO:     name : DBLoss
[2022/01/20 01:40:18] root INFO:     ohem_ratio : 3
[2022/01/20 01:40:18] root INFO: Metric : 
[2022/01/20 01:40:18] root INFO:     main_indicator : hmean
[2022/01/20 01:40:18] root INFO:     name : DetMetric
[2022/01/20 01:40:18] root INFO: Optimizer : 
[2022/01/20 01:40:18] root INFO:     beta1 : 0.9
[2022/01/20 01:40:18] root INFO:     beta2 : 0.999
[2022/01/20 01:40:18] root INFO:     lr : 
[2022/01/20 01:40:18] root INFO:         learning_rate : 0.0001
[2022/01/20 01:40:18] root INFO:         name : Cosine
[2022/01/20 01:40:18] root INFO:         warmup_epoch : 2
[2022/01/20 01:40:18] root INFO:     name : Adam
[2022/01/20 01:40:18] root INFO:     regularizer : 
[2022/01/20 01:40:18] root INFO:         factor : 0
[2022/01/20 01:40:18] root INFO:         name : L2
[2022/01/20 01:40:18] root INFO: PostProcess : 
[2022/01/20 01:40:18] root INFO:     box_thresh : 0.6
[2022/01/20 01:40:18] root INFO:     max_candidates : 1000
[2022/01/20 01:40:18] root INFO:     name : DBPostProcess
[2022/01/20 01:40:18] root INFO:     thresh : 0.3
[2022/01/20 01:40:18] root INFO:     unclip_ratio : 1.5
[2022/01/20 01:40:18] root INFO: Train : 
[2022/01/20 01:40:18] root INFO:     dataset : 
[2022/01/20 01:40:18] root INFO:         data_dir : ./
[2022/01/20 01:40:18] root INFO:         label_file_list : ['M2021/M2021_label_train.txt']
[2022/01/20 01:40:18] root INFO:         name : SimpleDataSet
[2022/01/20 01:40:18] root INFO:         ratio_list : [1.0]
[2022/01/20 01:40:18] root INFO:         transforms : 
[2022/01/20 01:40:18] root INFO:             DecodeImage : 
[2022/01/20 01:40:18] root INFO:                 channel_first : False
[2022/01/20 01:40:18] root INFO:                 img_mode : BGR
[2022/01/20 01:40:18] root INFO:             DetLabelEncode : None
[2022/01/20 01:40:18] root INFO:             CopyPaste : None
[2022/01/20 01:40:18] root INFO:             IaaAugment : 
[2022/01/20 01:40:18] root INFO:                 augmenter_args : 
[2022/01/20 01:40:18] root INFO:                     args : 
[2022/01/20 01:40:18] root INFO:                         p : 0.5
[2022/01/20 01:40:18] root INFO:                     type : Fliplr
[2022/01/20 01:40:18] root INFO:                     args : 
[2022/01/20 01:40:18] root INFO:                         rotate : [-10, 10]
[2022/01/20 01:40:18] root INFO:                     type : Affine
[2022/01/20 01:40:18] root INFO:                     args : 
[2022/01/20 01:40:18] root INFO:                         size : [0.5, 3]
[2022/01/20 01:40:18] root INFO:                     type : Resize
[2022/01/20 01:40:18] root INFO:             EastRandomCropData : 
[2022/01/20 01:40:18] root INFO:                 keep_ratio : True
[2022/01/20 01:40:18] root INFO:                 max_tries : 50
[2022/01/20 01:40:18] root INFO:                 size : [1600, 1600]
[2022/01/20 01:40:18] root INFO:             MakeBorderMap : 
[2022/01/20 01:40:18] root INFO:                 shrink_ratio : 0.4
[2022/01/20 01:40:18] root INFO:                 thresh_max : 0.7
[2022/01/20 01:40:18] root INFO:                 thresh_min : 0.3
[2022/01/20 01:40:18] root INFO:             MakeShrinkMap : 
[2022/01/20 01:40:18] root INFO:                 min_text_size : 8
[2022/01/20 01:40:18] root INFO:                 shrink_ratio : 0.4
[2022/01/20 01:40:18] root INFO:             NormalizeImage : 
[2022/01/20 01:40:18] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/01/20 01:40:18] root INFO:                 order : hwc
[2022/01/20 01:40:18] root INFO:                 scale : 1./255.
[2022/01/20 01:40:18] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/01/20 01:40:18] root INFO:             ToCHWImage : None
[2022/01/20 01:40:18] root INFO:             KeepKeys : 
[2022/01/20 01:40:18] root INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/01/20 01:40:18] root INFO:     loader : 
[2022/01/20 01:40:18] root INFO:         batch_size_per_card : 4
[2022/01/20 01:40:18] root INFO:         drop_last : False
[2022/01/20 01:40:18] root INFO:         num_workers : 4
[2022/01/20 01:40:18] root INFO:         shuffle : True
[2022/01/20 01:40:18] root INFO:         use_shared_memory : False
[2022/01/20 01:40:18] root INFO: profiler_options : None
[2022/01/20 01:40:18] root INFO: train with paddle 2.1.2 and device CUDAPlace(0)
[2022/01/20 01:40:18] root INFO: Initialize indexs of datasets:['M2021/M2021_label_eval.txt']
W0120 01:40:18.252579 10189 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0120 01:40:18.257786 10189 device_context.cc:422] device: 0, cuDNN Version: 7.6.
[2022/01/20 01:40:23] root INFO: resume from my_exps/det_dianbiao_size1600_copypaste/best_accuracy
[2022/01/20 01:40:23] root INFO: metric in ckpt ***************
[2022/01/20 01:40:23] root INFO: hmean:0.8543046357615895
[2022/01/20 01:40:23] root INFO: precision:0.7914110429447853
[2022/01/20 01:40:23] root INFO: recall:0.9280575539568345
[2022/01/20 01:40:23] root INFO: fps:3.2759908010222296
[2022/01/20 01:40:23] root INFO: best_epoch:138
[2022/01/20 01:40:23] root INFO: start_epoch:139
eval model:: 100%|██████████████████████████████| 70/70 [01:12<00:00,  1.04s/it]
[2022/01/20 01:41:36] root INFO: metric eval ***************
[2022/01/20 01:41:36] root INFO: precision:0.7081081081081081
[2022/01/20 01:41:36] root INFO: recall:0.9225352112676056
[2022/01/20 01:41:36] root INFO: hmean:0.801223241590214
[2022/01/20 01:41:36] root INFO: fps:2.3143317331617412
!python tools/infer_det.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml -o Global.infer_img="./M2021/台安站公寓楼段值班房间.jpg" -o Global.checkpoints="my_exps/det_dianbiao_size1600_copypaste/best_accuracy"

效果非常棒!接下来,就是串接检测模型和识别模型了。

3.3 模型导出和串接

这里用了个比较取巧的方式,先将模型导出,然后把whl下预测用的检测模型用新训练的模型直接替换掉,就可以看到finetune后的检测效果了!

# 模型导出
!python tools/export_model.py -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml -o Global.pretrained_model=./my_exps/det_dianbiao_size1600_copypaste/best_accuracy Global.save_inference_dir=./inference/det_db
from paddleocr import PaddleOCR, draw_ocr
# 模型路径下必须含有model和params文件
ocr = PaddleOCR(det_model_dir='./inference/det_db', 
                use_angle_cls=True)
img_path = './M2021/台安站公寓楼段值班房间.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


[2022/01/20 02:12:49] root WARNING: version 2.1 not support cls models, use version 2.0 instead
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.2.1/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='./inference/det_db', det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/aistudio/PaddleOCR/ppocr/utils/ppocr_keys_v1.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.2.1/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=True, table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, version='2.1', vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
[2022/01/20 02:12:52] root DEBUG: dt_boxes num : 4, elapse : 0.048544883728027344
[2022/01/20 02:12:52] root DEBUG: cls num  : 4, elapse : 0.0068705081939697266
[2022/01/20 02:12:52] root DEBUG: rec_res num  : 4, elapse : 0.019389867782592773
[[[1359.0, 1911.0], [2065.0, 1890.0], [2068.0, 1991.0], [1362.0, 2012.0]], ('2013207034088', 0.91313225)]
[[[1576.0, 2011.0], [1689.0, 2011.0], [1689.0, 2069.0], [1576.0, 2069.0]], ('WW', 0.5822574)]
[[[1065.0, 2131.0], [2008.0, 2131.0], [2008.0, 2366.0], [1065.0, 2366.0]], ('3127', 0.98005736)]
# 显示结果
from PIL import Image

image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

对于多识别到的内容,有两个方式将其处理掉:

  • 阈值调整
  • 将非数字的内容后处理掉

4 小结

现在,基于PaddleOCR文本检测模型微调的这条路线,也跑通了。接下来我们将在新增数据集基础上,研究旋转目标检测和识别模型的finetune,尽请期待~

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐