转自AI Studio,原文链接:飞桨常规赛:中文场景文字识别 - 12月第3名方案 - 飞桨AI Studio

常规赛:2021年12月中文场景文字识别-第3名技术方案分享

  • 本项目为常规赛:中文场景文字识别 2021年12月份第3名的技术方案分享项目。最终得分为84.22158。

  • 本项目使用PaddleOCR-develop(静态图版本),PaddleOCR主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成,本次中文场景文本识别只需要使用第三阶段的文本识别器即可。采用CRNN文本识别模型作为baseline,更多关于PaddleOCR的信息详见PaddleOCR-develop

  • 下面将从环境安装、数据处理、模型调优、训练与预测四个方面进行介绍。

引言

(1) 简介

针对于一些场景下的文字识别问题,考虑使用PaddleOCR对其进行解决。PaddleOCR是一个文字识别模型套件,很适用于该问题的解决。本项目首先使用git安装相应的PaddleOCR环境,然后,进行数据处理操作,包含解压数据集、数据准备以及图像数据增强等等。进而,进行模型参数配置,训练模型,导出模型,并预测结果。然后,使用数据结构算法对结果预测txt文件按照比赛提交要求进行调整排序。最终,对本项目的经验做出总结并展望接下来的改进方向,为其他选手提供参考建议。

注:本项目中有一些步骤需要手动添加配置(例如把本项目提供的代码复制到相应的配置文件中)

(2) 比赛介绍

  • 中文场景文字识别技术在人们的日常生活中受到广泛关注,具有丰富的应用场景,如:拍照翻译、图像检索、场景理解等。然而,中文场景中的文字面临着包括光照变化、低分辨率、字体以及排布多样性、中文字符种类多等复杂情况。如何解决上述问题成为一项极具挑战性的任务。
  • 中文场景文字识别常规赛全新升级,提供轻量级中文场景文字识别数据,要求选手使用飞桨框架对图像区域中的文字行进行预测,并返回文字行的内容。

(3) 赛题重点难点

  • 本次比赛的重点是考验选手对工具的选择。对于本赛题,PaddleOCR是一个很好的选择,PP-OCR是在飞桨Paddle平台上发布的一种实用的超轻量级OCR系统,该系统由文本检测、检测框校正和文本识别三部分组成。
  • 因此如何熟练使用PaddleOCR成为本次比赛的难点。
  • 另外,想要取得令人满意的分数,数据增广以及合理的调参必不可少,同样也是本次比赛的难点。

(4) 数据介绍

  • 本次赛题数据集共包括6万张图片,其中5万张图片作为训练集,1万张作为测试集。数据集采自中国街景,并由街景图片中的文字行区域(例如店铺标牌、地标等等)截取出来而形成。数据集中所有图像都经过一些预处理。

  • 标注文件:平台提供的标注文件为.csv文件格式,文件中的四列分别为图片的宽、高、文件名和文字标注。

  • 备注: 仅可使用比赛提供数据进行训练,不允许使用其他开源数据集进行训练。

(5) 个人方案亮点

  • 本项目使用text_render进行数据增广,对配置文件进行相应调整。
  • 另外,对参数调优进行重点考虑,合理调整相应超参数等等。
  • 本项目还根据比赛提交要求,写了一个python文件,对txt文件乱序内容按照比赛要求进行排序,并生成结果文件。

一、环境安装

1.1 安装PaddleOCR

  • AI Studio已经提供了paddlepaddle1.8.4及python3.7的环境,因此只需要参考官方教程安装PaddleOCR即可。

In [1]

!cd ~/work && git clone -b develop https://gitee.com/paddlepaddle/PaddleOCR.git
Cloning into 'PaddleOCR'...
remote: Enumerating objects: 31443, done.
remote: Counting objects: 100% (5822/5822), done.
remote: Compressing objects: 100% (2358/2358), done.
remote: Total 31443 (delta 3940), reused 5082 (delta 3373), pack-reused 25621
Receiving objects: 100% (31443/31443), 258.73 MiB | 37.88 MiB/s, done.
Resolving deltas: 100% (21801/21801), done.
Checking connectivity... done.

In [2]

!cd ~/work/PaddleOCR && pip install -r requirements.txt && python setup.py install
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting shapely
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9d/4d/4b0d86ed737acb29c5e627a91449470a9fb914f32640db3f1cb7ba5bc19e/Shapely-1.8.1.post1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 24.7 MB/s eta 0:00:0000:0100:01
Collecting imgaug
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 948.0/948.0 KB 214.2 kB/s eta 0:00:0000:0100:01
Collecting pyclipper
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/fa/2c294127e4f88967149a68ad5b3e43636e94e3721109572f8f17ab15b772/pyclipper-1.3.0.post2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (603 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 603.5/603.5 KB 76.5 kB/s eta 0:00:0000:0100:01
Collecting lmdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4d/cf/3230b1c9b0bec406abb85a9332ba5805bdd03a1d24025c6bbcfb8ed71539/lmdb-1.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (298 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.8/298.8 KB 37.1 kB/s eta 0:00:0000:0100:01
Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (4.36.1)
Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (1.16.4)
Collecting opencv-python==4.2.0.32
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/34/a3/403dbaef909fee9f9f6a8eaff51d44085a14e5bb1a1ff7257117d744986a/opencv_python-4.2.0.32-cp37-cp37m-manylinux1_x86_64.whl (28.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.2/28.2 MB 1.9 MB/s eta 0:00:00:00:0100:01
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug->-r requirements.txt (line 2)) (2.2.3)
Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug->-r requirements.txt (line 2)) (1.16.0)
Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug->-r requirements.txt (line 2)) (7.1.2)
Collecting scikit-image>=0.14.2
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d2/d9/d16d4cbb4840e0fb3bd329b49184d240b82b649e1bd579489394fbc85c81/scikit_image-0.19.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 1.3 MB/s eta 0:00:00:00:0100:01
Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug->-r requirements.txt (line 2)) (1.3.0)
Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug->-r requirements.txt (line 2)) (2.6.1)
Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image>=0.14.2->imgaug->-r requirements.txt (line 2)) (21.3)
Collecting numpy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6d/ad/ff3b21ebfe79a4d25b4a4f8e5cf9fd44a204adb6b33c09010f566f51027a/numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 2.5 MB/s eta 0:00:00:00:0100:01
Collecting tifffile>=2019.7.26
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 KB 24.0 kB/s eta 0:00:00a 0:00:01
Collecting PyWavelets>=1.1.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/56/4441877073d8a5266dbf7b04c7f3dc66f1149c8efb9323e0ef987a9bb1ce/PyWavelets-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 833.4 kB/s eta 0:00:000:0100:01
Requirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image>=0.14.2->imgaug->-r requirements.txt (line 2)) (2.4)
Collecting scipy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/58/4f/11f34cfc57ead25752a7992b069c36f5d18421958ebd6466ecd849aeaf86/scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 1.6 MB/s eta 0:00:00:00:0100:01
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug->-r requirements.txt (line 2)) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug->-r requirements.txt (line 2)) (3.0.7)
Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug->-r requirements.txt (line 2)) (2022.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug->-r requirements.txt (line 2)) (1.1.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug->-r requirements.txt (line 2)) (2.8.2)
Requirement already satisfied: setuptools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->imgaug->-r requirements.txt (line 2)) (41.4.0)
Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image>=0.14.2->imgaug->-r requirements.txt (line 2)) (4.4.0)
Installing collected packages: pyclipper, lmdb, shapely, numpy, tifffile, scipy, PyWavelets, opencv-python, scikit-image, imgaug
  Attempting uninstall: numpy
    Found existing installation: numpy 1.16.4
    Uninstalling numpy-1.16.4:
      Successfully uninstalled numpy-1.16.4
  Attempting uninstall: scipy
    Found existing installation: scipy 1.3.0
    Uninstalling scipy-1.3.0:
      Successfully uninstalled scipy-1.3.0
  Attempting uninstall: opencv-python
    Found existing installation: opencv-python 4.1.1.26
    Uninstalling opencv-python-4.1.1.26:
      Successfully uninstalled opencv-python-4.1.1.26
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
parl 1.4.1 requires pyzmq==18.1.1, but you have pyzmq 22.3.0 which is incompatible.
Successfully installed PyWavelets-1.3.0 imgaug-0.4.0 lmdb-1.3.0 numpy-1.21.6 opencv-python-4.2.0.32 pyclipper-1.3.0.post2 scikit-image-0.19.2 scipy-1.7.3 shapely-1.8.1.post1 tifffile-2021.11.2
running install
running bdist_egg
running egg_info
creating paddleocr.egg-info
writing paddleocr.egg-info/PKG-INFO
writing dependency_links to paddleocr.egg-info/dependency_links.txt
writing entry points to paddleocr.egg-info/entry_points.txt
writing requirements to paddleocr.egg-info/requires.txt
writing top-level names to paddleocr.egg-info/top_level.txt
writing manifest file 'paddleocr.egg-info/SOURCES.txt'
reading manifest file 'paddleocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'LICENSE.txt'
writing manifest file 'paddleocr.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/paddleocr
copying paddleocr.py -> build/lib/paddleocr
copying __init__.py -> build/lib/paddleocr
copying MANIFEST.in -> build/lib/paddleocr
copying README.md -> build/lib/paddleocr
creating build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/PKG-INFO -> build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/SOURCES.txt -> build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/dependency_links.txt -> build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/entry_points.txt -> build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/requires.txt -> build/lib/paddleocr/paddleocr.egg-info
copying paddleocr.egg-info/top_level.txt -> build/lib/paddleocr/paddleocr.egg-info
creating build/lib/paddleocr/ppocr
creating build/lib/paddleocr/ppocr/data
creating build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/__init__.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/data_augment.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/dataset_traversal.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/db_process.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/east_process.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/make_border_map.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/make_shrink_map.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/random_crop_data.py -> build/lib/paddleocr/ppocr/data/det
copying ppocr/data/det/sast_process.py -> build/lib/paddleocr/ppocr/data/det
creating build/lib/paddleocr/ppocr/postprocess
copying ppocr/postprocess/__init__.py -> build/lib/paddleocr/ppocr/postprocess
copying ppocr/postprocess/db_postprocess.py -> build/lib/paddleocr/ppocr/postprocess
copying ppocr/postprocess/east_postprocess.py -> build/lib/paddleocr/ppocr/postprocess
copying ppocr/postprocess/locality_aware_nms.py -> build/lib/paddleocr/ppocr/postprocess
copying ppocr/postprocess/sast_postprocess.py -> build/lib/paddleocr/ppocr/postprocess
creating build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/.gitignore -> build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/.ycm_extra_conf.py -> build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/__init__.py -> build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/__main__.py -> build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/adaptor.cpp -> build/lib/paddleocr/ppocr/postprocess/lanms
copying ppocr/postprocess/lanms/lanms.h -> build/lib/paddleocr/ppocr/postprocess/lanms
creating build/lib/paddleocr/ppocr/postprocess/lanms/include
creating build/lib/paddleocr/ppocr/postprocess/lanms/include/clipper
copying ppocr/postprocess/lanms/include/clipper/clipper.cpp -> build/lib/paddleocr/ppocr/postprocess/lanms/include/clipper
copying ppocr/postprocess/lanms/include/clipper/clipper.hpp -> build/lib/paddleocr/ppocr/postprocess/lanms/include/clipper
creating build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/attr.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/buffer_info.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/cast.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/chrono.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/class_support.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/common.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/complex.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/descr.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/eigen.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/embed.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/eval.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/functional.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/numpy.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/operators.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/options.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/pybind11.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/pytypes.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/stl.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/stl_bind.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying ppocr/postprocess/lanms/include/pybind11/typeid.h -> build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11
creating build/lib/paddleocr/ppocr/utils
copying ppocr/utils/character.py -> build/lib/paddleocr/ppocr/utils
copying ppocr/utils/check.py -> build/lib/paddleocr/ppocr/utils
copying ppocr/utils/ic15_dict.txt -> build/lib/paddleocr/ppocr/utils
copying ppocr/utils/ppocr_keys_v1.txt -> build/lib/paddleocr/ppocr/utils
copying ppocr/utils/utility.py -> build/lib/paddleocr/ppocr/utils
creating build/lib/paddleocr/ppocr/utils/corpus
copying ppocr/utils/corpus/occitan_corpus.txt -> build/lib/paddleocr/ppocr/utils/corpus
creating build/lib/paddleocr/ppocr/utils/dict
copying ppocr/utils/dict/french_dict.txt -> build/lib/paddleocr/ppocr/utils/dict
copying ppocr/utils/dict/german_dict.txt -> build/lib/paddleocr/ppocr/utils/dict
copying ppocr/utils/dict/japan_dict.txt -> build/lib/paddleocr/ppocr/utils/dict
copying ppocr/utils/dict/korean_dict.txt -> build/lib/paddleocr/ppocr/utils/dict
copying ppocr/utils/dict/occitan_dict.txt -> build/lib/paddleocr/ppocr/utils/dict
creating build/lib/paddleocr/tools
creating build/lib/paddleocr/tools/infer
copying tools/infer/__init__.py -> build/lib/paddleocr/tools/infer
copying tools/infer/predict_cls.py -> build/lib/paddleocr/tools/infer
copying tools/infer/predict_det.py -> build/lib/paddleocr/tools/infer
copying tools/infer/predict_rec.py -> build/lib/paddleocr/tools/infer
copying tools/infer/predict_system.py -> build/lib/paddleocr/tools/infer
copying tools/infer/utility.py -> build/lib/paddleocr/tools/infer
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/paddleocr
copying build/lib/paddleocr/paddleocr.py -> build/bdist.linux-x86_64/egg/paddleocr
copying build/lib/paddleocr/MANIFEST.in -> build/bdist.linux-x86_64/egg/paddleocr
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
copying build/lib/paddleocr/ppocr/postprocess/locality_aware_nms.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
copying build/lib/paddleocr/ppocr/postprocess/sast_postprocess.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
copying build/lib/paddleocr/ppocr/postprocess/db_postprocess.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/lanms/adaptor.cpp -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/lanms/lanms.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/clipper
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/clipper/clipper.hpp -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/clipper
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/clipper/clipper.cpp -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/clipper
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/common.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/eigen.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/options.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/typeid.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/attr.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/descr.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/pybind11.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/stl.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/buffer_info.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/complex.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/cast.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/pytypes.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/operators.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/functional.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/eval.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/embed.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/stl_bind.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/class_support.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/chrono.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/include/pybind11/numpy.h -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/include/pybind11
copying build/lib/paddleocr/ppocr/postprocess/lanms/.ycm_extra_conf.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/lanms/.gitignore -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/lanms/__main__.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/lanms/__init__.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms
copying build/lib/paddleocr/ppocr/postprocess/east_postprocess.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
copying build/lib/paddleocr/ppocr/postprocess/__init__.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
copying build/lib/paddleocr/ppocr/utils/dict/occitan_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
copying build/lib/paddleocr/ppocr/utils/dict/korean_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
copying build/lib/paddleocr/ppocr/utils/dict/french_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
copying build/lib/paddleocr/ppocr/utils/dict/german_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
copying build/lib/paddleocr/ppocr/utils/dict/japan_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/dict
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/corpus
copying build/lib/paddleocr/ppocr/utils/corpus/occitan_corpus.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/corpus
copying build/lib/paddleocr/ppocr/utils/check.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
copying build/lib/paddleocr/ppocr/utils/ppocr_keys_v1.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
copying build/lib/paddleocr/ppocr/utils/utility.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
copying build/lib/paddleocr/ppocr/utils/character.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
copying build/lib/paddleocr/ppocr/utils/ic15_dict.txt -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/data
creating build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/make_shrink_map.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/random_crop_data.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/sast_process.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/db_process.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/make_border_map.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/dataset_traversal.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/data_augment.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/east_process.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
copying build/lib/paddleocr/ppocr/data/det/__init__.py -> build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det
creating build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
copying build/lib/paddleocr/paddleocr.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/paddleocr/paddleocr.egg-info
creating build/bdist.linux-x86_64/egg/paddleocr/tools
creating build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/predict_rec.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/utility.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/predict_system.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/predict_cls.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/predict_det.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/tools/infer/__init__.py -> build/bdist.linux-x86_64/egg/paddleocr/tools/infer
copying build/lib/paddleocr/README.md -> build/bdist.linux-x86_64/egg/paddleocr
copying build/lib/paddleocr/__init__.py -> build/bdist.linux-x86_64/egg/paddleocr
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/paddleocr.py to paddleocr.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/locality_aware_nms.py to locality_aware_nms.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/sast_postprocess.py to sast_postprocess.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/db_postprocess.py to db_postprocess.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/.ycm_extra_conf.py to .ycm_extra_conf.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/__main__.py to __main__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/lanms/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/east_postprocess.py to east_postprocess.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/postprocess/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/check.py to check.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/utility.py to utility.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/utils/character.py to character.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/make_shrink_map.py to make_shrink_map.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/random_crop_data.py to random_crop_data.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/sast_process.py to sast_process.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/db_process.py to db_process.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/make_border_map.py to make_border_map.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/dataset_traversal.py to dataset_traversal.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/data_augment.py to data_augment.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/east_process.py to east_process.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/ppocr/data/det/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/predict_rec.py to predict_rec.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/utility.py to utility.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/predict_system.py to predict_system.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/predict_cls.py to predict_cls.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/predict_det.py to predict_det.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/tools/infer/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/paddleocr/__init__.py to __init__.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying paddleocr.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
paddleocr.__pycache__.paddleocr.cpython-37: module references __file__
paddleocr.ppocr.postprocess.__pycache__.east_postprocess.cpython-37: module references __file__
paddleocr.ppocr.postprocess.__pycache__.sast_postprocess.cpython-37: module references __file__
paddleocr.ppocr.postprocess.lanms.__pycache__..ycm_extra_conf.cpython-37: module references __file__
paddleocr.ppocr.postprocess.lanms.__pycache__.__init__.cpython-37: module references __file__
paddleocr.tools.infer.__pycache__.predict_cls.cpython-37: module references __file__
paddleocr.tools.infer.__pycache__.predict_det.cpython-37: module references __file__
paddleocr.tools.infer.__pycache__.predict_rec.cpython-37: module references __file__
paddleocr.tools.infer.__pycache__.predict_system.cpython-37: module references __file__
creating dist
creating 'dist/paddleocr-1.1.2-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing paddleocr-1.1.2-py3.7.egg
creating /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr-1.1.2-py3.7.egg
Extracting paddleocr-1.1.2-py3.7.egg to /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Adding paddleocr 1.1.2 to easy-install.pth file
Installing paddleocr script to /opt/conda/envs/python35-paddle120-env/bin

Installed /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr-1.1.2-py3.7.egg
Processing dependencies for paddleocr==1.1.2
Searching for tqdm==4.36.1
Best match: tqdm 4.36.1
Adding tqdm 4.36.1 to easy-install.pth file
Installing tqdm script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for opencv-python==4.2.0.32
Best match: opencv-python 4.2.0.32
Adding opencv-python 4.2.0.32 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for numpy==1.21.6
Best match: numpy 1.21.6
Adding numpy 1.21.6 to easy-install.pth file
Installing f2py script to /opt/conda/envs/python35-paddle120-env/bin
Installing f2py3 script to /opt/conda/envs/python35-paddle120-env/bin
Installing f2py3.7 script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for lmdb==1.3.0
Best match: lmdb 1.3.0
Adding lmdb 1.3.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for pyclipper==1.3.0.post2
Best match: pyclipper 1.3.0.post2
Adding pyclipper 1.3.0.post2 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for imgaug==0.4.0
Best match: imgaug 0.4.0
Adding imgaug 0.4.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for Shapely==1.8.1.post1
Best match: Shapely 1.8.1.post1
Adding Shapely 1.8.1.post1 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for Pillow==7.1.2
Best match: Pillow 7.1.2
Adding Pillow 7.1.2 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for scipy==1.7.3
Best match: scipy 1.7.3
Adding scipy 1.7.3 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for imageio==2.6.1
Best match: imageio 2.6.1
Adding imageio 2.6.1 to easy-install.pth file
Installing imageio_download_bin script to /opt/conda/envs/python35-paddle120-env/bin
Installing imageio_remove_bin script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for scikit-image==0.19.2
Best match: scikit-image 0.19.2
Adding scikit-image 0.19.2 to easy-install.pth file
Installing skivi script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for matplotlib==2.2.3
Best match: matplotlib 2.2.3
Adding matplotlib 2.2.3 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for six==1.16.0
Best match: six 1.16.0
Adding six 1.16.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for PyWavelets==1.3.0
Best match: PyWavelets 1.3.0
Adding PyWavelets 1.3.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for packaging==21.3
Best match: packaging 21.3
Adding packaging 21.3 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for tifffile==2021.11.2
Best match: tifffile 2021.11.2
Adding tifffile 2021.11.2 to easy-install.pth file
Installing lsm2bin script to /opt/conda/envs/python35-paddle120-env/bin
Installing tiff2fsspec script to /opt/conda/envs/python35-paddle120-env/bin
Installing tiffcomment script to /opt/conda/envs/python35-paddle120-env/bin
Installing tifffile script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for networkx==2.4
Best match: networkx 2.4
Adding networkx 2.4 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for cycler==0.10.0
Best match: cycler 0.10.0
Adding cycler 0.10.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for pyparsing==3.0.7
Best match: pyparsing 3.0.7
Adding pyparsing 3.0.7 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for pytz==2022.1
Best match: pytz 2022.1
Adding pytz 2022.1 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for kiwisolver==1.1.0
Best match: kiwisolver 1.1.0
Adding kiwisolver 1.1.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for decorator==4.4.0
Best match: decorator 4.4.0
Adding decorator 4.4.0 to easy-install.pth file

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Searching for setuptools==41.4.0
Best match: setuptools 41.4.0
Adding setuptools 41.4.0 to easy-install.pth file
Installing easy_install script to /opt/conda/envs/python35-paddle120-env/bin

Using /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages
Finished processing dependencies for paddleocr==1.1.2

二、数据处理

2.1 解压数据集

In [3]

!cd ~/data/data62842/ && unzip train_images.zip
!cd ~/data/data62843/ && unzip test_images.zip
  inflating: test_images/3359.jpg    
  inflating: test_images/4436.jpg    
  inflating: test_images/5728.jpg    
  inflating: test_images/2047.jpg    
  inflating: test_images/6221.jpg    
  inflating: test_images/9112.jpg    
  inflating: test_images/7659.jpg    
  inflating: test_images/964.jpg     
  inflating: test_images/6547.jpg    
  inflating: test_images/9674.jpg    
  inflating: test_images/7881.jpg    
  inflating: test_images/1228.jpg    
  inflating: test_images/2721.jpg    
  inflating: test_images/4350.jpg    
  inflating: test_images/2735.jpg    
  inflating: test_images/4344.jpg    
  inflating: test_images/970.jpg     
  inflating: test_images/6553.jpg    
  inflating: test_images/9660.jpg    
  inflating: test_images/7895.jpg    
  inflating: test_images/6235.jpg    
  inflating: test_images/9106.jpg    
  inflating: test_images/8218.jpg    
  inflating: test_images/4422.jpg    
  inflating: test_images/2053.jpg    
  inflating: test_images/743.jpg     
  inflating: test_images/9853.jpg    
  inflating: test_images/8595.jpg    
  inflating: test_images/2906.jpg    
  inflating: test_images/2912.jpg    
  inflating: test_images/757.jpg     
  inflating: test_images/9847.jpg    
  inflating: test_images/8581.jpg    
  inflating: test_images/2084.jpg    
  inflating: test_images/5933.jpg    
  inflating: test_images/4393.jpg    
  inflating: test_images/6584.jpg    
  inflating: test_images/7842.jpg    
  inflating: test_images/6590.jpg    
  inflating: test_images/7856.jpg    
  inflating: test_images/4387.jpg    
  inflating: test_images/5099.jpg    
  inflating: test_images/2090.jpg    
  inflating: test_images/5927.jpg    
  inflating: test_images/1599.jpg    
  inflating: test_images/5098.jpg    
  inflating: test_images/4386.jpg    
  inflating: test_images/7857.jpg    
  inflating: test_images/6591.jpg    
  inflating: test_images/1598.jpg    
  inflating: test_images/5926.jpg    
  inflating: test_images/2091.jpg    
  inflating: test_images/5932.jpg    
  inflating: test_images/2085.jpg    
  inflating: test_images/7843.jpg    
  inflating: test_images/6585.jpg    
  inflating: test_images/4392.jpg    
  inflating: test_images/8580.jpg    
  inflating: test_images/9846.jpg    
  inflating: test_images/756.jpg     
  inflating: test_images/2913.jpg    
  inflating: test_images/2907.jpg    
  inflating: test_images/8594.jpg    
  inflating: test_images/9852.jpg    
  inflating: test_images/742.jpg     
  inflating: test_images/7894.jpg    
  inflating: test_images/9661.jpg    
  inflating: test_images/6552.jpg    
  inflating: test_images/971.jpg     
  inflating: test_images/4345.jpg    
  inflating: test_images/2734.jpg    
  inflating: test_images/2052.jpg    
  inflating: test_images/4423.jpg    
  inflating: test_images/8219.jpg    
  inflating: test_images/9107.jpg    
  inflating: test_images/6234.jpg    
  inflating: test_images/9113.jpg    
  inflating: test_images/6220.jpg    
  inflating: test_images/2046.jpg    
  inflating: test_images/5729.jpg    
  inflating: test_images/4437.jpg    
  inflating: test_images/3358.jpg    
  inflating: test_images/4351.jpg    
  inflating: test_images/2720.jpg    
  inflating: test_images/1229.jpg    
  inflating: test_images/7880.jpg    
  inflating: test_images/9675.jpg    
  inflating: test_images/6546.jpg    
  inflating: test_images/965.jpg     
  inflating: test_images/7658.jpg    
  inflating: test_images/3416.jpg    
  inflating: test_images/4379.jpg    
  inflating: test_images/5067.jpg    
  inflating: test_images/2708.jpg    
  inflating: test_images/1201.jpg    
  inflating: test_images/795.jpg     
  inflating: test_images/7670.jpg    
  inflating: test_images/8543.jpg    
  inflating: test_images/9885.jpg    
  inflating: test_images/7116.jpg    
  inflating: test_images/8225.jpg    
  inflating: test_images/6208.jpg    
  inflating: test_images/1567.jpg    
  inflating: test_images/5701.jpg    
  inflating: test_images/3370.jpg    
  inflating: test_images/5715.jpg    
  inflating: test_images/3364.jpg    
  inflating: test_images/7102.jpg    
  inflating: test_images/8231.jpg    
  inflating: test_images/1573.jpg    
  inflating: test_images/781.jpg     
  inflating: test_images/1215.jpg    
  inflating: test_images/9649.jpg    
  inflating: test_images/7664.jpg    
  inflating: test_images/8557.jpg    
  inflating: test_images/959.jpg     
  inflating: test_images/9891.jpg    
  inflating: test_images/3402.jpg    
  inflating: test_images/5073.jpg    
  inflating: test_images/2293.jpg    
  inflating: test_images/1942.jpg    
  inflating: test_images/6793.jpg    
  inflating: test_images/568.jpg     
  inflating: test_images/8966.jpg    
  inflating: test_images/4184.jpg    
  inflating: test_images/3833.jpg    
  inflating: test_images/4190.jpg    
  inflating: test_images/3827.jpg    
  inflating: test_images/6787.jpg    
  inflating: test_images/8972.jpg    
  inflating: test_images/7499.jpg    
  inflating: test_images/1956.jpg    
  inflating: test_images/2287.jpg    
  inflating: test_images/3199.jpg    
  inflating: test_images/232.jpg     
  inflating: test_images/4806.jpg    
  inflating: test_images/554.jpg     
  inflating: test_images/6977.jpg    
  inflating: test_images/8782.jpg    
  inflating: test_images/9488.jpg    
  inflating: test_images/540.jpg     
  inflating: test_images/6963.jpg    
  inflating: test_images/8796.jpg    
  inflating: test_images/4812.jpg    
  inflating: test_images/226.jpg     
  inflating: test_images/1759.jpg    
  inflating: test_images/6036.jpg    
  inflating: test_images/9305.jpg    
  inflating: test_images/7328.jpg    
  inflating: test_images/1981.jpg    
  inflating: test_images/4621.jpg    
  inflating: test_images/2250.jpg    
  inflating: test_images/2536.jpg    
  inflating: test_images/5259.jpg    
  inflating: test_images/4147.jpg    
  inflating: test_images/3628.jpg    
  inflating: test_images/6988.jpg    
  inflating: test_images/6750.jpg    
  inflating: test_images/9463.jpg    
  inflating: test_images/8769.jpg    
  inflating: test_images/6744.jpg    
  inflating: test_images/9477.jpg    
  inflating: test_images/2522.jpg    
  inflating: test_images/4153.jpg    
  inflating: test_images/4635.jpg    
  inflating: test_images/2244.jpg    
  inflating: test_images/6022.jpg    
  inflating: test_images/9311.jpg    
  inflating: test_images/1995.jpg    
  inflating: test_images/3172.jpg    
  inflating: test_images/5503.jpg    
  inflating: test_images/1765.jpg    
  inflating: test_images/9339.jpg    
  inflating: test_images/8027.jpg    
  inflating: test_images/7314.jpg    
  inflating: test_images/8741.jpg    
  inflating: test_images/7472.jpg    
  inflating: test_images/8999.jpg    
  inflating: test_images/597.jpg     
  inflating: test_images/1003.jpg    
  inflating: test_images/5265.jpg    
  inflating: test_images/3614.jpg    
  inflating: test_images/5271.jpg    
  inflating: test_images/3600.jpg    
  inflating: test_images/8755.jpg    
  inflating: test_images/7466.jpg    
  inflating: test_images/6778.jpg    
  inflating: test_images/1017.jpg    
  inflating: test_images/583.jpg     
  inflating: test_images/1771.jpg    
  inflating: test_images/8033.jpg    
  inflating: test_images/7300.jpg    
  inflating: test_images/3166.jpg    
  inflating: test_images/4609.jpg    
  inflating: test_images/5517.jpg    
  inflating: test_images/2278.jpg    
  inflating: test_images/4796.jpg    
  inflating: test_images/5488.jpg    
  inflating: test_images/6181.jpg    
  inflating: test_images/1836.jpg    
  inflating: test_images/8812.jpg    
  inflating: test_images/1188.jpg    
  inflating: test_images/3947.jpg    
  inflating: test_images/2481.jpg    
  inflating: test_images/3953.jpg    
  inflating: test_images/2495.jpg    
  inflating: test_images/408.jpg     
  inflating: test_images/8806.jpg    
  inflating: test_images/6195.jpg    
  inflating: test_images/1822.jpg    
  inflating: test_images/4782.jpg    
  inflating: test_images/346.jpg     
  inflating: test_images/8190.jpg    
  inflating: test_images/4972.jpg    
  inflating: test_images/6803.jpg    
  inflating: test_images/420.jpg     
  inflating: test_images/6817.jpg    
  inflating: test_images/434.jpg     
  inflating: test_images/4966.jpg    
  inflating: test_images/352.jpg     
  inflating: test_images/8184.jpg    
  inflating: test_images/6142.jpg    
  inflating: test_images/9271.jpg    
  inflating: test_images/2324.jpg    
  inflating: test_images/4755.jpg    
  inflating: test_images/4033.jpg    
  inflating: test_images/3984.jpg    
  inflating: test_images/2442.jpg    
  inflating: test_images/6624.jpg    
  inflating: test_images/9517.jpg    
  inflating: test_images/8609.jpg    
  inflating: test_images/6630.jpg    
  inflating: test_images/9503.jpg    
  inflating: test_images/3748.jpg    
  inflating: test_images/4027.jpg    
  inflating: test_images/3990.jpg    
  inflating: test_images/5339.jpg    
  inflating: test_images/2456.jpg    
  inflating: test_images/2330.jpg    
  inflating: test_images/4999.jpg    
  inflating: test_images/4741.jpg    
  inflating: test_images/7248.jpg    
  inflating: test_images/6156.jpg    
  inflating: test_images/9265.jpg    
  inflating: test_images/1639.jpg    
  inflating: test_images/2318.jpg    
  inflating: test_images/5477.jpg    
  inflating: test_images/4769.jpg    
  inflating: test_images/3006.jpg    
  inflating: test_images/8153.jpg    
  inflating: test_images/7260.jpg    
  inflating: test_images/385.jpg     
  inflating: test_images/1611.jpg    
  inflating: test_images/1177.jpg    
  inflating: test_images/6618.jpg    
  inflating: test_images/8635.jpg    
  inflating: test_images/7506.jpg    
  inflating: test_images/3760.jpg    
  inflating: test_images/5311.jpg    
  inflating: test_images/3774.jpg    
  inflating: test_images/5305.jpg    
  inflating: test_images/1163.jpg    
  inflating: test_images/8621.jpg    
  inflating: test_images/7512.jpg    
  inflating: test_images/8147.jpg    
  inflating: test_images/7274.jpg    
  inflating: test_images/9259.jpg    
  inflating: test_images/1605.jpg    
  inflating: test_images/391.jpg     
  inflating: test_images/5463.jpg    
  inflating: test_images/3012.jpg    
  inflating: test_images/2683.jpg    
  inflating: test_images/86.jpg      
  inflating: test_images/7923.jpg    
  inflating: test_images/178.jpg     
  inflating: test_images/6383.jpg    
  inflating: test_images/5852.jpg    
  inflating: test_images/4594.jpg    
  inflating: test_images/5846.jpg    
  inflating: test_images/4580.jpg    
  inflating: test_images/7089.jpg    
  inflating: test_images/6397.jpg    
  inflating: test_images/92.jpg      
  inflating: test_images/7937.jpg    
  inflating: test_images/3589.jpg    
  inflating: test_images/2697.jpg    
  inflating: test_images/622.jpg     
  inflating: test_images/9932.jpg    
  inflating: test_images/2867.jpg    
  inflating: test_images/8392.jpg    
  inflating: test_images/144.jpg     
  inflating: test_images/8386.jpg    
  inflating: test_images/150.jpg     
  inflating: test_images/9098.jpg    
  inflating: test_images/2873.jpg    
  inflating: test_images/636.jpg     
  inflating: test_images/9926.jpg    
  inflating: test_images/805.jpg     
  inflating: test_images/7738.jpg    
  inflating: test_images/9715.jpg    
  inflating: test_images/6426.jpg    
  inflating: test_images/45.jpg      
  inflating: test_images/1349.jpg    
  inflating: test_images/2640.jpg    
  inflating: test_images/4231.jpg    
  inflating: test_images/2898.jpg    
  inflating: test_images/5891.jpg    
  inflating: test_images/3238.jpg    
  inflating: test_images/4557.jpg    
  inflating: test_images/5649.jpg    
  inflating: test_images/2126.jpg    
  inflating: test_images/9073.jpg    
  inflating: test_images/6340.jpg    
  inflating: test_images/9067.jpg    
  inflating: test_images/6354.jpg    
  inflating: test_images/8379.jpg    
  inflating: test_images/5885.jpg    
  inflating: test_images/3.jpg       
  inflating: test_images/4543.jpg    
  inflating: test_images/2132.jpg    
  inflating: test_images/2654.jpg    
  inflating: test_images/4225.jpg    
  inflating: test_images/811.jpg     
  inflating: test_images/9701.jpg    
  inflating: test_images/51.jpg      
  inflating: test_images/6432.jpg    
  inflating: test_images/5113.jpg    
  inflating: test_images/3562.jpg    
  inflating: test_images/7704.jpg    
  inflating: test_images/8437.jpg    
  inflating: test_images/839.jpg     
  inflating: test_images/79.jpg      
  inflating: test_images/9729.jpg    
  inflating: test_images/1375.jpg    
  inflating: test_images/1413.jpg    
  inflating: test_images/187.jpg     
  inflating: test_images/7062.jpg    
  inflating: test_images/8351.jpg    
  inflating: test_images/3204.jpg    
  inflating: test_images/5675.jpg    
  inflating: test_images/3210.jpg    
  inflating: test_images/5661.jpg    
  inflating: test_images/193.jpg     
  inflating: test_images/1407.jpg    
  inflating: test_images/6368.jpg    
  inflating: test_images/7076.jpg    
  inflating: test_images/8345.jpg    
  inflating: test_images/7710.jpg    
  inflating: test_images/8423.jpg    
  inflating: test_images/1361.jpg    
  inflating: test_images/2668.jpg    
  inflating: test_images/5107.jpg    
  inflating: test_images/4219.jpg    
  inflating: test_images/3576.jpg    
  inflating: test_images/2118.jpg    
  inflating: test_images/5677.jpg    
  inflating: test_images/4569.jpg    
  inflating: test_images/3206.jpg    
  inflating: test_images/7060.jpg    
  inflating: test_images/8353.jpg    
  inflating: test_images/185.jpg     
  inflating: test_images/1411.jpg    
  inflating: test_images/1377.jpg    
  inflating: test_images/6418.jpg    
  inflating: test_images/7706.jpg    
  inflating: test_images/8435.jpg    
  inflating: test_images/3560.jpg    
  inflating: test_images/5111.jpg    
  inflating: test_images/3574.jpg    
  inflating: test_images/5105.jpg    
  inflating: test_images/1363.jpg    
  inflating: test_images/7712.jpg    
  inflating: test_images/8421.jpg    
  inflating: test_images/7074.jpg    
  inflating: test_images/8347.jpg    
  inflating: test_images/9059.jpg    
  inflating: test_images/1405.jpg    
  inflating: test_images/191.jpg     
  inflating: test_images/5663.jpg    
  inflating: test_images/3212.jpg    
  inflating: test_images/9071.jpg    
  inflating: test_images/6342.jpg    
  inflating: test_images/2124.jpg    
  inflating: test_images/4555.jpg    
  inflating: test_images/5893.jpg    
  inflating: test_images/4233.jpg    
  inflating: test_images/2642.jpg    
  inflating: test_images/9717.jpg    
  inflating: test_images/47.jpg      
  inflating: test_images/6424.jpg    
  inflating: test_images/807.jpg     
  inflating: test_images/8409.jpg    
  inflating: test_images/9703.jpg    
  inflating: test_images/6430.jpg    
  inflating: test_images/53.jpg      
  inflating: test_images/813.jpg     
  inflating: test_images/3548.jpg    
  inflating: test_images/4227.jpg    
  inflating: test_images/5139.jpg    
  inflating: test_images/2656.jpg    
  inflating: test_images/2130.jpg    
  inflating: test_images/4541.jpg    
  inflating: test_images/1.jpg       
  inflating: test_images/5887.jpg    
  inflating: test_images/7048.jpg    
  inflating: test_images/9065.jpg    
  inflating: test_images/6356.jpg    
  inflating: test_images/1439.jpg    
  inflating: test_images/146.jpg     
  inflating: test_images/8390.jpg    
  inflating: test_images/2865.jpg    
  inflating: test_images/9930.jpg    
  inflating: test_images/620.jpg     
  inflating: test_images/9924.jpg    
  inflating: test_images/7909.jpg    
  inflating: test_images/634.jpg     
  inflating: test_images/2871.jpg    
  inflating: test_images/5878.jpg    
  inflating: test_images/152.jpg     
  inflating: test_images/8384.jpg    
  inflating: test_images/4596.jpg    
  inflating: test_images/5850.jpg    
  inflating: test_images/5688.jpg    
  inflating: test_images/6381.jpg    
  inflating: test_images/1388.jpg    
  inflating: test_images/7921.jpg    
  inflating: test_images/84.jpg      
  inflating: test_images/2681.jpg    
  inflating: test_images/2859.jpg    
  inflating: test_images/2695.jpg    
  inflating: test_images/9918.jpg    
  inflating: test_images/608.jpg     
  inflating: test_images/7935.jpg    
  inflating: test_images/90.jpg      
  inflating: test_images/6395.jpg    
  inflating: test_images/4582.jpg    
  inflating: test_images/5844.jpg    
  inflating: test_images/5313.jpg    
  inflating: test_images/3762.jpg    
  inflating: test_images/8637.jpg    
  inflating: test_images/7504.jpg    
  inflating: test_images/9529.jpg    
  inflating: test_images/1175.jpg    
  inflating: test_images/1613.jpg    
  inflating: test_images/387.jpg     
  inflating: test_images/8151.jpg    
  inflating: test_images/7262.jpg    
  inflating: test_images/3004.jpg    
  inflating: test_images/5475.jpg    
  inflating: test_images/3010.jpg    
  inflating: test_images/5461.jpg    
  inflating: test_images/393.jpg     
  inflating: test_images/1607.jpg    
  inflating: test_images/6168.jpg    
  inflating: test_images/8145.jpg    
  inflating: test_images/7276.jpg    
  inflating: test_images/8623.jpg    
  inflating: test_images/7510.jpg    
  inflating: test_images/1161.jpg    
  inflating: test_images/2468.jpg    
  inflating: test_images/5307.jpg    
  inflating: test_images/4019.jpg    
  inflating: test_images/3776.jpg    
  inflating: test_images/7538.jpg    
  inflating: test_images/6626.jpg    
  inflating: test_images/9515.jpg    
  inflating: test_images/1149.jpg    
  inflating: test_images/2440.jpg    
  inflating: test_images/3986.jpg    
  inflating: test_images/4031.jpg    
  inflating: test_images/3038.jpg    
  inflating: test_images/4757.jpg    
  inflating: test_images/5449.jpg    
  inflating: test_images/2326.jpg    
  inflating: test_images/6140.jpg    
  inflating: test_images/9273.jpg    
  inflating: test_images/6154.jpg    
  inflating: test_images/9267.jpg    
  inflating: test_images/8179.jpg    
  inflating: test_images/4743.jpg    
  inflating: test_images/2332.jpg    
  inflating: test_images/2454.jpg    
  inflating: test_images/3992.jpg    
  inflating: test_images/4025.jpg    
  inflating: test_images/6632.jpg    
  inflating: test_images/9501.jpg    
  inflating: test_images/422.jpg     
  inflating: test_images/6801.jpg    
  inflating: test_images/3979.jpg    
  inflating: test_images/4970.jpg    
  inflating: test_images/1808.jpg    
  inflating: test_images/8192.jpg    
  inflating: test_images/344.jpg     
  inflating: test_images/8186.jpg    
  inflating: test_images/350.jpg     
  inflating: test_images/9298.jpg    
  inflating: test_images/4964.jpg    
  inflating: test_images/8838.jpg    
  inflating: test_images/436.jpg     
  inflating: test_images/6815.jpg    
  inflating: test_images/2483.jpg    
  inflating: test_images/3945.jpg    
  inflating: test_images/8810.jpg    
  inflating: test_images/1834.jpg    
  inflating: test_images/378.jpg     
  inflating: test_images/6183.jpg    
  inflating: test_images/4794.jpg    
  inflating: test_images/4958.jpg    
  inflating: test_images/4780.jpg    
  inflating: test_images/7289.jpg    
  inflating: test_images/1820.jpg    
  inflating: test_images/6197.jpg    
  inflating: test_images/8804.jpg    
  inflating: test_images/6829.jpg    
  inflating: test_images/3789.jpg    
  inflating: test_images/2497.jpg    
  inflating: test_images/3951.jpg    
  inflating: test_images/3616.jpg    
  inflating: test_images/4179.jpg    
  inflating: test_images/5267.jpg    
  inflating: test_images/2508.jpg    
  inflating: test_images/1001.jpg    
  inflating: test_images/595.jpg     
  inflating: test_images/8743.jpg    
  inflating: test_images/7470.jpg    
  inflating: test_images/8025.jpg    
  inflating: test_images/7316.jpg    
  inflating: test_images/6008.jpg    
  inflating: test_images/1767.jpg    
  inflating: test_images/5501.jpg    
  inflating: test_images/3170.jpg    
  inflating: test_images/5515.jpg    
  inflating: test_images/3164.jpg    
  inflating: test_images/8031.jpg    
  inflating: test_images/7302.jpg    
  inflating: test_images/1773.jpg    
  inflating: test_images/581.jpg     
  inflating: test_images/1015.jpg    
  inflating: test_images/9449.jpg    
  inflating: test_images/8757.jpg    
  inflating: test_images/7464.jpg    
  inflating: test_images/3602.jpg    
  inflating: test_images/5273.jpg    
  inflating: test_images/6752.jpg    
  inflating: test_images/9461.jpg    
  inflating: test_images/4145.jpg    
  inflating: test_images/2534.jpg    
  inflating: test_images/2252.jpg    
  inflating: test_images/4623.jpg    
  inflating: test_images/1983.jpg    
  inflating: test_images/8019.jpg    
  inflating: test_images/6034.jpg    
  inflating: test_images/9307.jpg    
  inflating: test_images/1997.jpg    
  inflating: test_images/6020.jpg    
  inflating: test_images/9313.jpg    
  inflating: test_images/2246.jpg    
  inflating: test_images/5529.jpg    
  inflating: test_images/4637.jpg    
  inflating: test_images/3158.jpg    
  inflating: test_images/4151.jpg    
  inflating: test_images/2520.jpg    
  inflating: test_images/1029.jpg    
  inflating: test_images/6746.jpg    
  inflating: test_images/9475.jpg    
  inflating: test_images/7458.jpg    
  inflating: test_images/8780.jpg    
  inflating: test_images/6975.jpg    
  inflating: test_images/8958.jpg    
  inflating: test_images/556.jpg     
  inflating: test_images/4804.jpg    
  inflating: test_images/230.jpg     
  inflating: test_images/224.jpg     
  inflating: test_images/1968.jpg    
  inflating: test_images/4810.jpg    
  inflating: test_images/3819.jpg    
  inflating: test_images/8794.jpg    
  inflating: test_images/6961.jpg    
  inflating: test_images/542.jpg     
  inflating: test_images/3831.jpg    
  inflating: test_images/5298.jpg    
  inflating: test_images/4186.jpg    
  inflating: test_images/6949.jpg    
  inflating: test_images/8964.jpg    
  inflating: test_images/6791.jpg    
  inflating: test_images/1798.jpg    
  inflating: test_images/1940.jpg    
  inflating: test_images/2291.jpg    
  inflating: test_images/4838.jpg    
  inflating: test_images/2285.jpg    
  inflating: test_images/218.jpg     
  inflating: test_images/1954.jpg    
  inflating: test_images/8970.jpg    
  inflating: test_images/6785.jpg    
  inflating: test_images/3825.jpg    
  inflating: test_images/4192.jpg    
  inflating: test_images/3372.jpg    
  inflating: test_images/5703.jpg    
  inflating: test_images/1565.jpg    
  inflating: test_images/9139.jpg    
  inflating: test_images/7114.jpg    
  inflating: test_images/8227.jpg    
  inflating: test_images/9887.jpg    
  inflating: test_images/7672.jpg    
  inflating: test_images/8541.jpg    
  inflating: test_images/797.jpg     
  inflating: test_images/1203.jpg    
  inflating: test_images/5065.jpg    
  inflating: test_images/3414.jpg    
  inflating: test_images/5071.jpg    
  inflating: test_images/3400.jpg    
  inflating: test_images/9893.jpg    
  inflating: test_images/7666.jpg    
  inflating: test_images/8555.jpg    
  inflating: test_images/6578.jpg    
  inflating: test_images/1217.jpg    
  inflating: test_images/783.jpg     
  inflating: test_images/1571.jpg    
  inflating: test_images/7100.jpg    
  inflating: test_images/8233.jpg    
  inflating: test_images/3366.jpg    
  inflating: test_images/4409.jpg    
  inflating: test_images/5717.jpg    
  inflating: test_images/2078.jpg    
  inflating: test_images/1559.jpg    
  inflating: test_images/9105.jpg    
  inflating: test_images/6236.jpg    
  inflating: test_images/7128.jpg    
  inflating: test_images/4421.jpg    
  inflating: test_images/2050.jpg    
  inflating: test_images/2736.jpg    
  inflating: test_images/5059.jpg    
  inflating: test_images/4347.jpg    
  inflating: test_images/3428.jpg    
  inflating: test_images/973.jpg     
  inflating: test_images/9663.jpg    
  inflating: test_images/6550.jpg    
  inflating: test_images/7896.jpg    
  inflating: test_images/967.jpg     
  inflating: test_images/8569.jpg    
  inflating: test_images/9677.jpg    
  inflating: test_images/6544.jpg    
  inflating: test_images/7882.jpg    
  inflating: test_images/2722.jpg    
  inflating: test_images/4353.jpg    
  inflating: test_images/4435.jpg    
  inflating: test_images/2044.jpg    
  inflating: test_images/9111.jpg    
  inflating: test_images/6222.jpg    
  inflating: test_images/5918.jpg    
  inflating: test_images/2911.jpg    
  inflating: test_images/7869.jpg    
  inflating: test_images/754.jpg     
  inflating: test_images/9844.jpg    
  inflating: test_images/8582.jpg    
  inflating: test_images/9688.jpg    
  inflating: test_images/740.jpg     
  inflating: test_images/9850.jpg    
  inflating: test_images/8596.jpg    
  inflating: test_images/998.jpg     
  inflating: test_images/2905.jpg    
  inflating: test_images/2093.jpg    
  inflating: test_images/5924.jpg    
  inflating: test_images/6593.jpg    
  inflating: test_images/768.jpg     
  inflating: test_images/7855.jpg    
  inflating: test_images/9878.jpg    
  inflating: test_images/4384.jpg    
  inflating: test_images/4390.jpg    
  inflating: test_images/2939.jpg    
  inflating: test_images/6587.jpg    
  inflating: test_images/7841.jpg    
  inflating: test_images/7699.jpg    
  inflating: test_images/2087.jpg    
  inflating: test_images/5930.jpg    
  inflating: test_images/3399.jpg    
  inflating: test_images/7698.jpg    
  inflating: test_images/7840.jpg    
  inflating: test_images/6586.jpg    
  inflating: test_images/2938.jpg    
  inflating: test_images/4391.jpg    
  inflating: test_images/3398.jpg    
  inflating: test_images/5931.jpg    
  inflating: test_images/2086.jpg    
  inflating: test_images/5925.jpg    
  inflating: test_images/2092.jpg    
  inflating: test_images/4385.jpg    
  inflating: test_images/9879.jpg    
  inflating: test_images/7854.jpg    
  inflating: test_images/769.jpg     
  inflating: test_images/6592.jpg    
  inflating: test_images/2904.jpg    
  inflating: test_images/8597.jpg    
  inflating: test_images/999.jpg     
  inflating: test_images/9851.jpg    
  inflating: test_images/741.jpg     
  inflating: test_images/9689.jpg    
  inflating: test_images/5919.jpg    
  inflating: test_images/8583.jpg    
  inflating: test_images/9845.jpg    
  inflating: test_images/755.jpg     
  inflating: test_images/7868.jpg    
  inflating: test_images/2910.jpg    
  inflating: test_images/4352.jpg    
  inflating: test_images/2723.jpg    
  inflating: test_images/7883.jpg    
  inflating: test_images/6545.jpg    
  inflating: test_images/9676.jpg    
  inflating: test_images/966.jpg     
  inflating: test_images/8568.jpg    
  inflating: test_images/6223.jpg    
  inflating: test_images/9110.jpg    
  inflating: test_images/2045.jpg    
  inflating: test_images/4434.jpg    
  inflating: test_images/2051.jpg    
  inflating: test_images/4420.jpg    
  inflating: test_images/7129.jpg    
  inflating: test_images/6237.jpg    
  inflating: test_images/9104.jpg    
  inflating: test_images/1558.jpg    
  inflating: test_images/7897.jpg    
  inflating: test_images/6551.jpg    
  inflating: test_images/9662.jpg    
  inflating: test_images/972.jpg     
  inflating: test_images/3429.jpg    
  inflating: test_images/4346.jpg    
  inflating: test_images/5058.jpg    
  inflating: test_images/2737.jpg    
  inflating: test_images/782.jpg     
  inflating: test_images/1216.jpg    
  inflating: test_images/6579.jpg    
  inflating: test_images/8554.jpg    
  inflating: test_images/7667.jpg    
  inflating: test_images/9892.jpg    
  inflating: test_images/3401.jpg    
  inflating: test_images/5070.jpg    
  inflating: test_images/2079.jpg    
  inflating: test_images/5716.jpg    
  inflating: test_images/4408.jpg    
  inflating: test_images/3367.jpg    
  inflating: test_images/8232.jpg    
  inflating: test_images/7101.jpg    
  inflating: test_images/1570.jpg    
  inflating: test_images/8226.jpg    
  inflating: test_images/7115.jpg    
  inflating: test_images/9138.jpg    
  inflating: test_images/1564.jpg    
  inflating: test_images/5702.jpg    
  inflating: test_images/3373.jpg    
  inflating: test_images/3415.jpg    
  inflating: test_images/5064.jpg    
  inflating: test_images/1202.jpg    
  inflating: test_images/796.jpg     
  inflating: test_images/8540.jpg    
  inflating: test_images/7673.jpg    
  inflating: test_images/9886.jpg    
  inflating: test_images/1955.jpg    
  inflating: test_images/219.jpg     
  inflating: test_images/2284.jpg    
  inflating: test_images/4193.jpg    
  inflating: test_images/3824.jpg    
  inflating: test_images/6784.jpg    
  inflating: test_images/8971.jpg    
  inflating: test_images/6790.jpg    
  inflating: test_images/8965.jpg    
  inflating: test_images/6948.jpg    
  inflating: test_images/4187.jpg    
  inflating: test_images/5299.jpg    
  inflating: test_images/3830.jpg    
  inflating: test_images/4839.jpg    
  inflating: test_images/2290.jpg    
  inflating: test_images/1941.jpg    
  inflating: test_images/1799.jpg    
  inflating: test_images/4811.jpg    
  inflating: test_images/1969.jpg    
  inflating: test_images/225.jpg     
  inflating: test_images/543.jpg     
  inflating: test_images/6960.jpg    
  inflating: test_images/8795.jpg    
  inflating: test_images/3818.jpg    
  inflating: test_images/8959.jpg    
  inflating: test_images/557.jpg     
  inflating: test_images/6974.jpg    
  inflating: test_images/8781.jpg    
  inflating: test_images/231.jpg     
  inflating: test_images/4805.jpg    
  inflating: test_images/3159.jpg    
  inflating: test_images/4636.jpg    
  inflating: test_images/5528.jpg    
  inflating: test_images/2247.jpg    
  inflating: test_images/9312.jpg    
  inflating: test_images/6021.jpg    
  inflating: test_images/1996.jpg    
  inflating: test_images/7459.jpg    
  inflating: test_images/9474.jpg    
  inflating: test_images/6747.jpg    
  inflating: test_images/1028.jpg    
  inflating: test_images/2521.jpg    
  inflating: test_images/4150.jpg    
  inflating: test_images/2535.jpg    
  inflating: test_images/4144.jpg    
  inflating: test_images/9460.jpg    
  inflating: test_images/6753.jpg    
  inflating: test_images/9306.jpg    
  inflating: test_images/6035.jpg    
  inflating: test_images/1982.jpg    
  inflating: test_images/8018.jpg    
  inflating: test_images/4622.jpg    
  inflating: test_images/2253.jpg    
  inflating: test_images/1772.jpg    
  inflating: test_images/7303.jpg    
  inflating: test_images/8030.jpg    
  inflating: test_images/3165.jpg    
  inflating: test_images/5514.jpg    
  inflating: test_images/5272.jpg    
  inflating: test_images/3603.jpg    
  inflating: test_images/7465.jpg    
  inflating: test_images/8756.jpg    
  inflating: test_images/9448.jpg    
  inflating: test_images/1014.jpg    
  inflating: test_images/580.jpg     
  inflating: test_images/7471.jpg    
  inflating: test_images/8742.jpg    
  inflating: test_images/594.jpg     
  inflating: test_images/1000.jpg    
  inflating: test_images/2509.jpg    
  inflating: test_images/5266.jpg    
  inflating: test_images/4178.jpg    
  inflating: test_images/3617.jpg    
  inflating: test_images/3171.jpg    
  inflating: test_images/5500.jpg    
  inflating: test_images/1766.jpg    
  inflating: test_images/6009.jpg    
  inflating: test_images/7317.jpg    
  inflating: test_images/8024.jpg    
  inflating: test_images/6196.jpg    
  inflating: test_images/1821.jpg    
  inflating: test_images/7288.jpg    
  inflating: test_images/4781.jpg    
  inflating: test_images/4959.jpg    
  inflating: test_images/3950.jpg    
  inflating: test_images/2496.jpg    
  inflating: test_images/3788.jpg    
  inflating: test_images/6828.jpg    
  inflating: test_images/8805.jpg    
  inflating: test_images/8811.jpg    
  inflating: test_images/3944.jpg    
  inflating: test_images/2482.jpg    
  inflating: test_images/4795.jpg    
  inflating: test_images/6182.jpg    
  inflating: test_images/379.jpg     
  inflating: test_images/1835.jpg    
  inflating: test_images/4965.jpg    
  inflating: test_images/9299.jpg    
  inflating: test_images/351.jpg     
  inflating: test_images/8187.jpg    
  inflating: test_images/6814.jpg    
  inflating: test_images/8839.jpg    
  inflating: test_images/437.jpg     
  inflating: test_images/3978.jpg    
  inflating: test_images/6800.jpg    
  inflating: test_images/423.jpg     
  inflating: test_images/345.jpg     
  inflating: test_images/1809.jpg    
  inflating: test_images/8193.jpg    
  inflating: test_images/4971.jpg    
  inflating: test_images/2333.jpg    
  inflating: test_images/4742.jpg    
  inflating: test_images/8178.jpg    
  inflating: test_images/9266.jpg    
  inflating: test_images/6155.jpg    
  inflating: test_images/9500.jpg    
  inflating: test_images/6633.jpg    
  inflating: test_images/4024.jpg    
  inflating: test_images/3993.jpg    
  inflating: test_images/2455.jpg    
  inflating: test_images/4030.jpg    
  inflating: test_images/3987.jpg    
  inflating: test_images/2441.jpg    
  inflating: test_images/1148.jpg    
  inflating: test_images/9514.jpg    
  inflating: test_images/6627.jpg    
  inflating: test_images/7539.jpg    
  inflating: test_images/9272.jpg    
  inflating: test_images/6141.jpg    
  inflating: test_images/2327.jpg    
  inflating: test_images/5448.jpg    
  inflating: test_images/4756.jpg    
  inflating: test_images/3039.jpg    
  inflating: test_images/7277.jpg    
  inflating: test_images/8144.jpg    
  inflating: test_images/6169.jpg    
  inflating: test_images/1606.jpg    
  inflating: test_images/392.jpg     
  inflating: test_images/5460.jpg    
  inflating: test_images/3011.jpg    
  inflating: test_images/3777.jpg    
  inflating: test_images/4018.jpg    
  inflating: test_images/5306.jpg    
  inflating: test_images/2469.jpg    
  inflating: test_images/1160.jpg    
  inflating: test_images/7511.jpg    
  inflating: test_images/8622.jpg    
  inflating: test_images/1174.jpg    
  inflating: test_images/9528.jpg    
  inflating: test_images/7505.jpg    
  inflating: test_images/8636.jpg    
  inflating: test_images/3763.jpg    
  inflating: test_images/5312.jpg    
  inflating: test_images/5474.jpg    
  inflating: test_images/3005.jpg    
  inflating: test_images/7263.jpg    
  inflating: test_images/8150.jpg    
  inflating: test_images/386.jpg     
  inflating: test_images/1612.jpg    
  inflating: test_images/91.jpg      
  inflating: test_images/7934.jpg    
  inflating: test_images/609.jpg     
  inflating: test_images/9919.jpg    
  inflating: test_images/2694.jpg    
  inflating: test_images/5845.jpg    
  inflating: test_images/4583.jpg    
  inflating: test_images/6394.jpg    
  inflating: test_images/6380.jpg    
  inflating: test_images/5689.jpg    
  inflating: test_images/5851.jpg    
  inflating: test_images/4597.jpg    
  inflating: test_images/2858.jpg    
  inflating: test_images/2680.jpg    
  inflating: test_images/85.jpg      
  inflating: test_images/7920.jpg    
  inflating: test_images/1389.jpg    
  inflating: test_images/2870.jpg    
  inflating: test_images/635.jpg     
  inflating: test_images/7908.jpg    
  inflating: test_images/9925.jpg    
  inflating: test_images/8385.jpg    
  inflating: test_images/153.jpg     
  inflating: test_images/5879.jpg    
  inflating: test_images/8391.jpg    
  inflating: test_images/147.jpg     
  inflating: test_images/621.jpg     
  inflating: test_images/9931.jpg    
  inflating: test_images/2864.jpg    
  inflating: test_images/2657.jpg    
  inflating: test_images/5138.jpg    
  inflating: test_images/4226.jpg    
  inflating: test_images/3549.jpg    
  inflating: test_images/812.jpg     
  inflating: test_images/6431.jpg    
  inflating: test_images/52.jpg      
  inflating: test_images/9702.jpg    
  inflating: test_images/1438.jpg    
  inflating: test_images/6357.jpg    
  inflating: test_images/9064.jpg    
  inflating: test_images/7049.jpg    
  inflating: test_images/5886.jpg    
  inflating: test_images/0.jpg       
  inflating: __MACOSX/test_images/._0.jpg  
  inflating: test_images/4540.jpg    
  inflating: test_images/2131.jpg    
  inflating: test_images/5892.jpg    
  inflating: test_images/4554.jpg    
  inflating: test_images/2125.jpg    
  inflating: test_images/6343.jpg    
  inflating: test_images/9070.jpg    
  inflating: test_images/806.jpg     
  inflating: test_images/8408.jpg    
  inflating: test_images/46.jpg      
  inflating: test_images/6425.jpg    
  inflating: test_images/9716.jpg    
  inflating: test_images/2643.jpg    
  inflating: test_images/4232.jpg    
  inflating: test_images/8420.jpg    
  inflating: test_images/7713.jpg    
  inflating: test_images/1362.jpg    
  inflating: test_images/5104.jpg    
  inflating: test_images/3575.jpg    
  inflating: test_images/3213.jpg    
  inflating: test_images/5662.jpg    
  inflating: test_images/190.jpg     
  inflating: test_images/1404.jpg    
  inflating: test_images/9058.jpg    
  inflating: test_images/8346.jpg    
  inflating: test_images/7075.jpg    
  inflating: test_images/1410.jpg    
  inflating: test_images/184.jpg     
  inflating: test_images/8352.jpg    
  inflating: test_images/7061.jpg    
  inflating: test_images/3207.jpg    
  inflating: test_images/4568.jpg    
  inflating: test_images/5676.jpg    
  inflating: test_images/2119.jpg    
  inflating: test_images/5110.jpg    
  inflating: test_images/3561.jpg    
  inflating: test_images/8434.jpg    
  inflating: test_images/7707.jpg    
  inflating: test_images/6419.jpg    
  inflating: test_images/1376.jpg    

In [4]

!cd ~/data/data62842/ && mv train_images ../ && mv train_label.csv ../
!cd ~/data/data62843/ && mv test_images ../ 

2.2 数据增强

  • 本次比赛中,使用数据增强的目的是用来防止过拟合,并且数据增强适用于dataset较小的时候。

  • 我选择使用text_render进行数据增强。使用的操作主要包括明暗变换,文本边界调整,添加噪声,颜色调整,文本字体特效变换等等。

  • 安装text_render后,需要手动修改text_render/configs/default.yaml配置,如下所示

# Small font_size will make text looks like blured/prydown
font_size:
  min: 14
  max: 23

# choose Text color range
# color boundary is in R,G,B format
font_color:
  enable: true
  blue:
    fraction: 0.5
    l_boundary: [0,0,150]
    h_boundary: [60,60,255]
  brown:
    fraction: 0.5
    l_boundary: [139,70,19]
    h_boundary: [160,82,43]

# By default, text is drawed by Pillow with (https://stackoverflow.com/questions/43828955/measuring-width-of-text-python-pil)
# If `random_space` is enabled, some text will be drawed char by char with a random space
random_space:
  enable: false
  fraction: 0.3
  min: -0.1 # -0.1 will make chars very close or even overlapped
  max: 0.1

# Do remap with sin()
# Currently this process is very slow!
curve:
  enable: false
  fraction: 0.3
  period: 360  # degree, sin 函数的周期
  min: 1 # sin 函数的幅值范围
  max: 5

# random crop text height
crop:
  enable: false
  fraction: 0.5

  # top and bottom will applied equally
  top:
    min: 5
    max: 10 # in pixel, this value should small than img_height
  bottom:
    min: 5
    max: 10 # in pixel, this value should small than img_height

# Use image in bg_dir as background for text
img_bg:
  enable: false
  fraction: 0.5

# Not work when random_space applied
text_border:
  enable: true
  fraction: 0.3

  # lighter than word color
  light:
    enable: true
    fraction: 0.5

  # darker than word color
  dark:
    enable: true
    fraction: 0.5

# https://docs.opencv.org/3.4/df/da0/group__photo__clone.html#ga2bf426e4c93a6b1f21705513dfeca49d
# https://www.cs.virginia.edu/~connelly/class/2014/comp_photo/proj2/poisson.pdf
# Use opencv seamlessClone() to draw text on background
# For some background image, this will make text image looks more real
seamless_clone:
  enable: true
  fraction: 0.5

perspective_transform:
  max_x: 25
  max_y: 25
  max_z: 3

blur:
  enable: true
  fraction: 0.03

# If an image is applied blur, it will not be applied prydown
prydown:
  enable: true
  fraction: 0.03
  max_scale: 1.5 # Image will first resize to 1.5x, and than resize to 1x

noise:
  enable: true
  fraction: 0.3

  gauss:
    enable: true
    fraction: 0.25

  uniform:
    enable: true
    fraction: 0.25

  salt_pepper:
    enable: true
    fraction: 0.25

  poisson:
    enable: true
    fraction: 0.25

line:
  enable: false
  fraction: 0.05

  under_line:
    enable: false
    fraction: 0.2

  table_line:
    enable: false
    fraction: 0.3

  middle_line:
    enable: false
    fraction: 0.5

line_color:
  enable: false
  black:
    fraction: 0.5
    l_boundary: 0,0,0
    h_boundary: 64,64,64
  blue:
    fraction: 0.5
    l_boundary: [0,0,150]
    h_boundary: [60,60,255]

# These operates are applied on the final output image,
# so actually it can also be applied in training process as an data augmentation method.

# By default, text is darker than background.
# If `reverse_color` is enabled, some images will have dark background and light text
reverse_color:
  enable: true
  fraction: 0.3

emboss:
  enable: true
  fraction: 0.3

sharp:
  enable: true
  fraction: 0.3

  • PaddleOCR的FAQ1.1.8中介绍到,PaddleOCR的识别模型采用520W左右的数据集(真实数据26W+合成数据500W)进行训练,可见数据增广的重要性。

In [5]

!cd ~/work && git clone https://github.com/Sanster/text_renderer
!cd ~/work/text_renderer && pip install -r requirements.txt
Cloning into 'text_renderer'...
remote: Enumerating objects: 707, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 707 (delta 4), reused 7 (delta 2), pack-reused 688
Receiving objects: 100% (707/707), 12.92 MiB | 29.00 KiB/s, done.
Resolving deltas: 100% (387/387), done.
Checking connectivity... done.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: Cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (0.29)
Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (4.2.0.32)
Requirement already satisfied: pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (7.1.2)
Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (1.21.6)
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (2.2.3)
Collecting fontTools
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bc/83/43991c6f0dfb395cc9ccf5c19fd51fc6068cb3919cee4b78eddd4b16efd1/fonttools-4.32.0-py3-none-any.whl (900 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 900.8/900.8 KB 20.2 MB/s eta 0:00:0000:01
Collecting tenacity
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f2/a5/f86bc8d67c979020438c8559cc70cfe3a1643fd160d35e09c9cca6a09189/tenacity-8.0.1-py3-none-any.whl (24 kB)
Requirement already satisfied: easyDict in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (1.9)
Collecting pyyaml==5.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 274.2/274.2 KB 24.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (3.0.7)
Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (2022.1)
Requirement already satisfied: six>=1.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (1.16.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->-r requirements.txt (line 5)) (0.10.0)
Requirement already satisfied: setuptools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->-r requirements.txt (line 5)) (41.4.0)
Building wheels for collected packages: pyyaml
  Building wheel for pyyaml (setup.py) ... done
  Created wheel for pyyaml: filename=PyYAML-5.1-cp37-cp37m-linux_x86_64.whl size=44074 sha256=daf0a0735387c6b458dfe6ea44d8171e841f474a5b6c03b02e6e466946b4b4e3
  Stored in directory: /home/aistudio/.cache/pip/wheels/94/62/26/bddefc8ed4a42614da5668d92d375be4e5c7818a4847ff2a21
Successfully built pyyaml
Installing collected packages: tenacity, pyyaml, fontTools
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 5.1.2
    Uninstalling PyYAML-5.1.2:
      Successfully uninstalled PyYAML-5.1.2
Successfully installed fontTools-4.32.0 pyyaml-5.1 tenacity-8.0.1
  • 通过统计训练集的图像尺寸,可以发现训练集的高度固定为48,而宽度与图中的文字个数有关。

In [ ]

import glob
import os
import cv2


def get_aspect_ratio(img_set_dir):
    m_width = 0
    m_height = 0
    width_dict = {}
    height_dict = {}
    images = glob.glob(img_set_dir+'*.jpg')
    for image in images:
        img = cv2.imread(image)
        width_dict[int(img.shape[1])] = 1 if (int(img.shape[1])) not in width_dict else 1 + width_dict[int(img.shape[1])]
        height_dict[int(img.shape[0])] = 1 if (int(img.shape[0])) not in height_dict else 1 + height_dict[int(img.shape[0])]
        m_width += img.shape[1]
        m_height += img.shape[0]
    m_width = m_width/len(images)
    m_height = m_height/len(images)
    aspect_ratio = m_width/m_height
    width_dict = dict(sorted(width_dict.items(), key=lambda item: item[1], reverse=True))
    height_dict = dict(sorted(height_dict.items(), key=lambda item: item[1], reverse=True))
    return aspect_ratio,m_width,m_height,width_dict,height_dict
aspect_ratio,m_width,m_height,width_dict,height_dict = get_aspect_ratio("/home/aistudio/data/train_images/")
print("aspect ratio is: {}, mean width is: {}, mean height is: {}".format(aspect_ratio,m_width,m_height))
print("Width dict:{}".format(width_dict))
print("Height dict:{}".format(height_dict))
import pandas as pd

def Q2B(s):
    """全角转半角"""
    inside_code=ord(s)
    if inside_code==0x3000:
        inside_code=0x0020
    else:
        inside_code-=0xfee0
    if inside_code<0x0020 or inside_code>0x7e: #转完之后不是半角字符返回原来的字符
        return s
    return chr(inside_code)

def stringQ2B(s):
    """把字符串全角转半角"""
    return "".join([Q2B(c) for c in s])

def is_chinese(s):
    """判断unicode是否是汉字"""
    for c in s:
        if c < u'\u4e00' or c > u'\u9fa5':
            return False
    return True

def is_number(s):
    """判断unicode是否是数字"""
    for c in s:
        if c < u'\u0030' or c > u'\u0039':
            return False
    return True

def is_alphabet(s):
    """判断unicode是否是英文字母"""
    for c in s:
        if c < u'\u0061' or c > u'\u007a':
            return False
    return True

def del_other(s):
    """判断是否非汉字,数字和小写英文"""
    res = str()
    for c in s:
        if not (is_chinese(c) or is_number(c) or is_alphabet(c)):
            c = ""
        res += c
    return res


df = pd.read_csv("/home/aistudio/data/train_label.csv", encoding="gbk")
name, value = list(df.name), list(df.value)
for i, label in enumerate(value):
    # 全角转半角
    label = stringQ2B(label)
    # 大写转小写
    label = "".join([c.lower() for c in label])
    # 删除所有空格符号
    label = del_other(label)
    value[i] = label

# 删除标签为""的行
data = zip(name, value)
data = list(filter(lambda c: c[1]!="", list(data)))
# 保存到work目录
with open("/home/aistudio/data/train_label.txt", "w") as f:
    for line in data:
        f.write(line[0] + "\t" + line[1] + "\n")

# 记录训练集中最长标签
label_max_len = 0
with open("/home/aistudio/data/train_label.txt", "r") as f:
    for line in f:
        name, label = line.strip().split("\t")
        if len(label) > label_max_len:
            label_max_len = len(label)

print("label max len: ", label_max_len)
def create_label_list(train_list):
    classSet = set()
    with open(train_list) as f:
        next(f)
        for line in f:
            img_name, label = line.strip().split("\t")
            for e in label:
                classSet.add(e)
    # 在类的基础上加一个blank
    classList = sorted(list(classSet))
    with open("/home/aistudio/data/label_list.txt", "w") as f:
        for idx, c in enumerate(classList):
            f.write("{}\t{}\n".format(c, idx))
    
    # 为数据增广提供词库
    with open("/home/aistudio/work/text_renderer/data/chars/ch.txt", "w") as f:
        for idx, c in enumerate(classList):
            f.write("{}\n".format(c))
            
    return classSet

classSet = create_label_list("/home/aistudio/data/train_label.txt")
print("classify num: ", len(classSet))
  • 生成字符长度为1,2,3,4,5的数据集各2000张,共10000张。

In [ ]

# 清空已经生成的数据集
!cd ~/work/text_renderer/output/default && rm ./* 

In [ ]

!cd ~/work/text_renderer && python main.py --length 1 --img_width 32 --img_height 48 --chars_file "./data/chars/ch.txt" --corpus_mode 'random' --num_img 2000
!cd ~/work/text_renderer && python main.py --length 2 --img_width 64 --img_height 48 --chars_file "./data/chars/ch.txt" --corpus_mode 'random' --num_img 2000
!cd ~/work/text_renderer && python main.py --length 3 --img_width 96 --img_height 48 --chars_file "./data/chars/ch.txt" --corpus_mode 'random' --num_img 2000
!cd ~/work/text_renderer && python main.py --length 4 --img_width 128 --img_height 48 --chars_file "./data/chars/ch.txt" --corpus_mode 'random' --num_img 2000
!cd ~/work/text_renderer && python main.py --length 5 --img_width 160 --img_height 48 --chars_file "./data/chars/ch.txt" --corpus_mode 'random' --num_img 2000
  • 将生成的数据集与原数据集合并

In [9]

!cp ~/work/text_renderer/output/default/*.jpg ~/data/train_images

In [10]

import os

with open('work/text_renderer/output/default/tmp_labels.txt','r',encoding='utf-8') as src_label:
    with open('data/train_label.txt','a',encoding='utf-8') as dst_label:
        lines = src_label.readlines()
        for line in lines:
            [img,text] = line.split(' ')
            print('{}.jpg\t{}'.format(img,text),file=dst_label,end='')

三、模型调优

  • 可以选择PaddleOCR提供的CRNN预训练模型,或其他模型
  • 根据前面统计的训练集尺寸,将模型输入尺寸设置为高度48,宽度256
  • 采用cosine_decay和warmup策略,加快模型收敛
  • CRNN模型是在2015年论文"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition"[论文][代码]提出的,用于不定长序列的文本识别。

  • 下图是CRNN模型的结构图,其主要由CNN层,RNN层及CTC翻译层三部分构成。其中CNN层从输入图片提取图片特征,原文使用的是VGG网络,而PaddleOCR使用的是ResNet34和MobileNetV3。大模型往往能取得更好的效果,因此本项目采用ResNet34作为baseline来改进。改进方向则为调整CNN的特征提取网络,尝试ResNet50及更深的结构。

In [ ]

!cd ~/work/PaddleOCR && mkdir pretrain_weights && cd pretrain_weights && wget https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar

In [13]

!cd ~/work/PaddleOCR/pretrain_weights && tar -xf ch_ppocr_server_v1.1_rec_pre.tar
  • 在PaddleOCR/configs/rec中,分别添加训练配置文件 my_rec_ch_train.yml和my_rec_ch_reader.yml

  • 本次比赛结果的调优过程:设定了161轮迭代(从epoch0到epoch160),初始学习率为0.0001,fc_decay为0.00001,l2学习率衰减为0.00001

  • 为了适当提升学习速度,使用了cosine_decay和warmup。其中step_each_epoch为1000,warmup_minibatch为2000,衰减总轮数为161

  • 经测试以上参数设定可以达到较好的结果

#my_rec_ch_train.yml
Global:
  algorithm: CRNN
  use_gpu: true
  epoch_num: 161        #训练轮数
  log_smooth_window: 20
  print_batch_step: 10   
  save_model_dir: ./output/my_rec_ch
  save_epoch_step: 20    #保存模型间隔轮数
  eval_batch_step: 1000
  train_batch_size_per_card: 256
  test_batch_size_per_card: 128
  image_shape: [3, 48, 256]
  max_text_length: 80
  character_type: ch
  character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
  loss_type: ctc
  distort: true
  use_space_char: true
  reader_yml: ./configs/rec/my_rec_ch_reader.yml
  pretrain_weights: ./pretrain_weights/ch_ppocr_server_v1.1_rec_pre/best_accuracy
  checkpoints:
  save_inference_dir: 
  infer_img:

Architecture:
  function: ppocr.modeling.architectures.rec_model,RecModel

Backbone:
  function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
  layers: 34

Head:
  function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
  encoder_type: rnn
  fc_decay: 0.00001
  SeqRNN:
    hidden_size: 256
    
Loss:
  function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss

Optimizer:
  function: ppocr.optimizer,AdamDecay
  base_lr: 0.0001      #初始学习率
  l2_decay: 0.00001    #学习率衰减
  beta1: 0.9
  beta2: 0.999
  decay:
    function: cosine_decay_warmup
    step_each_epoch: 1000
    total_epoch: 161
    warmup_minibatch: 2000
#my_rec_ch_reader.yml
TrainReader:
  reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
  num_workers: 1
  img_set_dir: /home/aistudio/data/train_images
  label_file_path: /home/aistudio/data/train_label.txt
  
EvalReader:
  reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
  img_set_dir: /home/aistudio/data/train_images
  label_file_path: /home/aistudio/data/train_label.txt

TestReader:
  reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

四、训练与预测

4.1 训练模型

  • 添加训练配置文件 my_rec_ch_train.yml和my_rec_ch_reader.yml以后,输入以下命令就可以开始训练。

In [ ]

!cd ~/work/PaddleOCR && python tools/train.py -c configs/rec/my_rec_ch_train.yml

4.2 导出模型(根据需求选择)

  • 训练完成后,模型和参数会被保存到PaddleOCR/output文件下,选择需要导出最终的模型,如下操作导出的是对iter_epoch_160的模型进行导出,同时设置导出的路径为PaddleOCR/inference/CRNN_R34,这些路径都可以自行修改。

In [ ]

!cd ~/work/PaddleOCR && python tools/export_model.py -c configs/rec/my_rec_ch_train.yml -o Global.checkpoints=./output/my_rec_ch/iter_epoch_160 Global.save_inference_dir=./inference/CRNN_R34

4.3 预测结果

在work/PaddleOCR/tools/路径下,新建python文件infer_rec_new.py

复制如下代码到infer_rec_new.py中

# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import os
import sys
import glob
import re
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))


def set_paddle_flags(**kwargs):
    for key, value in kwargs.items():
        if os.environ.get(key, None) is None:
            os.environ[key] = str(value)


# NOTE(paddle-dev): All of these flags should be
# set before `import paddle`. Otherwise, it would
# not take any effect.
set_paddle_flags(
    FLAGS_eager_delete_tensor_gb=0,  # enable GC to save memory
)

import tools.program as program
from paddle import fluid
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import enable_static_mode
from ppocr.data.reader_main import reader_main
from ppocr.utils.save_load import init_model
from ppocr.utils.character import CharacterOps
from ppocr.utils.utility import create_module
from ppocr.utils.utility import get_image_file_list


def main():
    config = program.load_config(FLAGS.config)
    program.merge_config(FLAGS.opt)
    logger.info(config)
    char_ops = CharacterOps(config['Global'])
    config['Global']['char_ops'] = char_ops

    # check if set use_gpu=True in paddlepaddle cpu version
    use_gpu = config['Global']['use_gpu']
    #     check_gpu(use_gpu)

    place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)

    rec_model = create_module(config['Architecture']['function'])(params=config)

    startup_prog = fluid.Program()
    eval_prog = fluid.Program()
    with fluid.program_guard(eval_prog, startup_prog):
        with fluid.unique_name.guard():
            _, outputs = rec_model(mode="test")
            fetch_name_list = list(outputs.keys())
            fetch_varname_list = [outputs[v].name for v in fetch_name_list]
            print(fetch_varname_list)
    eval_prog = eval_prog.clone(for_test=True)
    exe.run(startup_prog)

    init_model(config, eval_prog, exe)

    blobs = reader_main(config, 'test')()             ###
    # print(blobs)
    infer_img = config['Global']['infer_img']
    infer_list = get_image_file_list(infer_img)
    #infer_list.sort(key=lambda x: int(re.split('/home/aistudio/data/test_images/|.jpg',x)[1]))   ##
    # print(infer_list)

    #images = glob.glob("/home/aistudio/data/test_images/*.jpg")
    #images.sort(key=lambda x: int(re.split('/home/aistudio/data/test_images/|.jpg',x)[1]))

    max_img_num = len(infer_list)
    if len(infer_list) == 0:
        logger.info("Can not find img in infer_img dir.")
    from tqdm import tqdm
    f = open('test2.txt',mode='w',encoding='utf8')   ###
    f.write('new_name\tvalue\n')                     ###

    for i in tqdm( range(max_img_num)):
    #for image in images:
        # print("infer_img:",infer_list[i])
        img = next(blobs)
        predict = exe.run(program=eval_prog,
                          feed={"image": img},#img
                          fetch_list=fetch_varname_list,
                          return_numpy=False)
        preds = np.array(predict[0])
        if preds.shape[1] == 1:
            preds = preds.reshape(-1)
            preds_lod = predict[0].lod()[0]
            preds_text = char_ops.decode(preds)
        else:
            end_pos = np.where(preds[0, :] == 1)[0]
            if len(end_pos) <= 1:
                preds_text = preds[0, 1:]
            else:
                preds_text = preds[0, 1:end_pos[1]]
            preds_text = preds_text.reshape(-1)
            preds_text = char_ops.decode(preds_text)

        #f.write('{}\t{}\n'.format(os.path.basename(img_path),preds_text))   ###
        f.write('{}\t{}\n'.format(infer_list[i].replace('/home/aistudio/data/test_images/', ''),preds_text))
        #print(image)
        # print("\t index:",preds)
        # print("\t word :",preds_text)
    f.close()

    # save for inference model
    #target_var = []
    #for key, values in outputs.items():
    #    target_var.append(values)

    #fluid.io.save_inference_model(
    #    "./output/",
    #    feeded_var_names=['image'],
    #    target_vars=target_var,
    #    executor=exe,
    #    main_program=eval_prog,
    #    model_filename="model",
    #    params_filename="params")


if __name__ == '__main__':
    enable_static_mode()
    parser = program.ArgsParser()
    FLAGS = parser.parse_args()
    FLAGS.config = 'configs/rec/my_rec_ch_train.yml'
    main()
  • 结果(.txt)将会被保存至work/PaddleOCR/的路径下,命名为test2.txt

注意:此处的test2.txt文件中的内容是乱序的,根据比赛要求,需要对其中的预测内容排序后再提交

  • 最终比赛提交的结果,checkpoints使用的是/home/aistudio/work/PaddleOCR/output/my_rec_ch/路径下的best_accuracy
  • 通过下面的命令即可对测试集图像进行预测
  • Global.checkpoints 模型检查点文件
  • -c 配置文件
  • Global.infer_img 预测图片路径,可以为图像文件或者图像目录

In [ ]

%cd ~/work/PaddleOCR
!python tools/infer_rec_new.py \
    -c configs/rec/my_rec_ch_train.yml \
    -o Global.checkpoints=./output/my_rec_ch/best_accuracy \
    Global.infer_img=/home/aistudio/data/test_images

4.4 对txt文件内容排序

  • 用python写了一个小算法,对txt文件中的内容排序,最终将结果输出到test112.txt文件中,该排序文件命名为ZhuanHuan.py
  • 在work/PaddleOCR/路径下,新建ZhuanHuan.py文件,复制以下代码到ZhuanHuan.py中。
f = open('test7.txt', 'r', encoding='utf8')
something = f.readlines()
#print(something)
new = []
for x in something:
    first = x.strip('\n')
    second = first.split()
    new.append(second)
#print(new)
print(new[1][1])
#for i in new:
#    print(new)
#    new[i][0].replace('.jpg','')
 #   int(new[i][0])
#for i in range(1,10001):
 #   if new[i][1] == []:
  #      new[i][0] = new[i][0].replace('', ' ')




for i in range(1,10001):
    new[i][0]=new[i][0].replace('.jpg', '')
    new[i][0]=int(new[i][0])
    #new[i][0]=int(new[i][0])
    #new[i][0].sort()

print(new)
#    for j in range(len(new[0])):
f = open('test1112.txt', mode='w', encoding='utf8')  ###
f.write('new_name\tvalue\n')  ###
b = 0

for j in range(10000):
    for i in range(1,10001):
        if new[i][0] == b:
            if len(new[i]) == 2:
                f.write('{}.jpg\t{}\n'.format(new[i][0], new[i][1]))
            else:
                f.write('{}.jpg\t{}\n'.format(new[i][0], ''))

    b = b+1
print(j)
f.close()
print("finish")

In [ ]

!python /home/aistudio/work/PaddleOCR/ZhuanHuan.py

五、总结与展望

  • 可以尝试进一步优化数据增强的配置文件中的参数
  • 尝试调整超参数
  • 每个神经网络的能力也是有限的,可以尝试改进网络模型

六、给其他选手的建议

  • 有一定经验的小伙伴可以从竞赛入手锻炼自己的能力,在学习中可以多查阅Paddle官方API文档或者教程,有助于快速解决问题。

  • 另外,可以多学习他人分享的项目,从中学习一些思路和调参经验。

  • 对于没有经验的小伙伴,可以报名飞桨训练营和相关课程,可以很好的打下基础。

  • 总之,要多学和多练相结合可以提升自我。

参考资料

一文读懂CRNN+CTC文字识别

PaddleOCR中文场景文字识别,score:86.98076

常规赛:中文场景文字识别 技术方案分享

PaddleOCR:中文场景文字识别

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐