百度网盘AI大赛——表格检测进阶:表格的结构化 Baseline
基于PaddleDetection进行检测表格中的行、列、合并单元格。本项目附带一个可以直接提交的样例,分数30+。
百度网盘AI大赛——表格检测进阶:表格的结构化
基于PaddleDetection进行检测表格中的行、列、合并单元格。本项目附带一个可以直接提交的样例,分数30+。
比赛介绍
随着票据、名单等带有表单、表格的文件被广泛应用,将纸质文件转化成电子数据并保存管理成为了很多企业的必然工作。传统人工录入的方式效率低、差错多、流程长,如果能通过技术处理,实现表格图片的结构化展现,则可以很大程度降低成本,提高效率以及使用体验。本次比赛希望各位选手能通过OCR等技术解决此痛点问题,识别表格图片的内容与坐标,精准还原纸质数据。
本次比赛共4个类别,分别为整体表格(table)、表格行(row)、表格列(column)、跨多行/列的合并单元格(spanning_cell),annos.txt 为标注文件,json格式。
任务分析
查看数据可知标注符合检测任务,因此可以使用PaddleDetection进行目标检测。
代码
解压数据
! unzip -oq /home/aistudio/data/data182509/train.zip
! unzip -oq /home/aistudio/data/data182509/testA.zip
安装环境依赖
# 克隆PaddleDetection仓库
import os
if not os.path.exists('PaddleDetection'):
!git clone https://github.com/PaddlePaddle/PaddleDetection.git
# 安装其他依赖
%cd PaddleDetection
! pip install -r requirements.txt
# 编译安装paddledet
! python setup.py install
# 测试一下PaddleDet环境有没有准备好
! python ppdet/modeling/tests/test_architectures.py
转换数据为训练需要的格式
# 转voc
%cd ~
! mkdir -p train/annotations
# thanks to https://blog.csdn.net/hu694028833/article/details/81089959
import xml.dom.minidom as minidom
import cv2
import json
with open('train/annos.txt', 'r') as f:
data = json.load(f)
img_root = 'train/imgs/'
annotation_dir = 'train/annotations/'
with open('train.txt','w') as fw:
for name in data.keys():
img_dir = img_root+name
save_dir = annotation_dir+name.split('.')[0]+'.xml'
try:
# generate xml
img = cv2.imread(img_dir)
h,w,c = img.shape
dom = minidom.getDOMImplementation().createDocument(None,'annotations',None)
root = dom.documentElement
element = dom.createElement('filename')
element.appendChild(dom.createTextNode(img_dir.split('/')[-1]))
root.appendChild(element)
element = dom.createElement('size')
element_c = dom.createElement('width')
element_c.appendChild(dom.createTextNode(str(w)))
element.appendChild(element_c)
element_c = dom.createElement('height')
element_c.appendChild(dom.createTextNode(str(h)))
element.appendChild(element_c)
element_c = dom.createElement('depth')
element_c.appendChild(dom.createTextNode(str(c)))
element.appendChild(element_c)
root.appendChild(element)
objects = data[name]
for a_object in objects:
element = dom.createElement('object')
element_c = dom.createElement('name')
element_c.appendChild(dom.createTextNode(a_object['label']))
element.appendChild(element_c)
element_c = dom.createElement('bndbox')
element.appendChild(element_c)
element_cc = dom.createElement('xmin')
element_cc.appendChild(dom.createTextNode(str(a_object['box'][0])))
element_c.appendChild(element_cc)
element_cc = dom.createElement('ymin')
element_cc.appendChild(dom.createTextNode(str(a_object['box'][1])))
element_c.appendChild(element_cc)
element_cc = dom.createElement('xmax')
element_cc.appendChild(dom.createTextNode(str(a_object['box'][2])))
element_c.appendChild(element_cc)
element_cc = dom.createElement('ymax')
element_cc.appendChild(dom.createTextNode(str(a_object['box'][3])))
element_c.appendChild(element_cc)
root.appendChild(element)
with open(save_dir, 'w', encoding='utf-8') as f:
dom.writexml(f, addindent='\t', newl='\n',encoding='utf-8')
except:
print(name)
continue
fw.write(img_dir+' '+save_dir+'\n')
/home/aistudio
# 准备标签文件
%cd ~
with open('label_list.txt','w') as f:
for item in ['table','row','column','spanning_cell']:
f.write(item+'\n')
/home/aistudio
准备yml文件
在PaddleDetection/configs/picodet/目录下准备一个名为myPicodetXsVoc.yml的文件,内容为:
注1:如果需要接续自己之前训练好的模型进行训练,更改pretrain_weights和weights的路径即可。
注2:snapshot_epoch意味着数据保存的间隔,调整epoch时,请一并调整snapshot_epoch,避免记录文件占用过多存储空间。
_BASE_: [
'../datasets/voc.yml',
'../runtime.yml',
'_base_/picodet_v2.yml',
'_base_/optimizer_300e.yml',
'_base_/picodet_320_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/LCNet_x0_35_pretrained.pdparams
weights: output/picodet_xs_320_coco/best_model
find_unused_parameters: True
use_ema: true
epoch: 4
snapshot_epoch: 2
LCNet:
scale: 0.35
feature_maps: [3, 4, 5]
LCPAN:
out_channels: 96
PicoHeadV2:
conv_feat:
name: PicoFeat
feat_in: 96
feat_out: 96
num_convs: 2
num_fpn_stride: 4
norm_type: bn
share_cls_reg: True
use_se: True
feat_in_chan: 96
TrainReader:
batch_size: 64
LearningRate:
base_lr: 0.32
schedulers:
- !CosineDecay
max_epochs: 300
- !LinearWarmup
start_factor: 0.1
steps: 300
将PaddleDetection/configs/datasets/voc.yml修改为如下内容
metric: VOC
map_type: 11point
num_classes: 4
TrainDataset:
!VOCDataSet
dataset_dir: /home/aistudio
anno_path: /home/aistudio/train.txt
label_list: /home/aistudio/label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
EvalDataset:
!VOCDataSet
dataset_dir: /home/aistudio
anno_path: /home/aistudio/train.txt
label_list: /home/aistudio/label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
TestDataset:
!ImageFolder
anno_path: /home/aistudio/label_list.txt
训练
%cd ~/PaddleDetection
! export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
! python tools/train.py -c configs/picodet/myPicodetXsVoc.yml
预测查看结果
# 预测代码
%cd ~/PaddleDetection
! export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
! python tools/infer.py -c configs/picodet/myPicodetXsVoc.yml \
--infer_img=/home/aistudio/pubtest/imgs/border_0_WOM2RARU4CRXH44VAHC0.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/myPicodetXsVoc/model_final \
--use_vdl=False
导出模型
%cd ~/PaddleDetection
! python tools/export_model.py -c configs/picodet/myPicodetXsVoc.yml \
--output_dir=/home/aistudio/mymodel \
-o weights=output/myPicodetXsVoc/model_final \
TestReader.inputs_def.image_shape=[3,320,320]
写一个基于Paddle的预测脚本
%cd ~
import paddle
import cv2
import numpy as np
model = paddle.jit.load('./mymodel/myPicodetXsVoc/model')
model.eval()
img = cv2.imread('/home/aistudio/pubtest/imgs/border_0_WOM2RARU4CRXH44VAHC0.jpg')
h, w, c = img.shape
input_size = (320,320)
scale_factor = [input_size[0] / img.shape[0], input_size[1] / img.shape[1]]
factor = np.array(scale_factor, dtype=np.float32)
factor = paddle.to_tensor(factor).reshape((1, 2)).astype('float32')
img = cv2.resize(img, (320, 320))
img = img/255
img = (img-np.array([0.485,0.456,0.406]))/np.array([0.229, 0.224,0.225])
img = img.transpose([2,0,1])
img = paddle.to_tensor(img).astype('float32')
img = paddle.reshape(img,[1]+img.shape)
pre = model(img, factor)
img = cv2.imread('/home/aistudio/pubtest/imgs/border_0_WOM2RARU4CRXH44VAHC0.jpg')
for item in pre[0].numpy():
cls, value, xmin, ymin, xmax, ymax = item
xmin ,ymin, xmax, ymax = [int(x) for x in [xmin ,ymin, xmax, ymax]]
if value > 0.5:
# print(item)
img = cv2.rectangle(img, (xmin, ymin), (xmax,ymax),color = (255,0,0))
cv2.imwrite('test.jpg',img)
准备predict.py文件
predict.py文件放在根目录下,内容如下
# 代码示例
# python predict.py [src_image_dir] [results]
import os
import sys
import glob
import json
import cv2
import paddle
import numpy as np
def process(src_image_dir, save_dir):
model = paddle.jit.load('./mymodel/myPicodetXsVoc/model')
model.eval()
label_list = ['table','row','column','spanning_cell']
image_paths = glob.glob(os.path.join(src_image_dir, "*.jpg"))
result = {}
for image_path in image_paths:
filename = os.path.split(image_path)[1]
# do something
img = cv2.imread(image_path)
input_size = (320,320)
scale_factor = [input_size[0] / img.shape[0], input_size[1] / img.shape[1]]
factor = np.array(scale_factor, dtype=np.float32)
factor = paddle.to_tensor(factor).reshape((1, 2)).astype('float32')
img = cv2.resize(img, input_size)
img = img/255
img = (img-np.array([0.485,0.456,0.406]))/np.array([0.229, 0.224,0.225])
img = img.transpose([2,0,1])
img = paddle.to_tensor(img).astype('float32')
img = paddle.reshape(img,[1]+img.shape)
pre = model(img, factor)
if filename not in result:
result[filename] = []
for item in pre[0].numpy():
cls, value, xmin, ymin, xmax, ymax = item
cls, xmin ,ymin, xmax, ymax = [int(x) for x in [cls, xmin ,ymin, xmax, ymax]]
cls = label_list[cls]
if value > 0.5:
result[filename].append({
"box": [xmin, ymin, xmax, ymax],
"label": cls
})
with open(os.path.join(save_dir, "result.txt"), 'w', encoding="utf-8") as f:
f.write(json.dumps(result))
if __name__ == "__main__":
assert len(sys.argv) == 3
src_image_dir = sys.argv[1]
save_dir = sys.argv[2]
if not os.path.exists(save_dir):
os.makedirs(save_dir)
process(src_image_dir, save_dir)
预测验证
%cd ~
! python predict.py pubtest/imgs test_A_result
打包提交
%cd ~
! zip -r submit.zip predict.py mymodel
总结
本项目简单通过Picodet完成了表格检测,4epoch可达30分,如果需要更高的分数可以尝试调整训练参数~
请点击此处查看本环境基本用法.
Please click here for more detailed instructions.
此文章为搬运
原项目链接
更多推荐
所有评论(0)