Swin Transformer 来实现行人检测和追踪_副本
在 PaddleDetection 套件中添加新的骨干网络 Swin Transformer 并实现目标检测模型的训练
·
转载自AI Studio
标题项目链接https://aistudio.baidu.com/aistudio/projectdetail/2022805
引入
- 之前使用 Swin Transformer 实现过图像分类任务
- 今天换个下游任务——目标检测,尝试使用 Swin Transformer 作为 Backbone 在 PaddleDetection 套件中实现目标检测任务
已知问题
- 目前这个 Backbone 的代码还不太稳定,目前有以下几个问题,才疏学浅,暂时没找到解决方法
- Droppath 模块中 paddle.rand() 函数会偶发性出现错误,提示 system error
- RCNN 类模型训练时,当模型的输入分辨率或者 Batchsize 过大时,cuda 会报 700 错误
- YOLO 类模型训练时,当模型的输入分辨率过大时,会出现 BCE Loss 异常,感觉像是梯度消失导致的
PaddleDetection
PaddleDetection 飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。
PaddleDetection 模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。
经过长时间产业实践打磨,PaddleDetection 已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
产品动态
- 2021.04.14: 发布 release/2.0 版本,PaddleDetection 全面支持动态图,覆盖静态图模型算法,全面升级模型效果,同时发布 PP-YOLO v2, PPYOLO tiny 模型,增强版 anchor free 模型 PAFNet,新增旋转框检测 S2ANet 模型,详情参考 PaddleDetection
- 2021.02.07: 发布 release/2.0-rc 版本,PaddleDetection 动态图试用版本,详情参考 PaddleDetection 动态图。
特性
- 模型丰富: 包含目标检测、实例分割、人脸检测等100+个预训练模型,涵盖多种全球竞赛冠军方案
- 使用简洁:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。
- 端到端打通: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持云端/边缘端多架构、多设备部署。
- 高性能: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。
套件结构概览
Architectures | Backbones | Components | Data Augmentation |
|
|
|
|
模型性能概览
各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。

说明:
CBResNet
为Cascade-Faster-RCNN-CBResNet200vd-FPN
模型,COCO数据集mAP高达53.3%Cascade-Faster-RCNN
为Cascade-Faster-RCNN-ResNet50vd-DCN
,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPSPP-YOLO
在COCO数据集精度45.9%,Tesla V100预测速度72.9FPS,精度速度均优于YOLOv4PP-YOLO v2
是对PP-YOLO
模型的进一步优化,在COCO数据集精度49.5%,Tesla V100预测速度68.9FPS
同步 PaddleDetection 代码
# !git clone https://github.com.cnpmjs.org/PaddlePaddle/PaddleDetection -b release/2.0 --depth 1
添加 Backbone
- 添加模型代码:PaddleDetection/ppdet/modeling/backbones/swin_transformer.py
- 修改__init__.py:PaddleDetection/ppdet/modeling/backbones/__init__.py
编写配置文件
- 本次使用的配置文件如下:
# faster_rcnn_swin_ti.yaml
use_gpu: true
log_iter: 10
save_dir: output
snapshot_epoch: 1
epoch: 12
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [8, 11]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
architecture: FasterRCNN
FasterRCNN:
backbone: SwinTransformer
neck: FPN
rpn_head: RPNHead
bbox_head: BBoxHead
# post process
bbox_post_process: BBoxPostProcess
SwinTransformer:
out_indices: [0,1,2,3]
pretrained: https://bj.bcebos.com/v1/ai-studio-online/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams
FPN:
out_channel: 256
RPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 1000
topk_after_collect: True
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
BBoxHead:
head: TwoFCHead
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
bbox_assigner: BBoxAssigner
BBoxAssigner:
batch_size_per_im: 512
bg_thresh: 0.5
fg_thresh: 0.5
fg_fraction: 0.25
use_random: True
TwoFCHead:
out_channel: 1024
BBoxPostProcess:
decode: RCNNBox
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomResize: {target_size: [[640, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32, pad_gt: true}
batch_size: 1
shuffle: true
drop_last: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32, pad_gt: false}
batch_size: 1
shuffle: false
drop_last: false
drop_empty: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32, pad_gt: false}
batch_size: 1
shuffle: false
drop_last: false
metric: VOC
map_type: integral
num_classes: 4
TrainDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc
anno_path: train.txt
label_list: label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
EvalDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc
anno_path: valid.txt
label_list: label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
TestDataset:
!ImageFolder
anno_path: dataset/roadsign_voc/label_list.txt
模型训练
%cd ~/PaddleDetection
!python tools/train.py -c ~/faster_rcnn_swin_ti.yaml --eval
%cd ~/PaddleDetection
!python tools/train.py -c ~/yolov3_swin_ti.yaml --eval
%cd work/PaddleDetection/
/home/aistudio/work/PaddleDetection
!python -u tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=output/000000014439_640x640.jpg
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if data.dtype == np.object:
W0605 10:17:16.612674 925 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0605 10:17:16.617413 925 device_context.cc:372] device: 0, cuDNN Version: 7.6.
2021-06-05 10:17:19,274 - INFO - unique_endpoints {''}
2021-06-05 10:17:19,274 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if data.dtype == np.object:
[06/05 10:17:21] ppdet.utils.checkpoint INFO: Finish loading model weights: output/faster_rcnn_swin_ti/best_model.pdparams
[06/05 10:17:21] ppdet.engine INFO: Detection bbox results save in output/000000014439_640x640.jpg
import numpy as np
import os
image_path = 'mot_images/3/'
imgs = os.listdir(image_path)
infer_imgs = np.random.choice(imgs, 10)
infer_imgs
array(['00092.jpg', '00187.jpg', '00083.jpg', '00005.jpg', '00036.jpg',
'00032.jpg', '00203.jpg', '00247.jpg', '00103.jpg', '00106.jpg'],
dtype='<U9')
from tqdm import tqdm
# 这里是使用单卡的示例代码
!CUDA_VISIBLE_DEVICES=0
# !python tools/infer.py -c ppyolov2.yml -o weights=output/ppyolov2/best_model.pdparams --infer_img=/home/aistudio/work/PaddleDetection/mot_imgs/0/00161.jpg
for img in tqdm(infer_imgs):
print("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)
os.system("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)
0%| | 0/10 [00:00<?, ?it/s]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00092.jpg
10%|█ | 1/10 [00:07<01:05, 7.26s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00187.jpg
20%|██ | 2/10 [00:14<00:57, 7.22s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00083.jpg
30%|███ | 3/10 [00:21<00:50, 7.18s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00005.jpg
40%|████ | 4/10 [00:28<00:43, 7.24s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00036.jpg
50%|█████ | 5/10 [00:36<00:36, 7.25s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00032.jpg
60%|██████ | 6/10 [00:43<00:28, 7.23s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00203.jpg
70%|███████ | 7/10 [00:50<00:21, 7.26s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00247.jpg
80%|████████ | 8/10 [00:57<00:14, 7.19s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00103.jpg
90%|█████████ | 9/10 [01:04<00:07, 7.17s/it]
python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00106.jpg
100%|██████████| 10/10 [01:11<00:00, 7.19s/it]
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tqdm import tqdm
%matplotlib inline
imgs = glob.glob('output/*.jpg')
plt.figure(figsize=(16, 40))
for i in range(len(imgs)):
img = mpimg.imread(imgs[i])
plt.subplot(5, 2, i+1)
plt.imshow(img)
plt.show()
<Figure size 1152x2880 with 0 Axes>
总结
- 这样 Swin Transformer 模型就被添加到了 PaddleDetection 套件中了
- 不过目前 Swin Transformer 模型做 PaddleDetection 检测的 Backbone 仍不太稳定
- 之后再尝试调试一下,找找具体问题在哪,看看能不能把这些问题给解决掉
更多推荐
所有评论(0)