Swin Transformer 来实现行人检测和追踪_副本

在 PaddleDetection 套件中添加新的骨干网络 Swin Transformer 并实现目标检测模型的训练

AI Studio

1014人浏览 · 2022-02-19 20:29:18

AI Studio · 2022-02-19 20:29:18 发布

转载自AI Studio

标题项目链接https://aistudio.baidu.com/aistudio/projectdetail/2022805

引入

之前使用 Swin Transformer 实现过图像分类任务
今天换个下游任务——目标检测，尝试使用 Swin Transformer 作为 Backbone 在 PaddleDetection 套件中实现目标检测任务

已知问题

目前这个 Backbone 的代码还不太稳定，目前有以下几个问题，才疏学浅，暂时没找到解决方法
- Droppath 模块中 paddle.rand() 函数会偶发性出现错误，提示 system error
- RCNN 类模型训练时，当模型的输入分辨率或者 Batchsize 过大时，cuda 会报 700 错误
- YOLO 类模型训练时，当模型的输入分辨率过大时，会出现 BCE Loss 异常，感觉像是梯度消失导致的

PaddleDetection

PaddleDetection 飞桨目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。

PaddleDetection 模块化地实现了多种主流目标检测算法，提供了丰富的数据增强策略、网络模块组件（如骨干网络）、损失函数等，并集成了模型压缩和跨平台高性能部署能力。

经过长时间产业实践打磨，PaddleDetection 已拥有顺畅、卓越的使用体验，被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。

产品动态

2021.04.14: 发布 release/2.0 版本，PaddleDetection 全面支持动态图，覆盖静态图模型算法，全面升级模型效果，同时发布 PP-YOLO v2, PPYOLO tiny 模型，增强版 anchor free 模型 PAFNet，新增旋转框检测 S2ANet 模型，详情参考 PaddleDetection
2021.02.07: 发布 release/2.0-rc 版本，PaddleDetection 动态图试用版本，详情参考 PaddleDetection 动态图。

特性

模型丰富: 包含目标检测、实例分割、人脸检测等100+个预训练模型，涵盖多种全球竞赛冠军方案
使用简洁：模块化设计，解耦各个网络组件，开发者轻松搭建、试用各种检测模型及优化策略，快速得到高性能、定制化的算法。
端到端打通: 从数据增强、组网、训练、压缩、部署端到端打通，并完备支持云端/边缘端多架构、多设备部署。
高性能: 基于飞桨的高性能内核，模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。

套件结构概览

Architectures

Backbones

Components

Data Augmentation

Two-Stage Detection
- Faster RCNN
- FPN
- Cascade-RCNN
- Libra RCNN
- Hybrid Task RCNN
- PSS-Det

One-Stage Detection
- RetinaNet
- YOLOv3
- YOLOv4
- PP-YOLO
- SSD

Anchor Free
- CornerNet-Squeeze
- FCOS
- TTFNet

Instance Segmentation
- Mask RCNN
- SOLOv2

Face-Detction
- FaceBoxes
- BlazeFace
- BlazeFace-NAS

ResNet(&vd)
ResNeXt(&vd)
SENet
Res2Net
HRNet
Hourglass
CBNet
GCNet
DarkNet
CSPDarkNet
VGG
MobileNetv1/v3
GhostNet
Efficientnet

Common
- Sync-BN
- Group Norm
- DCNv2
- Non-local

FPN
- BiFPN
- BFP
- HRFPN
- ACFPN

Loss
- Smooth-L1
- GIoU/DIoU/CIoU
- IoUAware

Post-processing
- SoftNMS
- MatrixNMS

Speed
- FP16 training
- Multi-machine training

Resize
Flipping
Expand
Crop
Color Distort
Random Erasing
Mixup
Cutmix
Grid Mask
Auto Augment

模型性能概览

各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。

说明：

CBResNet为Cascade-Faster-RCNN-CBResNet200vd-FPN模型，COCO数据集mAP高达53.3%
Cascade-Faster-RCNN为Cascade-Faster-RCNN-ResNet50vd-DCN，PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
PP-YOLO在COCO数据集精度45.9%，Tesla V100预测速度72.9FPS，精度速度均优于YOLOv4
PP-YOLO v2是对PP-YOLO模型的进一步优化，在COCO数据集精度49.5%，Tesla V100预测速度68.9FPS

同步 PaddleDetection 代码

# !git clone https://github.com.cnpmjs.org/PaddlePaddle/PaddleDetection -b release/2.0 --depth 1

添加 Backbone

添加模型代码：PaddleDetection/ppdet/modeling/backbones/swin_transformer.py
修改__init__.py：PaddleDetection/ppdet/modeling/backbones/__init__.py

编写配置文件

本次使用的配置文件如下：

# faster_rcnn_swin_ti.yaml
use_gpu: true
log_iter: 10
save_dir: output
snapshot_epoch: 1

epoch: 12

LearningRate:
  base_lr: 0.001
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [8, 11]
  - !LinearWarmup
    start_factor: 0.1
    steps: 1000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0001
    type: L2


architecture: FasterRCNN

FasterRCNN:
  backbone: SwinTransformer
  neck: FPN
  rpn_head: RPNHead
  bbox_head: BBoxHead
  # post process
  bbox_post_process: BBoxPostProcess

SwinTransformer:
  out_indices: [0,1,2,3]
  pretrained: https://bj.bcebos.com/v1/ai-studio-online/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams

FPN:
  out_channel: 256

RPNHead:
  anchor_generator:
    aspect_ratios: [0.5, 1.0, 2.0]
    anchor_sizes: [[32], [64], [128], [256], [512]]
    strides: [4, 8, 16, 32, 64]
  rpn_target_assign:
    batch_size_per_im: 256
    fg_fraction: 0.5
    negative_overlap: 0.3
    positive_overlap: 0.7
    use_random: True
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 2000
    post_nms_top_n: 1000
    topk_after_collect: True
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 1000
    post_nms_top_n: 1000


BBoxHead:
  head: TwoFCHead
  roi_extractor:
    resolution: 7
    sampling_ratio: 0
    aligned: True
  bbox_assigner: BBoxAssigner

BBoxAssigner:
  batch_size_per_im: 512
  bg_thresh: 0.5
  fg_thresh: 0.5
  fg_fraction: 0.25
  use_random: True

TwoFCHead:
  out_channel: 1024

BBoxPostProcess:
  decode: RCNNBox
  nms:
    name: MultiClassNMS
    keep_top_k: 100
    score_threshold: 0.05
    nms_threshold: 0.5

worker_num: 2
TrainReader:
  sample_transforms:
  - Decode: {}
  - RandomResize: {target_size: [[640, 1333]], interp: 2, keep_ratio: True}
  - RandomFlip: {prob: 0.5}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: true}
  batch_size: 1
  shuffle: true
  drop_last: true


EvalReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: false}
  batch_size: 1
  shuffle: false
  drop_last: false
  drop_empty: false


TestReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: false}
  batch_size: 1
  shuffle: false
  drop_last: false

metric: VOC
map_type: integral
num_classes: 4

TrainDataset:
  !VOCDataSet
    dataset_dir: dataset/roadsign_voc
    anno_path: train.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

EvalDataset:
  !VOCDataSet
    dataset_dir: dataset/roadsign_voc
    anno_path: valid.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

TestDataset:
  !ImageFolder
    anno_path: dataset/roadsign_voc/label_list.txt

模型训练

%cd ~/PaddleDetection

!python tools/train.py -c ~/faster_rcnn_swin_ti.yaml --eval

%cd ~/PaddleDetection

!python tools/train.py -c ~/yolov3_swin_ti.yaml --eval

%cd work/PaddleDetection/

/home/aistudio/work/PaddleDetection

!python -u tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=output/000000014439_640x640.jpg

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
W0605 10:17:16.612674   925 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0605 10:17:16.617413   925 device_context.cc:372] device: 0, cuDNN Version: 7.6.
2021-06-05 10:17:19,274 - INFO - unique_endpoints {''}
2021-06-05 10:17:19,274 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
[06/05 10:17:21] ppdet.utils.checkpoint INFO: Finish loading model weights: output/faster_rcnn_swin_ti/best_model.pdparams
[06/05 10:17:21] ppdet.engine INFO: Detection bbox results save in output/000000014439_640x640.jpg


import numpy as np
import os

image_path = 'mot_images/3/'
imgs = os.listdir(image_path)
infer_imgs = np.random.choice(imgs, 10)
infer_imgs

array(['00092.jpg', '00187.jpg', '00083.jpg', '00005.jpg', '00036.jpg',
       '00032.jpg', '00203.jpg', '00247.jpg', '00103.jpg', '00106.jpg'],
      dtype='<U9')

from tqdm import tqdm
# 这里是使用单卡的示例代码
!CUDA_VISIBLE_DEVICES=0
# !python tools/infer.py -c ppyolov2.yml -o weights=output/ppyolov2/best_model.pdparams --infer_img=/home/aistudio/work/PaddleDetection/mot_imgs/0/00161.jpg
for img in tqdm(infer_imgs):
    print("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)
    os.system("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)

  0%|          | 0/10 [00:00<?, ?it/s]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00092.jpg


 10%|█         | 1/10 [00:07<01:05,  7.26s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00187.jpg


 20%|██        | 2/10 [00:14<00:57,  7.22s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00083.jpg


 30%|███       | 3/10 [00:21<00:50,  7.18s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00005.jpg


 40%|████      | 4/10 [00:28<00:43,  7.24s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00036.jpg


 50%|█████     | 5/10 [00:36<00:36,  7.25s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00032.jpg


 60%|██████    | 6/10 [00:43<00:28,  7.23s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00203.jpg


 70%|███████   | 7/10 [00:50<00:21,  7.26s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00247.jpg


 80%|████████  | 8/10 [00:57<00:14,  7.19s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00103.jpg


 90%|█████████ | 9/10 [01:04<00:07,  7.17s/it]

python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/00106.jpg


100%|██████████| 10/10 [01:11<00:00,  7.19s/it]

import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tqdm import tqdm

%matplotlib inline
imgs = glob.glob('output/*.jpg')
plt.figure(figsize=(16, 40))
for i in range(len(imgs)):
    img = mpimg.imread(imgs[i])
    plt.subplot(5, 2, i+1)
    plt.imshow(img)
plt.show()