PP-Tracking之手把手玩转单镜头行人追踪

PP-Tracking是基于飞桨深度学习框架的业界首个开源实时跟踪系统。针对实际业务的难点痛点，PP-Tracking内置行人车辆跟踪、跨镜头跟踪、多类别跟踪、小目标跟踪及流量计数等能力与产业。

AI Studio

3609人浏览 · 2021-12-18 12:33:25

AI Studio · 2021-12-18 12:33:25 发布

PP-Tracking之手把手玩转单镜头行人追踪

PP-Tracking是基于飞桨深度学习框架的业界首个开源实时跟踪系统。针对实际业务的难点痛点，**PP-Tracking内置行人车辆跟踪、跨镜头跟踪、多类别跟踪、小目标跟踪及流量计数等能力与产业应用，同时提供可视化开发界面。**模型集成多目标跟踪，目标检测，ReID轻量级算法，进一步提升PP-Tracking在服务器端部署性能。同时支持python，C++部署，适配Linux，Nvidia Jetson多平台环境。

在如下示例中，将介绍如何使用示例代码基于您在BML中已创建的数据集来完成单镜头跟踪模型的训练，评估和推理。以及多镜头的部署。

话不多说，我们先看看结果

第一步：环境准备

下载开源代码
安装必要依赖

!cd work/ &&git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop

Cloning into 'PaddleDetection'...
remote: Enumerating objects: 760, done.[K
remote: Counting objects: 100% (760/760), done.[K
remote: Compressing objects: 100% (418/418), done.[K
remote: Total 20290 (delta 493), reused 508 (delta 342), pack-reused 19530[K
Receiving objects: 100% (20290/20290), 201.01 MiB | 31.58 MiB/s, done.
Resolving deltas: 100% (15042/15042), done.
Checking connectivity... done.

!cd work/PaddleDetection/ && pip install -r requirements.txt && python setup.py install

Finished processing dependencies for paddledet==2.3.0on3.7/site-packagesinnn/binn120-env/bin-py3.7.eggeling/backbones/lite_hrnet.py:702: SyntaxWarning: assertion is always true, perhaps remove parentheses? terminaltables-3.1.0 typeguard-2.13.2 xmltodict-0.12.0

第二步：数据集准备

修改mot的配置文件，文件目录如下
整理之前：

MOT16
  └——————train
  └——————test

整理之后：  
MOT16
   |——————images
   |        └——————train
   |        └——————test
   └——————labels_with_ids
            └——————train

详细参考MOT数据准备文档
本文中，我们不需要直接下载数据集，这个项目已经直接引用平台用户上传的MOT-16数据集，如果大家需要也可以下载到本地进行训练

!mv /home/aistudio/data/data118993/MOT16.zip

!cd work/PaddleDetection/dataset/mot && unzip MOT16.zip   -d MOT16

  inflating: MOT16/train/MOT16-13/img1/000750.jpg

1. 生成labels_with_ids

创建images文件，分别将test文件夹和train文件夹的目录移至images文件夹下，并执行标签生成脚本

!cd  work/PaddleDetection/dataset/mot/MOT16/ && mkdir -p images

!cd work/PaddleDetection/dataset/mot/MOT16 && mv ./train ./images && mv ./test ./images

!cd work/PaddleDetection/dataset/mot && python gen_labels_MOT.py

2. 生成mot16.train文件并且复制到 image_lists下面

import glob
import os.path as osp
image_list = []
for seq in sorted(glob.glob('work/PaddleDetection/dataset/mot/MOT16/images/train/*')):
    for image in glob.glob(osp.join(seq, "img1")+'/*.jpg'):
        image = image.replace('work/PaddleDetection/dataset/mot/','')
        image_list.append(image)
with open('mot16.train','w') as image_list_file:
    image_list_file.write(str.join('\n',image_list))

!mkdir -p work/PaddleDetection/dataset/mot/image_lists && cp -r mot16.train work/PaddleDetection/dataset/mot/image_lists

3. 配置数据集的yml文件

修改配置文件里面的数据集

添加在PaddleDetection/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml文件最后

# for MOT training
TrainDataset:
  !MOTDataSet
    dataset_dir: dataset/mot   # 训练集所在目录
    image_lists: ['mot16.train']
    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']

# for MOT evaluation
# If you want to change the MOT evaluation dataset, please modify 'data_root'
EvalMOTDataset:
  !MOTImageFolder
    dataset_dir: dataset/mot
    data_root: MOT16/images/train
    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT

# for MOT video inference
TestMOTDataset:
  !MOTImageFolder
    dataset_dir: dataset/mot
    keep_ori_im: True # set True if save visualization images or video

粘贴配置文档后的yml文件：

第三步：模型训练

使用MOT16作为训练数据，训练30epoch，由于我们使用的是MOT16的全量数据，所以训练时长要比单独摘取MOT16的分支数据集要慢很多，实际演示的过程中大家在选取数据集的时候可以采用MOT16的分支数据集作为演示

!cd work/PaddleDetection/ && python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml

-----------  Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: 0
heter_worker_num: None
heter_workers: 
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: ./fairmot_dla34_30e_1088x608/
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers: 
training_script: tools/train.py
training_script_args: ['-c', 'configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-12-06 17:31:34,347 launch.py:416] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2021-12-06 17:31:34,348 launch_utils.py:527] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:47105               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:47105               |
    |                     PADDLE_RANK_IN_NODE                        0                      |
    |                 PADDLE_LOCAL_DEVICE_IDS                        0                      |
    |                 PADDLE_WORLD_DEVICE_IDS                        0                      |
    |                     FLAGS_selected_gpus                        0                      |
    |             FLAGS_selected_accelerators                        0                      |
    +=======================================================================================+

INFO 2021-12-06 17:31:34,348 launch_utils.py:531] details abouts PADDLE_TRAINER_ENDPOINTS can be found in ./fairmot_dla34_30e_1088x608//endpoints.log, and detail running logs maybe found in ./fairmot_dla34_30e_1088x608//workerlog.0
launch proc_id:1911 idx:0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
[12/06 17:31:38] ppdet.data.source.mot INFO: MOT dataset summary: 
[12/06 17:31:38] ppdet.data.source.mot INFO: OrderedDict([('mot16.train', 518)])
[12/06 17:31:38] ppdet.data.source.mot INFO: Total images: 5316
[12/06 17:31:38] ppdet.data.source.mot INFO: Image start index: OrderedDict([('mot16.train', 0)])
[12/06 17:31:38] ppdet.data.source.mot INFO: Total identities: 519
[12/06 17:31:38] ppdet.data.source.mot INFO: Identity start index: OrderedDict([('mot16.train', 0)])
W1206 17:31:41.335633  1911 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W1206 17:31:41.340693  1911 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[12/06 17:31:47] ppdet.utils.download INFO: Downloading fairmot_dla34_crowdhuman_pretrained.pdparams from https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams

  0%|          | 0/127930 [00:00<?, ?KB/s]
  0%|          | 243/127930 [00:00<00:53, 2405.10KB/s]
  0%|          | 579/127930 [00:00<00:48, 2621.31KB/s]
  1%|          | 1059/127930 [00:00<00:42, 3019.88KB/s]
  1%|▏         | 1763/127930 [00:00<00:34, 3642.71KB/s]
  2%|▏         | 2771/127930 [00:00<00:27, 4504.33KB/s]
  3%|▎         | 4326/127930 [00:00<00:21, 5724.09KB/s]
  5%|▍         | 6307/127930 [00:00<00:16, 7274.25KB/s]
  7%|▋         | 9171/127930 [00:00<00:12, 9369.63KB/s]
 10%|█         | 13049/127930 [00:00<00:09, 12129.23KB/s]
 14%|█▍        | 18401/127930 [00:01<00:06, 15793.38KB/s]
 20%|█▉        | 25169/127930 [00:01<00:05, 20510.70KB/s]
 25%|██▌       | 32037/127930 [00:01<00:03, 25976.20KB/s]
 30%|███       | 38562/127930 [00:01<00:02, 31699.98KB/s]
 35%|███▌      | 45260/127930 [00:01<00:02, 37648.87KB/s]
 41%|████      | 51946/127930 [00:01<00:01, 43327.44KB/s]
 46%|████▌     | 58601/127930 [00:01<00:01, 48392.65KB/s]
 51%|█████     | 65431/127930 [00:01<00:01, 53028.52KB/s]
 57%|█████▋    | 72536/127930 [00:01<00:00, 57395.43KB/s]
 62%|██████▏   | 79580/127930 [00:01<00:00, 60771.23KB/s]
 68%|██████▊   | 86643/127930 [00:02<00:00, 63422.05KB/s]
 73%|███████▎  | 93691/127930 [00:02<00:00, 65384.56KB/s]
 79%|███████▊  | 100603/127930 [00:02<00:00, 66461.71KB/s]
 84%|████████▍ | 107606/127930 [00:02<00:00, 67492.09KB/s]
 90%|████████▉ | 114706/127930 [00:02<00:00, 68507.21KB/s]
 95%|█████████▌| 121746/127930 [00:02<00:00, 69063.90KB/s]
100%|██████████| 127930/127930 [00:02<00:00, 49433.09KB/s]
[12/06 17:31:50] ppdet.utils.checkpoint INFO: The shape [14455] in pretrained weight reid.classifier.bias is unmatched with the shape [519] in model reid.classifier.bias. And the weight reid.classifier.bias will not be loaded
[12/06 17:31:50] ppdet.utils.checkpoint INFO: The shape [128, 14455] in pretrained weight reid.classifier.weight is unmatched with the shape [128, 519] in model reid.classifier.weight. And the weight reid.classifier.weight will not be loaded
[12/06 17:31:50] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/aistudio/.cache/paddle/weights/fairmot_dla34_crowdhuman_pretrained.pdparams
[12/06 17:31:51] ppdet.engine INFO: Epoch: [0] [  0/886] learning_rate: 0.000100 loss: 11.543140 heatmap_loss: 0.707743 size_loss: 0.841231 offset_loss: 0.199213 det_loss: 0.991079 reid_loss: 6.887893 eta: 4:44:04 batch_cost: 0.6412 data_cost: 0.0004 ips: 9.3568 images/s
[12/06 17:32:03] ppdet.engine INFO: Epoch: [0] [ 20/886] learning_rate: 0.000100 loss: 11.371736 heatmap_loss: 0.790844 size_loss: 1.153176 offset_loss: 0.207417 det_loss: 1.129737 reid_loss: 6.450063 eta: 4:35:46 batch_cost: 0.6221 data_cost: 0.0003 ips: 9.6453 images/s
[12/06 17:32:16] ppdet.engine INFO: Epoch: [0] [ 40/886] learning_rate: 0.000100 loss: 10.481521 heatmap_loss: 0.754798 size_loss: 0.956801 offset_loss: 0.211136 det_loss: 1.079346 reid_loss: 6.054442 eta: 4:36:08 batch_cost: 0.6257 data_cost: 0.0003 ips: 9.5895 images/s
[12/06 17:32:28] ppdet.engine INFO: Epoch: [0] [ 60/886] 
learning_rate: 0.000100 loss: 7.048557 heatmap_loss: 0.578546 size_loss: 0.739798 offset_loss: 0.202264 det_loss: 0.905977 reid_loss: 4.017921 eta: 4:44:41 batch_cost: 0.6728 data_cost: 0.0003 ips: 8.9184 images/s

第四步：模型评估

开启下面的预测，最终我们得到如下图所示的内容，就是我们的模型最终预测结果


!cd work/PaddleDetection && CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams

输出结果较长，我们只展示最后的输出结果如下：

第五步：模型预测

使用下载好的模型进行推理，为了方便我们只推理了dataset/mot/MOT16/images/test/MOT16-01/img1下面的数据

跟踪输出视频保存在output/mot_outputs/img1_vis.mp4

txt文件结果保存在output/mot_results/img1.txt,输出格式表示为frame_id, id, bbox_left, bbox_top, bbox_width, bbox_height, score, x, y, z

!cd work/PaddleDetection/ && CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams --image_dir=dataset/mot/MOT16/images/test/MOT16-01/img1  --save_videos

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
W1206 22:39:30.301265 15564 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W1206 22:39:30.306080 15564 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[12/06 22:39:32] ppdet.utils.checkpoint INFO: Finish resuming model weights: output/fairmot_dla34_30e_1088x608/model_final.pdparams
[12/06 22:39:32] ppdet.engine.tracker INFO: Starting tracking folder dataset/mot/MOT16/images/test/MOT16-01/img1, found 450 images
[12/06 22:39:32] ppdet.engine.tracker INFO: Processing frame 0 (100000.00 fps)
[12/06 22:39:36] ppdet.engine.tracker INFO: Processing frame 40 (19.35 fps)
[12/06 22:39:39] ppdet.engine.tracker INFO: Processing frame 80 (19.52 fps)
[12/06 22:39:43] ppdet.engine.tracker INFO: Processing frame 120 (19.66 fps)
[12/06 22:39:46] ppdet.engine.tracker INFO: Processing frame 160 (19.72 fps)
[12/06 22:39:50] ppdet.engine.tracker INFO: Processing frame 200 (19.77 fps)
[12/06 22:39:54] ppdet.engine.tracker INFO: Processing frame 240 (18.92 fps)
[12/06 22:39:58] ppdet.engine.tracker INFO: Processing frame 280 (18.88 fps)
[12/06 22:40:01] ppdet.engine.tracker INFO: Processing frame 320 (18.99 fps)
[12/06 22:40:05] ppdet.engine.tracker INFO: Processing frame 360 (19.10 fps)
[12/06 22:40:08] ppdet.engine.tracker INFO: Processing frame 400 (19.19 fps)
[12/06 22:40:12] ppdet.engine.tracker INFO: Processing frame 440 (19.29 fps)
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[0;36m[mjpeg @ 0x1485720] [0mChangeing bps to 8
Input #0, image2, from 'output/mot_outputs/img1/%05d.jpg':
  Duration: 00:00:18.00, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 25 tbn, 25 tbc
[0;33mNo pixel format specified, yuvj420p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[0m[1;36m[libx264 @ 0x14883e0] [0musing SAR=1/1
[1;36m[libx264 @ 0x14883e0] [0musing cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2
[1;36m[libx264 @ 0x14883e0] [0mprofile High, level 4.0
[1;36m[libx264 @ 0x14883e0] [0m264 - core 148 r2643 5c65704 - H.264/MPEG-4 AVC codec - Copyleft 2003-2015 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=34 lookahead_threads=5 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output/mot_outputs/img1/../img1_vis.mp4':
  Metadata:
    encoder         : Lavf56.40.101
    Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuvj420p(pc), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 25 fps, 12800 tbn, 25 tbc
    Metadata:
      encoder         : Lavc56.60.100 libx264
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
Press [q] to stop, [?] for help
frame=  450 fps=8.1 q=-1.0 Lsize=   18529kB time=00:00:17.92 bitrate=8470.4kbits/s    
video:18523kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.034153%
[1;36m[libx264 @ 0x14883e0] [0mframe I:2     Avg QP:20.49  size:150857
[1;36m[libx264 @ 0x14883e0] [0mframe P:177   Avg QP:22.36  size: 78490
[1;36m[libx264 @ 0x14883e0] [0mframe B:271   Avg QP:26.49  size: 17610
[1;36m[libx264 @ 0x14883e0] [0mconsecutive B-frames:  3.3% 31.6% 52.7% 12.4%
[1;36m[libx264 @ 0x14883e0] [0mmb I  I16..4: 14.4% 80.9%  4.7%
[1;36m[libx264 @ 0x14883e0] [0mmb P  I16..4:  3.9% 27.3%  1.6%  P16..4: 20.9% 15.6% 14.6%  0.0%  0.0%    skip:16.0%
[1;36m[libx264 @ 0x14883e0] [0mmb B  I16..4:  1.3%  6.5%  0.2%  B16..8: 32.7%  6.9%  1.8%  direct: 2.1%  skip:48.5%  L0:50.4% L1:40.5% BI: 9.1%
[1;36m[libx264 @ 0x14883e0] [0m8x8 transform intra:82.5% inter:87.0%
[1;36m[libx264 @ 0x14883e0] [0mcoded y,uvDC,uvAC intra: 62.4% 60.5% 3.6% inter: 20.4% 13.9% 2.5%
[1;36m[libx264 @ 0x14883e0] [0mi16 v,h,dc,p: 35% 29% 35%  1%
[1;36m[libx264 @ 0x14883e0] [0mi8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 24% 49%  2%  1%  1%  1%  1%  3%
[1;36m[libx264 @ 0x14883e0] [0mi4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 44% 24% 12%  3%  3%  3%  4%  3%  3%
[1;36m[libx264 @ 0x14883e0] [0mi8c dc,h,v,p: 37% 29% 32%  2%
[1;36m[libx264 @ 0x14883e0] [0mWeighted P-Frames: Y:0.0% UV:0.0%
[1;36m[libx264 @ 0x14883e0] [0mref P L0: 53.1% 13.1% 20.0% 13.8%
[1;36m[libx264 @ 0x14883e0] [0mref B L0: 71.1% 22.9%  6.0%
[1;36m[libx264 @ 0x14883e0] [0mref B L1: 84.8% 15.2%
[1;36m[libx264 @ 0x14883e0] [0mkb/s:8429.62
[12/06 22:41:08] ppdet.engine.tracker INFO: Save video in output/mot_outputs/img1/../img1_vis.mp4
MOT results save in output/mot_results/img1.txt

预测结束后，我们可以将生成的mp4格式的视频在本地打，实时查看行人追踪情况，结果如下面截图类似