【PaddleDetection】量化Picodet-NPU进行快速人头检测

基于Picodet-NPU实现快速人头检测

AI Studio

1683人浏览 · 2022-10-07 21:26:18

AI Studio · 2022-10-07 21:26:18 发布

使用全量化的Picodet-NPU进行轻量化的人头检测

项目背景介绍

在新冠疫情的背景下，关于人群聚集时自动预警的项目越来越被重视，我刚好手头正在做这样一个项目。说大白话就是我们要实时的检测当前场景的人数，当达到一定数量时系统自动预警。

项目方案介绍

任务介绍

很显然，我们首要任务就是统计当前场景的人数，那么怎么统计人数呢？第一种方法是基于密度图生成的人群计数，其特点是在高密度场景下有着较高的准确率，但是低密度场景准确度较低。第二种方法是基于目标检测的方法，其特点是在低密度场景下有着较高的准确率，但是高密度场景漏检率较高。

显然在高密度的场景下，说明当前场景早已相当密集，那么早就应该要触发预警了。因此密度图并不适合用来做这个项目，那么我们的选择只剩下目标检测了。

项目优势

针对相关方案调研中的问题，我做了针对性的改进

改进一: 原数据集存在部分错误，在使用PaddleDetection自带的数据集转换代码转换数据集时会报错。我对PaddleDetection转换数据集代码x2coco.py进行改进，现在转换时可以自动跳过数据集中错误的部分。
改进二: 该项目使用最新Picodet-NPU模型进行训练，适合在嵌入式设备上部署。
改进三: 新增了量化的步骤，对硬件更友好。

模型选型

要做目标检测，我们首先要选择合适的模型。在选择目标检测模型的过程中，我们要注意到，我们的目标是让项目落地。因此，我们最后是需要将模型跑到嵌入式设备上的，这就要求模型即要小，准确率又不能太低。这时，PaddleDetection自带的Picodet就成了很好的选择。可是Picodet也有很多种类型呀，我们该如何选择呢？

在PaddleDetection对Picodet的介绍中我们可以看到以下基线:

模型	输入尺寸	mAP^{val 0.5:0.95}	mAP^{val 0.5}	参数量 ^(M)	FLOPS ^(G)	预测时延^{CPU^{^(ms)}}	预测时延^{Lite^{^(ms)}}	权重下载	配置文件	导出模型
PicoDet-XS	320*320	23.5	36.1	0.70	0.67	3.9ms	7.81ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-XS	416*416	26.2	39.3	0.70	1.13	6.1ms	12.38ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-S	320*320	29.1	43.4	1.18	0.97	4.8ms	9.56ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-S	416*416	32.5	47.6	1.18	1.65	6.6ms	15.20ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-M	320*320	34.4	50.0	3.46	2.57	8.2ms	17.68ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-M	416*416	37.5	53.4	3.46	4.34	12.7ms	28.39ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-L	320*320	36.1	52.0	5.80	4.20	11.5ms	25.21ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-L	416*416	39.4	55.7	5.80	7.10	20.7ms	42.23ms	model \| log	config	w/ 后处理 \| w/o 后处理
PicoDet-L	640*640	42.6	59.2	5.80	16.81	62.5ms	108.1ms	model \| log	config	w/ 后处理 \| w/o 后处理

以上的这些模型都是使用fp32精度运行的，当我们移植到嵌入式设备时（指RK系列），将使用fp16精度来跑模型，虽然精度损失不多，但这并不能使用到嵌入式设备的所有性能。别着急，这时候，继续往下翻，我们能找到以下基线：

模型	输入尺寸	mAP^{val 0.5:0.95}	mAP^{val 0.5}	参数量 ^(M)	FLOPS ^(G)	预测时延^{CPU^{^(ms)}}	预测时延^{Lite^{^(ms)}}	权重下载	配置文件
PicoDet-S-NPU	416*416	30.1	44.2	-	-	-	-	model \| log	config

PaddleDetection为了能在嵌入式边缘计算设备上使用NPU计算，推出了最新的Picodet-NPU，能够以int8类型在嵌入式设备上运行，并且跑出不错的效果。因此今天我们选择以这个模型来作为训练所使用的模型。

数据集

数据集介绍

数据集参考可能逃不了课了！如何使用PaddleX来点人头？，使用人头检测数据集来进行训练。为了使用int8类型在嵌入式设备上运行，我们最后是要对模型进行量化训练的。参考PP-PicoDet全量化示例我们可以知道，量化训练时，是以coco数据集作为输入对象的，因此我们需要将模型转换为coco格式。

对数据集进行的改进

在转换模型的过程中，我们发现数据集存在一定的错误，这显然会对模型最后的结果产生影响。我对PaddleDetection转换数据集代码x2coco.py进行改进，现在转换时可以自动跳过数据集中错误的部分，留下相对"干净"的数据集。

解压数据集

在转换coco格式之前，我们先需要先解压数据集

# 解压数据集请运行以下代码
# !unzip -qo ~/data/data168347/PaddleDetection-release-2.5.zip -d ~/work
!unzip -qo ~/data/data104969/SCUT_HEAD_Part_A_B.zip -d ~/work/PaddleDetection-release-2.5/dataset

划分数据集

在解压完数据集之后，我们还需要划分数据集，让程序知道，那些是训练用的图片，那些是验证用图片。

在这里，我参考可能逃不了课了！如何使用PaddleX来点人头？使用paddlex进行划分数据集，但是还需要做一些小小的修改。

由config配置中的coco_detection可以看到，picodet测试集和验证集用的是同一个json文件，因此这里对划分方式进行了修改，将训练集:验证集:测试集改成了8:2:0的比例。

!pip install paddlex==2.0rc
%cd ~/work/PaddleDetection-release-2.5
!paddlex --split_dataset --format VOC --dataset_dir ./dataset/MyDataset --val_value 0.2

转换数据集格式

接下来，我们参考如何准备训练数据对数据集格式进行转换。

在转换数据集时，由于原数据集存在错误，在转换时是不会被通过的，因此我们对tools/x2coco.py进行了修改，现在的代码将自动跳过错误的xml文件。

代码修改的地方贴在下面了，有兴趣的同学可以看看:

def voc_get_coco_annotation(obj, label2id):
    label = obj.findtext('name')
    assert label in label2id, "label is not in label2id."
    category_id = label2id[label]
    bndbox = obj.find('bndbox')
    xmin = float(bndbox.findtext('xmin'))
    ymin = float(bndbox.findtext('ymin'))
    xmax = float(bndbox.findtext('xmax'))
    ymax = float(bndbox.findtext('ymax'))
    if not (xmax > xmin and ymax > ymin):
        print("Box size error.max:{} xmin:{} ymax:{} ymin:{}".format(xmax,xmin,ymax,ymin))
        return None
    o_width = xmax - xmin
    o_height = ymax - ymin
    anno = {
        'area': o_width * o_height,
        'iscrowd': 0,
        'bbox': [xmin, ymin, o_width, o_height],
        'category_id': category_id,
        'ignore': 0,
    }
    return anno

def voc_xmls_to_cocojson(annotation_paths, label2id, output_dir, output_file):
    output_json_dict = {
        "images": [],
        "type": "instances",
        "annotations": [],
        "categories": []
    }
    bnd_id = 1  # bounding box start id
    im_id = 0
    print('Start converting !')
    for a_path in tqdm(annotation_paths):
        # Read annotation xml
        ann_tree = ET.parse(a_path)
        ann_root = ann_tree.getroot()

        img_info = voc_get_image_info(ann_root, im_id)
        img_info['file_name'] = os.path.basename(a_path).split(".")[0] + ".jpg"
        is_ok = True
        ann_list = []
        for obj in ann_root.findall('object'):
            ann = voc_get_coco_annotation(obj=obj, label2id=label2id)
            if ann is None:
                is_ok = False
                break
            ann.update({'image_id': im_id, 'id': bnd_id})
            bnd_id = bnd_id + 1
            ann_list.append(ann.copy())
        if is_ok:
            output_json_dict['images'].append(img_info)
            im_id += 1
            for ann in ann_list:
                output_json_dict['annotations'].append(ann)

    for label, label_id in label2id.items():
        category_info = {'supercategory': 'none', 'id': label_id, 'name': label}
        output_json_dict['categories'].append(category_info)
    output_file = os.path.join(output_dir, output_file)
    with open(output_file, 'w') as f:
        output_json = json.dumps(output_json_dict)
        f.write(output_json)
    print(bnd_id)

%cd ~/work/PaddleDetection-release-2.5
# 转换训练集文件
!python tools/x2coco.py \
        --dataset_type voc \
        --voc_anno_dir ./dataset/MyDataset/ \
        --voc_anno_list ./dataset/MyDataset/train_list.txt \
        --voc_label_list ./dataset/MyDataset/labels.txt \
        --voc_out_name ./dataset/MyDataset/voc_train.json

# 转换验证集文件
!python tools/x2coco.py \
        --dataset_type voc \
        --voc_anno_dir ./dataset/MyDataset/ \
        --voc_anno_list ./dataset/MyDataset/val_list.txt \
        --voc_label_list ./dataset/MyDataset/labels.txt \
        --voc_out_name ./dataset/MyDataset/voc_val.json

训练

参考PP-PicoDet全量化示例，我们的训练将分为四个部分：训练全精度模型 -> 评估全精度模型 -> 量化训练 -> 评估量化模型。

安装PaddleDetection

在正式开始训练之前，我们需要参考PaddleDetection安装文档安装一下需要的环境。

# 安装其他依赖
%cd ~/work/PaddleDetection-release-2.5
!pip install -r requirements.txt

# 编译安装paddledet
!python setup.py install

修改配置文件

修改coco_detection.yml

训练前，我们要修改configs/datasets/coco_detection.yml,让程序能够读取到我们的数据集。

metric: COCO
num_classes: 1

TrainDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: voc_train.json
    dataset_dir: dataset/MyDataset
    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']

EvalDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: voc_val.json
    dataset_dir: dataset/MyDataset

TestDataset:
  !ImageFolder
    anno_path: voc_val.json # also support txt (like VOC's label_list.txt)
    dataset_dir: dataset/MyDataset # if set, anno_path will be 'dataset_dir/anno_path'

修改picodet_s_416_coco_npu.yml

我们还需要修改picodet_s_416_coco_npu的配置文件./configs/picodet/picodet_s_416_coco_npu.yml

为了更快的训练，我们还需要修改batch-size,我这里跑64报错，因此改成一个batch读取32张图片
注意: !!!如果你是使用1卡进行训练还需要将学习率/4!!!

我这里使用的是1卡，这是修改完后的样子:

_BASE_: [
  '../datasets/coco_detection.yml',
  '../runtime.yml',
  '_base_/picodet_v2.yml',
  '_base_/optimizer_300e.yml',
]

pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams
weights: ./output/picodet_s_416_coco_npu/best_model.pdparams
find_unused_parameters: True
keep_best_weight: True
use_ema: True
epoch: 300
snapshot_epoch: 10

PicoDet:
  backbone: LCNet
  neck: CSPPAN
  head: PicoHeadV2

LCNet:
  scale: 0.75
  feature_maps: [3, 4, 5]
  act: relu6

CSPPAN:
  out_channels: 96
  use_depthwise: True
  num_csp_blocks: 1
  num_features: 4
  act: relu6

PicoHeadV2:
  conv_feat:
    name: PicoFeat
    feat_in: 96
    feat_out: 96
    num_convs: 4
    num_fpn_stride: 4
    norm_type: bn
    share_cls_reg: True
    use_se: True
    act: relu6
  feat_in_chan: 96
  act: relu6

LearningRate:
  base_lr: 0.2
  schedulers:
  - !CosineDecay
    max_epochs: 300
    min_lr_ratio: 0.08
    last_plateau_epochs: 30
  - !ExpWarmup
    epochs: 2

worker_num: 4
eval_height: &eval_height 416
eval_width: &eval_width 416
eval_size: &eval_size [*eval_height, *eval_width]

TrainReader:
  sample_transforms:
  - Decode: {}
  - Mosaic:
      prob: 0.6
      input_dim: [640, 640]
      degrees: [-10, 10]
      scale: [0.1, 2.0]
      shear: [-2, 2]
      translate: [-0.1, 0.1]
      enable_mixup: True
  - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30}
  - RandomFlip: {prob: 0.5}
  batch_transforms:
  - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_size: True, random_interp: True, keep_ratio: False}
  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
  - Permute: {}
  - PadGT: {}
  batch_size: 32
  shuffle: true
  drop_last: true
  mosaic_epoch: 180


EvalReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32}
  batch_size: 8
  shuffle: false


TestReader:
  inputs_def:
    image_shape: [1, 3, *eval_height, *eval_width]
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
  - Permute: {}
  batch_size: 1

对PaddleDetection代码做出的修改

修改operators.py

由于数据集标注有部分漏洞，训练时会一直提示警告，很烦，建议关掉，方法是修改文件[ppdet/data/transform/operators.py]中的这段话

class Decode(BaseOperator):
    def __init__(self):
        """ Transform the image data to numpy format following the rgb format
        """
        super(Decode, self).__init__()

    def apply(self, sample, context=None):
        """ load image if 'im_file' field is not empty but 'image' is"""
        if 'image' not in sample:
            with open(sample['im_file'], 'rb') as f:
                sample['image'] = f.read()
            sample.pop('im_file')

        try:
            im = sample['image']
            data = np.frombuffer(im, dtype='uint8')
            im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
            if 'keep_ori_im' in sample and sample['keep_ori_im']:
                sample['ori_image'] = im
            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
        except:
            im = sample['image']

        sample['image'] = im
        if 'h' not in sample:
            sample['h'] = im.shape[0]
        elif sample['h'] != im.shape[0]:
            # logger.warning(
            #     "The actual image height: {} is not equal to the "
            #     "height: {} in annotation, and update sample['h'] by actual "
            #     "image height.".format(im.shape[0], sample['h']))
            sample['h'] = im.shape[0]
        if 'w' not in sample:
            sample['w'] = im.shape[1]
        elif sample['w'] != im.shape[1]:
            # logger.warning(
            #     "The actual image width: {} is not equal to the "
            #     "width: {} in annotation, and update sample['w'] by actual "
            #     "image width.".format(im.shape[1], sample['w']))
            sample['w'] = im.shape[1]

        sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
        sample['scale_factor'] = np.array([1., 1.], dtype=np.float32)
        return sample

训练全精度模型

%cd ~/work/PaddleDetection-release-2.5

# 单卡训练
# training on single-GPU
# 注意:请修改学习率为0.05
# !export CUDA_VISIBLE_DEVICES=0
# !python tools/train.py -c configs/picodet/picodet_s_416_coco_npu.yml \
#                         --eval \
#                         --use_vdl=true \
#                         --vdl_log_dir=vdl_dir/scalar

# 多卡训练
# training on multi-GPU
# 注意:请修改学习率为0.2
!export CUDA_VISIBLE_DEVICES=0,1,2,3
!python -m paddle.distributed.launch --gpus 0,1,2,3 \
                                    tools/train.py \
                                    -c configs/picodet/picodet_s_416_coco_npu.yml \
                                    --eval \
                                    --vdl_log_dir=vdl_dir/scalar

模型验证

训练完成之后，我们要验证一下我们的模型精度有多高。以下是我训练出来的模型精度

模型	输入尺寸	mAP^{val 0.5:0.95}	mAP^{val 0.5}
PicoDet-S-NPU	416*416	35.6	78.2

!python tools/eval.py -c configs/picodet/picodet_s_416_coco_npu.yml \
              -o weights=./output/picodet_s_416_coco_npu/best_model.pdparams

使用训练好的模型进行推理

验证完成后，我们可以尝试使用我们训练好的模型进行推理。以下是我训练完成之后，运行的推理结果:

可以看到，效果还是很不错的，教室内的人头大部分都被检测出来了。

%cd ~/work/PaddleDetection-release-2.5
!python tools/infer.py -c configs/picodet/picodet_s_416_coco_npu.yml \
                        -o use_gpu=true \
                        --infer_img=./2.jpeg

导出预测模型

使用如下命令，导出Inference模型，用于全量化训练。导出模型默认存放在output_inference文件夹,包括*.pdmodel和*.pdiparams文件，用于全量化。

%cd ~/work/PaddleDetection-release-2.5
!python tools/export_model.py \
        -c configs/picodet/picodet_s_416_coco_npu.yml \
        -o weights=./output/picodet_s_416_coco_npu/best_model.pdparams \

对模型进行量化

参考全量化训练，训练完模型之后，我们要对模型进行量化，把模型从fp32格式转换成int8格式。

安装量化所需要的包

!pip install pyzmq==18.1.1
!pip install paddleslim==2.3.0
# !pip install paddledet==2.4.0

修改量化配置文件

修改picodet_reader.yml

要进行量化我们首先要修改配置文件deploy/auto_compression/configs/picodet_reader.yml，让系统能够读取到数据集

metric: COCO
num_classes: 1


# Datset configuration
TrainDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: voc_train.json
    dataset_dir: ./dataset/MyDataset

EvalDataset:
  !COCODataSet
    image_dir: JPEGImages
    anno_path: voc_val.json
    dataset_dir: ./dataset/MyDataset

worker_num: 6
eval_height: &eval_height 416
eval_width: &eval_width 416
eval_size: &eval_size [*eval_height, *eval_width]

EvalReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32}
  batch_size: 8
  shuffle: false

修改之后，我们要把这个文件移动到PaddleDetection的目录下

%cd ~/work/PaddleDetection-release-2.5
!cp ./deploy/auto_compression/configs/picodet_reader.yml ./configs

修改picodet_s_qat_dis.yaml

修改完picodet_reader.yml，我们还需要修改deploy/auto_compression/configs/picodet_s_qat_dis.yaml,让系统能够读取到模型数据，这是我修改以后的配置。

Global:
  reader_config: ./configs/picodet_reader.yml
  input_list: ['image', 'scale_factor']
  Evaluation: True
  model_dir: output_inference/picodet_s_416_coco_npu
  model_filename: model.pdmodel
  params_filename: model.pdiparams

Distillation:
  alpha: 1.0
  loss: l2

Quantization:
  use_pact: true
  activation_quantize_type: 'moving_average_abs_max'
  weight_bits: 8
  activation_bits: 8
  quantize_op_types:
  - conv2d
  - depthwise_conv2d

TrainConfig:
  train_iter: 8000
  eval_iter: 1000
  learning_rate:  
    type: CosineAnnealingDecay
    learning_rate: 0.00001
    T_max: 8000
  optimizer_builder:
    optimizer:
      type: SGD
    weight_decay: 4.0e-05

开始量化

%cd ~/work/PaddleDetection-release-2.5
!export CUDA_VISIBLE_DEVICES=0
!python ./deploy/auto_compression/run.py --config_path=./deploy/auto_compression/configs/picodet_s_qat_dis.yaml \
                                        --save_dir='./output/'

量化后的评估

进行量化以后，我们还需要对精度进行评估。

再次修改picodet_s_qat_dis.yaml

量化以后的模型位置变了，因此评估之前还需要对deploy/auto_compression/configs/picodet_s_qat_dis.yaml进行修改，我的配置如下:

Global:
  reader_config: ./configs/picodet_reader.yml
  input_list: ['image', 'scale_factor']
  Evaluation: True
  model_dir: output
  model_filename: model.pdmodel
  params_filename: model.pdiparams

Distillation:
  alpha: 1.0
  loss: l2

Quantization:
  use_pact: true
  activation_quantize_type: 'moving_average_abs_max'
  weight_bits: 8
  activation_bits: 8
  quantize_op_types:
  - conv2d
  - depthwise_conv2d

TrainConfig:
  train_iter: 8000
  eval_iter: 1000
  learning_rate:  
    type: CosineAnnealingDecay
    learning_rate: 0.00001
    T_max: 8000
  optimizer_builder:
    optimizer:
      type: SGD
    weight_decay: 4.0e-05

评估验证集

!export CUDA_VISIBLE_DEVICES=0
!python ./deploy/auto_compression/eval.py --config_path=./deploy/auto_compression/configs/picodet_s_qat_dis.yaml

更新日志

V0.0 -> 2022-09-11

初步更新第一个最基础版本的训练代码
添加多GPU训练的方法

V0.1 -> 2022-09-11

修复数据集分割时的错误
新增量化训练代码

V0.2 -> 2022-09-11

新增量化训练代码

总结

我们使用Picodet-NPU完成了初步的训练，并且得到了不错的效果。虽然训练出来的模型在验证集上的效果看上去还不错，map0.5也达到了78.2%。但是，这部分结果实际上存在一定的泡沫的，因为验证结果只是针对这个数据集而言的，实际效果其实还有待提高。

例如我们输入以下图片:

我们将得到输出图片:

看到鸡你太美，模型就被"迷住了"，把篮球也认为是一个人头，而且准确率高达80%。这说明原数据集的训练场景不够多，导致模型存在一定的过拟合，后期将尝试更新更大的数据集来进行训练。

请点击此处查看本环境基本用法.

Please click here for more detailed instructions.

此文章为搬运
原项目链接

百度飞桨AI Studio社区

学大模型，用大模型上飞桨星河社区！每天8点V100G算力免费领！免费领取ERNIE 4.0 100w Token >>>

更多推荐

利用Amazon Bedrock畅玩Claude 3等多种领先模型，抢占AI高地(体验倒计时4小时)

百度飞桨星河社区

RAPTOR：索引树状 RAG，使用树结构来捕捉文本的高级和低级细节

百度飞桨星河社区

MultiHop-RAG：多跳查询的基准检索增强生成

百度飞桨星河社区

所有评论(0)

查看更多评论

AI Studio

@m0_63642362

已为社区贡献1436条内容