基于PaddleLite实现yolov5的移动端部署

本项目是参加飞桨黑客松二期第63号任务，也是本人是上一个项目的续集，将yolov5部署到安卓手机

AI Studio

2258人浏览 · 2022-07-16 23:06:17

AI Studio · 2022-07-16 23:06:17 发布

项目背景

本项目是参加PaddlePaddle Hackathon第二期活动的任务, 在此分享将yolov5模型部署在手机端上的一些实践记录和踩坑经历。

任务目标：

参考ssd_mobilnetv1目标检测的 Android demo，使用 yolo_v5 模型在安卓手机上完成 demo 开发，输入为摄像头实时视频流，输出为包含检测框的视频流；在界面上添加一个 backend 选择开关，用户可以选择将模型运行在 CPU 或 GPU 上，保证分别运行 CPU 和 GPU 结果均正确。

项目任务链接： https://github.com/PaddlePaddle/Paddle/issues/40234

GitHub 任务提交链接: https://github.com/PaddlePaddle/Paddle-Lite-Demo/pull/237

基于之前项目的后续开发

链接： https://aistudio.baidu.com/aistudio/projectdetail/3452279

导出nb文件

基于之前的项目稍微做了调整，代码已上传至GitHub: https://github.com/thunder95/yolov5_paddle_prune

%cd /home/aistudio/
!rm -rf yolov5_paddle_prune/
!git clone  https://github.com/thunder95/yolov5_paddle_prune.git
%cd yolov5_paddle_prune
!git checkout android

# 安装依赖库
!pip install gputil==1.4.0 pycocotools terminaltables
!mkdir -p /home/aistudio/.config/thunder95/
!cp /home/aistudio/Arial.ttf  /home/aistudio/.config/thunder95/

数据集

本项目不需要训练, 但需要解压并修改数据集加载的配置文件

%cd /home/aistudio/data/data127016/
!unzip /home/aistudio/data/data127016/yolov5_mot20.zip
!python create_txt.py
!sed -i 's|\/f\/dataset\/person\/out|\/home\/aistudio\/data\/data127016|g' /home/aistudio/yolov5_paddle_prune/data/coco.yaml

在models/common.py中修改，基于PaddleAPI实现PaddleLite不支持的算子silu：

Error: This model is not supported, because 1 ops are not supported on ‘arm’. These unsupported ops are: ‘silu’

在这里插入图片描述

修改原后处理逻辑

遇到的问题一：

[F 4/10 8:51:53.568 …-Lite/lite/kernels/opencl/image_helper.h:72 DefaultGlobalWorkSize] Unsupport DefaultGlobalWorkSize with tensor_dim.size():5, image_shape.size():2

所提Issue(https://github.com/PaddlePaddle/Paddle-Lite/issues/8803)反馈, 在PaddleLite中使用opencl不支持>4维的算子

遇到的问题二：

使用paddle.max和paddle.argmax组合会跟原网络输出无法对齐

所提Issue(https://github.com/PaddlePaddle/Paddle-Lite/issues/8647)反馈, 官方待修复这个问题。

修改models/yolo.py中检测头的foward部分

class Detect(nn.Layer):
    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
        super().__init__()
        self.onnx_dynamic = False  # ONNX export parameter
        self.stride = None  # strides computed during build
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [paddle.zeros([1])] * self.nl  # init grid
        self.anchor_grid = [paddle.zeros([1])] * self.nl  # init anchor grid
        self.register_buffer('anchors', paddle.to_tensor(anchors).astype('float32').reshape([self.nl, -1, 2]))  # shape(nl,na,2)
        self.m = nn.LayerList(nn.Conv2D(x, self.no * self.na, 1) for x in ch)  # output conv
        self.inplace = inplace  # use in-place ops (e.g. slice assignment)
    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].reshape([bs, self.na, self.no, ny * nx]).transpose([0, 1, 3, 2])
            if not self.training:  # inference
                y = F.sigmoid(x[i])
                z.append(y)
        return x if self.training else z

修改detect.py中提取boxes代码，将在andriod端用c实现

boxes = []  # xywh, score, cls
for l in range(len(pred)):
    pred[l] = pred[l].numpy()
    bs, ch, nd, cn = pred[l].shape
    rows = dim_num[l]
    print(l, rows)
    for b in range(bs):
        for c in range(ch):
            for r in range(rows):
                offset = r * row_num
                max_cls_val = 0
                max_cls_id = 0
                score = pred[l][b, c, r, 4]
                if score < conf_thres:
                    continue

                for i in range(80):
                    if pred[l][b, c, r, i + 5] > max_cls_val:
                        max_cls_val = pred[l][b, c, r, i + 5]
                        max_cls_id = i

                score *= max_cls_val
                if score < conf_thres:
                    continue

                y = int(r / xdim[l])
                x = int(r % xdim[l])

                tmp_box = [
                    (pred[l][b, c, r, 0] * 2 - 0.5 + x) * strides[l],
                    (pred[l][b, c, r, 1] * 2 - 0.5 + y) * strides[l],  # i, y, x, 2
                    (pred[l][b, c, r, 2] * 2) ** 2 * anchors[l][c * 2],
                    (pred[l][b, c, r, 3] * 2) ** 2 * anchors[l][c * 2 + 1],
                    max_cls_id,
                    score,  # scores
                ]

                tmp_box[0] = tmp_box[0] - tmp_box[2] / 2  # x1
                tmp_box[1] = tmp_box[1] - tmp_box[3] / 2  # y1
                tmp_box[2] = tmp_box[0] + tmp_box[2]  # x2
                tmp_box[3] = tmp_box[1] + tmp_box[3]  # y2

                boxes.append(tmp_box)

# 导出静态图模型
%cd /home/aistudio/yolov5_paddle_prune
!python convert_static.py --weights ./weights/yolov5n.pdparams --cfg models/yolov5n.yaml --data ./data/coco.yaml --source crop.jpg

生成NB文件

安装PaddleLite优化工具paddle_lite_opt，并通过命令生成nb权重文件。

本项目推荐编译安装release/v2.11的PaddleLite, 参考issue:https://github.com/PaddlePaddle/Paddle-Lite/issues/9168

git clone https://github.com/PaddlePaddle/Paddle-Lite.git

cd Paddle-Lite

git checkout release/v2.11

export NDK_ROOT=~/Android/Sdk/ndk/24.0.8215888/

./lite/tools/build_android.sh --with_opencl=ON --arch=armv8 --with_extra=ON --with_log=ON

%cd /home/aistudio/yolov5_paddle_prune/
!pip install paddlelite==2.11

# cpu
!paddle_lite_opt --model_file=./simple_net.pdmodel --param_file=simple_net.pdiparams  --optimize_out_type=naive_buffer --optimize_out=./yolov5n_cpu --valid_targets=arm

#gpu
!paddle_lite_opt --model_file=./simple_net.pdmodel --param_file=simple_net.pdiparams  --optimize_out_type=naive_buffer --optimize_out=./yolov5n_gpu --valid_targets=opencl,arm

/home/aistudio/yolov5_paddle_prune
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddlelite==2.11
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8b/d7/8babc059bff1d02dc85e32fb8bff3c56680db5f5b20246467f9fcbcfe62c/paddlelite-2.11-cp37-cp37m-manylinux1_x86_64.whl (47.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: paddlelite
Successfully installed paddlelite-2.11
[33mWARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mLoading topology data from ./simple_net.pdmodel
Loading params data from simple_net.pdiparams
1. Model is successfully loaded!
2. Model is optimized and saved into ./yolov5n_cpu.nb successfully
Loading topology data from ./simple_net.pdmodel
Loading params data from simple_net.pdiparams
1. Model is successfully loaded!
2. Model is optimized and saved into ./yolov5n_gpu.nb successfully

Adb shell运行

安裝Adb软件（以ubuntu为例）

准备一部测试用的安卓手机armv8版本，处理器Qualcomm Snapdrgon 450。

系统 -> 版本号 -> 连续点击7次，开启开发模式
设置 -> 搜索开发人员选项 -> 打开usb调试
连接安卓手机，测试连接状态

 sudo apt update
 sudo apt install -y wget adb
 adb devices

下方出现设备，表示设备连接成功

List of devices attached
XKKBB19A10231894	device

修改本地NDK配置:
export NDK_ROOT=~/Android/Sdk/ndk/24.0.8215888

关键代码

下载代码：
git clone https://github.com/PaddlePaddle/Paddle-Lite-Demo.git

注意: 之前的版本opencl会存在精度对齐的问题，需要切换到develop或者release/v2.11，参考本人issue: https://github.com/PaddlePaddle/Paddle-Lite/issues/8807

下载, 解压并拷贝预编译库到Paddle-Lite-Demo项目: https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html

/cxx/include$ cp ./* /home/hulei/projects/tmp/Paddle-Lite-Demo/libs/android/cxx/include/
/cxx/lib$ cp libpaddle_light_api_shared.so /home/hulei/projects/tmp/Paddle-Lite-Demo/libs/android/cxx/libs/arm64-v8a/

模型加载:

  // 1. Set MobileConfig
  MobileConfig config;
  config.set_model_from_file(model_file);
  std::cout << "model_file: " << model_file << std::endl;
  config.set_power_mode(static_cast<paddle::lite_api::PowerMode>(power_mode));
  config.set_threads(thread_num);

  // 2. Create PaddlePredictor by MobileConfig
  std::shared_ptr<PaddlePredictor> predictor =
      CreatePaddlePredictor<MobileConfig>(config);

  // 3. Prepare input data from image
  // read img and pre-process
  std::unique_ptr<Tensor> input_tensor0(std::move(predictor->GetInput(0)));
  input_tensor0->Resize({1, 3, height, width});
  auto *data0 = input_tensor0->mutable_data<float>();
  cv::Mat img = imread(img_path, cv::IMREAD_COLOR);
  pre_process(img, width, height, data0);

  // 4. Run predictor
  double first_duration{-1};
  for (size_t widx = 0; widx < warmup; ++widx) {
    if (widx == 0) {
      auto start = GetCurrentUS();
      predictor->Run();
      first_duration = (GetCurrentUS() - start) / 1000.0;
    } else {
      predictor->Run();
    }
  }

数据输入预处理

void pre_process(const cv::Mat &img_ori, int width, int height, float *data) {
  cv::Mat img = img_ori.clone();
  int w, h, x, y;
  int channelLength = width * height;

  float r_w = width / (img.cols * 1.0);
  float r_h = height / (img.rows * 1.0);
  if (r_h > r_w) {
    w = width;
    h = r_w * img.rows;
    x = 0;
    y = (height - h) / 2;
  } else {
    w = r_h * img.cols;
    h = height;
    x = (width - w) / 2;
    y = 0;
  }

  cv::Mat re(h, w, CV_8UC3);
  cv::resize(img, re, re.size(), 0, 0, cv::INTER_CUBIC);
  cv::Mat out(height, width, CV_8UC3, cv::Scalar(128, 128, 128));
  re.copyTo(out(cv::Rect(x, y, re.cols, re.rows)));

  // split channels
  out.convertTo(out, CV_32FC3, 1. / 255.);
  cv::Mat input_channels[3];
  cv::split(out, input_channels);
  for (int j = 0; j < 3; j++) {
    memcpy(data + width * height * j, input_channels[2 - j].data,
           channelLength * sizeof(float));
  }
}

后处理代码

void post_process(std::shared_ptr<PaddlePredictor> predictor, float thresh,
                  std::vector<std::string> class_names, const cv::Mat &image,
                  int in_width, int in_height) { // NOLINT
  const int strides[3] = {8, 16, 32};
  const int anchors[3][6] = {{10, 13, 16, 30, 33, 23},
                             {30, 61, 62, 45, 59, 119},
                             {116, 90, 156, 198, 373, 326}};
  std::map<int, std::vector<Object>> raw_outputs;
  float r_w = in_width / static_cast<float>(image.cols);
  float r_h = in_height / static_cast<float>(image.rows);
  float r, off_x, off_y;
  if (r_h > r_w) {
    r = r_w;
    off_x = 0;
    off_y = static_cast<int>((in_height - r_w * image.rows) / 2);
  } else {
    r = r_h;
    off_y = 0;
    off_x = static_cast<int>((in_width - r_h * image.cols) / 2);
  }

  for (int k = 0; k < 3; k++) {
    std::unique_ptr<const Tensor> output_tensor(
        std::move(predictor->GetOutput(k)));
    auto *outptr = output_tensor->data<float>();
    auto shape_out = output_tensor->shape();
    std::vector<int> shape_new(shape_out.begin(), shape_out.end());
    int xdim = static_cast<int>(in_width / strides[k]);
    extract_boxes(outptr, &raw_outputs, strides[k], anchors[k], shape_new, r,
                  thresh, off_x, off_y, xdim);
  }

  std::vector<Object> outs;
  nms(&raw_outputs, &outs, 0.45);
}

打开日志调试

模型可能算子不支持，也可能有其他问题，需要更多详细的日志信息，这是需要打开日志模式。下载PaddleLite源码重新编译：

./lite/tools/build_android_armv8.sh --with_opencl=ON --arch=armv8 --with_extra=ON --with_log=ON

运行Yolov5n推理

 cd Paddle-Lite-Demo/libs
 # 下载所需要的 Paddle Lite 预测库
 sh download.sh
 cd ../object_detection/assets
 # 下载OPT 优化后模型、测试图片、标签文件
 sh download.sh
 cd ../android/app/shell/cxx/yolov5n_detection
 # 更新 NDK_ROOT 路径，然后完成可执行文件的编译和运行
 sh build.sh
 # CMakeList.txt 里的 System 默认设置是linux；如果在Mac 运行，则需将 CMAKE_SYSTEM_NAME 变量设置为 drawn

cpu推理性能:

在这里插入图片描述

推理结果：

在这里插入图片描述

gpu推理性能: shell下相比cpu测试没有明显的性能提升，输出精度与cpu对齐

在这里插入图片描述

部署到Anroid手机

参考教程: https://aistudio.baidu.com/aistudio/projectdetail/3431580

注意：如果您的 Android Studio 尚未配置 NDK ，请根据 Android Studio 用户指南中的安装及配置 NDK 和 CMake 内容，预先配置好 NDK 。您可以选择最新的 NDK 版本，或者使用 Paddle Lite 预测库版本一样的 NDK

使用AndroidStudio打开工程目录: https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/develop/object_detection/android/shell/cxx/yolov5n_detection

更新预测库到demo工程中:

/java/jar$ cp PaddlePredictor.jar yolov5n_detection_demo/app/PaddleLite/java/

/java/so$ cp libpaddle_lite_jni.so yolov5n_detection_demo/app/PaddleLite/java/libs/arm64-v8a/

cxx/include$ cp ./* yolov5n_detection_demo/app/PaddleLite/cxx/include/

cxx/lib$ cp libpaddle_light_api_shared.so yolov5n_detection_demo/app/PaddleLite/cxx/libs/arm64-v8a/

启动Android部署到到手机上

在性能不错的小米手机上可达到50fps

基于裁剪的模型进行部署

使用上一个项目裁剪训练的权重: yolov5_paddle_prune/weights/finetune.pdparams

转换静态图时需要适配cfg配置文件，将裁剪后的模型通过cfg配置文件加载

相比之前的nb文件大小，裁剪后从 7.5MB 降低到 2.8MB.

%cd /home/aistudio/yolov5_paddle_prune
!python convert_static_prune.py --weights ./weights/finetune.pdparams --cfg cfg/prune_0.5_keep_0.01_8x_yolov5n_v6_person.cfg --data ./data/coco_person.yaml --source crop.jpg

/home/aistudio/yolov5_paddle_prune
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
[34m[1mconvert_static_prune: [0mweights=['./weights/finetune.pdparams'], cfg=cfg/prune_0.5_keep_0.01_8x_yolov5n_v6_person.cfg, source=crop.jpg, imgsz=[320, 320], conf_thres=0.01, iou_thres=0.6, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, dnn=False, single_cls=False, data=./data/coco_person.yaml, hyp=data/hyps/hyp.scratch.yaml
[31m[1mrequirements:[0m tqdm>=4.36.1 not found and is required by YOLOv5, attempting auto-update...

# 按照同样的方式生成nb文件， 并部署到同样的安卓手机,  paddle-lite-demo代码不需要额外的改动
%cd /home/aistudio/yolov5_paddle_prune/
!pip install paddlelite==2.11

# cpu
!paddle_lite_opt --model_file=./prune_net.pdmodel --param_file=prune_net.pdiparams  --optimize_out_type=naive_buffer --optimize_out=./yolov5n_cpu_prune --valid_targets=arm

#gpu
!paddle_lite_opt --model_file=./prune_net.pdmodel --param_file=prune_net.pdiparams  --optimize_out_type=naive_buffer --optimize_out=./yolov5n_gpu_prune --valid_targets=opencl,arm

/home/aistudio/yolov5_paddle_prune
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddlelite==2.11
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8b/d7/8babc059bff1d02dc85e32fb8bff3c56680db5f5b20246467f9fcbcfe62c/paddlelite-2.11-cp37-cp37m-manylinux1_x86_64.whl (47.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 MB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: paddlelite
Successfully installed paddlelite-2.11
[33mWARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mLoading topology data from ./prune_net.pdmodel
Loading params data from prune_net.pdiparams
1. Model is successfully loaded!
2. Model is optimized and saved into ./yolov5n_cpu_prune.nb successfully
Loading topology data from ./prune_net.pdmodel
Loading params data from prune_net.pdiparams
1. Model is successfully loaded!
2. Model is optimized and saved into ./yolov5n_gpu_prune.nb successfully