二值神经网络第二弹：paddle自定义算子实现高效推理

AI Studio

439人浏览 · 2022-12-28 23:10:39

AI Studio · 2022-12-28 23:10:39 发布

二值神经网络 BNN 的自定义算子实现推理

0 项目介绍

本项目在前一项目全平台首个二值神经网络基础上，基于paddle2.3.2重构了训练和预测代码，使得可读性更强。

本项目用C++自定义算子的方式，用异或操作替代全连接层中的乘法操作，完成了功能验证。

参考文献：

Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1

2 二值神经网络原理与实现方法回顾

二值化网络只是将网络的参数和激活值二值化，并没有改变网络结构，且输出层的概率输出依旧是浮点型。

2.1 二值化方法

二值化方法公式如下：

2.2 前馈传播过程

先将实数型权值参数二值化得到二值型权值参数，即 $x^{b}_{k}$ =sign( $x^{}_{k}$ )。然后利用二值化后的参数计算得到实数型的中间向量，该向量再通过Batch Normalization操作，得到实数型的隐藏层激活向量。如果不是输出层的话，就将该向量做二值化激活。完整公式如下所示：
上式中 $x^{b}_{k}$ 表示第k层经过二值化激活后的结果。

2.3 反向传播过程

首先值得一提的是，在训练过程中，权重参数是以全精度float32类型存储并更新的。但是在前馈传播过程中，权重参数会先二值化再进行运算，激活函数会将batch_norm后浮点类型的特征值重新二值化，从而确保下一卷积层的输入为二值化的特征图。
由于sign(x)的导数（几乎）处处为零，因此， $W^{}_{k}$ 通过BP算法得到的梯度为零，因此不能直接用来更新权值。为解决这个问题采用 straight-through estimator的方法，即梯度传播时绕过sign()操作，从而避免导数为0的传播。

# 训练二值神经网络，请用GPU环境
%cd /home/aistudio/work/
!python ./LeNet_MNIST_train.py

# 用训练好的参数进行预测
%cd /home/aistudio/work/
!python ./LeNet_MNIST_predict.py

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
download testing data and load testing data
load finished
评估准确度为：97.2460925579071%

3 二值神经网络的自定义算子C++实现

以上的预测程序中，虽然全连接层的权重和输入特征图都是二值化的，但是因为采用paddle中的Linear层完成计算，所以本质上还是使用了乘法操作。而二值神经网络的优势在于计算量庞大的全连接（卷积）操作可以通过位操作来替代乘法操作，所以以下将介绍如何使用C++自定义算子来用位运算实现二值全连接层的运算。

3.1 二值神经网络的实现原理

二值神经网络中乘法只有如下表所示的4种情况：

特征图	权重	结果
-1	-1	1
-1	1	-1
1	-1	-1
1	1	1

异或操作的真值表如下所示：

In1	In2	Out
True	True	False
True	False	True
False	True	True
False	False	False

通过对比以上两表可以发现，将-1映射为True、1映射为False即可通过异或操作实现二值神经网络的乘法。
所以需要预先对+1/-1形式的权重做上述映射并存储；同时将原本的二值化激活函数（正数激活值为+1，负数激活值为-1）修改为正数激活值为False，负数激活值为True。
因为全连接层中有对乘积的累加，所以这里还需将乘积结果True反映射回-1、False反映射回1。由于True=1、False=0，所以此反映射只要通过f(x)=1-2*x即可实现。

3.2 二值激活函数的实现代码

std::vector<paddle::Tensor> custom_binary_act_forward(const paddle::Tensor& x) {
    if (x.place() == paddle::PlaceType::kCPU) {
        auto out = paddle::Tensor(paddle::PlaceType::kCPU);
        out.reshape(x.shape());

        auto x_numel = x.size();
        auto* x_data = x.data<float>();
        auto* out_data = out.mutable_data<bool>(x.place());

        for (int i = 0; i < x_numel; ++i) {
            if (x_data[i]<0)
                out_data[i] = true;
            else
                out_data[i] = false;
        }
        return {out};
    } else {
        PD_THROW("Not implemented.");
    }
}

以上为二值激活函数的核心C++代码，完整代码见work/custom_binary_act.cc。
值得注意的是需要将输出out_data的数据类型设为bool型。

3.3 二值全连接层的实现代码

std::vector<paddle::Tensor> custom_bnn_linear_forward(const paddle::Tensor& x, const paddle::Tensor& w, const paddle::Tensor& b) {
    if (x.place() == paddle::PlaceType::kCPU) {
        auto out = paddle::Tensor(paddle::PlaceType::kCPU);
        out.reshape({x.shape()[0], w.shape()[1]});

        auto* x_data = x.data<bool>();
        auto* w_data = w.data<bool>();
        auto* b_data = b.data<float>();
        auto* out_data = out.mutable_data<float>(x.place());

        for (int i=0; i<x.shape()[0]*w.shape()[1]; ++i)
            out_data[i] = 0;

        for (int i = 0; i < x.shape()[0]; ++i) {
            for (int k = 0; k < w.shape()[1]; ++k) {
                for (int j = 0; j < x.shape()[1]; ++j) {
                    out_data[i*w.shape()[1]+k] += 1-2*(float)(x_data[i*x.shape()[1]+j]^w_data[j*w.shape()[1]+k]);
                }
                out_data[i*w.shape()[1]+k] += b_data[k];
            }
        }
        return {out};
    } else {
        PD_THROW("Not implemented.");
    }
}

以上为二值全连接层的核心C++代码，完整代码见work/custom_bnn_linear.cc。
值得注意的是，二值神经网络中全连接层的bias参数仍为浮点数，所以这里将输出out_data的数据类型设为float型。

3.4 完整流程

第一步使用即时编译（JIT Compile）导入自定义C++算子。

注意：以下注册自定义算子的代码块必须要被运行（推荐在CPU环境下使用）

%cd /home/aistudio/
import paddle
import paddle.nn as nn
from paddle.utils.cpp_extension import load
custom_ops = load(
    name="custom_jit_ops",
    sources=[
             "work/custom_bnn_linear.cc",
             "work/custom_binary_act.cc"
            ]
)

/home/aistudio
Compiling user custom op, it will cost a few seconds.....


cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp

第二步使用自定义C++算子构建网络。

import paddle
import paddle.nn as nn
from work.LeNet_MNIST_train import *
class LeNet_inference(paddle.nn.Layer):  # 这里网络结构并非严格的LeNet，只是姑且这么命名
                               # 由于二值化后信息丢失相比于float32会严重些，所以可以适当增加些隐藏层神经元数量
    def __init__(self):
        super(LeNet_inference, self).__init__()
        self.infl_ratio=1
        self.fc1 = BinarizeLinear(784, 2048*self.infl_ratio, bias_attr=True )
        self.bn1 = paddle.nn.BatchNorm1D(2048*self.infl_ratio)
        self.fc2 = BinarizeLinear(2048*self.infl_ratio, 2048*self.infl_ratio, bias_attr=True )
        self.bn2 = paddle.nn.BatchNorm1D(2048*self.infl_ratio)
        self.fc3 = BinarizeLinear(2048*self.infl_ratio, 2048*self.infl_ratio, bias_attr=True )
        self.bn3 = paddle.nn.BatchNorm1D(2048*self.infl_ratio)
        self.fc4 = BinarizeLinear(2048*self.infl_ratio, 10, bias_attr=True )
        self.act = Binary_act()

    def convert_weight(self):
        self.weight1 = paddle.Tensor(self.fc1.weight.numpy())
        self.weight2 = paddle.Tensor(self.fc2.weight.numpy())
        self.weight3 = paddle.Tensor(self.fc3.weight.numpy())
        self.weight4 = paddle.Tensor(self.fc4.weight.numpy())
        self.weight1 = custom_ops.custom_binary_act(self.weight1)
        self.weight2 = custom_ops.custom_binary_act(self.weight2)
        self.weight3 = custom_ops.custom_binary_act(self.weight3)
        self.weight4 = custom_ops.custom_binary_act(self.weight4)

    def forward(self, x):
        x = paddle.reshape(x, [-1, 28*28])
        x = nn.functional.linear(x, self.fc1.weight, self.fc1.bias)
        x = self.bn1(x)
        x = custom_ops.custom_binary_act(x)
        x = custom_ops.custom_bnn_linear(x, self.weight2, self.fc2.bias)
        x = self.bn2(x)
        x = custom_ops.custom_binary_act(x)
        x = custom_ops.custom_bnn_linear(x, self.weight3, self.fc3.bias)
        x = self.bn3(x)
        x = custom_ops.custom_binary_act(x)
        x = custom_ops.custom_bnn_linear(x, self.weight4, self.fc4.bias)
        return x

第三步，进行准确率评估

import numpy as np
import paddle
from paddle.vision.transforms import Compose, Resize, Transpose, Normalize
def main():
    state_dict = paddle.load('work/best_lenet_model.pdparams')

    paddle.seed(42)
    np.random.seed(42)
    transform = Compose([Normalize(mean=[127.5],
                                std=[127.5],
                                data_format='CHW')])
    # 使用transform对数据集做归一化
    print('download testing data and load testing data')
    batch_size = 512
    test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)
    valid_loader = paddle.io.DataLoader(test_dataset, batch_size=batch_size)
    print('load finished')

    model = LeNet_inference()
    model.eval()
    model.set_state_dict(state_dict)

    model.convert_weight()

    accuracies = []
    for batch_id, data in enumerate(valid_loader()):
        x_data = paddle.cast(data[0], 'float32')
        y_data = paddle.cast(data[1], 'int64')
        y_data = paddle.reshape(y_data, (-1, 1))
        y_predict = model(x_data)
        acc = paddle.metric.accuracy(y_predict, y_data)
        accuracies.append(np.mean(acc.numpy()))
        print("{}/{} batch acc is: {}".format(batch_id, len(valid_loader), acc.numpy()))

    avg_acc = np.mean(accuracies)
    print("评估准确度为：{}%".format(avg_acc*100))
main()

download testing data and load testing data
item  96/403 [======>.......................] - ETA: 0s - 1ms/item

Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-images-idx3-ubyte.gz 
Begin to download


item 2/2 [===========================>..] - ETA: 0s - 2ms/item


Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-labels-idx1-ubyte.gz 
Begin to download

Download finished


load finished


W1222 08:17:49.397006   168 place.cc:147] The `paddle::PlaceType::kCPU/kGPU` is deprecated since version 2.3, and will be removed in version 2.4! Please use `Tensor::is_cpu()/is_gpu()` method to determine the type of place.
W1222 08:17:49.397060   168 place.cc:136] The `paddle::PlaceType::kCPU/kGPU` is deprecated since version 2.3, and will be removed in version 2.4! Please use `paddle::CPUPlace()/DefaultGPUPlace()` to represent the place type.
W1222 08:17:49.397073   168 tensor.cc:54] The Tensor(place) constructor is deprecated since version 2.3, and will be removed in version 2.4! Please use `paddle::empty/full` method to create a new Tensor instead. Reason: A legal tensor cannot be constructed only based on the `place`, and datatype, shape, layout, etc. is also required.
W1222 08:17:49.397099   168 tensor.cc:104] The function of resetting the shape of the uninitialized Tensor of the `reshape` method is deprecated since version 2.3, and will be removed in version 2.4, please use `paddle::empty/full` method to create a new Tensor instead. reason: `reshape` means changing the tensor shape without touching underlying data, this requires the total size of the tensor to remain constant.
W1222 08:17:49.397119   168 tensor.cc:199] Allocating memory through `mutable_data` method is deprecated since version 2.3, and `mutable_data` method will be removed in version 2.4! Please use `paddle::empty/full` method to create a new Tensor with allocated memory, and use data<T>() method to get the memory pointer of tensor instead. Reason: When calling `mutable_data` to allocate memory, the datatype, and data layout of tensor may be in an illegal state.
W1222 08:17:49.522763   168 tensor.cc:199] Allocating memory through `mutable_data` method is deprecated since version 2.3, and `mutable_data` method will be removed in version 2.4! Please use `paddle::empty/full` method to create a new Tensor with allocated memory, and use data<T>() method to get the memory pointer of tensor instead. Reason: When calling `mutable_data` to allocate memory, the datatype, and data layout of tensor may be in an illegal state.


0/20 batch acc is: [0.9746094]
1/20 batch acc is: [0.9609375]
2/20 batch acc is: [0.95703125]
3/20 batch acc is: [0.9628906]
4/20 batch acc is: [0.9453125]
5/20 batch acc is: [0.9589844]
6/20 batch acc is: [0.9765625]

6/20 batch acc is: [0.9765625]