『骑驴看代码』CrossEntropyLoss

交叉熵损失函数

AI Studio

571人浏览 · 2023-02-12 23:10:48

AI Studio · 2023-02-12 23:10:48 发布

★★★ 本文源自AI Studio社区精品项目，【点击此处】查看更多精品内容 >>>

交叉熵损失函数（CrossEntropyLoss）

交叉熵损失函数是一种计算机学习中用来衡量两个分布之间差异的函数，是损失函数的一种，它常用于分类问题中的监督学习，它能够衡量模型的预测输出与真实输出之间的误差。特别是在深度学习中，交叉熵损失函数可用于训练神经网络模型，以最小化输出与真实输出之间的误差，使模型的预测尽可能接近真实值。

损失函数（loss function）

损失函数是一种衡量模型与数据吻合程度的算法。损失函数测量实际测量值和预测值之间差距的一种方式。损失函数的值越高预测就越错误，损失函数值越低则预测越接近真实值。对每个单独的观测(数据点)计算损失函数。将所有损失函数（loss function）的值取平均值的函数称为代价函数（cost function），更简单的理解就是损失函数是针对单个样本的，而代价函数是针对所有样本的。

简单说（白话，不严谨）就是训练数据当前预测方向切线和实际目标之间的差值（详细白话见机器学习之回归【李宏毅机器学习特训营】）

交叉熵

交叉熵（Cross-Entropy）是用来衡量两个概率分布之间的距离的一种度量方式。它可以用来衡量预测模型的好坏，通常情况下，模型越好，交叉熵越小。

$\ell(w)=-\frac{1}{n}\left[\sum_{i=1}^ny_ilog\left(\hat{y_i}\right)+(1-y_i)log(1-\hat{y_i})\right]$

此处 $\ell(w)$ 表示数据集上每个样本的损失值， $y_i$ 表示第i个样本的真实标签， $\hat{y}_i$ 表示第i个样本的预测标签，n表示样本总个数。

CrossEntropyLoss API

paddle.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean', soft_label=False, axis=- 1, name=None)

默认情况下，该操作符使用softmax实现交叉熵损失函数。

这个函数结合了软最大运算和交叉熵损失函数的计算提供更稳定的数值计算。

当use_softmax=False时，该运算符将计算交叉熵损失函数而不考虑softmax。

可用于计算硬标签或软标签的交叉熵。其中，硬标签是指实际 label 值，例如：0, 1, 2…，软标签是指实际 label 的概率，例如：0.6, 0,8, 0,2…。

默认情况下，此运算符将计算结果的平均值，您还可以影响使用还原参数的默认行为。请参考详细参数。

API文档传送门

API English文档传送门

代码传送门

I.softmax交叉熵

1.硬标签(每个样品只能归入一个类别):

1.1当 use_softmax=True：
$loss_j=-\text{logits}_{label_j}+\log\left(\sum_{i=0}^{C}\exp(\text{logits}_i)\right) , j = 1,...,N$ 式中，N为样本数，C为类别数。

1.2.当 use_softmax=False：
$loss_j=-\log\left({P}_{label_j}\right) , j = 1,...,N$ 式中，N为样本数，C为类别数，P为输入(softmax的输出)。

2.软标签(每个样本按一定概率分配给多个类别，概率和为1)。

2.1. 当 use_softmax=True:
$loss_j=-\sum_{i=0}^{C}\text{label}_i\left(\text{logits}_i-\log\left(\sum_{i=0}^{C}\exp(\text{logits}_i)\right)\right) , j = 1,...,N$
式中，N为样本数，C为类别数.

2.2. 当 use_softmax=False
$loss_j=-\sum_{j=0}^{C}\left({label}_j*\log\left({P}_{label_j}\right)\right) , j = 1,...,N$
式中，N为样本数，C为类别数，P为输入(softmax的输出)。

II.加权和还原处理

加权
如果参数 weight 标签是 None , 保持默认形式运行.

如果weight 标签不是 None , 则对每个样本的交叉熵进行加权
根据 soft_label = False 或 True 分类讨论.

1.1.硬标签(1soft_label = False)
$loss_j=loss_j*weight[label_j]$
1.2. Soft labels (soft_label = True)
$loss_j=loss_j*\sum_{i}\left(weight[label_i]*logits_i\right)$
还原

2.1 如果 reduction 标签是none 直接返回原始结果

2.2 如果reduction 标签是 sum 返回前一个结果的和
$loss=\sum_{j}loss_j$
2.3 如果reduction标签是mean , 它将根据weight 参数进行处理，如下所示。

2.3.1. 如果weight 标签是None返回先前结果的平均值
$loss=\sum_{j}loss_j/N$
式中，N为样本数，C为类别数。

2.3.2. 如果weight 参数不为 None, 则返回前一个结果的加权平均值

1.硬标签 (soft_label = False)
$loss=\sum_{j}loss_j/\sum_{j}weight[label_j]$

2.软标签 (soft_label = True)
$loss=\sum_{j}loss_j/\sum_{j}\left(\sum_{i}weight[label_i]\right)$

参数

weight (Tensor，可选): - 指定每个类别的权重。其默认为 None。如果提供该参数的话，维度必须为 C （类别数）的张量。数据类型应该为 float32 或 float64。
ignore_index (int64，可选): - 指定一个忽略的标签值，此标签值不参与计算，且该标签数据不会丢失，负值表示无需忽略任何标签数据。仅在 soft_label=False 时有效。默认值为-100。数据类型为 int64。
reduction (str，可选): - 指定应用于输出结果的计算方式，数据类型为 string，可选值有：none, mean, sum。默认为 mean，计算 mini-batch loss 均值。设置为 sum 时，计算 mini-batch loss 的总和。设置为 none 时，则返回 loss Tensor。
soft_label (bool, 可选) – 指明 label 是否为软标签。默认为 False，表示 label 为硬标签；若 soft_label=True 则表示软标签。
axis (int,可选) - 进行 softmax 计算的维度索引。它应该在 [−1，dim−1] 范围内，而 dim 是输入 logits 的维度。默认值：-1。
use_softmax (bool,可选) - 指定是否对 input 进行 softmax 归一化。默认值：True。
name (str，可选) - 具体用法请参见 Name，一般无需设置，默认值为 None。

举例

# hard labels 硬标签
import paddle
paddle.seed(99999)  # 固定种子便于复现
# 定义一个二维数据[N, C] N为层数，C为每层数据个数
N=100  # 样本数
C=200  # 类别数
reduction='mean'  # 定义reduction，后续用于计算 mini-batch loss 均值
input =  paddle.rand([N, C], dtype='float64')  # 生成符合标准正态分布（均值为 0，标准差为 1 的正态随机分布）的随机 Tensor
print(input[0][:20])  # 查看第0层的前20个数据
label =  paddle.randint(0, C, shape=[N], dtype='int64')   # 生成符合标准正态分布区间为[0, C)的数据，长度为N
weight = paddle.rand([C], dtype='float64')

# input是随机生成的数据，label是随机标签数据，weight是随机权重数据

# 定义权重和计算方式
cross_entropy_loss = paddle.nn.loss.CrossEntropyLoss(
    weight=weight, reduction=reduction)  # 等同于cross_entropy_loss = paddle.nn.loss.CrossEntropyLoss(weight)

# 输入随机数据进行运算
dy_ret = cross_entropy_loss(
                           input,
                           label)

print(dy_ret.numpy())  # [5.41993642]

Tensor(shape=[20], dtype=float64, place=Place(cpu), stop_gradient=True,
       [0.56245385, 0.28769494, 0.96811303, 0.87189548, 0.58035937, 0.53197589,
        0.89708411, 0.43403399, 0.48542086, 0.57983019, 0.57505820, 0.66562431,
        0.30480728, 0.55020241, 0.20474006, 0.79212845, 0.51198698, 0.35735818,
        0.28328114, 0.17039875])
[5.35419278]

# soft labels 软标签
import paddle
paddle.seed(99999)
axis = -1
ignore_index = -100
N = 4
C = 3
shape = [N, C]
reduction='mean'
weight = None
logits = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0)  # 生成数值范围在[min, max)内均匀分布的随机 Tensor。
labels = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0)
labels /= paddle.sum(labels, axis=axis, keepdim=True)
paddle_loss_mean = paddle.nn.functional.cross_entropy(
                                                      logits,
                                                      labels,
                                                      soft_label=True,  # True为软标签
                                                      axis=axis,
                                                      weight=weight,
                                                      reduction=reduction)  # 其他数据均为默认值
print(paddle_loss_mean.numpy())  # [1.12908343]

[1.12801195]

forward(input, label)向前计算

定义每次调用时执行的计算。应该被所有子类重写。

参数

*inputs (tuple) -解包的元组数据
**kwargs (dict) -解包的dict参数

CrossEntropyLoss基本代码

# 请勿运行，只是代码展示！！！！！！！
# https://github.com/PaddlePaddle/Paddle/blob/dddc5d9d10317ff90f8f6c3ab48f6ee7a3a1a919/python/paddle/nn/layer/loss.py#L141
class CrossEntropyLoss(Layer):
    # 定义参数和初始值
    def __init__(
        self,
        weight=None,
        ignore_index=-100,
        reduction='mean',
        soft_label=False,
        axis=-1,
        use_softmax=True,
        name=None,
    ):
        super(CrossEntropyLoss, self).__init__()
        self.weight = weight
        self.reduction = reduction
        self.ignore_index = ignore_index
        self.soft_label = soft_label
        self.axis = axis
        self.use_softmax = use_softmax
        self.name = name

    # 向前计算，对参数和初始值进行调用
    def forward(self, input, label):
        ret = paddle.nn.functional.cross_entropy(
            input,
            label,
            weight=self.weight,
            ignore_index=self.ignore_index,
            reduction=self.reduction,
            soft_label=self.soft_label,
            axis=self.axis,
            use_softmax=self.use_softmax,
            name=self.name,
        )

        return ret  # 返回ret

cross_entropy API

与CrossEntropyLoss 相似，多包装了一层接口（非官方解释，为个人理解）。

官网解释如下：

实现了 softmax 交叉熵损失函数。该函数会将 softmax 操作、交叉熵损失函数的计算过程进行合并，从而提供了数值上更稳定的计算。

篇幅缘故不再缀叙。

cross_entropy基本代码

以下注释的基本原理请参考I.x.x或II.x.x具体位置会进行注释。

由于篇幅庞大不利于观看，下列关于报错的raise ValueError不再重复解析该代码均为报错

# 请勿运行，只是代码展示！！！！！！！
# https://github.com/PaddlePaddle/Paddle/blob/dddc5d9d10317ff90f8f6c3ab48f6ee7a3a1a919/python/paddle/nn/layer/loss.py#L1438
def cross_entropy(input,  # 输入数据在函数中是（在实际计算时输出的数据所以是input）
                  label,  # 标签
                  weight=None,  # 权重
                  ignore_index=-100,  # 指定忽略标签值
                  reduction='mean',  # 输出结果的计算方式
                  soft_label=False,  # 软硬计算（是实际值还是概率计算）
                  axis=-1,  # 索引维度
                  use_softmax=True,  # 否对 input 进行 softmax 归一化
                  name=None):
    
    # 对reduction进行判定是否在列表中
    if reduction not in ['sum', 'mean', 'none']:
        raise ValueError(
            "The value of 'reduction' in softmax_cross_entropy"
            "should be 'sum', 'mean' or 'none', but received %s, which is not allowed."
            % reduction)
    
    # 对ignore_index和soft_label的互异性进行判定，具体见参数soft_label。
    if ignore_index > 0 and soft_label == True:
        raise ValueError(
            "When soft_label == True, the value of 'ignore_index' in softmax_cross_entropy"
            "should be '-100', but received %s, which is not allowed." %
            ignore_index)

    # 判定input的维度
    input_dims = len(list(input.shape))
    if input_dims == 0:
        raise ValueError('The dimention of input should be larger than zero!')

    # 判定label的维度与input的维度
    label_dims = len(list(label.shape))
    if input_dims - 1 != label_dims and input_dims != label_dims:
        raise ValueError(
            'Expected nput_dims - 1 = label_dims or input_dims == label_dims\
             (got nput_dims{}, label_dims{})'.format(input_dims, label_dims))

    # label的维度与input的维度相差1就解压缩成相同的          
    if input_dims - 1 == label_dims:
        label = paddle.unsqueeze(label, axis=axis)


    if _non_static_mode():
        # 软标签
        if soft_label == False:
            # 数据类型转换并完成对不要数据的判定
            valid_label = paddle.cast(
                label != ignore_index, dtype=label.dtype) * label
            
            # 获得最大、最小数据
            label_min = paddle.min(valid_label)
            label_max = paddle.max(valid_label)

            # 对数据的不同维度的个数进行区间判定
            if label_min < 0:
                raise ValueError("Target {} is out of lower bound.".format(
                    label_min.item()))
            if label_max >= input.shape[axis]:
                raise ValueError("Target {} is out of upper bound.".format(
                    label_max.item()))

        # 对环境进行判定，是否可以运行
        if core.is_compiled_with_npu() or core.is_compiled_with_mlu():
            # 实现softmax 交叉熵损失函数
            _, _, out = _C_ops.softmax_with_cross_entropy(
                input, label, 'soft_label', soft_label, 'ignore_index',
                ignore_index, 'numeric_stable_mode', True, 'axis', axis,
                'use_softmax', use_softmax)
        else:
            # 硬标签
            # 实现softmax归一化
            if in_dygraph_mode():
                _, out = _C_ops.final_state_cross_entropy_with_softmax(
                    input, label, soft_label, use_softmax, True, ignore_index,
                    axis)
            if _in_legacy_dygraph():
                _, out = _C_ops.softmax_with_cross_entropy(
                    input, label, 'soft_label', soft_label, 'ignore_index',
                    ignore_index, 'numeric_stable_mode', True, 'axis', axis,
                    'use_softmax', use_softmax)

        # 权重不为空时
        if weight is not None:

            # 从类别到种类改变权重，shape:N或 [N,H,W] 分别代表 1d 或 2d .
            # 见II.1.1
            if soft_label == True:
                # chajchaj:
                # 权重维度是 C,则 C 是类型的编号.
                # 从1d案例出发: 标签形状为[N,C], 权重合成维度为 N.
                # 从2d案例出发: 标签形状为[N,H,W,C], 权重合成维度为 [N,H,W].
                weight_gather = paddle.matmul(  # 计算Tensor乘积
                    x=paddle.cast(label, weight.dtype),  # 转换数据类型
                    y=weight,
                    transpose_x=False,
                    transpose_y=True)
                out_shape = list(out.shape)  # 输出数据形状查看
                weight_gather_reshape = reshape(weight_gather, shape=out_shape)  # 维度重构
                out = paddle.cast(out, weight_gather_reshape.dtype)  # 数据类型转换

                out = _C_ops.elementwise_mul(out, weight_gather_reshape)

            else:
                # 维度判断
                if input.shape[axis] != weight.shape[-1]:
                    raise ValueError(
                        "input's class_dimension({}) must equal to "
                        "weight's class_dimension({}) "
                        "when weight is provided" \
                            .format(input.shape[axis], weight.shape[-1]))
                
                # 部分数据维度转换
                ignore_weight_mask = paddle.cast((label != ignore_index),
                                                 out.dtype)

                if ignore_weight_mask.ndim > 1 and ignore_weight_mask.shape[
                        axis] == 1:
                    # TODO: Temporarily use squeeze instead of squeeze_
                    # 删除尺寸为1的维度
                    ignore_weight_mask = paddle.squeeze(ignore_weight_mask,
                                                        axis)
                # 针对不同的索引维度进行不同处理
                if axis != -1 and axis != valid_label.ndim - 1:
                    temp_perm = list(range(axis % valid_label.ndim)) \
                                + list(range((axis % valid_label.ndim + 1), valid_label.ndim)) \
                                + [axis % valid_label.ndim]
                    weight_gather = _C_ops.gather_nd(
                        weight, valid_label.transpose(temp_perm))
                else:
                    weight_gather = _C_ops.gather_nd(weight, valid_label)
                weight_gather = _C_ops.elementwise_mul(weight_gather,
                                                       ignore_weight_mask)
                input_shape = list(label.shape)
                weight_gather_reshape = reshape(
                    weight_gather, shape=input_shape)
                out = paddle.cast(out, weight_gather_reshape.dtype)
                out = _C_ops.elementwise_mul(out, weight_gather_reshape)

        # 计算 mini-batch loss 的总和
        if reduction == "sum":
            #   because of fluid_softmax_with_cross_entropy op's inner logic,
            #   in the out tensor of this op, the loss of sample with class_index==ignore_index is 0
            #   so, reduce_sum all directly is ok
            return _C_ops.reduce_sum(out, 'reduce_all', True)
        
        # 计算 mini-batch loss 的均值
        elif reduction == "mean":
            # 1. if weight==none,
            #     numerator: reduce_sum all loss directly is ok causeof fluid_softmax_with_cross_entropy's inner logic
            #     denominator: count sample num with class_index!=ignore_index
            # 2. else
            #     numerator: loss's weighted sum
            #     denominator: cal the sum of weight where the sample's class_index!=ignore_index
            if ignore_index != -100:
                out_sum = _C_ops.reduce_sum(out, 'reduce_all', True)
                # for each label[i],set 1 or 0, according to ignore_index
                # mask[i]=0, if label[i]==ignore_index
                # mask[i]=1, otherwise
                mask = (label != ignore_index)
                
                
                if weight is None:
                    mask = paddle.cast(mask, dtype=out_sum.dtype)
                    count = _C_ops.reduce_sum(mask, 'reduce_all', True)
                    ret = out_sum / (count + (count == 0.0))
                
                else:
                    mask = paddle.cast(mask, weight_gather_reshape.dtype)
                    weight_ignored = _C_ops.elementwise_mul(
                        mask, weight_gather_reshape)
                    weight_sum = _C_ops.reduce_sum(weight_ignored, 'reduce_all',
                                                   True)
                    ret = out_sum / (weight_sum + (weight_sum == 0.0))
                return ret

            # 详见II.2.3.2
            elif weight is not None:
                out_sum = _C_ops.reduce_sum(out, 'reduce_all', True)
                total_weight = _C_ops.reduce_sum(weight_gather_reshape,
                                                 'reduce_all', True)
                return out_sum / (total_weight + (total_weight == 0.0))
            
            # 详见II.2.3.1
            else:
                return _C_ops.mean(out)

        # 直接返回原始结果
        else:
            if input_dims - 1 == label_dims:
                out = paddle.squeeze(out, axis=axis)  # 删除尺寸为1的维度
            return out

    fluid.data_feeder.check_variable_and_dtype(
        input, 'input', ['float32', 'float64'], 'softmax_cross_entropy')
    fluid.data_feeder.check_variable_and_dtype(
        label, 'label',
        ['uint8', 'int8', 'int16', 'int32', 'int64', 'float32', 'float64'],
        'softmax_cross_entropy')
    attrs = {
        'soft_label': soft_label,
        'ignore_index': ignore_index,
        'numeric_stable_mode': True,
        'axis': axis,
        'use_softmax': use_softmax
    }
    helper = LayerHelper('softmax_with_cross_entropy', **locals())
    softmax = helper.create_variable_for_type_inference(dtype=input.dtype)
    out = helper.create_variable_for_type_inference(dtype=input.dtype)

    outputs = {'Softmax': softmax, 'Loss': out}
    if core.is_compiled_with_npu() or core.is_compiled_with_mlu():
        backprop = helper.create_variable_for_type_inference(dtype=input.dtype)
        outputs['Backprop'] = backprop
    helper.append_op(
        type='softmax_with_cross_entropy',
        inputs={'Logits': input,
                'Label': label},
        outputs=outputs,
        attrs=attrs)

    # 详见II.1.1
    if weight is not None:
        fluid.data_feeder.check_variable_and_dtype(
            weight, 'weight', ['float32', 'float64'], 'softmax_cross_entropy')
        weight_name = name if reduction == 'none' else None
        if soft_label == True:
            # chajchaj:
            # trans weight from class to sample, shape:N or [N,H,W] for 1d and 2d cases.
            # weight's shape is C, where C is class num.
            # for 1d case: label's shape is [N,C], weight_gather's shape is N.
            # for 2d case: label's shape is [N,H,W,C], weight_gather's shape is [N,H,W].
            weight_gather = paddle.matmul(
                x=paddle.cast(label, weight.dtype),
                y=weight,
                transpose_x=False,
                transpose_y=True)

            out_shape = list(out.shape)
            weight_gather_reshape = reshape(weight_gather, shape=out_shape)
            out = paddle.cast(out, weight_gather_reshape.dtype)
        # II.1.2
        else:
            if input.shape[axis] != weight.shape[-1]:
                raise ValueError("input's class_dimension({}) must equal to "
                                 "weight's class_dimension({}) "
                                 "when weight is provided" \
                                 .format(input.shape[axis], weight.shape[-1]))

            valid_label = paddle.multiply(
                paddle.cast(
                    label != ignore_index, dtype=label.dtype), label)
            ignore_weight_mask = paddle.cast((label != ignore_index),
                                             input.dtype)
            if ignore_weight_mask.ndim > 1 and ignore_weight_mask.shape[
                    axis] == 1:
                ignore_weight_mask = paddle.squeeze(ignore_weight_mask, axis)
            if axis != -1 and axis != valid_label.ndim - 1:
                temp_perm = list(range(axis % valid_label.ndim)) \
                            + list(range((axis % valid_label.ndim + 1), valid_label.ndim)) \
                            + [axis % valid_label.ndim]
                weight_gather = paddle.gather_nd(
                    weight, paddle.transpose(valid_label, temp_perm))
            else:
                weight_gather = paddle.gather_nd(weight, valid_label)
            weight_gather = paddle.multiply(weight_gather, ignore_weight_mask)

            input_shape = list(label.shape)
            weight_gather_reshape = reshape(weight_gather, shape=input_shape)
        out = paddle.multiply(out, weight_gather_reshape, name=weight_name)

    if reduction == "sum":
        return paddle.sum(out, name=name)
    elif reduction == "mean":
        if ignore_index != -100:
            out_sum = paddle.sum(out, name=name)
            # for each label[i],set 1 or 0, according to ignore_index
            # mask[i]=0, if label[i]==ignore_index
            # mask[i]=1, otherwise
            mask = (label != ignore_index)
            if (weight is None):
                mask = paddle.cast(mask, dtype=out_sum.dtype)
                count = paddle.sum(mask, name=name)
                ret = out_sum / (count + (count == 0.0))
            else:
                mask = paddle.cast(mask, weight_gather_reshape.dtype)
                weight_ignored = paddle.multiply(mask, weight_gather_reshape)
                weight_sum = paddle.sum(weight_ignored, name=name)
                ret = out_sum / (weight_sum + (weight_sum == 0.0))
            return ret
        elif weight is not None:
            out_sum = paddle.sum(out, name=name)
            total_weight = paddle.sum(weight_gather_reshape)
            return out_sum / (total_weight + (total_weight == 0.0))
        else:
            return paddle.mean(out, name=name)

    else:
        if input_dims - 1 == label_dims:
            out = paddle.squeeze(out, axis=axis)

=name)
            total_weight = paddle.sum(weight_gather_reshape)
            return out_sum / (total_weight + (total_weight == 0.0))
        else:
            return paddle.mean(out, name=name)

    else:
        if input_dims - 1 == label_dims:
            out = paddle.squeeze(out, axis=axis)

        return out