【AI达人特训营】高可信度航班延误预测模型及模型可解释性

本项目利用使用简单的MLP模型完成航班延误预测任务，并探究该模型的可解释性。进一步使用bootstrapping构建高可信度航班延误预测的集成模型。

AI Studio

1001人浏览 · 2022-07-01 22:47:08

AI Studio · 2022-07-01 22:47:08 发布

高可信度航班延误预测模型及模型可解释性

在这里插入图片描述

随着航空运输市场需求量的不断增加，在规模较大的机场，航班易出现延误。在发生大面积航班延误时，如果没有良好的应对机制，不仅会产生旅客群体性事件，还将带来一系列延误造成的连锁反应。

本项目基于航班起飞时可以测量的特征，构建MLP模型，以预测航班是否延误。并计算不同特征的重要程度来对模型做出解释，并推断延迟的原因。我们还将使用bootstrapping来估计模型的预测区间。我们利用这些预测区间来建立一种新的模型:在不自信的情况下避免做出预测的模型。该项目所用数据集为2015 Flight Delays and Cancellations数据集的一部分。

关于延误，FAA和BTS将晚点15分钟或以上的航班定义为延误，因此本项目为二分类任务(正点/延误)。

!pip install -qq scikit-lego

# imports
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.utils import resample #bootstrap
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import permutation_importance
from sklego.preprocessing import RepeatingBasisFunction #RBF

import paddle
import paddle.nn as nn

import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)

# 不知道为什么，切换成GPU各种错误，直接CPU得了
# https://blog.csdn.net/qq_37668436/article/details/114336142
# 最好是V100 32G或以上，CPU核心多
paddle.device.set_device("cpu")
print(paddle.device.get_device())

cpu

1. Data

该文件原始形状为（10000，10），本节对数据预处理，并构建适合训练的数据形式。

1.1. Preprocessing

如上文所说，首先需要根据ARRIVAL_DELAY来新建DELAY_OR_NOT列，作为真实标签。接着对非数值的分类变量进行独热码操作，如果有缺失值，处理缺失值，扩展数据，并将数据分割为训练集和测试集(8000/2000)。

其中对于时间的编码，可以有些特殊操作：

时间信息的三种编码方式（RBF）
https://scikit-lego.netlify.app/preprocessing.html?highlight=repeatingbasisfunction
https://scikit-lego.netlify.app/api/preprocessing.html?highlight=repeatingbasisfunction#sklego.preprocessing.RepeatingBasisFunction.fit

使用RBF(radial basis function)对星期和月份编码，可以提升2%左右的准确率。但其中的操作会影响后面的部分，且本文重点不在于性能，因此本次未使用(代码已给出)。

# Load the data
df = pd.read_csv('data/data155027/flights.csv')

# Create a variable DELAY_OR_NOT that denotes whether ARRIVAL_DELAY is greater than or equal to 15 minutes
df['DELAY_OR_NOT'] = df['ARRIVAL_DELAY'].apply(lambda x: 1 if x >= 15 else 0)
df = df.drop(['ARRIVAL_DELAY'], axis=1)

df.head()

	DISTANCE	SCHEDULED_TIME	MONTH	SCHED_DEP_HOUR	SCHED_ARR_HOUR	FLIGHT_COUNT	DAY_OF_WEEK	ORIGIN_AIRPORT	DESTINATION_AIRPORT	DELAY_OR_NOT
0	2586	342	9	7	15	240	3	SFO	JFK	1
1	1235	185	5	6	11	366	4	LAX	DFW	0
2	184	76	4	17	18	172	7	BOS	LGA	0
3	862	148	7	19	21	260	7	IAH	DEN	1
4	236	71	3	20	21	266	2	LAS	LAX	1

# # radial basis functions
# rbf_weeks = RepeatingBasisFunction(n_periods=7,
#                                     remainder='drop',
#                                     column='DAY_OF_WEEK',
#                                     input_range=(1,7))

# rbf_months = RepeatingBasisFunction(n_periods=12, 
#                                     remainder='drop', 
#                                     column='MONTH', 
#                                     input_range=(1,12))        

# rbf_weeks.fit(df)
# rbf_months.fit(df)

# rbfW = rbf_weeks.transform(df)
# rbfM = rbf_months.transform(df)

# use mean values to fill NAN
df = df.fillna(df.mean())

# one-hot-encode and make certain to drop one column for each predictor
df = pd.get_dummies(df, drop_first=True)

# # for RBF
# X = df.drop(['DELAY_OR_NOT','DAY_OF_WEEK','MONTH'], axis=1)
# X = np.concatenate((X, rbfW, rbfM), axis=1)

# Split the data into training and test sets
X = df.drop(['DELAY_OR_NOT'], axis=1)
y = df['DELAY_OR_NOT']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=111)

# scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# shape of the dataset
print(f'The shape of the X_train is {X_train.shape}')
print(f'The shape of the y_train is {y_train.shape}')
print(f'The shape of the X_test is {X_test.shape}')
print(f'The shape of the y_test is {y_test.shape}')

The shape of the X_train is (8000, 808)
The shape of the y_train is (8000,)
The shape of the X_test is (2000, 808)
The shape of the y_test is (2000,)

1.2. Generate Dataset

定义Dataset来适配该数据集，接着使用DataLoader将数据集化成batch。

class FlightDataset(paddle.io.Dataset):
    """
    步骤一：继承paddle.io.Dataset类
    """
    def __init__(self, data, label):
        """
        步骤二：传入data和label
        """
        super(FlightDataset, self).__init__()
        self.num_samples = data.shape[0]
        self.data = data
        self.label = label

    def __getitem__(self, index):
        """
        步骤三：实现__getitem__方法，定义指定index时如何获取数据，并返回单条数据（训练数据，对应的标签）
        """
        # row = np.array(self.data[index]).astype('float32')
        # predic = np.array(self.label[index]).astype('float32')
        # return row, predic
        return self.data[index], self.label[index]

    def __len__(self):
        """
        步骤四：实现__len__方法，返回数据集总数目
        """
        return self.num_samples

# to_tensor
X_train_tensor = paddle.to_tensor(X_train, dtype='float32')  
y_train_tensor = paddle.to_tensor(np.array(y_train), dtype='float32')  
X_test_tensor = paddle.to_tensor(X_test, dtype='float32')  
y_test_tensor = paddle.to_tensor(np.array(y_test), dtype='float32') 
# 数据导入
train_dataset = FlightDataset(X_train_tensor, y_train_tensor)
test_dataset = FlightDataset(X_test_tensor, y_test_tensor)
# 实例化数据读取器
batch_size = 128
train_dataloader = paddle.io.DataLoader(train_dataset, shuffle=True, batch_size=batch_size, num_workers=2)
test_dataloader = paddle.io.DataLoader(test_dataset, shuffle=False, batch_size=batch_size, num_workers=2)

2. Baseline Model

本项目的Baseline模型是简单的MLP加上一些正则化方式，模型结构见下文summary。
模型最终测试精度为72%。从下图中可以看出模型已存在过拟合现象，因此增加正则化或许能提高模型性能，在此不做深究。

在这里插入图片描述

# model parameters
n_input = X_train.shape[1]
n_hidden = 15
n_output = 1

# define the model
nn_model = nn.Sequential(
    nn.Linear(n_input, n_hidden, weight_attr=nn.initializer.KaimingNormal()), 
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(n_hidden, n_hidden, weight_attr=nn.initializer.KaimingNormal()), 
    nn.ReLU(),
    nn.Linear(n_hidden, n_output), 
    nn.Sigmoid()
)

# model summary
paddle.summary(nn_model, (1, 3, 808))

---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Linear-4        [[1, 3, 808]]          [1, 3, 15]          12,135     
    ReLU-3          [[1, 3, 15]]          [1, 3, 15]             0       
   Dropout-2        [[1, 3, 15]]          [1, 3, 15]             0       
   Linear-5         [[1, 3, 15]]          [1, 3, 15]            240      
    ReLU-4          [[1, 3, 15]]          [1, 3, 15]             0       
   Linear-6         [[1, 3, 15]]          [1, 3, 1]             16       
   Sigmoid-2        [[1, 3, 1]]           [1, 3, 1]              0       
===========================================================================
Total params: 12,391
Trainable params: 12,391
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.00
Params size (MB): 0.05
Estimated Total Size (MB): 0.06
---------------------------------------------------------------------------






{'total_params': 12391, 'trainable_params': 12391}

# compile it and run it
# compile the model
model = paddle.Model(nn_model)
learning_rate = 0.001
clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=1.0)
opt = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=model.parameters(), grad_clip=clip, weight_decay=0.05)
model.prepare(optimizer=opt, loss=paddle.nn.BCELoss(), metrics=paddle.metric.Precision())

# train
early_stop = paddle.callbacks.EarlyStopping(monitor='precision', mode='max', patience=50, verbose=0, save_best_model=True)
visualdl = paddle.callbacks.VisualDL('log/baseline')
model.fit(train_dataloader, test_dataloader, epochs=500, verbose=1, 
                        callbacks=[early_stop,visualdl], save_dir='checkpoint/baseline', save_freq=50)

# evaluate baseline model
best_model = paddle.Model(nn_model)
best_model.load('checkpoint/baseline/best_model.pdparams')
learning_rate = 0.001
clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=1.0)
opt = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=best_model.parameters(), grad_clip=clip, weight_decay=0.1)
best_model.prepare(optimizer=opt, loss=paddle.nn.BCELoss(), metrics=paddle.metric.Precision())
best_model.evaluate(test_dataloader, verbose=0)

{'loss': [0.5625631], 'precision': 0.7204968944099379}

3. Model Interpretation

这到了本项目第一个重点，如何对模型做出解释？

3.1. Feature Importance

第一个方法

为了解释最终的baseline model，我们将首先使用一个我们知道如何解释的“代理模型”，并用我们的baseline模型的预测结果上训练它。

为此我们需要修改我们的训练集。首先，为训练集生成一组baseline model的预测结果。对于“代理模型”的数据集：(a)X与baseline训练集的X相同，(b)将y替换为baseline model对训练集的预测结果。

接下来，用修改后的训练数据集拟合一个logistics regression model(代理模型)。打印代理模型的测试精度以确认它与我们的baseline model测试精度相似。为了达到类似的精度，可能需要调整LogisticRegression中的C。

最后使用sklearn的permutation_importance函数来计算特征的重要性。并生成一个条形图，展示使用permutation_importance得到的前10个最重要的预测因子的相对重要性。

Resource:

# generate logreg dataset
# Convert probabilities into labels
# 切记不要用train_dataloader来predict，因为有shuffle
y_train_hat = best_model.predict(X_train_tensor, stack_outputs=True, verbose=0)[0]
y_train_logreg = np.round(y_train_hat).ravel()

# Fit the logistic regression model
logreg = LogisticRegression(penalty='l2', max_iter=1000)
logreg.fit(X_train, y_train_logreg)

# Print the logreg test accuracy
y_test_logreg_hat = logreg.predict(X_test)
print("logreg_model_test_acc:", accuracy_score(y_test_logreg_hat, y_test))

logreg_model_test_acc: 0.7265

# 需要时间较长(4核需3分钟)

# compute the feature importances use permutation_importance
perm_results  = permutation_importance(logreg, X_test, y_test, n_jobs=1)

# calculate relative importance of top 10 predictors in descending order
relimp10 = np.flip(np.sort(perm_results.importances_mean)[-10:])/perm_results.importances_mean.max()

# identify indices of top 10 predictors in descending order
relimp10_idx = np.flip(np.argsort(perm_results.importances_mean)[-10:])

# identify top 10 predictor names based on sorted top 10 indices
relimp10_preds = [X.columns[i] for i in relimp10_idx]

# plot results of top 10 predictors
fig, ax = plt.subplots(figsize=(9, 6))
plt.title(
    "Top 10 predictors by relative importance\n"
    "as identfied using \"permutation_importance\"",
    fontsize=16,
)
ax.barh(relimp10_preds[::-1], relimp10[::-1], alpha=0.5)
ax.tick_params(labelsize=12)
ax.set_xlabel("relative feature importance", fontsize=12)
ax.grid(':', alpha=0.4)
plt.tight_layout()
plt.show()

在这里插入图片描述

3.2. Visualize Important Features

第二种方法

另一种解释的方法是将数据集中除选中的特征外，其他特征全部设置为平均值，以此来观察选中变量对预测结果的影响。需要注意的是，我们最好从上面发现的最重要的特征中选择（SCHED_DEP_HOUR， FLIGHT_COUNT， SCHED_ARR_HOUR， DISTANCE）。为了便于解释，以下所有图的预测因子按其原始比例显示。

# examining the response as a function of any of the predictors
def predictors_examining_14(examining_predictors, NN_model):
    '''
    input:
        examining_predictors: list of strings, the names of the predictors to examine
        NN_model: the trained model
    output:
        None(just print the plot)
    '''

    # set all predictors to their means of scaled values except the examining variables

    # get the original data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=111)
    # get the column index of the examining variables
    examining_predictors_index = X_train.columns.get_indexer(examining_predictors)
    
    # scale the data and get the mean of the scaled data
    scaler_temp = StandardScaler()
    X_train_temp = scaler_temp.fit_transform(X_train)
    X_train_final = X_train_temp.copy()
    X_train_final[:]= X_train_temp.mean(axis=0)
    # keep the examing variables to their scaled values
    X_train_final[:,examining_predictors_index] = X_train_temp[:,examining_predictors_index]
    X_train_final = paddle.to_tensor(X_train_final, dtype='float32')  
    # predict the probability of delay with the new data
    y_train_hat = NN_model.predict(X_train_final, verbose=0)


    # plot the predicted probabilities of delay vs. examining predictors on the training set.
    # set the first axis
    fig, ax1 = plt.subplots(figsize=(9, 6))
    # ax1.set_title('Predicted Probability of Delay vs. '+' & '.join(examining_predictors), fontsize=24)
    ax1.set_title(
        "\"Model\" training predictions\nby {}\n"\
        "with all other predictors are held constant at their means"
        .format(
            ' & '.join(examining_predictors),
        ),
        fontsize=15,
    )

    ax1.set_xlabel('Predicted Probability of Delay', fontsize=16)
    ax1.set_ylabel('Original '+examining_predictors[0], color='b', fontsize=16)
    s1 = ax1.scatter(y_train_hat, X_train[examining_predictors[0]], c='b', marker='o', label=examining_predictors[0])
    ax1.tick_params(axis='y', labelcolor='b')

    # set the second axis(if have more than one examining variables)
    if len(examining_predictors) == 1:
        ax1.legend(loc=0, fontsize=12)
        plt.show()
    else:
        # Create a second Y-axis that shares the X-axis
        ax2 = ax1.twinx()
        ax2.set_ylabel('Original '+examining_predictors[1], color='r', fontsize=16)
        s2 = ax2.scatter(y_train_hat, X_train[examining_predictors[1]], c='r', marker='o', label=examining_predictors[1])
        ax2.tick_params(axis='y', labelcolor='r')

        # Add the legend to the plot
        sca = [s1,s2]
        labels = [lable.get_label() for lable in sca]
        plt.legend(sca, labels,loc = 2, fontsize=12)

        plt.show()

# except SCHED_DEP_HOUR (3.2.1)
predictors_examining_14(['SCHED_DEP_HOUR'], best_model)

在这里插入图片描述

通过保持所有其他变量不变，我们可以观察到SCHED_DEP_HOUR对模型预测结果的影响。可以看到，随着一天中预定起飞时间的增加，航班延误的预测概率也会增加。这一发现有助于说明了该变量在我们的模型中的影响。然而，这一发现也提出了一些问题，即其他哪些混杂因素可能导致这一预测的影响。下面将探究变量之间的影响。

# except SCHED_DEP_HOUR and FLIGHT_COUNT (3.2.2)
predictors_examining_14(['SCHED_DEP_HOUR','FLIGHT_COUNT'], best_model)

在这里插入图片描述

# except SCHED_DEP_HOUR and SCHED_ARR_HOUR (3.2.3)
predictors_examining_14(['SCHED_DEP_HOUR','SCHED_ARR_HOUR'], best_model)

在这里插入图片描述

# except SCHED_DEP_HOUR and DISTANCE (3.2.4)
predictors_examining_14(['SCHED_DEP_HOUR','DISTANCE'], best_model)

在这里插入图片描述

上面三张图最明显的是，"SCHED_DEP_HOUR"对预测结果的影响显著大于另一个对比的变量，这与上面得到的feature importance结果一致。表明模型预测结果主要受该变量影响。
在3.2.2中，每天的航班数量可以集中在0-300的范围内，延误的概率随着航班数量的增加而略有增加。由此可以推断出，繁忙的机场会增加延误的概率。
在3.2.3中，随着到达时间从0增加到24，延误概率略有增加，需要注意的是，右下角的点是第二天到达的（延误概率高主要是受前一天出发时间晚的影响）。
在3.2.4中，我们可以看到，飞行距离集中在0到3000英里之间，随着飞行距离的增加，延误概率也略有增加。

4. Bagging (bootstrap-aggregated)

这是该项目的第二个重点

这里使用与baseline model相同的网络架构(层、节点、激活等)，使用bootstrap创建多个训练集，并为每个bootstrap数据集拟合一个单独的MLP模型。预测每个模型的测试数据的输出。

Resource：

https://www.zhihu.com/column/p/262211315
https://blog.csdn.net/dmsgames/article/details/81943206

def progressbar(n_step, n_total):
    """Prints self-updating progress bar to stdout to track for-loop progress
    
    There are entire 3rd-party libraries dedicated to custom progress-bars.
    A simple function like this is often more than enough to get the job done.
    
    :param n_total: total number of expected for-loop iterations
    :type n_total: int
    :param n_step: current iteration number, starting at 0
    :type n_step: int

    .. example::
    
        for i in range(n_iterations):
            progressbar(i, n_iterations)
            
    .. source:
    
        This function is a simplified version of code found here:
        https://stackoverflow.com/questions/3160699/python-progress-bar/15860757#15860757
    """
    n_step = n_step + 1
    barlen = 50
    progress = n_step / n_total
    block = int(round(barlen * progress))
    status = ""
    if n_step == n_total:
        status = "Done...\r\n\n"
    text = "\r [{0}] {1}/{2} {3}".format(
        "=" * block + "-" * (barlen - block),
        n_step,
        n_total,
        status,
    )
    sys.stdout.write(text)
    sys.stdout.flush()

# build and compile the model
def buildModel(learning_rate = 0.001):
    '''
    input:
    learning_rate: float, the learning rate of the model
    output:
    model: the model
    '''
    # build NN model
    # model parameters
    n_input = X_train.shape[1]
    n_hidden = 15
    n_output = 1

    # define the model
    nn_model = nn.Sequential(
        nn.Linear(n_input, n_hidden, weight_attr=nn.initializer.KaimingNormal()), 
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(n_hidden, n_hidden, weight_attr=nn.initializer.KaimingNormal()), 
        nn.ReLU(),
        nn.Linear(n_hidden, n_output), 
        nn.Sigmoid()
    )

    # compile the model
    model = paddle.Model(nn_model)
    clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=1.0)
    opt = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=model.parameters(), grad_clip=clip, weight_decay=0.05)
    model.prepare(optimizer=opt, loss=paddle.nn.BCELoss(), metrics=paddle.metric.Precision())

    return model

%%time
# 需要时间较长(4核需xx分钟)
# bootstraps不应过小，否则体现不出效果
# bootstraps数量与训练时间成正比，可以通过减小bootstraps缩短学习时间
# Bootstrap and train your networks and get predictions on fixed X test data
bootstraps = 100
epochs = 500
batch_size = 256
learning_rate = 0.001

boot_models = []
for i in range(bootstraps):
    # progress bar
    progressbar(i, bootstraps)

    # build, compile and train the model
    boot_X_train, boot_y_train = resample(X_train, y_train, replace=True, n_samples=int(X_train.shape[0]))
    boot_model = buildModel(learning_rate=learning_rate)

    # generate resample dataset
    boot_X_train = paddle.to_tensor(boot_X_train, dtype='float32')  
    boot_y_train = paddle.to_tensor(np.array(boot_y_train), dtype='float32')
    boot_train_dataset = FlightDataset(boot_X_train, boot_y_train)  
    boot_train_dataloader = paddle.io.DataLoader(boot_train_dataset, shuffle=True, batch_size=256, num_workers=4)

    early_stop = paddle.callbacks.EarlyStopping(monitor='precision', mode='max', patience=50, verbose=0, save_best_model=True)
    boot_model.fit(boot_train_dataloader, test_dataloader, epochs=epochs, verbose=0, callbacks=[early_stop])

    # save model
    boot_models.append(boot_model)

4.1. Confidence Interval

随机选择6个测试观测数据，在6个子图上绘制预测概率的分布(即n个bootstrapped probability)，在每个子图中标记95%CI边界，在每个子图的标题中包含每个观测数据的实际类别，以便于参考。

根据下面显示的6个图可以观察到一些有趣的东西，大部分情况下，无论使用的是何种bootstrap数据集，该模型都可以"自信"的预测出结果。这意味着，大多数概率预测都处于产生“正确”的预测(p= 0.5的一侧)。我们也可以观察到其中模型预测错误的情况，也非常"自信"。其他的测试预测似乎远没有那么确定，在这些情况下，bootstrap预测的分布更加分散。

# ramdomly select 10 test observations
boot_X_test, boot_y_test = resample(X_test, y_test, replace=0, n_samples=6)
boot_y_hat = np.zeros((bootstraps, boot_X_test.shape[0]))
for i in range(bootstraps):
    boot_y_hat[i] = boot_models[i].predict(paddle.to_tensor(boot_X_test, dtype='float32') , stack_outputs=True, verbose=0)[0].ravel()
boot_y_hat = boot_y_hat.T

# plot the distribution of predicted probabilities with the 95% CI bounds clearly marked and reported in each subplot
fig, ax = plt.subplots(6, 1, figsize=(9, 25))
for i in range(6):
    ax[i].set_title(f'The distribution of {bootstraps} bootstrapped probabilites with the 95% CI bounds , sample {i}, True label: {boot_y_test.iloc[i]}')
    ax[i].hist(boot_y_hat[i], bins=30, range=(0, 1), color='blue')
    ax[i].axvline(boot_y_hat[i].mean(), color='k', linestyle='dashed', linewidth=2, label='Mean')
    ax[i].axvspan(np.percentile(boot_y_hat[i], 2.5), np.percentile(boot_y_hat[i], 97.5), color='red', alpha=0.3, label='95% CI')
    ax[i].set_xlabel('Predicted Probability of Delay')
    ax[i].set_ylabel('Frequency')
    ax[i].legend()
fig.tight_layout()
plt.show()

在这里插入图片描述

4.2. Abstain Model

使用从上面的bootstrap样本中获得的预测的概率分布，我们可以评估我们的bagging(即bootstrap-aggregated)预测对每个测试观察的“重要性”。为了实现这一点，首先计算跨越阈值p=0.5的bootstrapped的比例。我们称这个比例为PPR(Posterior Prediction Ratio)。当PPR=0时，该条数据(一行数据)的所有bootstrap概率都在p=0.5的同一侧。同样地，当PPR=0.5时，该条数据(一行数据)的bootstrap预测的一半是y=0，另一半是y=1。在计算所有测试观察值后，应该有 2000个PPR值(即每条数据一个PPR值)。

接下来，为了获得更准确的预测，我们可以创建一个Abstain Model，如果未满足某个定义的显著性阈值(即最大PPR值)，该模型将不会对某一行数据进行预测。

最后，绘制出abstain bagging model中未弃权的测试观察结果的比例(即有预测结果的比例)，以及预测准确的比较。

最后的图中显示，随着PPR阈值的减小(更高可信度)，有预测结果的比例有明显下降。反之，预测比例上升。我们还可以看到，在更低的PPR阈值下，我们的分类准确率越来越高。而我们abstain bagging model所能做到的最好的情况是在牺牲50%的可用预测的基础上，将测试精度增加约10个额外点。

# predict all test data with the bootstrapped models
boot_y_hat_all = np.zeros((bootstraps, X_test.shape[0]))
for i in range(bootstraps):
    # progress bar
    progressbar(i, bootstraps)
    # predict
    boot_y_hat_all[i] = boot_models[i].predict(X_test_tensor, stack_outputs=True, verbose=0)[0].ravel()
boot_y_hat_all = boot_y_hat_all.T

# calculate posterior prediction ratio
boot_y_hat_label = np.round(boot_y_hat_all)
PPR = (boot_y_hat_label==0).mean(axis=1)

 [==================================================] 100/100 Done...

def abstain_bagging_model(threshold=0.5):
    '''
    input:
        threshold: float, max is 0.5, when it is 0 the bagging model have 100% confidence
    output:
        The rest of the samples, accuracy and proportion of the samples that not abstained
    '''

    # delete the abstained samples
    indexes = np.where(abs(PPR-0.5) >= abs(threshold-0.5))
    X, y = X_test[indexes], y_test.iloc[indexes]

    # calculate the proportion of the samples that not abstained
    proportion = X.shape[0]/X_test.shape[0]

    # calculate the accuracy
    acc = ((np.round(boot_y_hat_all[indexes]).sum(axis=1)>(len(boot_models)/2))==y_test.iloc[indexes]).mean()

    return X, y, acc, proportion

# test accuracy and proportion of test observations not abstained for abstain bagging model
def  plot_acc_proportion_16(start, end, step):
    '''
    input:
        start: the start of the range
        end: the end of the range
        step: the step of the range
    output:
        plot the accuracy and proportion of the samples that not abstained
    '''

    # generate the data for the next part
    arr = np.arange(start, end, step)
    accs=[]
    pros=[]
    for i in arr:
        _, _, acc, proportion = abstain_bagging_model(threshold=i)
        accs.append(acc)
        pros.append(proportion)


    # plot accuracies and proportion of test observations not abstained for abstain bagging model
    
    # plot accuracies
    # set the first axis
    fig16, ax16 = plt.subplots(figsize=(14,7))
    ax16.set_title('PPR Threshold vs. Accuracy and Rest Propotion of Test Observation', fontsize=24)
    ax16.set_xlabel('Posterior Prediction Ratio Threshlod', fontsize=16)
    ax16.set_ylabel('Accuracy of abstain bagging model', color='b', fontsize=16)
    s1 = ax16.plot(arr, accs, c='b', label='Accuracy')
    ax16.tick_params(axis='y', labelcolor='b')
    ax16.legend(loc=0)

    # plot the proportion of test observations not abstained
    # Create a second Y-axis that shares the X-axis
    ax2 = ax16.twinx()
    ax2.set_ylabel('Rest ratio of test observation', color='r', fontsize=16)
    s2 = ax2.plot(arr, pros, c='r', label='Rest Propotion')
    ax2.tick_params(axis='y', labelcolor='r')
    ax2.legend(loc=0)

    # # fig16.tight_layout()
    plt.show()

# test accuracy and proportion of test observations not abstained for abstain bagging model when the threshold is increased
plot_acc_proportion_16(0, 0.51, 0.01)

在这里插入图片描述