★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>

一. 项目介绍

❤️基于神经网络结合紫外差分光谱的二氧化硫浓度定量预测❤️

二氧化硫(SO2)是一种常见的环境污染物,对大气、水体和土壤等环境有着广泛的影响。因此,准确监测和预测大气中的二氧化硫浓度对于环境管理和污染控制具有重要意义。紫外差分光谱是一种常用于二氧化硫浓度监测的方法,通过测量大气中SO2在紫外光波段的吸收特性来进行定量分析。

本项目旨在通过应用神经网络技术,结合紫外差分光谱数据,实现对二氧化硫浓度的准确定量预测。项目将采用从不同环境中收集的紫外差分光谱数据,包括大气中SO2的光谱吸收特性以及环境参数(如温度、湿度等),作为输入特征。基于这些输入特征,将建立一个神经网络模型,通过对历史数据的学习和训练,实现对二氧化硫浓度的预测。

项目计划包括以下步骤:

  1. 数据采集和准备:从不同环境中采集紫外差分光谱数据,包括SO2的光谱吸收特性以及环境参数。对采集到的数据进行处理和准备,包括数据清洗、特征提取和特征工程等。

  2. 模型选择和设计:根据项目需求,选择合适的神经网络模型,并进行模型的设计。可以考虑使用常见的神经网络模型,如多层感知器(MLP)、卷积神经网络(CNN)或循环神经网络(RNN)等。

  3. 模型训练和调优:使用采集到的紫外差分光谱数据,对选定的神经网络模型进行训练和调优。包括将数据集划分为训练集和验证集,进行模型参数的优化和调整,以获得最佳的预测性能。

  4. 模型评估和验证:通过对模型进行评估和验证,包括使用测试数据集进行性能测试,评估模型的预测准确性、稳定性和可靠性。根据评估结果进行模型的调整和优化。

  5. 结果解释和应用:根据训练好的神经网络模型,实现对二氧化硫的浓度预测

二. 代码运行

1. 解压数据

# 运行完一次记得注释掉
!unzip /home/aistudio/data/data208645/Data.zip -d ./data

2. 导包

import pandas as pd
import paddle
import numpy as np
from sklearn.model_selection import cross_val_score, train_test_split
import matplotlib.pyplot as plt

3. 读取数据

train_data = pd.read_excel("./data/Data/train.xlsx", header=None)
val_data = pd.read_excel("./data/Data/val.xlsx", header=None)
test_data = pd.read_excel("./data/Data/test.xlsx", header=None)
print("加载数据完成!")

print("train_data:",train_data)
print("val_data:",val_data)
print("test_data:",test_data)
加载数据完成!
train_data:           0         1         2         3         4         5         6    \
0   -0.012270 -0.008444 -0.004375 -0.000184  0.003440  0.006620  0.008990   
1   -0.025047 -0.015357 -0.005686  0.002572  0.008761  0.013994  0.018033   
2   -0.037505 -0.021944 -0.007038  0.005147  0.014224  0.021056  0.026412   
3   -0.061983 -0.034999 -0.010056  0.009798  0.023887  0.034412  0.042387   
4   -0.073650 -0.041423 -0.011709  0.011549  0.027902  0.040159  0.049526   
..        ...       ...       ...       ...       ...       ...       ...   
164 -0.148829 -0.115722 -0.075014 -0.028676  0.019919  0.065222  0.104061   
165 -0.160321 -0.122449 -0.077428 -0.027611  0.023061  0.070258  0.110567   
166 -0.170898 -0.128527 -0.079606 -0.027007  0.026049  0.074729  0.116371   
167 -0.194200 -0.142182 -0.084521 -0.025135  0.032484  0.084201  0.128401   
168 -0.205352 -0.148857 -0.087424 -0.024942  0.034898  0.088087  0.133736   

          7         8         9    ...       414       415       416  \
0    0.011761  0.013728  0.014204  ...  0.000974  0.001178  0.001687   
1    0.022009  0.024704  0.025219  ...  0.001391  0.001821  0.002151   
2    0.031662  0.035359  0.035733  ...  0.002669  0.003190  0.003518   
3    0.050285  0.055814  0.056690  ...  0.004162  0.005100  0.005648   
4    0.059010  0.065689  0.067010  ...  0.005132  0.005943  0.006464   
..        ...       ...       ...  ...       ...       ...       ...   
164  0.139079  0.164084  0.175114  ...  0.006676  0.008426  0.009503   
165  0.146746  0.173061  0.184654  ...  0.007423  0.009236  0.010182   
166  0.153701  0.181050  0.193047  ...  0.007897  0.009787  0.010914   
167  0.168198  0.197904  0.211459  ...  0.009296  0.011503  0.012656   
168  0.175235  0.206353  0.220749  ...  0.009904  0.012182  0.013512   

          417       418       419       420       421       422  423  
0    0.001411  0.000760  0.000105 -0.000448 -0.000898 -0.001340    1  
1    0.002091  0.001451  0.000401 -0.000655 -0.001414 -0.002175    2  
2    0.003363  0.001538  0.000307 -0.001323 -0.002451 -0.003273    3  
3    0.005315  0.002817  0.000515 -0.001990 -0.004093 -0.005589    5  
4    0.005909  0.002990  0.000467 -0.002356 -0.004476 -0.006186    6  
..        ...       ...       ...       ...       ...       ...  ...  
164  0.009424  0.006349  0.003002 -0.000789 -0.003904 -0.005904   11  
165  0.009977  0.006880  0.003316 -0.000988 -0.004208 -0.006403   12  
166  0.010501  0.007303  0.003547 -0.000943 -0.004642 -0.007005   13  
167  0.012375  0.008722  0.004119 -0.001417 -0.005753 -0.008739   15  
168  0.013215  0.009154  0.004446 -0.001490 -0.006323 -0.009596   16  

[169 rows x 424 columns]
val_data:         0         1         2         3         4         5         6    \
0 -0.052783 -0.033933 -0.015029  0.002018  0.016323  0.028161  0.037847   
1 -0.110682 -0.065505 -0.023264  0.011161  0.037215  0.057251  0.073270   
2 -0.166297 -0.097069 -0.034448  0.015865  0.053162  0.081216  0.103936   
3 -0.058965 -0.048355 -0.032819 -0.013860  0.007229  0.027811  0.045168   
4 -0.116864 -0.079927 -0.041054 -0.004716  0.028120  0.056900  0.080591   
5 -0.172479 -0.111491 -0.052238 -0.000012  0.044067  0.080866  0.111258   
6 -0.064833 -0.062929 -0.051263 -0.030225 -0.002288  0.026767  0.051877   
7 -0.122731 -0.094501 -0.059498 -0.021082  0.018603  0.055857  0.087301   
8 -0.178346 -0.126065 -0.070682 -0.016378  0.034551  0.079822  0.117967   

        7         8         9    ...       414       415       416       417  \
0  0.046858  0.053088  0.054782  ...  0.003508  0.004392  0.005005  0.004664   
1  0.088549  0.099801  0.103189  ...  0.006914  0.008079  0.008879  0.008336   
2  0.125875  0.142550  0.148810  ...  0.010057  0.011853  0.012906  0.012040   
3  0.060715  0.071836  0.076016  ...  0.002705  0.003693  0.004151  0.004271   
4  0.102406  0.118550  0.124423  ...  0.006111  0.007379  0.008026  0.007943   
5  0.139731  0.161299  0.170044  ...  0.009254  0.011154  0.012052  0.011647   
6  0.073872  0.089537  0.096120  ...  0.002488  0.003486  0.004277  0.004285   
7  0.115563  0.136250  0.144526  ...  0.005894  0.007173  0.008151  0.007957   
8  0.152889  0.178999  0.190148  ...  0.009037  0.010947  0.012178  0.011661   

        418       419       420       421       422  423  
0  0.001598  0.000032 -0.001605 -0.002903 -0.003765    4  
1  0.004480  0.001232 -0.002686 -0.005762 -0.007993    9  
2  0.007287  0.002562 -0.003217 -0.008333 -0.011865   14  
3  0.001805  0.000592 -0.000741 -0.001296 -0.001838    4  
4  0.004688  0.001792 -0.001822 -0.004155 -0.006066    9  
5  0.007494  0.003122 -0.002353 -0.006726 -0.009938   14  
6  0.002136  0.001090 -0.000121 -0.000501 -0.000710    4  
7  0.005019  0.002291 -0.001202 -0.003360 -0.004938    9  
8  0.007825  0.003621 -0.001733 -0.005931 -0.008810   14  

[9 rows x 424 columns]
test_data:         0         1         2         3         4         5         6    \
0 -0.047434 -0.028410 -0.009918  0.005256  0.017354  0.027271  0.035861   
1 -0.099031 -0.056289 -0.017596  0.013699  0.037585  0.055516  0.070344   
2 -0.062197 -0.040211 -0.017950  0.001974  0.019550  0.034519  0.047329   
3 -0.078246 -0.054903 -0.029160 -0.003992  0.019580  0.040875  0.059377   
4 -0.100289 -0.076556 -0.046752 -0.014660  0.017760  0.048068  0.074909   
5 -0.145121 -0.085051 -0.029667  0.015279  0.049730  0.076468  0.099693   
6 -0.132055 -0.085548 -0.039095  0.002191  0.037861  0.067773  0.093635   
7 -0.149218 -0.098919 -0.047757 -0.001335  0.039021  0.073859  0.104074   
8 -0.181601 -0.126136 -0.067020 -0.011380  0.039078  0.084366  0.123608   

        7         8         9    ...       413       414       415       416  \
0  0.043041  0.047589  0.046583  ...  0.001697  0.002205  0.002398  0.002375   
1  0.083724  0.092225  0.090622  ...  0.004748  0.006000  0.006237  0.006257   
2  0.058283  0.065208  0.064650  ...  0.002485  0.003446  0.003920  0.004008   
3  0.075338  0.085535  0.086047  ...  0.002587  0.003817  0.004476  0.004762   
4  0.097708  0.112348  0.115042  ...  0.002646  0.003971  0.005033  0.005154   
5  0.120608  0.133603  0.132983  ...  0.006556  0.008714  0.010171  0.010531   
6  0.116255  0.130705  0.130851  ...  0.005689  0.007372  0.008379  0.008518   
7  0.130882  0.148112  0.149885  ...  0.006297  0.008249  0.009537  0.009972   
8  0.157677  0.180059  0.184107  ...  0.006566  0.009127  0.010725  0.011372   

        417       418       419       420       421       422  
0  0.002135  0.003274  0.001698 -0.000402 -0.002128 -0.003881  
1  0.005262  0.005548  0.002415 -0.001448 -0.004943 -0.008167  
2  0.003807  0.003320  0.001366 -0.001075 -0.002917 -0.004651  
3  0.004385  0.003837  0.001748 -0.000906 -0.003038 -0.004939  
4  0.005022  0.004741  0.002553 -0.000666 -0.002847 -0.004768  
5  0.009600  0.007677  0.003151 -0.002592 -0.007531 -0.011657  
6  0.007794  0.006505  0.002671 -0.001938 -0.005730 -0.009093  
7  0.009068  0.006826  0.002848 -0.002285 -0.006388 -0.009745  
8  0.010681  0.008655  0.003997 -0.002015 -0.006974 -0.011061  

[9 rows x 423 columns]

4. 构建网络

class Regressor(paddle.nn.Layer):
    # self代表类的实例自身
    def __init__(self):
        # 初始化父类中的一些参数
        super(Regressor, self).__init__()
        
        self.fc1 = paddle.nn.Linear(in_features=423, out_features=40)
        self.fc2 = paddle.nn.Linear(in_features=40, out_features=20)
        self.fc3 = paddle.nn.Linear(in_features=20, out_features=1)

        self.relu = paddle.nn.ReLU()
    
    # 网络的前向计算
    def forward(self, inputs):
        x = self.fc1(inputs)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        return x

5. 设置优化器

# 声明定义好的线性回归模型
model = Regressor()

# 开启模型训练模式
model.train()

# 定义优化算法,使用随机梯度下降SGD
opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())

6. 模型训练

EPOCH_NUM = 20   # 设置外层循环次数
BATCH_SIZE =32  # 设置batch大小
loss_train = []
loss_val = []
training_data = train_data.values.astype(np.float32)
val_data = val_data.values.astype(np.float32)
# 定义外层循环
for epoch_id in range(EPOCH_NUM):
    # 在每轮迭代开始之前,将训练数据的顺序随机的打乱
    np.random.shuffle(training_data)
    
    # 将训练数据进行拆分,每个batch包含10条数据
    mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]
    
    train_loss = []
    for iter_id, mini_batch in enumerate(mini_batches):
        # 清空梯度变量,以备下一轮计算
        opt.clear_grad()

        x = np.array(mini_batch[:, :-1])
        y = np.array(mini_batch[:, -1:])
        
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x)
        y = paddle.to_tensor(y)
        
        # 前向计算
        predicts = model(features)
        
        # 计算损失
        loss = paddle.nn.functional.l1_loss(predicts, label=y)
        avg_loss = paddle.mean(loss)
        train_loss.append(avg_loss.numpy())
        
        # 反向传播,计算每层参数的梯度值
        avg_loss.backward()

        # 更新参数,根据设置好的学习率迭代一步
        opt.step()
    
    mini_batches = [val_data[k:k+BATCH_SIZE] for k in range(0, len(val_data), BATCH_SIZE)]
    val_loss = []
    for iter_id, mini_batch in enumerate(mini_batches):
        x = np.array(mini_batch[:, :-1])
        y = np.array(mini_batch[:, -1:])
        
        features = paddle.to_tensor(x)
        y = paddle.to_tensor(y)
        
        predicts = model(features)
        loss = paddle.nn.functional.l1_loss(predicts, label=y)
        avg_loss = paddle.mean(loss)
        val_loss.append(avg_loss.numpy())
    
    loss_train.append(np.mean(train_loss))
    loss_val.append(np.mean(val_loss))

    print(f'Epoch {epoch_id}, train MAE {np.mean(train_loss)}, val MAE {np.mean(val_loss)}')
Epoch 0, train MAE 8.263873100280762, val MAE 8.513092994689941
Epoch 1, train MAE 7.547825336456299, val MAE 8.191972732543945
Epoch 2, train MAE 7.6819167137146, val MAE 7.773591995239258
Epoch 3, train MAE 6.952884674072266, val MAE 7.235118865966797
Epoch 4, train MAE 6.29564905166626, val MAE 6.492135047912598
Epoch 5, train MAE 5.78107213973999, val MAE 5.391508102416992
Epoch 6, train MAE 4.520540714263916, val MAE 3.7607827186584473
Epoch 7, train MAE 2.98671817779541, val MAE 1.7697641849517822
Epoch 8, train MAE 1.3634449243545532, val MAE 1.000611662864685
Epoch 9, train MAE 1.0004390478134155, val MAE 0.9018352031707764
Epoch 10, train MAE 0.9352921843528748, val MAE 0.7989349365234375
Epoch 11, train MAE 0.8162664771080017, val MAE 0.7369043827056885
Epoch 12, train MAE 0.7658093571662903, val MAE 0.604179859161377
Epoch 13, train MAE 0.5806207060813904, val MAE 0.4997282028198242
Epoch 14, train MAE 0.48825696110725403, val MAE 0.6940793395042419
Epoch 15, train MAE 0.510711669921875, val MAE 0.3516807556152344
Epoch 16, train MAE 0.33238792419433594, val MAE 0.2861652374267578
Epoch 17, train MAE 0.29805561900138855, val MAE 0.24866832792758942
Epoch 18, train MAE 0.3073848783969879, val MAE 0.612284243106842
Epoch 19, train MAE 0.37934020161628723, val MAE 0.20900627970695496

7. 可视化loss

# loss
x = np.linspace(0, EPOCH_NUM+1, EPOCH_NUM)

plt.figure()
plt.plot(x, loss_train, color='red', linewidth=1.0, linestyle='--', label='line')
plt.plot(x, loss_val, color='y', linewidth=1.0, label='line')
plt.savefig('loss.png', dpi=600, bbox_inches='tight', transparent=False)
plt.legend(["train MAE", "val MAE"])
plt.title("Loss")
plt.xlabel('epoch_num')
plt.ylabel('loss value')

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans
  (prop.get_family(), self.defaultFamily[fontext]))





Text(55.8472,0.5,'loss value')

在这里插入图片描述

8. 模型验证

model.eval()
test_data = paddle.to_tensor(test_data.values.astype(np.float32))
test_predict = model(test_data)
test_predict = test_predict.numpy().flatten()
test_predict = test_predict.round().astype(int)
print("test_predict:",test_predict)
test_predict: [ 4  9  5  6  7 13 11 12 13]

_predict = test_predict.round().astype(int)
print(“test_predict:”,test_predict)


    test_predict: [ 4  9  5  6  7 13 11 12 13]


<div align='center'>
  <img src='https://ai-studio-static-online.cdn.bcebos.com/51bd56eab06c464e95f4c3a8e06abb3827d7334f174044a0bddb0c953b216598
' width='1000' />
</div> 


# 三. 结果展示


```python
x = np.linspace(0, 10, 9)
Y_test = [4,9,5,6,7,14,12,13,15]
Y_test = np.array(Y_test)

predicted = test_predict
plt.figure()
plt.scatter(x, predicted, color='red')  # 画点
plt.scatter(x, Y_test, color='y')  # 画点
plt.plot(x, predicted, color='red', linewidth=1.0, linestyle='--', label='line')
plt.plot(x, Y_test, color='y', linewidth=1.0, label='line')
plt.savefig('result.png', dpi=600, bbox_inches='tight', transparent=False)
plt.legend(["predict value", "true value"])
plt.title("SO2")
plt.xlabel('X')
plt.ylabel('Absorption intensity')

Text(47.0972,0.5,'Absorption intensity')

在这里插入图片描述

四. 总结

  • 从图中我们可以看出,在SO2高浓度的时候,预测的不是很准确,这大概率是因为非线性的影响。
  • 在气体浓度定量分析中,如何去除非线性的影响,是一直研究的课题。
  • 可以加入光谱预处理来提高模型的准确性
  • 例如可以对数据进行差分拟合、小波变换、傅里叶变换等来改进

作者简介

CSDN 人工智能领域新星创作者

百度飞桨官方帮帮团成员

腾讯云开发初级工程师认证

我在AI Studio上获得钻石等级,点亮9个徽章,来互关呀~

此文章为搬运
原项目链接

Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐