
基于神经网络结合紫外差分光谱的二氧化硫浓度定量预测
基于神经网络结合紫外差分光谱的二氧化硫浓度定量预测
★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
一. 项目介绍
❤️基于神经网络结合紫外差分光谱的二氧化硫浓度定量预测❤️
本项目旨在通过应用神经网络技术,结合紫外差分光谱数据,实现对二氧化硫浓度的准确定量预测。项目将采用从不同环境中收集的紫外差分光谱数据,包括大气中SO2的光谱吸收特性以及环境参数(如温度、湿度等),作为输入特征。基于这些输入特征,将建立一个神经网络模型,通过对历史数据的学习和训练,实现对二氧化硫浓度的预测。
项目计划包括以下步骤:
-
数据采集和准备:从不同环境中采集紫外差分光谱数据,包括SO2的光谱吸收特性以及环境参数。对采集到的数据进行处理和准备,包括数据清洗、特征提取和特征工程等。
-
模型选择和设计:根据项目需求,选择合适的神经网络模型,并进行模型的设计。可以考虑使用常见的神经网络模型,如多层感知器(MLP)、卷积神经网络(CNN)或循环神经网络(RNN)等。
-
模型训练和调优:使用采集到的紫外差分光谱数据,对选定的神经网络模型进行训练和调优。包括将数据集划分为训练集和验证集,进行模型参数的优化和调整,以获得最佳的预测性能。
-
模型评估和验证:通过对模型进行评估和验证,包括使用测试数据集进行性能测试,评估模型的预测准确性、稳定性和可靠性。根据评估结果进行模型的调整和优化。
-
结果解释和应用:根据训练好的神经网络模型,实现对二氧化硫的浓度预测
二. 代码运行
1. 解压数据
# 运行完一次记得注释掉
!unzip /home/aistudio/data/data208645/Data.zip -d ./data
2. 导包
import pandas as pd
import paddle
import numpy as np
from sklearn.model_selection import cross_val_score, train_test_split
import matplotlib.pyplot as plt
3. 读取数据
train_data = pd.read_excel("./data/Data/train.xlsx", header=None)
val_data = pd.read_excel("./data/Data/val.xlsx", header=None)
test_data = pd.read_excel("./data/Data/test.xlsx", header=None)
print("加载数据完成!")
print("train_data:",train_data)
print("val_data:",val_data)
print("test_data:",test_data)
加载数据完成!
train_data: 0 1 2 3 4 5 6 \
0 -0.012270 -0.008444 -0.004375 -0.000184 0.003440 0.006620 0.008990
1 -0.025047 -0.015357 -0.005686 0.002572 0.008761 0.013994 0.018033
2 -0.037505 -0.021944 -0.007038 0.005147 0.014224 0.021056 0.026412
3 -0.061983 -0.034999 -0.010056 0.009798 0.023887 0.034412 0.042387
4 -0.073650 -0.041423 -0.011709 0.011549 0.027902 0.040159 0.049526
.. ... ... ... ... ... ... ...
164 -0.148829 -0.115722 -0.075014 -0.028676 0.019919 0.065222 0.104061
165 -0.160321 -0.122449 -0.077428 -0.027611 0.023061 0.070258 0.110567
166 -0.170898 -0.128527 -0.079606 -0.027007 0.026049 0.074729 0.116371
167 -0.194200 -0.142182 -0.084521 -0.025135 0.032484 0.084201 0.128401
168 -0.205352 -0.148857 -0.087424 -0.024942 0.034898 0.088087 0.133736
7 8 9 ... 414 415 416 \
0 0.011761 0.013728 0.014204 ... 0.000974 0.001178 0.001687
1 0.022009 0.024704 0.025219 ... 0.001391 0.001821 0.002151
2 0.031662 0.035359 0.035733 ... 0.002669 0.003190 0.003518
3 0.050285 0.055814 0.056690 ... 0.004162 0.005100 0.005648
4 0.059010 0.065689 0.067010 ... 0.005132 0.005943 0.006464
.. ... ... ... ... ... ... ...
164 0.139079 0.164084 0.175114 ... 0.006676 0.008426 0.009503
165 0.146746 0.173061 0.184654 ... 0.007423 0.009236 0.010182
166 0.153701 0.181050 0.193047 ... 0.007897 0.009787 0.010914
167 0.168198 0.197904 0.211459 ... 0.009296 0.011503 0.012656
168 0.175235 0.206353 0.220749 ... 0.009904 0.012182 0.013512
417 418 419 420 421 422 423
0 0.001411 0.000760 0.000105 -0.000448 -0.000898 -0.001340 1
1 0.002091 0.001451 0.000401 -0.000655 -0.001414 -0.002175 2
2 0.003363 0.001538 0.000307 -0.001323 -0.002451 -0.003273 3
3 0.005315 0.002817 0.000515 -0.001990 -0.004093 -0.005589 5
4 0.005909 0.002990 0.000467 -0.002356 -0.004476 -0.006186 6
.. ... ... ... ... ... ... ...
164 0.009424 0.006349 0.003002 -0.000789 -0.003904 -0.005904 11
165 0.009977 0.006880 0.003316 -0.000988 -0.004208 -0.006403 12
166 0.010501 0.007303 0.003547 -0.000943 -0.004642 -0.007005 13
167 0.012375 0.008722 0.004119 -0.001417 -0.005753 -0.008739 15
168 0.013215 0.009154 0.004446 -0.001490 -0.006323 -0.009596 16
[169 rows x 424 columns]
val_data: 0 1 2 3 4 5 6 \
0 -0.052783 -0.033933 -0.015029 0.002018 0.016323 0.028161 0.037847
1 -0.110682 -0.065505 -0.023264 0.011161 0.037215 0.057251 0.073270
2 -0.166297 -0.097069 -0.034448 0.015865 0.053162 0.081216 0.103936
3 -0.058965 -0.048355 -0.032819 -0.013860 0.007229 0.027811 0.045168
4 -0.116864 -0.079927 -0.041054 -0.004716 0.028120 0.056900 0.080591
5 -0.172479 -0.111491 -0.052238 -0.000012 0.044067 0.080866 0.111258
6 -0.064833 -0.062929 -0.051263 -0.030225 -0.002288 0.026767 0.051877
7 -0.122731 -0.094501 -0.059498 -0.021082 0.018603 0.055857 0.087301
8 -0.178346 -0.126065 -0.070682 -0.016378 0.034551 0.079822 0.117967
7 8 9 ... 414 415 416 417 \
0 0.046858 0.053088 0.054782 ... 0.003508 0.004392 0.005005 0.004664
1 0.088549 0.099801 0.103189 ... 0.006914 0.008079 0.008879 0.008336
2 0.125875 0.142550 0.148810 ... 0.010057 0.011853 0.012906 0.012040
3 0.060715 0.071836 0.076016 ... 0.002705 0.003693 0.004151 0.004271
4 0.102406 0.118550 0.124423 ... 0.006111 0.007379 0.008026 0.007943
5 0.139731 0.161299 0.170044 ... 0.009254 0.011154 0.012052 0.011647
6 0.073872 0.089537 0.096120 ... 0.002488 0.003486 0.004277 0.004285
7 0.115563 0.136250 0.144526 ... 0.005894 0.007173 0.008151 0.007957
8 0.152889 0.178999 0.190148 ... 0.009037 0.010947 0.012178 0.011661
418 419 420 421 422 423
0 0.001598 0.000032 -0.001605 -0.002903 -0.003765 4
1 0.004480 0.001232 -0.002686 -0.005762 -0.007993 9
2 0.007287 0.002562 -0.003217 -0.008333 -0.011865 14
3 0.001805 0.000592 -0.000741 -0.001296 -0.001838 4
4 0.004688 0.001792 -0.001822 -0.004155 -0.006066 9
5 0.007494 0.003122 -0.002353 -0.006726 -0.009938 14
6 0.002136 0.001090 -0.000121 -0.000501 -0.000710 4
7 0.005019 0.002291 -0.001202 -0.003360 -0.004938 9
8 0.007825 0.003621 -0.001733 -0.005931 -0.008810 14
[9 rows x 424 columns]
test_data: 0 1 2 3 4 5 6 \
0 -0.047434 -0.028410 -0.009918 0.005256 0.017354 0.027271 0.035861
1 -0.099031 -0.056289 -0.017596 0.013699 0.037585 0.055516 0.070344
2 -0.062197 -0.040211 -0.017950 0.001974 0.019550 0.034519 0.047329
3 -0.078246 -0.054903 -0.029160 -0.003992 0.019580 0.040875 0.059377
4 -0.100289 -0.076556 -0.046752 -0.014660 0.017760 0.048068 0.074909
5 -0.145121 -0.085051 -0.029667 0.015279 0.049730 0.076468 0.099693
6 -0.132055 -0.085548 -0.039095 0.002191 0.037861 0.067773 0.093635
7 -0.149218 -0.098919 -0.047757 -0.001335 0.039021 0.073859 0.104074
8 -0.181601 -0.126136 -0.067020 -0.011380 0.039078 0.084366 0.123608
7 8 9 ... 413 414 415 416 \
0 0.043041 0.047589 0.046583 ... 0.001697 0.002205 0.002398 0.002375
1 0.083724 0.092225 0.090622 ... 0.004748 0.006000 0.006237 0.006257
2 0.058283 0.065208 0.064650 ... 0.002485 0.003446 0.003920 0.004008
3 0.075338 0.085535 0.086047 ... 0.002587 0.003817 0.004476 0.004762
4 0.097708 0.112348 0.115042 ... 0.002646 0.003971 0.005033 0.005154
5 0.120608 0.133603 0.132983 ... 0.006556 0.008714 0.010171 0.010531
6 0.116255 0.130705 0.130851 ... 0.005689 0.007372 0.008379 0.008518
7 0.130882 0.148112 0.149885 ... 0.006297 0.008249 0.009537 0.009972
8 0.157677 0.180059 0.184107 ... 0.006566 0.009127 0.010725 0.011372
417 418 419 420 421 422
0 0.002135 0.003274 0.001698 -0.000402 -0.002128 -0.003881
1 0.005262 0.005548 0.002415 -0.001448 -0.004943 -0.008167
2 0.003807 0.003320 0.001366 -0.001075 -0.002917 -0.004651
3 0.004385 0.003837 0.001748 -0.000906 -0.003038 -0.004939
4 0.005022 0.004741 0.002553 -0.000666 -0.002847 -0.004768
5 0.009600 0.007677 0.003151 -0.002592 -0.007531 -0.011657
6 0.007794 0.006505 0.002671 -0.001938 -0.005730 -0.009093
7 0.009068 0.006826 0.002848 -0.002285 -0.006388 -0.009745
8 0.010681 0.008655 0.003997 -0.002015 -0.006974 -0.011061
[9 rows x 423 columns]
4. 构建网络
class Regressor(paddle.nn.Layer):
# self代表类的实例自身
def __init__(self):
# 初始化父类中的一些参数
super(Regressor, self).__init__()
self.fc1 = paddle.nn.Linear(in_features=423, out_features=40)
self.fc2 = paddle.nn.Linear(in_features=40, out_features=20)
self.fc3 = paddle.nn.Linear(in_features=20, out_features=1)
self.relu = paddle.nn.ReLU()
# 网络的前向计算
def forward(self, inputs):
x = self.fc1(inputs)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
x = self.relu(x)
return x
5. 设置优化器
# 声明定义好的线性回归模型
model = Regressor()
# 开启模型训练模式
model.train()
# 定义优化算法,使用随机梯度下降SGD
opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
6. 模型训练
EPOCH_NUM = 20 # 设置外层循环次数
BATCH_SIZE =32 # 设置batch大小
loss_train = []
loss_val = []
training_data = train_data.values.astype(np.float32)
val_data = val_data.values.astype(np.float32)
# 定义外层循环
for epoch_id in range(EPOCH_NUM):
# 在每轮迭代开始之前,将训练数据的顺序随机的打乱
np.random.shuffle(training_data)
# 将训练数据进行拆分,每个batch包含10条数据
mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]
train_loss = []
for iter_id, mini_batch in enumerate(mini_batches):
# 清空梯度变量,以备下一轮计算
opt.clear_grad()
x = np.array(mini_batch[:, :-1])
y = np.array(mini_batch[:, -1:])
# 将numpy数据转为飞桨动态图tensor的格式
features = paddle.to_tensor(x)
y = paddle.to_tensor(y)
# 前向计算
predicts = model(features)
# 计算损失
loss = paddle.nn.functional.l1_loss(predicts, label=y)
avg_loss = paddle.mean(loss)
train_loss.append(avg_loss.numpy())
# 反向传播,计算每层参数的梯度值
avg_loss.backward()
# 更新参数,根据设置好的学习率迭代一步
opt.step()
mini_batches = [val_data[k:k+BATCH_SIZE] for k in range(0, len(val_data), BATCH_SIZE)]
val_loss = []
for iter_id, mini_batch in enumerate(mini_batches):
x = np.array(mini_batch[:, :-1])
y = np.array(mini_batch[:, -1:])
features = paddle.to_tensor(x)
y = paddle.to_tensor(y)
predicts = model(features)
loss = paddle.nn.functional.l1_loss(predicts, label=y)
avg_loss = paddle.mean(loss)
val_loss.append(avg_loss.numpy())
loss_train.append(np.mean(train_loss))
loss_val.append(np.mean(val_loss))
print(f'Epoch {epoch_id}, train MAE {np.mean(train_loss)}, val MAE {np.mean(val_loss)}')
Epoch 0, train MAE 8.263873100280762, val MAE 8.513092994689941
Epoch 1, train MAE 7.547825336456299, val MAE 8.191972732543945
Epoch 2, train MAE 7.6819167137146, val MAE 7.773591995239258
Epoch 3, train MAE 6.952884674072266, val MAE 7.235118865966797
Epoch 4, train MAE 6.29564905166626, val MAE 6.492135047912598
Epoch 5, train MAE 5.78107213973999, val MAE 5.391508102416992
Epoch 6, train MAE 4.520540714263916, val MAE 3.7607827186584473
Epoch 7, train MAE 2.98671817779541, val MAE 1.7697641849517822
Epoch 8, train MAE 1.3634449243545532, val MAE 1.000611662864685
Epoch 9, train MAE 1.0004390478134155, val MAE 0.9018352031707764
Epoch 10, train MAE 0.9352921843528748, val MAE 0.7989349365234375
Epoch 11, train MAE 0.8162664771080017, val MAE 0.7369043827056885
Epoch 12, train MAE 0.7658093571662903, val MAE 0.604179859161377
Epoch 13, train MAE 0.5806207060813904, val MAE 0.4997282028198242
Epoch 14, train MAE 0.48825696110725403, val MAE 0.6940793395042419
Epoch 15, train MAE 0.510711669921875, val MAE 0.3516807556152344
Epoch 16, train MAE 0.33238792419433594, val MAE 0.2861652374267578
Epoch 17, train MAE 0.29805561900138855, val MAE 0.24866832792758942
Epoch 18, train MAE 0.3073848783969879, val MAE 0.612284243106842
Epoch 19, train MAE 0.37934020161628723, val MAE 0.20900627970695496
7. 可视化loss
# loss
x = np.linspace(0, EPOCH_NUM+1, EPOCH_NUM)
plt.figure()
plt.plot(x, loss_train, color='red', linewidth=1.0, linestyle='--', label='line')
plt.plot(x, loss_val, color='y', linewidth=1.0, label='line')
plt.savefig('loss.png', dpi=600, bbox_inches='tight', transparent=False)
plt.legend(["train MAE", "val MAE"])
plt.title("Loss")
plt.xlabel('epoch_num')
plt.ylabel('loss value')
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans
(prop.get_family(), self.defaultFamily[fontext]))
Text(55.8472,0.5,'loss value')

8. 模型验证
model.eval()
test_data = paddle.to_tensor(test_data.values.astype(np.float32))
test_predict = model(test_data)
test_predict = test_predict.numpy().flatten()
test_predict = test_predict.round().astype(int)
print("test_predict:",test_predict)
test_predict: [ 4 9 5 6 7 13 11 12 13]
_predict = test_predict.round().astype(int)
print(“test_predict:”,test_predict)
test_predict: [ 4 9 5 6 7 13 11 12 13]
<div align='center'>
<img src='https://ai-studio-static-online.cdn.bcebos.com/51bd56eab06c464e95f4c3a8e06abb3827d7334f174044a0bddb0c953b216598
' width='1000' />
</div>
# 三. 结果展示
```python
x = np.linspace(0, 10, 9)
Y_test = [4,9,5,6,7,14,12,13,15]
Y_test = np.array(Y_test)
predicted = test_predict
plt.figure()
plt.scatter(x, predicted, color='red') # 画点
plt.scatter(x, Y_test, color='y') # 画点
plt.plot(x, predicted, color='red', linewidth=1.0, linestyle='--', label='line')
plt.plot(x, Y_test, color='y', linewidth=1.0, label='line')
plt.savefig('result.png', dpi=600, bbox_inches='tight', transparent=False)
plt.legend(["predict value", "true value"])
plt.title("SO2")
plt.xlabel('X')
plt.ylabel('Absorption intensity')
Text(47.0972,0.5,'Absorption intensity')

四. 总结
- 从图中我们可以看出,在SO2高浓度的时候,预测的不是很准确,这大概率是因为非线性的影响。
- 在气体浓度定量分析中,如何去除非线性的影响,是一直研究的课题。
- 可以加入光谱预处理来提高模型的准确性
- 例如可以对数据进行差分拟合、小波变换、傅里叶变换等来改进
作者简介
CSDN 人工智能领域新星创作者
百度飞桨官方帮帮团成员
腾讯云开发初级工程师认证
我在AI Studio上获得钻石等级,点亮9个徽章,来互关呀~
此文章为搬运
原项目链接
更多推荐




所有评论(0)