高级检索

    基于生成对抗网络的烟田土壤有机质含量高光谱估测

    Hyperspectral Estimation of Soil Organic Matter Content in Tobacco Fields Based on Generated Adversarial Network

    • 摘要: 土壤有机质(soil organic matter,SOM)是评价土壤肥力高低的一项重要指标,在烟草生长过程中发挥了重要的作用。本研究在采集湖北省烟田土壤样本基础上,借助生成式对抗网络(generative adversarial networks,GAN)生成伪样本扩充建模集。使用标准正态变换(standard normal variable,SNV)、多元散射校正(multiplicative scatter correction,MSC)组合一阶微分(FD)、倒数对数(LR)以及倒数对数一阶微分(LRFD)进行预处理,结合皮尔逊相关系数(Pearson correlation coefficient,PCC)筛选敏感特征波段。使用偏最小二乘回归(partial least squares regression,PLSR)、随机森林(random forest,RF)和反向传播神经网络(back propagation neural networks,BPNN)3种机器学习方法,构建烟田SOM含量估测模型。结果表明:(1)25 000次训练后的GAN模型,生成的伪样本具有与真实样本相似的特征和规律;(2)经过MSC+LRFD预处理后,全波段反射率与SOM含量的相关性得到了提高,相关系数最高可达到0.66;(3)伪样本数量占比为150%时,经过特征波段筛选后,MSC+BPNN模型验证精度最优,其决定系数(coefficient of determination,R2)、相对分析误差(relative percent difference,RPD)和均方根误差(root mean square error,RMSE)分别为0.80、2.22和3.18。相比较原始数据集构建的最优模型,其模型精度提升了9.59%。研究证实,将GAN模型生成的伪样本添加进建模集中,可有效提高模型的估测性能,为复杂山区烟田SOM估测提供一种新的途径。

       

      Abstract: Soil organic matter (SOM) is a crucial indicator for evaluating soil fertility and plays an important role in tobacco growth. In this study, soil samples from tobacco fields in Hubei Province were collected, and generative adversarial networks (GAN) were used to generate pseudo-samples to expand the modeling set. Reflectance data were preprocessed by using standard normal variate (SNV), multiplicative scatter correction (MSC), first derivative (FD), logarithm reciprocal (LR), and logarithm reciprocal first derivative (LRFD). Sensitive spectral bands were selected based on pearson correlation coefficients. Partial least squares regression (PLSR), random forest (RF), and back propagation neural networks (BPNN) were then used to construct SOM estimation models for the tobacco fields. Results showed as the follows (1) After the GAN model was trained for 25 000 times, the generated pseudo samples showed similar characteristics and rules of real samples. (2) After MSC+LRFD preprocessing, the correlation between full band spectral reflectance and SOM content was increased, with the value of the correlation coefficient reaching up to 0.66. (3) When the pseudo-sample quantity reached 150%, after feature band selection, the MSC+BPNN model showed the best validation accuracy with a coefficient of determination (R2), relative percent difference (RPD), and root mean square error (RMSE) of 0.80, 2.22, and 3.18, respectively. Compared to the optimal model constructed from the original dataset, the model accuracy improved by 9.59%. The results from this study confirmed that adding GAN-generated pseudo-samples to the modeling set effectively enhanced model estimation performance, providing a new approach for SOM estimation in complex mountainous tobacco fields.

       

    /

    返回文章
    返回