Abstract:
Soil organic matter (SOM) is a crucial indicator for evaluating soil fertility and plays an important role in tobacco growth. In this study, soil samples from tobacco fields in Hubei Province were collected, and generative adversarial networks (GAN) were used to generate pseudo-samples to expand the modeling set. Reflectance data were preprocessed by using standard normal variate (SNV), multiplicative scatter correction (MSC), first derivative (FD), logarithm reciprocal (LR), and logarithm reciprocal first derivative (LRFD). Sensitive spectral bands were selected based on pearson correlation coefficients. Partial least squares regression (PLSR), random forest (RF), and back propagation neural networks (BPNN) were then used to construct SOM estimation models for the tobacco fields. Results showed as the follows (1) After the GAN model was trained for 25 000 times, the generated pseudo samples showed similar characteristics and rules of real samples. (2) After MSC+LRFD preprocessing, the correlation between full band spectral reflectance and SOM content was increased, with the value of the correlation coefficient reaching up to 0.66. (3) When the pseudo-sample quantity reached 150%, after feature band selection, the MSC+BPNN model showed the best validation accuracy with a coefficient of determination (
R2), relative percent difference (RPD), and root mean square error (RMSE) of 0.80, 2.22, and 3.18, respectively. Compared to the optimal model constructed from the original dataset, the model accuracy improved by 9.59%. The results from this study confirmed that adding GAN-generated pseudo-samples to the modeling set effectively enhanced model estimation performance, providing a new approach for SOM estimation in complex mountainous tobacco fields.