高级检索

      基于多模型集成和代谢组学的烟叶产区预测模型构建

      Construction of a Tobacco Leaf Origin Prediction Model Based on Multi-Model Ensemble and Metabolomics

      • 摘要: 针对传统烟草产区鉴别方法特征覆盖不足、主观性强等问题,本研究提出一种基于多特征选择与集成学习的烟叶产区预测模型。以5大产区的中部片烟和综合模块的上部片烟共计576份烟叶样本为研究对象,通过对7024项致香成分数据进行特征预处理,结合方差分析、皮尔森(Pearson)相关系数及LightGBM特征重要性评估,筛选出130个高判别性特征,维度压缩率达98.15%。在模型构建方面,创新性地集成支持向量机(SVM)、随机森林(RF)与多层感知机(MLP)的优势,通过网格搜索优化超参数和加权投票策略融合进行结果预测。结果表明,该集成模型在4折交叉验证中表现优异,宏平均精确率、召回率与F1值均达1.0,显著优于单一模型,实现全样本精准分类。本研究不仅揭示了致香成分与生态区间的非线性关联特征,而且为烟草产区鉴别提供了高准确性的新方法,对烟叶品质溯源与生产工艺优化具有重要的理论和实践价值。

         

        Abstract: To address the limitations of traditional tobacco origin identification methods, such as insufficient feature coverage and strong subjectivity, this study proposes a tobacco leaf origin prediction model based on multi-feature selection and ensemble learning. Using 576 tobacco samples, including middle strips and upper comprehensive module strips from five major production regions, this study conducted feature preprocessing on 7024 aroma component data points. Through variance analysis, Pearson correlation coefficient, and LightGBM feature importance evaluation, 130 highly discriminative features were selected, achieving a dimensionality reduction rate of 98.15%. For model construction, we innovatively integrated the advantages of support vector machines (SVM), random forests (RF), and multilayer perceptrons (MLP), optimized hyperparameters via grid search, and fused predictions using a weighted voting strategy. The experimental results demonstrated that the ensemble model achieved superior performance in 4-fold cross-validation, with macro-average precision, recall, and F1 all reaching 1.0, significantly outperforming individual models and realizing accurate classification of all samples. This study not only reveals the nonlinear relationships between aroma components and ecological regions but also provides a new, highly accurate method for tobacco origin identification, which has important theoretical and practical value for tobacco quality traceability and production process optimization.

         

      /

      返回文章
      返回