高级检索

    基于XGBoost算法的山东烟叶质量预测模型初探

    Study on Quality Prediction Model of Shandong Tobacco Based on XGBoost Algorithm

    • 摘要: 为挖掘烟叶化学成分与感官质量之间的关系,探究机器学习算法在烟叶质量评价领域的应用效果,以山东烟叶为试验材料,开展了常规成分、生物碱、有机酸、多酚和单双糖等20项主要化学成分检测和感官质量评价,并根据感官质量优劣将其划分为好、中、差3个质量档次。利用遗传算法对XGBoost进行超参数寻优,建立了基于化学成分的山东烟叶质量档次预测模型,同时引入SHAP value模型解释框架进行全局解释与特征依赖分析。所建预测模型对山东烟叶质量档次判别准确率为85%,尤其对第3质量档次识别效果最佳。SHAP value全局解释表明,影响山东烤烟质量的7个特征指标贡献度排名为:酸酚比>蔗糖>氯>烟碱>降烟碱>柠檬酸>糖碱比,其中糖碱比、蔗糖、酸酚比分别为好、中、差质量档次判别贡献最大的化学指标。基于XGBoost算法的山东烟叶质量预测模型在烟叶质量档次判别应用中有效、可靠、可解释性强,对于烟叶质量评价和烟叶生产具有一定指导意义。

       

      Abstract: In order to explore the relationship between chemical components and sensory quality of tobacco leaves, the application effect of machine learning algorithm in tobacco quality evaluation was studied. In this study, Shandong tobacco samples were used as experimental materials to determine 20 chemical components including conventional components, alkaloids, organic acids, polyphenols and mono-bisaccharides, and to evaluate the sensory quality. According to the sensory quality, the tobacco leaves were divided into three quality grades: good, medium and poor. The genetic algorithm was used to optimize the hyperparameters of XGBoost, and a prediction model of Shandong tobacco leaf quality grade based on chemical composition was established. At the same time, SHAP value model interpretation framework was introduced for global interpretation and feature dependence analysis. The accuracy rate of the model was 85% on the test set, with the identification effect on quality of the third class tobacco being the best. SHAP value showed that the rank of contribution to quality grade of Shandong flue-cured tobacco was as follows: acid-phenol ratio>sucrose>chlorine>nicotine>nornicotine>citric acid>sugar-nicotine ratio, among which the sugar-nicotine ratio, sucrose and acid-phenol ratio were the most important chemical indexes to identify the three quality grades of Shandong flue-cured tobacco. The Shandong tobacco quality prediction model based on XGBoost algorithm is effective, reliable and highly interpretable in the application of tobacco quality discrimination, which has a certain guiding significance for the evaluation of tobacco quality and tobacco production.

       

    /

    返回文章
    返回