Abstract:
In order to explore the relationship between chemical components and sensory quality of tobacco leaves, the application effect of machine learning algorithm in tobacco quality evaluation was studied. In this study, Shandong tobacco samples were used as experimental materials to determine 20 chemical components including conventional components, alkaloids, organic acids, polyphenols and mono-bisaccharides, and to evaluate the sensory quality. According to the sensory quality, the tobacco leaves were divided into three quality grades: good, medium and poor. The genetic algorithm was used to optimize the hyperparameters of XGBoost, and a prediction model of Shandong tobacco leaf quality grade based on chemical composition was established. At the same time, SHAP value model interpretation framework was introduced for global interpretation and feature dependence analysis. The accuracy rate of the model was 85% on the test set, with the identification effect on quality of the third class tobacco being the best. SHAP value showed that the rank of contribution to quality grade of Shandong flue-cured tobacco was as follows: acid-phenol ratio>sucrose>chlorine>nicotine>nornicotine>citric acid>sugar-nicotine ratio, among which the sugar-nicotine ratio, sucrose and acid-phenol ratio were the most important chemical indexes to identify the three quality grades of Shandong flue-cured tobacco. The Shandong tobacco quality prediction model based on XGBoost algorithm is effective, reliable and highly interpretable in the application of tobacco quality discrimination, which has a certain guiding significance for the evaluation of tobacco quality and tobacco production.