高维空间下烟叶质量相似性度量方法研究
Similarity Measurement Method of Tobacco Leaves In High Dimensional Space
-
摘要: 为判断高维数据空间下烟叶质量相似性,本研究提出了一种基于核变换和测地距离线的局部线性嵌入的相似性度量计算方法,并以450 个复烤片烟样品质量分布特征为材料进行特征分析与相似性度量实验验证。结果表明,主成分分析(PCA)的线性降维方法虽能体现原料质量数据内在的非线性特征,但样本点重叠较多,而测地线局部线性嵌入降维方法则能很好表征样本点的分类能力和对领域数据的适用性;在相似性度量时,嵌入映射方法在同产区、同部位、相近等级类烟叶搜索到的数量大于在原始数据集和PCA 变换后数据集上搜索得到的结果,该方法能够有效解决传统原料相似性度量方法中要求低维空间保距映射的问题。Abstract: In this paper, locally linear embedding algorithm in manifold learning based on kernel transformation and the geodesic distance was proposed for judging the quality of tobacco leaf similarity in high-dimensional data space. This method was verified through feature analysis and similarity measure experiment of 450 tobacco grilled piece samples. The results showed that local linear embedded method based on geodesics distance had very good characteristic of the sample classification ability and the applicability of field data. PCA method could reflect the inherent nonlinear characteristics of data quality of raw material, but there existed the more overlap of sample points. In the similarity measurement, the searching tobacco numbers through this method in the same producing area, the same position and the similar grade were greater than the number of tobacco leaf in original data set and that of PCA transform. The method can effectively solve the isometric problem in low-dimensional space to similarity measure.