Abstract:
Single nucleotide polymorphisms (SNPs), which belong to the third-generation molecular markers, were widely used. Derived from different tobacco cDNA libraries in Genbank, 317 175 expressed sequence tags (ESTs) were used to identify high-quality candidate SNPs. Using a redundancy-based approach, valid SNPs were detected by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. By CAP3 assembling, 15 429 contigs assembled at least four reads, and 53 477 candidate SNPs or insertions/deletions were identified. Also, the ratio of transition/transversion was 1.67:1 and indel sequences indicated a bias toward A and T nucleotides. The single nucleotide polymorphic density of tobacco was estimated to be 0.34% by sequence diversity. These markers can contribute to gene functional research and molecular breeding in tobacco.