Classification of cannabis strains in the Canadian market with discriminant analysis of principal components using genome-wide single nucleotide polymorphisms
Dan Jin | Philippe Henry | Jie Shan | Jie Chen
This study performed a whole-genome sequencing of 23 cannabis strains marketed in Canada, aligned sequences to a reference genome, and, after filtering for minor allele frequency of 10%, identified 137,858 single nucleotide polymorphisms (SNPs). Discriminant analysis of principal components (DAPC) was applied to these SNPs and further identified 344 structural SNPs, which classified individual strains into five chemotype-aligned groups: one CBD dominant, one balanced, and three THC dominant clusters. Therefore, there may be a relatively limited selection of CBD dominant strains for breeding balanced strains. Discriminant analysis of principal components using 344 structural SNPs DAPC was repeated using identified 344 structural SNPs. Strain 1-balanced is closer to THC dominant strain regardless of whether the whole set of SNPs or 344 identified SNPs were used.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8238227/