KTU: K‐mer Taxonomic Units improve the biological relevance of amplicon sequence variant microbiota data

Amplicon sequencing is widely implemented in microbiome‐associated studies. In recent years, microbial ecologists have switched to new algorithms for taxonomic identification and quantification. The amplicon sequence variant (ASV) denoising algorithm of unbiased sequence picking has replaced the OTU...

Full description

Saved in:
Bibliographic Details
Published inMethods in ecology and evolution Vol. 13; no. 3; pp. 560 - 568
Main Authors Liu, Po‐Yu, Yang, Shan‐Hua, Yang, Sung‐Yin
Format Journal Article
LanguageEnglish
Published London John Wiley & Sons, Inc 01.03.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Amplicon sequencing is widely implemented in microbiome‐associated studies. In recent years, microbial ecologists have switched to new algorithms for taxonomic identification and quantification. The amplicon sequence variant (ASV) denoising algorithm of unbiased sequence picking has replaced the OTU clustering methods. ASV can be used to detect and distinguish biological variations to the species OTU level (≥97% similarity). However, the ASV quantification among samples is sparse and less prevalent within the same batch. Here, we present a k‐mer based, alignment‐free algorithm—‘KTU’ (K‐mer Taxonomic Unit)—to iteratively re‐cluster ASVs into optimal biological taxonomic units. The ‘KTU’ algorithm comprises four parts: (a) The k‐mer frequency calling is sliding window counted by tetranucleotide frequencies from both ends of the DNA sequence. (b) The similarities in k‐mer frequencies among the sequences are measured by cosine dissimilarity. (c) The KTUs are detected from the cosine dissimilarity matrix using the partition around medoids (PAM) clustering algorithm. The iterative PAM‐KTU detecting process searches for the numbers of KTU convergent clusters according to the maximum silhouette coefficient. (d) Finally, the ASVs are aggregated into the corresponding KTUs. KTU re‐clustered every 1.38–4.53 ASVs into a feature with >99% sequence similarity on average and 1% cosine divergence for each KTU. Additionally, the re‐clustering procedure improved biological explanations for correlations and significances of clinical and environmental factors. 摘要 擴增子定序法是微生物體學相關研究的主要工具之一。近年來, 微生物生態學家改用新的物種鑑定與定量的新演算法—擴增子序列變體(ASV)去噪法, 來取代傳統OTU類聚法。雖然擴增子序列變體具有高度物種區別解析度, 利用擴增子序列變體去噪法進行微生物物種定量, 會造成單一特徵(即單一ASV)在樣本之間的過度稀疏現象(零膨脹效應)。 本研究為開發以計算序列k‐mer頻率、非序列對齊(alignment‐free)之類聚演算法—“KTU” (K‐mer Taxonomic Unit), 將擴增子序列變體重新類聚, 住分類學單元更具生物學意義。 KTU演算法包含了四個部分: 1) 以滑動視窗法呼叫DNA序列雙向之四核苷酸頻率(tetranucleotide frequencies), 將每一DNA序列轉換為256四核苷酸頻率之特徵組合。 2) 計算任兩序列之餘弦距離(cosine dissimilarity)。 3) 以partition around medoids (PAM)類聚演算法對餘弦距離進行類聚, 並根據最大輪廓係數(silhouette coefficient), 迭代搜尋最收斂之新類聚分類學單元數目。 4) 最後, 將類聚後之ASV合併為對應之新KTU。 經測試數筆已發表數據, KTU得平均重新類聚1.38–4.53個ASVs, 且每一新KTU分類單元內之序列相似度為>99%, 及1%之餘弦分歧度。此外, 透過KTU重新類聚方式, 微生物相數據亦提升了與臨床或環境因子的相關性, 更能展現其生物學解釋意義。
Bibliography:Shan‐Hua Yang and Sung‐Yin Yang authors contributed equally to this work.
Handling Editor
Daniele Silvestro
ISSN:2041-210X
2041-210X
DOI:10.1111/2041-210X.13758