LDSSNV: A Linkage Disequilibrium-Based Method for the Detection of Somatic Single-Nucleotide Variants

Single nucleotide variants (SNVs) are very common in human genome and pose a significant effect on cellular proliferation and tumorigenesis in various cancers. Somatic variant and germline variant are the two forms of SNVs. They are the major drivers of inherited diseases and acquired tumors respect...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on computational biology and bioinformatics Vol. 20; no. 5; pp. 3020 - 3032
Main Authors Lan, Jingfen, Chen, Wenxiang, Yin, Ganggang, Haque, A. K. Alvi, Xie, Kun, Yu, Qiang, Yuan, Xiguo
Format Journal Article
LanguageEnglish
Published United States IEEE 01.09.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Single nucleotide variants (SNVs) are very common in human genome and pose a significant effect on cellular proliferation and tumorigenesis in various cancers. Somatic variant and germline variant are the two forms of SNVs. They are the major drivers of inherited diseases and acquired tumors respectively. A reasonable analysis of the next generation sequencing data profiles from cancer genomes could provide crucial information for cancer diagnosis and treatment. Accurate detection of SNVs and distinguishing the two forms are still considered challenging tasks in cancer analysis. Herein, we propose a new approach, LDSSNV, to detect somatic SNVs without matched normal samples. LDSSNV predicts SNVs by training the XGboost classifier on a concise combination of features and distinguishes the two forms based on linkage disequilibrium which is a trait between germline mutations. LDSSNV provides two modes to distinguish the somatic variants from germline variants, the single-mode and multiple-mode by respectively using a single tumor sample and multiple tumor samples. The performance of the proposed method is assessed on both simulation data and real sequencing datasets. The analysis shows that the LDSSNV method outperforms competing methods and can become a robust and reliable tool for analyzing tumor genome variation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1545-5963
1557-9964
DOI:10.1109/TCBB.2023.3291134