NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

[Display omitted] Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We develope...

Full description

Saved in:
Bibliographic Details
Published inComputational and structural biotechnology journal Vol. 20; pp. 1821 - 1828
Main Authors Leal, Thiago Peixoto, Furlan, Vinicius C, Gouveia, Mateus Henrique, Saraiva Duarte, Julia Maria, Fonseca, Pablo AS, Tou, Rafael, Scliar, Marilia de Oliveira, Araujo, Gilderlanio Santana de, Costa, Lucas F., Zolini, Camila, Peixoto, Maria Gabriela Campolina Diniz, Carvalho, Maria Raquel Santos, Lima-Costa, Maria Fernanda, Gilman, Robert H, Tarazona-Santos, Eduardo, Rodrigues, Maíra Ribeiro
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.01.2022
Research Network of Computational and Structural Biotechnology
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2022.04.009